Explainable Artificial Intelligence for Deep Learning in Food | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Explainable Artificial Intelligence for Deep Learning in Food Osman Mutlu, Bas van der Velden, Ali Hürriyetoğlu, Anna Fensel This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-7289201/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract The integration of artificial intelligence (AI) in food systems is accelerating. Explainable AI can help in understanding AI, but the literature is fragmented in food systems. In light of regulatory imperatives such as the European Union's AI Act, this systematic review brings together current research on explainable AI in food, highlighting key patterns and gaps. We find that most studies use off-the-shelf explainable AI tools that fail to address the complexities of food data. Beyond model transparency, explainable AI offers broader value in model enhancement, supporting trust, and knowledge discovery. However, most studies do not adequately evaluate the explainable AI methods they use. Advancing explainable AI in food systems requires tailored and carefully evaluated approaches to ensure responsible and effective AI deployment. Domain and AI experts from the entire food system should collaborate on an evaluation framework in explainable AI for food to provide better guidance, tools, and evaluation. Social science/Science, technology and society Scientific community and society/Agriculture Biological sciences/Computational biology and bioinformatics/Machine learning Figures Figure 1 Figure 2 Figure 3 Figure 4 1 Introduction Artificial intelligence (AI) is revolutionizing our lives, industry, and research. This includes food systems 1 . AI, commonly referring to deep learning, is prominently used in all stages of the supply chain for food quality, food safety, food security 2,3 . Despite the success of deep learning, it has pivotal drawbacks: it depends heavily on its training data and is prone to biases — both known and unknown — that can be difficult to detect 4 . It is difficult to “see inside” deep learning models due to their "deep" and non-linear nature, leading to challenges in understanding how AI works (i.e., explainability). Such a “black box” in food systems creates unacceptable risks for human health and environment 5 . Explainability of deep learning is defined as the ability to explain model’s reasoning, functioning, and/or behavior in human understandable terms 6 . The European Union recognized the importance of explainability and proposed the AI legislative act in 2021 7 , which came into force in 2024. Explainability is an essential part of applications affecting our health and society, such as those on food systems. Explainable AI (XAI) has recently emerged as a response to the explainability challenges 8,9 . Despite the growing interest in XAI, there remains a significant gap in literature specifically focusing on its application within food. Guided by the recently introduced AI Act, this systematic review addresses this gap by providing a comprehensive and structured analysis of XAI in food. This review (i) examines the volume and trends of XAI research in food; (ii) offers a practical guidance on what type of XAI to apply, when to apply it, and what outcomes to expect; (iii) reveals the motivations for using XAI, (iv) describes the strategies for XAI evaluation, and (v) explores the future of XAI in food with a focus on methods tailored to food data. 2 Results We included 239 out of 2,876 studies in our review (see Methods section). All resulting categories and sub-categories, and their descriptions can be found in Table 1 . Table 1. This table contains all categories, sub-categories and their definitions. “Method type”, “Model agnostic vs specific explanations”, “Model-based vs post hoc explanations” and “Global vs local explanations” categories are per XAI method, while the rest of the categories are per study. Category Sub-category Description Method Type Backpropagation-based XAI methods for neural networks that use the gradient information from backpropagation or layer activations. Dimension reduction XAI methods that use dimension reduction techniques on extracted features. Graph-based XAI methods for explainability that make use of a graph structure, either by learning a graph structure and visualizing it, or using a knowledge graph to explain a model. Interpretable concepts XAI methods that provide an interpretation in terms of human-friendly concepts such as textures, stripes, or shapes. Occlusion-based XAI methods that occlude/mask/delete a part of the input data to understand its relevance to output. Perturbation-based XAI methods that change the input to see the difference between the original and changed output. Trainable attention Visualizations of attention mechanisms integrated into model training process. Other Any other XAI method. Model agnostic vs specific explanations Model agnostic XAI methods that are applicable to any model. These usually require only input and output of a model, which are universal for machine learning models. Model specific XAI methods that use extra information, e.g. gradients or activations, to generate explanations. This category also includes all model-based explanations. Multiple Multiple XAI methods with different explanations are used. Model-based vs post hoc explanations Model-based Explanations generated by explainable-by-design models. Also called integrated explanations or inherently explainable. Post hoc Explanations generated after the model training is finished. Multiple Multiple XAI methods with different explanations are used. Global vs local explanations Global The explanations provide global information about the model, mostly about the importance of individual input features. Local The explanations provide information about the model with respect to a single data sample. It is sometimes possible to aggregate these explanations and achieve pseudo-global explanations. Global and local The XAI method provides both global and local information at the same time. Multiple Multiple XAI methods with different explanations are used. Food Domain Food security Food and Agriculture Organization of United Nations (FAO) refers to a definition of food security from the World Food Summit of 1996: “Food security exists when all people, at all times, have physical and economic access to sufficient, safe and nutritious food that meets their dietary needs and food preferences for an active and healthy life.”, and identifies four main dimensions of food security: availability of food, access to food, food utilization, and stability of these dimensions over time. Food quality Food quality is comprised of quality characteristics of food such as the external (e.g. color, size) and internal (e.g. microbial, chemical) factors, and texture and flavor. Desired characteristics are often shaped by the food industry, and consumers and their behavior. Food safety Food safety refers to ensuring that food is safe to consume by humans. The definition of safe can change over time and place. Nutrition science Nutrition science is the science that studies the physiological process of nutrition (primarily human nutrition), interpreting the nutrients and other substances in food in relation to maintenance, growth, reproduction, health and disease of an organism. ("Joint Collection Development Policy: Human Nutrition and Food". US National Library of Medicine, National Institutes of Health. 14 October 2014. Retrieved 13 December 2014.) Food fraud Food fraud “is deception of consumers using food products, ingredients and packaging for economic gain and includes substitution, unapproved enhancements, misbranding, counterfeiting, stolen goods or others.” GFSI Position on Mitigating the Public Health Risk of Food Fraud (2014). Other Other tasks related to food. Data Modality Tabular Data that is presented in a table format, i.e. rows and columns. Image Data that is in image format. Spectral Chemical data, like reflectance data, NIRS etc. Text Raw text data. Genomics Genome data, such as DNA or RNA strings of nucleotides. Molecular Strings of elements that make up molecules. Graph Data that is in graph format where there are connections between nodes in the graph. Sound Raw sound data, specifically Fbanks representing audio as a series of frequency bands. Benefit Model understanding XAI was used to understand and evaluate the model. This is the default category. Model enhancement XAI was used to enhance the model, in performance or robustness. This category requires explanations to be integrated into the model. Downstream task XAI was used to predict the output for a downstream task of the original model's task. Knowledge discovery XAI was used to discover new knowledge about the task. User trust XAI was used to gain the trust of the end-users. Evaluation Domain expert The explanations were evaluated by domain experts in a systematic way. Domain- or method-specific metric The explanations were evaluated by a specific metric that is related to the XAI method or the task domain. Downstream task performance The explanations were evaluated by comparing the predictions on a downstream task to the labels in the downstream task. Human improvement The explanations were evaluated by comparing human annotation performance before and after training the annotators with the explanations. Model performance The explanations were evaluated by comparing the performance of the model with and without the integrated explanations. Multiple method comparison The explanations were evaluated by comparing multiple explanations from different XAI methods. User survey The explanations were evaluated by performing a user survey with relevant questions. No evaluation The explanations were not evaluated. Studies that show only a couple of explanations as evidence fall into this category. 2.1 Nearly all papers used off-the-shelf XAI Researchers have employed deep learning at a high rate in food as the amount of data, and the variance and availability of collected data increased 2,3 . Subsequently, we observed an exponential increase in XAI ( Fig. 1) . While most studies used off-the-shelf, popular and easily available XAI methods (207/239, 87%), some studies either introduce a new method or adapt an old one to their domain, which we refer to as tailor-made methods (32/239, 13%). We found 65 unique XAI methods, and grouped these into eight method types. We further categorized the methods as model agnostic versus model specific, model-based versus post hoc, and producing global versus local explanations 10 . All methods, their corresponding type, and other information are discussed in Supplementary Material Table 1. Most commonly used XAI methods include Grad-CAM (n=58), SHAP (n=58), and LIME (n=34) 11,12,13 . Model specific XAI methods were used twice as much as model agnostic XAI methods. 194/239 (81%) studies provided local explanations only, while 33/239 (14%) studies provided only global explanations. There were 4/239 (2%) studies that provided both global and local explanations. 189/239 (79%) studies used XAI after they train a model (post hoc), and 46/239 (19%) studies used model-based explanations. The remaining studies used multiple XAI methods with different characteristics. Data types play a role in selecting XAI methods as well. We identified eight data types ( Table 1 ). Studies with tabular data used 29% more perturbation-based XAI than average, and 35% less backpropagation-based XAI. Out of 62 studies that had tabular data, 31/62 (50%) used SHAP, and 6/62 (10%) used LIME. Meanwhile, studies with image data used backpropagation-based XAI 12% more than average and perturbation-based XAI 10% less than the average. For image data, Grad-CAM and LIME were used most. A total of 47/239 (20%) studies used multiple XAI, with 13/47 (28%) using more than two methods. Of the studies using multiple XAI, 19/47 (40%) studies used methods from a single method type, 21/47 (45%) from 2 method types, and 7/47 (15%) from 3 method types. State of explainable AI in food We identified six domains in food based on tasks performed: food security, food quality, food safety, food fraud, nutrition, and other ( Fig. 2) . The most common task for food security was crop disease detection (n=81). Approximately half of the XAI methods used were backpropagation-based, presumably since images were the dominant data type for this task. In addition, LIME was used in 20% of these studies, compared to 11% of the average LIME usage. This task includes the most unique studies that introduced tailor-made XAI methods in food (n=11). Tailor-made XAI methods included a biologically inherently explainable deep learning model that outperformed similarly complex models in terms of accuracy and robustness 14 . They achieved this by incorporating the prior-knowledge about biophysical and biochemical attributes, and their hierarchical structures of the target crops. Shi et al. (2023) 15 aimed to detect wheat yellow rust disease from satellite images as opposed to leaf images. They proposed Fast Fourier Transform-based kernel for extracting explainable information from image time-series data, and outperformed multiple baselines. Another common task for food security was crop yield prediction (n=20), which helps farmers in precision agriculture and policymakers in decision-making. Backpropagation-based methods were preferred over perturbation-based methods, even though the data modality was mostly tabular data, contrary to the average for tabular data. Togninalli et al. (2023) 16 used multi-modal deep learning for crop yield prediction using thermal image, digital elevation model, multispectral and genomics data, and utilized an attention mechanism to gauge the contribution of each data type to the outcome of the model. Their model improved predicting yield for new crop lines in unseen environments using genomics data compared to other data types. The remaining 50 studies in food security performed 29 unique tasks. Chelali et al. (2021) 17 used satellite image data for land cover mapping, and created semantic maps using Grad-CAM explanations based on combination of spatial dimensions and the time dimension. Batchuluun et al. (2022) 18 utilized ground truth heatmaps generated from thermal images of plants to better train their model for plant identification. Twenty nine food quality studies used Grad-CAM in 30% of the studies with a 12% increase from the average, while 49% of the studies used CAM-based XAI methods as opposed to an average of 28%. The studies included tasks such as freshness prediction, tastant classification, and quality inspection. Hsu et al. (2019) 19 built an explainable neural network approximation of correspondence analysis to scale it up to large and high-dimensional datasets with continuous features, and tested their method on a wine quality dataset. Chang et al. (2021) 20 adapted saliency maps explanations to spectral wavelength data for coffee flavor prediction. This allowed them to determine the effects of different molecule content on coffee flavor. Our analysis resulted in a total of 13 food safety studies, three of which were cross-domain, performing pest recognition, contamination/quality warning and fraud detection. Notably, food safety included the only two studies using graph representation learning and sharing their resulting graphs. For example, Hao et al. (2024) 21 performed heavy metal pollution prediction in soil-rice systems and shared the resulting graph of effects of different environmental factors. In another food safety study, Bowler et al. (2022) 22 performed allergen classification on multiple types of food powder with different allergens using a deep learning model trained with spectral data, and applied a feature importance XAI method to show how much wavelengths, therefore molecules, contribute to the decision. The rest of the 50 studies included tasks about nutrition and food fraud, and tasks that did not directly fall into any domain. Seven out of nine nutrition studies performed food recognition on images to provide nutritional information. Fu et al. (2023) 23 created a knowledge graph on dietary nutrition and human health by automatically extracting information from scientific texts. They used the resulting knowledge graph to create an explainable question answering system. Alongside the aforementioned fraud detection study, there were two studies on origin identification that both used XAI to select a subset of wavelengths in their spectral data. Finally, 38 studies that fell into the Other category consisted of tasks such as carbon footprint prediction, soil fungal diversity, greenhouse gas flux prediction and anticancer food molecules prediction. Xie et al. (2024) 24 built a model on abnormal vocalization detection for livestock pig welfare using raw sound and spectral data. They showed attention scores and Grad-CAM explanations to differentiate between a normal sound, a cough, or a scream. 2.2 Benefit of XAI is not limited to model understanding In addition to legal requirements imposed by the EU AI act, there are further multiple reasons for researchers in food to employ XAI ( Fig. 3) . XAI mainly provides insights into how a deep learning model works. It is imperative to know that the model behaves in the intended way for the current task, and does not use a shortcut to find a well explainable but incorrect solution 4,10 . XAI was used for better understanding and gauging the correctness of trained models in 172/239 (72%) studies. Di Martino et al. (2023) 25 introduced Gradient Sequential Latent Activation Mapping (Grad-SLAM), which is an extension of Grad-CAM, to explain how the classification of crops changed over time in satellite images, focusing on the temporal dimension instead of the spatial dimensions. XAI also enhanced the performance, robustness, or runtime of deep learning models (47/239, 20%). To do so, XAI must be directly integrated in the model, producing model-based explanations by definition. Trainable attention, explainable feature extraction, and feature importance for feature selection methods are some examples. Shah et al. (2022) 26 built two models (teacher-student architecture) trained in an end-to-end fashion, reconstructing input images with explainable highlights and feeding the reconstructed image into the second model. Katafuchi et al. (2021) 27 exploited the color contrast between healthy and unhealthy parts of a leaf. They trained a Generative Adversarial Network using healthy leaf images by color reconstruction, and for prediction, they checked the color difference between the reconstructed and the original image to highlight diseased parts if they exist. XAI enhanced end-user trust in AI for 11/239 (5%) studies, feasibly to convince users with non-scientific background to follow advice from a deep learning model. Yuan et al. (2023) 28 extended Concept Activation Vectors (CAV) by introducing Automatic Visual Concept-based Explanation Generation (AVCEG). AVCEG added global explanations with directed graphs to make end-users trust a pest recognition application. Lastly, XAI was used for downstream tasks in 6/239 (3%) studies and for knowledge discovery in 3/239 (1%) studies. Coulibaly et al. (2022) 29 used explanations from multiple XAI methods that were applied on an image classifier trained on a pest recognition task to create bounding boxes for pests. Qiu et al. (2021) 30 created a fine-grained severity score from saliency maps as opposed to having discrete severity classes of powdery mildew disease in grapes. Finally, for knowledge discovery, Akagi et al. (2020) 31 applied XAI to gather novel insights on calyx-end cracking on persimmon fruits. Aside from finding a known identifier by the experts, color unevenness, they also stated that “substantial relevance peaks around the apex would not be interpretable from conventional empirical diagnosis”. 2.3 Most XAI are not evaluated The evaluation strategies of XAI were classified into a framework ( Fig. 3) . Most XAI was not evaluated (190/239, 79%). Among XAI that were evaluated, model performance was the most common evaluation criteria (27/49, 55%), typically measured through ablation studies, comparing model accuracy with and without integrated (model-based) explanations. This evaluation approach was coupled with the benefit of model enhancement. Shulman et al. (2024) 32 built a physics-guided, inherently explainable neural network for crop quality classification. The explainable neural network performed better than an identical black-box baseline, especially with limited data. Another evaluation approach was based on functionally comparing explanations from multiple XAI methods (7/49, 14%). Bengamra et al. (2023) 33 extended an existing XAI method, and compared their proposed XAI method with its predecessor by creating deletion and insertion curves while gradually altering the input. A limitation of this approach is the need to have similar types of explanations in order to be comparable using a single function to evaluate. The accuracy of a downstream task of the original was also used to evaluate XAI. This approach was used when the employed XAI method generates explanations in a downstream task as mentioned in Section 2.2 . The performance of the explanations in the new task was the evaluation of the XAI in 6/49 (12%) studies. Yang et al. (2023) 34 employed multiple Class Activation Mapping (CAM)-based XAI methods on models trained for crop disease detection, then evaluated these methods comparing explanations and ground truth disease masks. Most of the quality datasets used for training deep learning are created manually by domain experts; this also applies to evaluating explanations from XAI. Our findings indicate that this is not always feasible (5/49, 10%). Ghosal et al. (2018) 35 used around a thousand crop disease expert-annotated masks to evaluate their proposed XAI method. The method consisted of a clustering and aggregation algorithm for CAM explanations from different layers of a CNN. Other approaches of evaluation included user surveys, applying a domain- or method-specific metric, and measuring human improvement in a task. Chhetri et al. (2023) 36 performed user surveys and compared results for SHAP explanations with explanations from their proposed approach. The surveys included basic questions on usefulness, comprehension, and user’s profession. Wang et al. (2022) 37 built a causal inference model on crop yield prediction and performed refutation tests, which are specific to causal models. Yu et al. (2023) 38 manually extracted features from activation maps of a model trained for freshness prediction for oranges. Teaching the extracted features or explanations to human annotators, they showed 10% overall improvement in correct prediction of shelf-life of oranges against a control group. 3 Discussion We saw an exponential increase in food studies and applications that used XAI. The uptake of XAI is an essential step towards better AI understanding, and therefore better control over applications that have a direct effect on our society and health. Our results indicate three major implications that need to be addressed. First, nearly all studies used off-the-shelf XAI methods, often overlooking specifics of food data. Second, the benefit of using XAI is not limited to model understanding. Third, most studies do not evaluate their XAI. Eighty-seven percent of the studies use known, off-the-shelf XAI methods due to their popularity. Despite being introduced several years ago and the emergence of newer methods, Grad-CAM (2019), SHAP (2017), and LIME (2016) remain the most widely used XAI methods. In fact, their use has not declined in recent years; in some cases, it has even increased. SHAP and LIME both produce additive explanations, which have been shown to be unfaithful to non-additive models, which include deep learning methods 39,40 . Meanwhile, Draelos et al. (2020) 41 illustrated that Grad-CAM may highlight locations the model does not use. Rudin (2019) 42 argues that XAI methods producing post hoc explanations are misleading, and to move forward and achieve reliable explainability for deep learning models, we need explainable-by-design XAI methods that are a part of the model building process. Explainable-by-design XAI methods is one of the examples showing that there are benefits to using XAI other than model understanding. XAI can drive model enhancement, build user trust, support downstream tasks, and enable knowledge discovery. We argue that these benefits have direct and practical implications in food applications, particularly in the context of data collection, which is difficult and expensive to gather. One common example is feature selection: using XAI to identify the most relevant features can reduce data collection costs while maintaining model accuracy and, in some cases, increasing robustness. Building trust and effectively communicating insights to various stakeholders is also crucial in food systems. Furthermore, XAI can support downstream tasks by reducing the need for costly annotation — such as when detailed labels are required — through leveraging insights from models trained on coarser labels. Finally, XAI facilitates the discovery of new insights during the annotation process, leading to higher-quality datasets and deeper domain understanding. We observed that the evaluation of XAI methods is largely overlooked. The main incentive for XAI is to create a control mechanism against errors in models. If the method of control, XAI, is also erroneous and uncontrolled, we essentially create a new unknown while trying to understand the old one. Grad-CAM, SHAP, and LIME are the topmost used XAI methods, yet 88% of the studies using them do not perform any evaluation. Our review suggests several evaluation strategies relevant to the food domain, each with its own trade-offs. A simple approach is to compare explanations from different XAI methods, especially across method types, to see which fits best with a given model and context. However, the comparison is only between the XAI methods, and does not reflect the actual quality of the explanations. Similarly, using model performance as an evaluation proxy shares these limitations. Domain expert evaluation remains the most reliable method but is often costly and difficult — scoring a Grad-CAM heatmap is far easier than assessing a detailed knowledge graph. Another possible evaluation scenario that requires data is to evaluate the accuracy of downstream task predictions. Finally, domain- or method-specific metrics, though not always feasible, offer a promising direction for assessing explanation quality. Concluding, we highlight the need for improvement in the usage of XAI in food, we show inspiring benefits for XAI, and we argue that evaluating explanations is as important as utilizing them in the first place. In medical science, a clear framework was developed to guide and evaluate clinical XAI 43 , but none of the off-the-shelf XAI methods passed this framework. Currently, there is no framework to even perform this test in food science, which is essential to develop better domain-specific XAI. Domain and AI experts from the entire food system should collaborate on such an evaluation framework in XAI for food to provide better guidance, tools, and evaluation. 4 Methods Our review followed the PRISMA 44 guidelines, and the resulting flow diagram can be found in Fig. 4 . We used four inclusion criteria: (i) using XAI, (ii) being in the food domain, (iii) applying or integrating XAI methods on deep learning (neural networks), and (iv) being peer-reviewed articles in English. The main categories for food were food security, food quality, food safety, food fraud, and nutrition. These categories were not predefined but emerged naturally based on the primary tasks addressed in the relevant studies. The studies that did not directly fall into the aforementioned main categories were labelled as “Other”. Subjects such as consumer behavior and opinion or marketing of food products were out of scope for this review. The full list of exclusion criteria are in the Supplementary Materials under “Exclusion criteria”. 4.1 Search query and results We constructed our search query to broadly capture studies related to both XAI and the food domain, while minimizing irrelevant results. We used two main keyword clusters—one for XAI and one for food-related topics (Box 1). This query was designed to be recall oriented without introducing unnecessary clutter in the search results. For example, we ensured that terms “explanation” or “interpretation” were only included when clearly linked to AI. Similarly, the term "food*" was chosen to match a range of relevant domains such as food safety, food security, food quality, and more. TITLE-ABS-KEY(explainab* OR interpretab* OR xai OR exml OR ((explanation OR interpretation) AND ("machine learning" OR "deep learning" OR "artificial intelligence"))) AND TITLE-ABS-KEY(food* OR agro* OR agri* OR "crop *") Box 1. Selected search query for Scopus. This query can be considered as two semantic parts, each part inside the parentheses of TITLE-ABS-KEY. We searched Scopus, specifically titles, abstracts and keywords, with the selected query and retrieved 2897 studies as a result on August 1, 2024. After deduplication via direct title matching and keeping the first duplicate instance, 2876 unique studies were left. 4.2 Abstract screening We used ASReview tool 45 , its version 1.5, throughout the abstract screening process. ASReview tool is a labelling software specifically tailored for abstract screening that utilizes the active learning method. The active learning method refers to updating a predictive model as the labelling process continues and ranking the samples from the most relevant to the least relevant. This method allows the annotator to see more relevant samples quicker than a random selection, which is especially useful where the number of relevant samples is low. As more and more abstracts are labelled, the active learning model’s accuracy improves. In principle, a stopping criterion can be applied once the remaining abstracts are unlikely to be relevant. We opted to screen all of the abstracts instead of stopping after encountering an arbitrary number of irrelevant records back-to-back. Options for the active learning model were Sentence BERT , Fully connected neural network , Mixed , and Dynamic resampling for Feature extraction technique , Classifier , Query strategy , and Balance strategy , respectively. 4.3 Data generation After completing the screening process, 96 relevant articles were selected. These formed the basis for identifying the main categories and sub-categories used in our framework. Our strategy for defining sub-categories was to be as specific as possible to reduce the need for revisiting articles later. For example, rather than grouping all XAI methods into broad categories (like backpropagation-based or perturbation-based), we initially left these labels open-ended. Any generalization was deferred until after the data generation was complete. After we specified the categories and established their framework, we retrieved the rest of the articles and processed them according to the framework. One main caveat of the categorization was non-exclusivity of sub-categories. 26% of articles had at least one column that contained multiple sub-categories. The details and definitions for each category and sub-categories may be found in Table 1 . 4.4 Limitations Due to our keyword selection, we could only detect papers that explicitly mention explainability in their content. Some authors did not mention in their abstracts that they used visualizations, attention mechanisms, inherently explainable models, or XAI methods, simply because it was not the focus of their study. We only included English articles, which lead to a language and culture bias. In addition, we only included peer-reviewed articles, which might underrepresent emerging or unconventional uses of XAI in food. References Nguyen, H. (2018). Sustainable food systems: Concept and framework. Zhou, L., Zhang, C., Liu, F., Qiu, Z., & He, Y. (2019). Application of deep learning in food: a review. Comprehensive reviews in food science and food safety, 18(6), 1793-1811. Yang, B., & Xu, Y. (2021). Applications of deep-learning approaches in horticultural research: a review. Horticulture Research, 8. Geirhos, R., Jacobsen, J. H., Michaelis, C., Zemel, R., Brendel, W., Bethge, M., & Wichmann, F. A. (2020). Shortcut learning in deep neural networks. Nature Machine Intelligence, 2(11), 665-673. Tzachor, A., Devare, M., King, B., Avin, S., & Ó hÉigeartaigh, S. (2022). Responsible artificial intelligence in agriculture requires systemic understanding of risks and externalities. Nature Machine Intelligence , 4(2), 104-109. Nauta, M., Trienes, J., Pathak, S., Nguyen, E., Peters, M., Schmitt, Y., ... & Seifert, C. (2023). From anecdotal evidence to quantitative evaluation methods: A systematic review on evaluating explainable ai. ACM Computing Surveys, 55(13s), 1-42. Proposal for a Regulation of the European Parliament and of the Council laying down harmonised rules on Artificial Intelligence (Artificial Intelligence Act) and amending certain union legislative acts COM(2021) 206 Final. Adadi, A., & Berrada, M. (2018). Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE access, 6, 52138-52160. Hassija, V., Chamola, V., Mahapatra, A., Singal, A., Goel, D., Huang, K., ... & Hussain, A. (2024). Interpreting black-box models: a review on explainable artificial intelligence. Cognitive Computation, 16(1), 45-74. Van der Velden, B. H., Kuijf, H. J., Gilhuijs, K. G., & Viergever, M. A. (2022). Explainable artificial intelligence (XAI) in deep learning-based medical image analysis. Medical Image Analysis, 79, 102470. Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2017). Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision (pp. 618-626). Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. Advances in neural information processing systems, 30. Ribeiro, M. T., Singh, S., & Guestrin, C. (2016, August). " Why should i trust you?" Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1135-1144). Shi, Y., Han, L., Huang, W., Chang, S., Dong, Y., Dancey, D., & Han, L. (2021). A biologically interpretable two-stage deep neural network (BIT-DNN) for vegetation recognition from hyperspectral imagery. IEEE Transactions on Geoscience and Remote Sensing, 60, 1-20. Shi, Y., Han, L., González-Moreno, P., Dancey, D., Huang, W., Zhang, Z., ... & Dai, M. (2023). A fast Fourier convolutional deep neural network for accurate and explainable discrimination of wheat yellow rust and nitrogen deficiency from Sentinel-2 time series data. Frontiers in Plant Science, 14, 1250844. Togninalli, M., Wang, X., Kucera, T., Shrestha, S., Juliana, P., Mondal, S., ... & Poland, J. (2023). Multi-modal deep learning improves grain yield prediction in wheat breeding by fusing genomics and phenomics. Bioinformatics, 39(6), btad336. Chelali, M., Kurtz, C., Puissant, A., & Vincent, N. (2021). Deep-STaR: Classification of image time series based on spatio-temporal representations. Computer Vision and Image Understanding, 208, 103221. Batchuluun, G., Nam, S. H., & Park, K. R. (2022). Deep learning-based plant classification and crop disease classification by thermal camera. Journal of King Saud University-Computer and Information Sciences, 34(10), 10474-10486. Hsu, H., Salamatian, S., & Calmon, F. P. (2019, April). Correspondence analysis using neural networks. In The 22nd International Conference on Artificial Intelligence and Statistics (pp. 2671-2680). PMLR. Chang, Y. T., Hsueh, M. C., Hung, S. P., Lu, J. M., Peng, J. H., & Chen, S. F. (2021). Prediction of specialty coffee flavors based on near‐infrared spectra using machine‑and deep‐learning methods. Journal of the Science of Food and Agriculture, 101(11), 4705-4714. Hao, H., Li, P., Li, K., Shan, Y., Liu, F., Hu, N., ... & Jiao, W. (2024). A novel prediction approach driven by graph representation learning for heavy metal concentrations. Science of The Total Environment, 947, 174713. Bowler, A. L., Ozturk, S., Rady, A., & Watson, N. (2022). Domain adaptation for in-line allergen classification of agri-food powders using near-infrared spectroscopy. Sensors, 22(19), 7239. Fu, C., Pan, X., Wu, J., Cai, J., Huang, Z., van Harmelen, F., ... & He, T. (2023). KG4NH: a comprehensive knowledge graph for question answering in dietary nutrition and human health. IEEE journal of biomedical and health informatics. Xie, Y., Wang, J., Chen, C., Yin, T., Yang, S., Li, Z., ... & Gan, L. (2024). Sound identification of abnormal pig vocalizations: Enhancing livestock welfare monitoring on smart farms. Information Processing & Management, 61(4), 103770. Di Martino, T., Guinvarc’h, R., Thirion-Lefevre, L., & Colin, É. (2023). Grad-SLAM: Explaining convolutional autoencoders’ Latent space of satellite image time series. IEEE Geoscience and Remote Sensing Letters, 20, 1-5. Shah, D., Trivedi, V., Sheth, V., Shah, A., & Chauhan, U. (2022). ResTS: Residual deep interpretable architecture for plant disease detection. Information Processing in Agriculture, 9(2), 212-223. Katafuchi, R., & Tokunaga, T. (2020). Image-based plant disease diagnosis with unsupervised anomaly detection based on reconstructability of colors. arXiv preprint arXiv:2011.14306. Yuan, Z., Liu, K., Li, S., & Yang, P. (2023, July). Automatic generation of visual concept-based explanations for pest recognition. In 2023 IEEE 21st International Conference on Industrial Informatics (INDIN) (pp. 1-6). IEEE. Coulibaly, S., Kamsu-Foguem, B., Kamissoko, D., & Traore, D. (2022). Explainable deep convolutional neural networks for insect pest recognition. Journal of Cleaner Production, 371, 133638. Qiu, T., Underhill, A., Sapkota, S. D., Cadle-Davidson, L., & Jiang, Y. (2021). Deep learning-based saliency maps for the quantification of grape powdery mildew at the microscopic level. In 2021 ASABE Annual International Virtual Meeting (p. 1). American Society of Agricultural and Biological Engineers. Akagi, T., Onishi, M., Masuda, K., Kuroki, R., Baba, K., Takeshita, K., ... & Ise, T. (2020). Explainable deep learning reproduces a ‘professional eye’on the diagnosis of internal disorders in persimmon fruit. Plant and Cell Physiology, 61(11), 1967-1973. Shulman, D., Israeli, A., Botnaro, Y., Margalit, O., Tamir, O., Naschitz, S., ... & Dattner, I. (2024). Physics-Guided Inverse Regression for Crop Quality Assessment. Journal of Agricultural, Biological and Environmental Statistics, 1-24. Bengamra, S., Zagrouba, E., & Bigand, A. (2023, August). Explainable AI for deep learning based potato leaf disease detection. In 2023 IEEE International Conference on Fuzzy Systems (FUZZ) (pp. 1-6). IEEE. Yang, S., Xing, Z., Wang, H., Gao, X., Dong, X., Yao, Y., ... & Liu, Z. (2023). Classification and localization of maize leaf spot disease based on weakly supervised learning. Frontiers in Plant Science, 14, 1128399. Ghosal, S., Blystone, D., Singh, A. K., Ganapathysubramanian, B., Singh, A., & Sarkar, S. (2018). An explainable deep machine vision framework for plant stress phenotyping. Proceedings of the National Academy of Sciences, 115(18), 4613-4618. Chhetri, T. R., Hohenegger, A., Fensel, A., Kasali, M. A., & Adekunle, A. A. (2023). Towards improving prediction accuracy and user-level explainability using deep learning and knowledge graphs: A study on cassava disease. Expert Systems with Applications, 233, 120955. Wang, Y., Chandrasekaran, J., Haberkorn, F., Dong, Y., Gopinath, M., & Batarseh, F. A. (2022, October). Deepfarm: AI-driven management of farm production using explainable causality. In 2022 IEEE 29th annual software technology conference (STC) (pp. 27-36). IEEE. Yu, Y., Deng, H., Chen, J., Cheng, Y., Xu, R., Li, S., & Chen, Y. (2023). Improving human intuition for vision-based freshness prediction of Citrus reticulata Blanco using machine learning. Scientia Horticulturae, 321, 112300. Gosiewska, A., & Biecek, P. (2019). Do not trust additive explanations. arXiv preprint arXiv:1903.11420 . Molnar, C., König, G., Herbinger, J., Freiesleben, T., Dandl, S., Scholbeck, C. A., ... & Bischl, B. (2020, July). General pitfalls of model-agnostic interpretation methods for machine learning models. In International Workshop on Extending Explainable AI Beyond Deep Models and Classifiers (pp. 39-68). Cham: Springer International Publishing. Draelos, R. L., & Carin, L. (2020). Use HiResCAM instead of Grad-CAM for faithful explanations of convolutional neural networks. arXiv preprint arXiv:2011.08891 . Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature machine intelligence, 1(5), 206-215. Jin, W., Li, X., Fatehi, M., & Hamarneh, G. (2023). Guidelines and evaluation of clinical explainable AI in medical image analysis. Medical image analysis , 84 , 102684. Page, M. J., McKenzie, J. E., Bossuyt, P. M., Boutron, I., Hoffmann, T. C., Mulrow, C. D., ... & Moher, D. (2021). The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. Bmj, 372. ASReview LAB developers. (2024). ASReview LAB - A tool for AI-assisted systematic reviews (v1.5). Zenodo. https://doi.org/10.5281/zenodo.10464713 Additional Declarations There is NO Competing Interest. Supplementary Files SupplementaryTable1methodtable.xlsx Supplementary Table 1 SupplementaryTable2studytable.xlsx Supplementary Table 2 SupplementaryMaterial.docx Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-7289201","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":495527927,"identity":"e2ce2232-6986-4e4f-bde8-92305f634204","order_by":0,"name":"Osman Mutlu","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAsklEQVRIiWNgGAWjYFADdgbGB1AGkYCHmYHZAMxiJkELmwRRWvj5Tyd+LmDYlrifmcesmrftHoM5IS2SM3I3S89guJ3YA9Rym7etmMGymYAWgxu8G6R5YFpy2xIYDA4T0nL+7ObfMC3FxGk5kLsNbgszUVqAftlmzWNw27jnMFux9J9zCTwEtfDzn918m6fitmx7e/PGjzPKEuQMjjcQ0ANxHoLJQ4z6UTAKRsEoGAUEAAAUEzhDZoNPHgAAAABJRU5ErkJggg==","orcid":"https://orcid.org/0000-0001-6144-5685","institution":"Wageningen University \u0026 Research","correspondingAuthor":true,"prefix":"","firstName":"Osman","middleName":"","lastName":"Mutlu","suffix":""},{"id":495527928,"identity":"370deaf5-c98d-402e-83e3-8f0a4c2ddda4","order_by":1,"name":"Bas van der Velden","email":"","orcid":"","institution":"Wageningen Food Safety Research","correspondingAuthor":false,"prefix":"","firstName":"Bas","middleName":"van der","lastName":"Velden","suffix":""},{"id":495527929,"identity":"9d73cd9d-3597-404f-8a68-9bdeed861aaa","order_by":2,"name":"Ali Hürriyetoğlu","email":"","orcid":"","institution":"Wageningen Food Safety Research","correspondingAuthor":false,"prefix":"","firstName":"Ali","middleName":"","lastName":"Hürriyetoğlu","suffix":""},{"id":495527930,"identity":"dea6a2a4-d0f4-4be7-b703-63c298f2a430","order_by":3,"name":"Anna Fensel","email":"","orcid":"","institution":"Wageningen University \u0026 Research","correspondingAuthor":false,"prefix":"","firstName":"Anna","middleName":"","lastName":"Fensel","suffix":""}],"badges":[],"createdAt":"2025-08-04 09:11:59","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-7289201/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-7289201/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":89556632,"identity":"11350dc1-0f0a-4765-8a49-c77bacac8182","added_by":"auto","created_at":"2025-08-21 09:38:05","extension":"jpg","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":42295,"visible":true,"origin":"","legend":"\u003cp\u003eshows the number of articles with XAI in food published each year, resembling an exponential increase. The colors refer to the context of the study: if the authors use a known XAI method (off-the-shelf, blue area), or if they introduce or adapt an XAI method (tailor-made, yellow area). Although they are not included in the results, review/discussion papers are shown here to have an indication of the trend of XAI in food (green area).\u003c/p\u003e","description":"","filename":"1.jpg","url":"https://assets-eu.researchsquare.com/files/rs-7289201/v1/2e9174f0800443b9d287d9df.jpg"},{"id":89557592,"identity":"01cdf4c1-5a84-40cb-b919-108cbffe9b9f","added_by":"auto","created_at":"2025-08-21 09:46:05","extension":"jpg","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":72218,"visible":true,"origin":"","legend":"\u003cp\u003eillustrates the distribution of all studies from different fields. Backpropagation- and perturbation-based are the most commonly used XAI methods. A notable deviation is seen in food safety, where backpropagation-based methods have little uptake. The total amount of papers is higher than the number of included papers, since papers can report on more than one food domain.\u003c/p\u003e","description":"","filename":"2.jpg","url":"https://assets-eu.researchsquare.com/files/rs-7289201/v1/41c4fef948177e867bb3dbc0.jpg"},{"id":89556639,"identity":"6efc2a6e-9dc0-4c9b-a4ff-36cc31048ae3","added_by":"auto","created_at":"2025-08-21 09:38:05","extension":"jpg","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":59960,"visible":true,"origin":"","legend":"\u003cp\u003eprovides a distribution of benefits of using XAI, and evaluation strategies for XAI methods in food. Even though the main intended benefit of using XAI was model understanding, there were other motivations to use XAI in food. Most of the studies did not evaluate the XAI methods used. The coloring of the heatmap is done in log scale to better present the distributions.\u003c/p\u003e","description":"","filename":"3.jpg","url":"https://assets-eu.researchsquare.com/files/rs-7289201/v1/45117ed35f7f7559f02621b5.jpg"},{"id":89556636,"identity":"ed2577b2-899c-4f47-9e4a-c37e986080cd","added_by":"auto","created_at":"2025-08-21 09:38:05","extension":"jpg","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":73023,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003ePRISMA guidelines 2020 flow diagram\u003c/em\u003e\u003csup\u003e\u003cem\u003e44\u003c/em\u003e\u003c/sup\u003e\u003cem\u003e. This diagram details the number of articles removed at each step for multiple reasons. Records marked as ineligible refer to records with the abstract “[No abstract available]” in the excel file from exporting the resulting Scopus search.\u003c/em\u003e\u003c/p\u003e","description":"","filename":"4.jpg","url":"https://assets-eu.researchsquare.com/files/rs-7289201/v1/07efed06478dfabd719a35b8.jpg"},{"id":97897788,"identity":"217e10f3-fbc8-4cb0-a60e-81983375128d","added_by":"auto","created_at":"2025-12-10 15:38:14","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":772518,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7289201/v1/9453f9c1-4d73-4a45-92b6-8ed5fa3cb695.pdf"},{"id":89557591,"identity":"29ecccae-f83d-446f-865f-648c181ea514","added_by":"auto","created_at":"2025-08-21 09:46:05","extension":"xlsx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":15339,"visible":true,"origin":"","legend":"Supplementary Table 1","description":"","filename":"SupplementaryTable1methodtable.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-7289201/v1/9f173d2376bbc6e7d651dcc6.xlsx"},{"id":89556637,"identity":"82ad5d5d-666e-44f9-9e70-30c7dd677f71","added_by":"auto","created_at":"2025-08-21 09:38:05","extension":"xlsx","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":30092,"visible":true,"origin":"","legend":"Supplementary Table 2","description":"","filename":"SupplementaryTable2studytable.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-7289201/v1/3ac17a5756f1bd472199c5dc.xlsx"},{"id":89557593,"identity":"8f2a17fe-1191-438d-ac90-6714e2825991","added_by":"auto","created_at":"2025-08-21 09:46:05","extension":"docx","order_by":3,"title":"","display":"","copyAsset":false,"role":"supplement","size":16641,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryMaterial.docx","url":"https://assets-eu.researchsquare.com/files/rs-7289201/v1/7392388f002570e46a0c19af.docx"}],"financialInterests":"There is \u003cb\u003eNO\u003c/b\u003e Competing Interest.","formattedTitle":"Explainable Artificial Intelligence for Deep Learning in Food","fulltext":[{"header":"1 Introduction","content":"\u003cp\u003eArtificial intelligence (AI) is revolutionizing our lives, industry, and research. This includes food systems\u003csup\u003e1\u003c/sup\u003e. \u0026nbsp;AI, commonly referring to deep learning, is prominently used in all stages of the supply chain for food quality, food safety, food security\u003csup\u003e2,3\u003c/sup\u003e. Despite the success of deep learning, it has pivotal drawbacks: it depends heavily on its training data and is prone to biases\u0026nbsp;—\u0026nbsp;both known and unknown\u0026nbsp;—\u0026nbsp;that can be difficult to detect\u003csup\u003e4\u003c/sup\u003e. It is difficult to “see inside” deep learning models due to their \"deep\" and non-linear nature, leading to challenges in understanding how AI works (i.e., explainability). Such a “black box” in food systems creates unacceptable risks for human health and environment\u003csup\u003e5\u003c/sup\u003e.\u003c/p\u003e\n\u003cp\u003eExplainability of deep learning is defined as the ability to explain model’s reasoning, functioning, and/or behavior in human understandable terms\u003csup\u003e6\u003c/sup\u003e. The European Union recognized the importance of explainability and proposed the AI legislative act in 2021\u003csup\u003e7\u003c/sup\u003e, which came into force in 2024. Explainability is an essential part of applications affecting our health and society, such as those on food systems.\u003c/p\u003e\n\u003cp\u003eExplainable AI (XAI) has recently emerged as a response to the explainability challenges\u003csup\u003e8,9\u003c/sup\u003e. Despite the growing interest in XAI, there remains a significant gap in literature specifically focusing on its application within food. Guided by the recently introduced AI Act, this systematic review addresses this gap by providing a comprehensive and structured analysis of XAI in food. This review (i) examines the volume and trends of XAI research in food; (ii) offers a practical guidance on what type of XAI to apply, when to apply it, and what outcomes to expect; (iii) reveals the motivations for using XAI, (iv) describes the strategies for XAI evaluation, and (v) explores the future of XAI in food with a focus on methods tailored to food data.\u003c/p\u003e"},{"header":"2 Results","content":"\u003cp\u003eWe included 239 out of 2,876 studies in our review (see Methods section). All resulting categories and sub-categories, and their descriptions can be found in \u003cem\u003eTable 1\u003c/em\u003e.\u003c/p\u003e\n\u003cp\u003e\u003cem\u003eTable 1. This table contains all categories, sub-categories and their definitions. \u0026ldquo;Method type\u0026rdquo;, \u0026ldquo;Model agnostic vs specific explanations\u0026rdquo;, \u0026ldquo;Model-based vs post hoc explanations\u0026rdquo; and \u0026ldquo;Global vs local explanations\u0026rdquo; categories are per XAI method, while the rest of the categories are per study.\u003c/em\u003e\u003c/p\u003e\n\u003ctable border=\"1\" cellspacing=\"0\" cellpadding=\"0\" width=\"623\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eCategory\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eSub-category\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 374px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eDescription\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd rowspan=\"8\" valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003eMethod Type\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003eBackpropagation-based\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 374px;\"\u003e\n \u003cp\u003eXAI methods for neural networks that use the gradient information from backpropagation or layer activations.\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003eDimension reduction\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 374px;\"\u003e\n \u003cp\u003eXAI methods that use dimension reduction techniques on extracted features.\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003eGraph-based\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 374px;\"\u003e\n \u003cp\u003eXAI methods for explainability that make use of a graph structure, either by learning a graph structure and visualizing it, or using a knowledge graph to explain a model.\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003eInterpretable concepts\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 374px;\"\u003e\n \u003cp\u003eXAI methods that provide an interpretation in terms of human-friendly concepts such as textures, stripes, or shapes.\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003eOcclusion-based\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 374px;\"\u003e\n \u003cp\u003eXAI methods that occlude/mask/delete a part of the input data to understand its relevance to output.\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003ePerturbation-based\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 374px;\"\u003e\n \u003cp\u003eXAI methods that change the input to see the difference between the original and changed output.\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003eTrainable attention\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 374px;\"\u003e\n \u003cp\u003eVisualizations of attention mechanisms integrated into model training process.\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003eOther\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 374px;\"\u003e\n \u003cp\u003eAny other XAI method.\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd rowspan=\"3\" valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003eModel agnostic vs\u0026nbsp;\u003cbr\u003e\u0026nbsp;specific explanations\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003eModel agnostic\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 374px;\"\u003e\n \u003cp\u003eXAI methods that are applicable to any model. These usually require only input and output of a model, which are universal for machine learning models.\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003eModel specific\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 374px;\"\u003e\n \u003cp\u003eXAI methods that use extra information, e.g. gradients or activations, to generate explanations. This category also includes all model-based explanations.\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003eMultiple\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 374px;\"\u003e\n \u003cp\u003eMultiple XAI methods with different explanations are used.\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd rowspan=\"3\" valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003eModel-based vs\u0026nbsp;\u003cbr\u003e\u0026nbsp;post hoc explanations\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003eModel-based\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 374px;\"\u003e\n \u003cp\u003eExplanations generated by explainable-by-design models. Also called integrated explanations or inherently explainable.\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003ePost hoc\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 374px;\"\u003e\n \u003cp\u003eExplanations generated after the model training is finished.\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003eMultiple\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 374px;\"\u003e\n \u003cp\u003eMultiple XAI methods with different explanations are used.\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd rowspan=\"4\" valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003eGlobal vs\u0026nbsp;\u003cbr\u003e\u0026nbsp;local explanations\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003eGlobal\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 374px;\"\u003e\n \u003cp\u003eThe explanations provide global information about the model, mostly about the importance of individual input features.\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003eLocal\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 374px;\"\u003e\n \u003cp\u003eThe explanations provide information about the model with respect to a single data sample. It is sometimes possible to aggregate these explanations and achieve pseudo-global explanations.\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003eGlobal and local\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 374px;\"\u003e\n \u003cp\u003eThe XAI method provides both global and local information at the same time.\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003eMultiple\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 374px;\"\u003e\n \u003cp\u003eMultiple XAI methods with different explanations are used.\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd rowspan=\"6\" valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003eFood Domain\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003eFood security\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 374px;\"\u003e\n \u003cp\u003eFood and Agriculture Organization of United Nations (FAO) refers to a definition of food security from the World Food Summit of 1996: \u0026ldquo;Food security exists when all people, at all times, have physical and economic access to sufficient, safe and nutritious food that meets their dietary needs and food preferences for an active and healthy life.\u0026rdquo;, and identifies four main dimensions of food security: availability of food, access to food, food utilization, and stability of these dimensions over time.\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003eFood quality\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 374px;\"\u003e\n \u003cp\u003eFood quality is comprised of quality characteristics of food such as the external (e.g. color, size) and internal (e.g. microbial, chemical) factors, and texture and flavor. Desired characteristics are often shaped by the food industry, and consumers and their behavior.\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003eFood safety\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 374px;\"\u003e\n \u003cp\u003eFood safety refers to ensuring that food is safe to consume by humans. The definition of safe can change over time and place.\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003eNutrition science\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 374px;\"\u003e\n \u003cp\u003eNutrition science is the science that studies the physiological process of nutrition (primarily human nutrition), interpreting the nutrients and other substances in food in relation to maintenance, growth, reproduction, health and disease of an organism. (\u0026quot;Joint Collection Development Policy: Human Nutrition and Food\u0026quot;. US National Library of Medicine, National Institutes of Health. 14 October 2014. Retrieved 13 December 2014.)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003eFood fraud\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 374px;\"\u003e\n \u003cp\u003eFood fraud \u0026ldquo;is deception of consumers using food products, ingredients and packaging for economic gain and includes substitution, unapproved enhancements, misbranding, counterfeiting, stolen goods or others.\u0026rdquo; GFSI Position on Mitigating the Public Health Risk of Food Fraud (2014).\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003eOther\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 374px;\"\u003e\n \u003cp\u003eOther tasks related to food.\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd rowspan=\"8\" valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003eData Modality\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003eTabular\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 374px;\"\u003e\n \u003cp\u003eData that is presented in a table format, i.e. rows and columns.\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003eImage\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 374px;\"\u003e\n \u003cp\u003eData that is in image format.\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003eSpectral\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 374px;\"\u003e\n \u003cp\u003eChemical data, like reflectance data, NIRS etc.\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003eText\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 374px;\"\u003e\n \u003cp\u003eRaw text data.\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003eGenomics\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 374px;\"\u003e\n \u003cp\u003eGenome data, such as DNA or RNA strings of nucleotides.\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003eMolecular\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 374px;\"\u003e\n \u003cp\u003eStrings of elements that make up molecules.\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003eGraph\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 374px;\"\u003e\n \u003cp\u003eData that is in graph format where there are connections between nodes in the graph.\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003eSound\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 374px;\"\u003e\n \u003cp\u003eRaw sound data, specifically Fbanks representing audio as a series of frequency bands.\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd rowspan=\"5\" valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003eBenefit\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003eModel understanding\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 374px;\"\u003e\n \u003cp\u003eXAI was used to understand and evaluate the model. This is the default category.\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003eModel enhancement\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 374px;\"\u003e\n \u003cp\u003eXAI was used to enhance the model, in performance or robustness. This category requires explanations to be integrated into the model.\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003eDownstream task\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 374px;\"\u003e\n \u003cp\u003eXAI was used to predict the output for a downstream task of the original model\u0026apos;s task.\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003eKnowledge discovery\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 374px;\"\u003e\n \u003cp\u003eXAI was used to discover new knowledge about the task.\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003eUser trust\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 374px;\"\u003e\n \u003cp\u003eXAI was used to gain the trust of the end-users.\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd rowspan=\"8\" valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003eEvaluation\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003eDomain expert\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 374px;\"\u003e\n \u003cp\u003eThe explanations were evaluated by domain experts in a systematic way.\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003eDomain- or method-specific metric\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 374px;\"\u003e\n \u003cp\u003eThe explanations were evaluated by a specific metric that is related to the XAI method or the task domain.\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003eDownstream task performance\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 374px;\"\u003e\n \u003cp\u003eThe explanations were evaluated by comparing the predictions on a downstream task to the labels in the downstream task.\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003eHuman improvement\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 374px;\"\u003e\n \u003cp\u003eThe explanations were evaluated by comparing human annotation performance before and after training the annotators with the explanations.\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003eModel performance\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 374px;\"\u003e\n \u003cp\u003eThe explanations were evaluated by comparing the performance of the model with and without the integrated explanations.\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003eMultiple method comparison\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 374px;\"\u003e\n \u003cp\u003eThe explanations were evaluated by comparing multiple explanations from different XAI methods.\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003eUser survey\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 374px;\"\u003e\n \u003cp\u003eThe explanations were evaluated by performing a user survey with relevant questions.\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003eNo evaluation\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 374px;\"\u003e\n \u003cp\u003eThe explanations were not evaluated. Studies that show only a couple of explanations as evidence fall into this category.\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003ch2\u003e2.1 Nearly all papers used off-the-shelf XAI\u003c/h2\u003e\n\u003cp\u003eResearchers have employed deep learning at a high rate in food as the amount of data, and the variance and availability of collected data increased\u003csup\u003e2,3\u003c/sup\u003e. Subsequently, we observed an exponential increase in XAI (\u003cem\u003eFig. 1)\u003c/em\u003e. While most studies used off-the-shelf, popular and easily available XAI methods (207/239, 87%), some studies either introduce a new method or adapt an old one to their domain, which we refer to as tailor-made methods (32/239, 13%).\u003c/p\u003e\n\u003cp\u003eWe found 65 unique XAI methods, and grouped these into eight method types. We further categorized the methods as model agnostic versus model specific, model-based versus post hoc, and producing global versus local explanations\u003csup\u003e10\u003c/sup\u003e. All methods, their corresponding type, and other information are discussed in \u003cem\u003eSupplementary Material Table 1.\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eMost commonly used XAI methods include Grad-CAM (n=58), SHAP (n=58), and LIME (n=34)\u003csup\u003e11,12,13\u003c/sup\u003e. Model specific XAI methods were used twice as much as model agnostic XAI methods. 194/239 (81%) studies provided local explanations only, while 33/239 (14%) studies provided only global explanations. There were 4/239 (2%) studies that provided both global and local explanations. 189/239 (79%) studies used XAI after they train a model (post hoc), and 46/239 (19%) studies used model-based explanations. The remaining studies used multiple XAI methods with different characteristics.\u003c/p\u003e\n\u003cp\u003eData types play a role in selecting XAI methods as well. We identified eight data types (\u003cem\u003eTable 1\u003c/em\u003e). Studies with tabular data used 29% more perturbation-based XAI than average, and 35% less backpropagation-based XAI. Out of 62 studies that had tabular data, 31/62 (50%) used SHAP, and 6/62 (10%) used LIME. Meanwhile, studies with image data used backpropagation-based XAI 12% more than average and perturbation-based XAI 10% less than the average. For image data, Grad-CAM and LIME were used most.\u003c/p\u003e\n\u003cp\u003eA total of 47/239 (20%) studies used multiple XAI, with 13/47 (28%) using more than two methods. Of the studies using multiple XAI, 19/47 (40%) studies used methods from a single method type, 21/47 (45%) from 2 method types, and 7/47 (15%) from 3 method types.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eState of explainable AI in food\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe identified six domains in food based on tasks performed: food security, food quality, food safety, food fraud, nutrition, and other (\u003cem\u003eFig. 2)\u003c/em\u003e.\u003c/p\u003e\n\u003cp\u003eThe most common task for food security was crop disease detection (n=81). Approximately half of the XAI methods used were backpropagation-based, presumably since images were the dominant data type for this task. In addition, LIME was used in 20% of these studies, compared to 11% of the average LIME usage. This task includes the most unique studies that introduced tailor-made XAI methods in food (n=11). Tailor-made XAI methods included a\u0026nbsp;biologically inherently explainable deep learning model that outperformed similarly complex models in terms of accuracy and robustness\u003csup\u003e14\u003c/sup\u003e. They achieved this by incorporating the prior-knowledge about biophysical and biochemical attributes, and their hierarchical structures of the target crops. Shi et al. (2023)\u003csup\u003e15\u003c/sup\u003e aimed to detect wheat yellow rust disease from satellite images as opposed to leaf images. They proposed Fast Fourier Transform-based kernel for extracting explainable information from image time-series data, and\u0026nbsp;outperformed\u0026nbsp;multiple baselines.\u003c/p\u003e\n\u003cp\u003eAnother common task for food security was crop yield prediction (n=20), which helps farmers in precision agriculture and policymakers in decision-making. Backpropagation-based methods were preferred over perturbation-based methods, even though the data modality was mostly tabular data, contrary to the average for tabular data. Togninalli et al. (2023)\u003csup\u003e16\u003c/sup\u003e used multi-modal deep learning for crop yield prediction using thermal image, digital elevation model, multispectral and genomics data, and utilized an attention mechanism to gauge the contribution of each data type to the outcome of the model. Their model improved predicting yield for new crop lines in unseen environments using genomics data compared to other data types. The remaining 50 studies in food security performed 29 unique tasks. Chelali et al. (2021)\u003csup\u003e17\u003c/sup\u003e used satellite image data for land cover mapping, and created semantic maps using Grad-CAM explanations based on combination of spatial dimensions and the time dimension. Batchuluun et al. (2022)\u003csup\u003e18\u003c/sup\u003e utilized ground truth heatmaps generated from thermal images of plants to better train their model for plant identification.\u003c/p\u003e\n\u003cp\u003eTwenty nine food quality studies used Grad-CAM in 30% of the studies with a 12% increase from the average, while 49% of the studies used CAM-based XAI methods as opposed to an average of 28%. The studies included tasks such as freshness prediction, tastant classification, and quality inspection. Hsu et al. (2019)\u003csup\u003e19\u003c/sup\u003e built an explainable neural network approximation of correspondence analysis to scale it up to large and high-dimensional datasets with continuous features, and tested their method on a wine quality dataset. Chang et al. (2021)\u003csup\u003e20\u003c/sup\u003e adapted saliency maps explanations to spectral wavelength data for coffee flavor prediction. This allowed them to determine the effects of different molecule content on coffee flavor.\u003c/p\u003e\n\u003cp\u003eOur analysis resulted in a total of 13 food safety studies, three of which were cross-domain, performing pest recognition, contamination/quality warning and fraud detection. Notably, food safety included the only two studies using graph representation learning and sharing their resulting graphs. For example, Hao et al. (2024)\u003csup\u003e21\u003c/sup\u003e performed heavy metal pollution prediction in soil-rice systems and shared the resulting graph of effects of different environmental factors. In another food safety study, Bowler et al. (2022)\u003csup\u003e22\u003c/sup\u003e performed allergen classification on multiple types of food powder with different allergens using a deep learning model trained with spectral data, and applied a feature importance XAI method to show how much wavelengths, therefore molecules, contribute to the decision.\u003c/p\u003e\n\u003cp\u003eThe rest of the 50 studies included tasks about nutrition and food fraud, and tasks that did not directly fall into any domain. Seven out of nine nutrition studies performed food recognition on images to provide nutritional information. Fu et al. (2023)\u003csup\u003e23\u003c/sup\u003e created a knowledge graph on dietary nutrition and human health by automatically extracting information from scientific texts. They used the resulting knowledge graph to create an explainable question answering system.\u0026nbsp;Alongside the aforementioned fraud detection study, there were two studies on origin identification that both used XAI to select a subset of wavelengths in their spectral data. Finally, 38 studies that fell into the \u003cem\u003eOther\u003c/em\u003e category consisted of tasks such as carbon footprint prediction, soil fungal diversity, greenhouse gas flux prediction and anticancer food molecules prediction. Xie et al. (2024)\u003csup\u003e24\u003c/sup\u003e built a model on abnormal vocalization detection for livestock pig welfare using raw sound and spectral data. They showed attention scores and Grad-CAM explanations to differentiate between a normal sound, a cough, or a scream.\u003c/p\u003e\n\u003ch2\u003e2.2 Benefit of XAI is not limited to model understanding\u003c/h2\u003e\n\u003cp\u003eIn addition to legal requirements imposed by the EU AI act, there are further multiple reasons for researchers in food to employ XAI (\u003cem\u003eFig. 3)\u003c/em\u003e.\u003c/p\u003e\n\u003cp\u003eXAI mainly provides insights into how a deep learning model works. It is imperative to know that the model behaves in the intended way for the current task, and does not use a shortcut to find a well explainable but incorrect \u0026nbsp;solution\u003csup\u003e4,10\u003c/sup\u003e. XAI was used for better understanding and gauging the correctness of trained models in 172/239 (72%) studies. Di Martino et al. (2023)\u003csup\u003e25\u003c/sup\u003e introduced Gradient Sequential Latent Activation Mapping (Grad-SLAM), which is an extension of Grad-CAM, to explain how the classification of crops changed over time in satellite images, focusing on the temporal dimension instead of the spatial dimensions.\u003c/p\u003e\n\u003cp\u003eXAI also enhanced the performance, robustness, or runtime of deep learning models (47/239, 20%). To do so, XAI must be directly integrated in the model, producing model-based explanations by definition. Trainable attention, explainable feature extraction, and feature importance for feature selection methods are some examples. Shah et al. (2022)\u003csup\u003e26\u003c/sup\u003e built two models (teacher-student architecture) trained in an end-to-end fashion, reconstructing input images with explainable highlights and feeding the reconstructed image into the second model. Katafuchi et al. (2021)\u003csup\u003e27\u003c/sup\u003e exploited the color contrast between healthy and unhealthy parts of a leaf. \u0026nbsp;They trained a Generative Adversarial Network using healthy leaf images by color reconstruction, and for prediction, they checked the color difference between the reconstructed and the original image to highlight diseased parts if they exist.\u003c/p\u003e\n\u003cp\u003eXAI enhanced end-user trust in AI for 11/239 (5%) studies, feasibly to convince users with non-scientific background to follow advice from a deep learning model. Yuan et al. (2023)\u003csup\u003e28\u003c/sup\u003e extended Concept Activation Vectors (CAV) by introducing Automatic Visual Concept-based Explanation Generation (AVCEG). AVCEG added global explanations with directed graphs to make end-users trust a pest recognition application.\u003c/p\u003e\n\u003cp\u003eLastly, XAI was used for downstream tasks in 6/239 (3%) studies and for knowledge discovery in 3/239 (1%) studies. Coulibaly et al. (2022)\u003csup\u003e29\u003c/sup\u003e used explanations from multiple XAI methods that were applied on an image classifier trained on a pest recognition task to create bounding boxes for pests. Qiu et al. (2021)\u003csup\u003e30\u003c/sup\u003e created a fine-grained severity score from saliency maps as opposed to having discrete severity classes of powdery mildew disease in grapes. Finally, for knowledge discovery, Akagi et al. (2020)\u003csup\u003e31\u003c/sup\u003e applied XAI to gather novel insights on calyx-end cracking on persimmon fruits. Aside from finding a known identifier by the experts, color unevenness, they also stated that \u0026ldquo;substantial relevance peaks around the apex would not be interpretable from conventional empirical diagnosis\u0026rdquo;.\u003c/p\u003e\n\u003ch2\u003e2.3 Most XAI are not evaluated\u003c/h2\u003e\n\u003cp\u003eThe evaluation strategies of XAI were classified into a framework (\u003cem\u003eFig. 3)\u003c/em\u003e.\u003c/p\u003e\n\u003cp\u003eMost XAI was not evaluated (190/239, 79%). Among XAI that were evaluated, model performance was the most common evaluation criteria (27/49, 55%), typically measured through ablation studies, comparing model accuracy with and without integrated (model-based) explanations. This evaluation approach was coupled with the benefit of model enhancement. Shulman et al. (2024)\u003csup\u003e32\u003c/sup\u003e built a physics-guided, inherently explainable neural network for crop quality classification. The explainable neural network performed better than an identical black-box baseline, especially with limited data.\u003c/p\u003e\n\u003cp\u003eAnother evaluation approach was based on functionally comparing explanations from multiple XAI methods (7/49, 14%). Bengamra\u0026nbsp;et al. (2023)\u003csup\u003e33\u003c/sup\u003e extended an existing XAI method, and compared their proposed XAI method with its predecessor by creating deletion and insertion curves while gradually altering the input. A limitation of this approach is the need to have similar types of explanations in order to be comparable using a single function to evaluate.\u003c/p\u003e\n\u003cp\u003eThe accuracy of a downstream task of the original was also used to evaluate XAI. This approach was used when the employed XAI method generates explanations in a downstream task as mentioned in \u003cem\u003eSection 2.2\u003c/em\u003e. The performance of the explanations in the new task was the evaluation of the XAI in 6/49 (12%) studies. Yang et al. (2023)\u003csup\u003e34\u003c/sup\u003e employed multiple Class Activation Mapping (CAM)-based XAI methods on models trained for crop disease detection, then evaluated these methods comparing explanations and ground truth disease masks.\u003c/p\u003e\n\u003cp\u003eMost of the quality datasets used for training deep learning are created manually by domain experts; this also applies to evaluating explanations from XAI. Our findings indicate that this is not always feasible (5/49, 10%). Ghosal et al. (2018)\u003csup\u003e35\u003c/sup\u003e used around a thousand crop disease expert-annotated masks to evaluate their proposed XAI method. The method consisted of a clustering and aggregation algorithm for CAM explanations from different layers of a CNN.\u003c/p\u003e\n\u003cp\u003eOther approaches of evaluation included user surveys, applying a domain- or method-specific metric, and measuring human improvement in a task. Chhetri et al. (2023)\u003csup\u003e36\u003c/sup\u003e performed user surveys and compared results for SHAP explanations with explanations from their proposed approach. The surveys included basic questions on usefulness, comprehension, and user\u0026rsquo;s profession. Wang et al. (2022)\u003csup\u003e37\u003c/sup\u003e built a causal inference model on crop yield prediction and performed refutation tests, which are specific to causal models. Yu et al. (2023)\u003csup\u003e38\u003c/sup\u003e manually extracted features from activation maps of a model trained for freshness prediction for oranges. Teaching the extracted features or explanations to human annotators, they showed 10% overall improvement in correct prediction of shelf-life of oranges against a control group.\u003c/p\u003e"},{"header":"3 Discussion","content":"\u003cp\u003eWe saw an exponential increase in food studies and applications that used XAI. The uptake of XAI is an essential step towards better AI understanding, and therefore better control over applications that have a direct effect on our society and health. Our results indicate three major implications that need to be addressed. First, nearly all studies used off-the-shelf XAI methods, often overlooking specifics of food data. Second, the benefit of using XAI is not limited to model understanding. Third, most studies do not evaluate their XAI.\u003c/p\u003e\n\u003cp\u003eEighty-seven percent of the studies use known, off-the-shelf XAI methods due to their popularity. Despite being introduced several years ago and the emergence of newer methods, Grad-CAM (2019), SHAP (2017), and LIME (2016) remain the most widely used XAI methods. In fact, their use has not declined in recent years; in some cases, it has even increased. SHAP and LIME both produce additive explanations, which have been shown to be unfaithful to non-additive models, which include deep learning methods\u003csup\u003e39,40\u003c/sup\u003e. Meanwhile, Draelos et al. (2020)\u003csup\u003e41\u003c/sup\u003e illustrated that Grad-CAM may highlight locations the model does not use. Rudin (2019)\u003csup\u003e42\u003c/sup\u003e argues that XAI methods producing post hoc explanations are misleading, and to move forward and achieve reliable explainability for deep learning models, we need explainable-by-design XAI methods that are a part of the model building process.\u003c/p\u003e\n\u003cp\u003eExplainable-by-design XAI methods is one of the examples showing that there are benefits to using XAI other than model understanding. XAI can drive model enhancement, build user trust, support downstream tasks, and enable knowledge discovery. We argue that these benefits have direct and practical implications in food applications, particularly in the context of data collection, which is difficult and expensive to gather. One common example is feature selection: using XAI to identify the most relevant features can reduce data collection costs while maintaining model accuracy and, in some cases, increasing robustness. Building trust and effectively communicating insights to various stakeholders is also crucial in food systems. Furthermore, XAI can support downstream tasks by reducing the need for costly annotation — such as when detailed labels are required — through leveraging insights from models trained on coarser labels. Finally, XAI facilitates the discovery of new insights during the annotation process, leading to higher-quality datasets and deeper domain understanding.\u003c/p\u003e\n\u003cp\u003eWe observed that the evaluation of XAI methods is largely overlooked. The main incentive for XAI is to create a control mechanism against errors in models. If the method of control, XAI, is also erroneous and uncontrolled, we essentially create a new unknown while trying to understand the old one. Grad-CAM, SHAP, and LIME are the topmost used XAI methods, yet 88% of the studies using them do not perform any evaluation. Our review suggests several evaluation strategies relevant to the food domain, each with its own trade-offs. A simple approach is to compare explanations from different XAI methods, especially across method types, to see which fits best with a given model and context. However, the comparison is only between the XAI methods, and does not reflect the actual quality of the explanations. Similarly, using model performance as an evaluation proxy shares these limitations. Domain expert evaluation remains the most reliable method but is often costly and difficult — scoring a Grad-CAM heatmap is far easier than assessing a detailed knowledge graph. Another possible evaluation scenario that requires data is to evaluate the accuracy of downstream task predictions. Finally, domain- or method-specific metrics, though not always feasible, offer a promising direction for assessing explanation quality.\u003c/p\u003e\n\u003cp\u003eConcluding, we highlight the need for improvement in the usage of XAI in food, we show inspiring benefits for XAI, and we argue that evaluating explanations is as important as utilizing them in the first place. In medical science, a clear framework was developed to guide and evaluate clinical XAI\u003csup\u003e43\u003c/sup\u003e, but none of the off-the-shelf XAI methods passed this framework. Currently, there is no framework to even perform this test in food science, which is essential to develop better domain-specific XAI. Domain and AI experts from the entire food system should collaborate on such an evaluation framework in XAI for food to provide better guidance, tools, and evaluation.\u003c/p\u003e"},{"header":"4 Methods","content":"\u003cp\u003eOur review followed the PRISMA\u003csup\u003e44\u003c/sup\u003e guidelines, and the resulting flow diagram can be found in \u003cem\u003eFig. 4\u003c/em\u003e. We used four inclusion criteria: (i) using XAI, (ii) being in the food domain, (iii) applying or integrating XAI methods on deep learning (neural networks), and (iv) being peer-reviewed articles in English. The main categories for food were food security, food quality, food safety, food fraud, and nutrition. These categories were not predefined but emerged naturally based on the primary tasks addressed in the relevant studies. The studies that did not directly fall into the aforementioned main categories were labelled as \u0026ldquo;Other\u0026rdquo;. Subjects such as consumer behavior and opinion or marketing of food products were out of scope for this review. The full list of exclusion criteria are in the Supplementary Materials under \u0026ldquo;Exclusion criteria\u0026rdquo;.\u003c/p\u003e\n\u003ch2\u003e4.1\u0026nbsp; \u0026nbsp;Search query and results\u003c/h2\u003e\n\u003cp\u003eWe constructed our search query to broadly capture studies related to both XAI and the food domain, while minimizing irrelevant results. We used two main keyword clusters\u0026mdash;one for XAI and one for food-related topics (Box 1). This query was designed to be recall oriented without introducing unnecessary clutter in the search results. For example, we ensured that terms \u0026ldquo;explanation\u0026rdquo; or \u0026ldquo;interpretation\u0026rdquo; were only included when clearly linked to AI. Similarly, the term \u0026quot;food*\u0026quot; was chosen to match a range of relevant domains such as food safety, food security, food quality, and more.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eTITLE-ABS-KEY(explainab* OR interpretab* OR xai OR exml OR ((explanation OR interpretation) AND (\u0026quot;machine learning\u0026quot; OR \u0026quot;deep learning\u0026quot; OR \u0026quot;artificial intelligence\u0026quot;))) AND TITLE-ABS-KEY(food* OR agro* OR agri* OR \u0026quot;crop *\u0026quot;)\u003c/p\u003e\n\u003cp\u003e\u003cem\u003eBox 1. Selected search query for Scopus. This query can be considered as two semantic parts, each part inside the parentheses of TITLE-ABS-KEY.\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eWe searched Scopus, specifically titles, abstracts and keywords, with the selected query and retrieved 2897 studies as a result on August 1, 2024. After deduplication via direct title matching and keeping the first duplicate instance, 2876 unique studies were left.\u003c/p\u003e\n\u003ch2\u003e4.2\u0026nbsp; \u0026nbsp;Abstract screening\u003c/h2\u003e\n\u003cp\u003eWe used ASReview tool\u003csup\u003e45\u003c/sup\u003e, its version 1.5, throughout the abstract screening process. ASReview tool is a labelling software specifically tailored for abstract screening that utilizes the active learning method. The active learning method refers to updating a predictive model as the labelling process continues and ranking the samples from the most relevant to the least relevant. This method allows the annotator to see more relevant samples quicker than a random selection, which is especially useful where the number of relevant samples is low. As more and more abstracts are labelled, the active learning model\u0026rsquo;s accuracy improves.\u003c/p\u003e\n\u003cp\u003eIn principle, a stopping criterion can be applied once the remaining abstracts are unlikely to be relevant. We opted to screen all of the abstracts instead of stopping after encountering an arbitrary number of irrelevant records back-to-back. Options for the active learning model were \u003cem\u003eSentence BERT\u003c/em\u003e, \u003cem\u003eFully connected neural network\u003c/em\u003e, \u003cem\u003eMixed\u003c/em\u003e, and \u003cem\u003eDynamic resampling\u003c/em\u003e for \u003cem\u003eFeature extraction technique\u003c/em\u003e, \u003cem\u003eClassifier\u003c/em\u003e, \u003cem\u003eQuery strategy\u003c/em\u003e, and \u003cem\u003eBalance strategy\u003c/em\u003e, respectively.\u003c/p\u003e\n\u003ch2\u003e4.3\u0026nbsp; \u0026nbsp;Data generation\u003c/h2\u003e\n\u003cp\u003eAfter completing the screening process, 96 relevant articles were selected. These formed the basis for identifying the main categories and sub-categories used in our framework. Our strategy for defining sub-categories was to be as specific as possible to reduce the need for revisiting articles later. For example, rather than grouping all XAI methods into broad categories (like backpropagation-based or perturbation-based), we initially left these labels open-ended. Any generalization was deferred until after the data generation was complete. After we specified the categories and established their framework, we retrieved the rest of the articles and processed them according to the framework.\u003c/p\u003e\n\u003cp\u003eOne main caveat of the categorization was non-exclusivity of sub-categories. 26% of articles had at least one column that contained multiple sub-categories. The details and definitions for each category and sub-categories may be found in \u003cem\u003eTable 1\u003c/em\u003e.\u003c/p\u003e\n\u003ch2\u003e4.4\u0026nbsp; \u0026nbsp;Limitations\u003c/h2\u003e\n\u003cp\u003eDue to our keyword selection, we could only detect papers that explicitly mention explainability in their content. Some authors did not mention in their abstracts that they used visualizations, attention mechanisms, inherently explainable models, or XAI methods, simply because it was not the focus of their study.\u003c/p\u003e\n\u003cp\u003eWe only included English articles, which lead to a language and culture bias. In addition, we only included peer-reviewed articles, which might underrepresent emerging or unconventional uses of XAI in food.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eNguyen, H. (2018). Sustainable food systems: Concept and framework.\u003c/li\u003e\n\u003cli\u003eZhou, L., Zhang, C., Liu, F., Qiu, Z., \u0026amp; He, Y. (2019). Application of deep learning in food: a review. Comprehensive reviews in food science and food safety, 18(6), 1793-1811.\u003c/li\u003e\n\u003cli\u003eYang, B., \u0026amp; Xu, Y. (2021). Applications of deep-learning approaches in horticultural research: a review. Horticulture Research, 8.\u003c/li\u003e\n\u003cli\u003eGeirhos, R., Jacobsen, J. H., Michaelis, C., Zemel, R., Brendel, W., Bethge, M., \u0026amp; Wichmann, F. A. (2020). Shortcut learning in deep neural networks. Nature Machine Intelligence, 2(11), 665-673.\u003c/li\u003e\n\u003cli\u003eTzachor, A., Devare, M., King, B., Avin, S., \u0026amp; \u0026Oacute; h\u0026Eacute;igeartaigh, S. (2022). Responsible artificial intelligence in agriculture requires systemic understanding of risks and externalities. Nature Machine Intelligence , 4(2), 104-109.\u003c/li\u003e\n\u003cli\u003eNauta, M., Trienes, J., Pathak, S., Nguyen, E., Peters, M., Schmitt, Y., ... \u0026amp; Seifert, C. (2023). From anecdotal evidence to quantitative evaluation methods: A systematic review on evaluating explainable ai. ACM Computing Surveys, 55(13s), 1-42.\u003c/li\u003e\n\u003cli\u003eProposal for a Regulation of the European Parliament and of the Council laying down harmonised rules on Artificial Intelligence (Artificial Intelligence Act) and amending certain union legislative acts COM(2021) 206 Final.\u003c/li\u003e\n\u003cli\u003eAdadi, A., \u0026amp; Berrada, M. (2018). Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE access, 6, 52138-52160.\u003c/li\u003e\n\u003cli\u003eHassija, V., Chamola, V., Mahapatra, A., Singal, A., Goel, D., Huang, K., ... \u0026amp; Hussain, A. (2024). Interpreting black-box models: a review on explainable artificial intelligence. Cognitive Computation, 16(1), 45-74.\u003c/li\u003e\n\u003cli\u003eVan der Velden, B. H., Kuijf, H. J., Gilhuijs, K. G., \u0026amp; Viergever, M. A. (2022). Explainable artificial intelligence (XAI) in deep learning-based medical image analysis. Medical Image Analysis, 79, 102470.\u003c/li\u003e\n\u003cli\u003eSelvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., \u0026amp; Batra, D. (2017). Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision (pp. 618-626).\u003c/li\u003e\n\u003cli\u003eLundberg, S. M., \u0026amp; Lee, S. I. (2017). A unified approach to interpreting model predictions. Advances in neural information processing systems, 30.\u003c/li\u003e\n\u003cli\u003eRibeiro, M. T., Singh, S., \u0026amp; Guestrin, C. (2016, August). \u0026quot; Why should i trust you?\u0026quot; Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1135-1144).\u003c/li\u003e\n\u003cli\u003eShi, Y., Han, L., Huang, W., Chang, S., Dong, Y., Dancey, D., \u0026amp; Han, L. (2021). A biologically interpretable two-stage deep neural network (BIT-DNN) for vegetation recognition from hyperspectral imagery. IEEE Transactions on Geoscience and Remote Sensing, 60, 1-20. \u003c/li\u003e\n\u003cli\u003eShi, Y., Han, L., Gonz\u0026aacute;lez-Moreno, P., Dancey, D., Huang, W., Zhang, Z., ... \u0026amp; Dai, M. (2023). A fast Fourier convolutional deep neural network for accurate and explainable discrimination of wheat yellow rust and nitrogen deficiency from Sentinel-2 time series data. Frontiers in Plant Science, 14, 1250844. \u003c/li\u003e\n\u003cli\u003eTogninalli, M., Wang, X., Kucera, T., Shrestha, S., Juliana, P., Mondal, S., ... \u0026amp; Poland, J. (2023). Multi-modal deep learning improves grain yield prediction in wheat breeding by fusing genomics and phenomics. Bioinformatics, 39(6), btad336. \u003c/li\u003e\n\u003cli\u003eChelali, M., Kurtz, C., Puissant, A., \u0026amp; Vincent, N. (2021). Deep-STaR: Classification of image time series based on spatio-temporal representations. Computer Vision and Image Understanding, 208, 103221. \u003c/li\u003e\n\u003cli\u003eBatchuluun, G., Nam, S. H., \u0026amp; Park, K. R. (2022). Deep learning-based plant classification and crop disease classification by thermal camera. Journal of King Saud University-Computer and Information Sciences, 34(10), 10474-10486. \u003c/li\u003e\n\u003cli\u003eHsu, H., Salamatian, S., \u0026amp; Calmon, F. P. (2019, April). Correspondence analysis using neural networks. In The 22nd International Conference on Artificial Intelligence and Statistics (pp. 2671-2680). PMLR. \u003c/li\u003e\n\u003cli\u003eChang, Y. T., Hsueh, M. C., Hung, S. P., Lu, J. M., Peng, J. H., \u0026amp; Chen, S. F. (2021). Prediction of specialty coffee flavors based on near‐infrared spectra using machine‑and deep‐learning methods. Journal of the Science of Food and Agriculture, 101(11), 4705-4714. \u003c/li\u003e\n\u003cli\u003eHao, H., Li, P., Li, K., Shan, Y., Liu, F., Hu, N., ... \u0026amp; Jiao, W. (2024). A novel prediction approach driven by graph representation learning for heavy metal concentrations. Science of The Total Environment, 947, 174713. \u003c/li\u003e\n\u003cli\u003eBowler, A. L., Ozturk, S., Rady, A., \u0026amp; Watson, N. (2022). Domain adaptation for in-line allergen classification of agri-food powders using near-infrared spectroscopy. Sensors, 22(19), 7239. \u003c/li\u003e\n\u003cli\u003eFu, C., Pan, X., Wu, J., Cai, J., Huang, Z., van Harmelen, F., ... \u0026amp; He, T. (2023). KG4NH: a comprehensive knowledge graph for question answering in dietary nutrition and human health. IEEE journal of biomedical and health informatics. \u003c/li\u003e\n\u003cli\u003eXie, Y., Wang, J., Chen, C., Yin, T., Yang, S., Li, Z., ... \u0026amp; Gan, L. (2024). Sound identification of abnormal pig vocalizations: Enhancing livestock welfare monitoring on smart farms. Information Processing \u0026amp; Management, 61(4), 103770. \u003c/li\u003e\n\u003cli\u003eDi Martino, T., Guinvarc\u0026rsquo;h, R., Thirion-Lefevre, L., \u0026amp; Colin, \u0026Eacute;. (2023). Grad-SLAM: Explaining convolutional autoencoders\u0026rsquo; Latent space of satellite image time series. IEEE Geoscience and Remote Sensing Letters, 20, 1-5. \u003c/li\u003e\n\u003cli\u003eShah, D., Trivedi, V., Sheth, V., Shah, A., \u0026amp; Chauhan, U. (2022). ResTS: Residual deep interpretable architecture for plant disease detection. Information Processing in Agriculture, 9(2), 212-223. \u003c/li\u003e\n\u003cli\u003eKatafuchi, R., \u0026amp; Tokunaga, T. (2020). Image-based plant disease diagnosis with unsupervised anomaly detection based on reconstructability of colors. arXiv preprint arXiv:2011.14306. \u003c/li\u003e\n\u003cli\u003eYuan, Z., Liu, K., Li, S., \u0026amp; Yang, P. (2023, July). Automatic generation of visual concept-based explanations for pest recognition. In 2023 IEEE 21st International Conference on Industrial Informatics (INDIN) (pp. 1-6). IEEE. \u003c/li\u003e\n\u003cli\u003eCoulibaly, S., Kamsu-Foguem, B., Kamissoko, D., \u0026amp; Traore, D. (2022). Explainable deep convolutional neural networks for insect pest recognition. Journal of Cleaner Production, 371, 133638.\u003c/li\u003e\n\u003cli\u003eQiu, T., Underhill, A., Sapkota, S. D., Cadle-Davidson, L., \u0026amp; Jiang, Y. (2021). Deep learning-based saliency maps for the quantification of grape powdery mildew at the microscopic level. In 2021 ASABE Annual International Virtual Meeting (p. 1). American Society of Agricultural and Biological Engineers. \u003c/li\u003e\n\u003cli\u003eAkagi, T., Onishi, M., Masuda, K., Kuroki, R., Baba, K., Takeshita, K., ... \u0026amp; Ise, T. (2020). Explainable deep learning reproduces a \u0026lsquo;professional eye\u0026rsquo;on the diagnosis of internal disorders in persimmon fruit. Plant and Cell Physiology, 61(11), 1967-1973. \u003c/li\u003e\n\u003cli\u003eShulman, D., Israeli, A., Botnaro, Y., Margalit, O., Tamir, O., Naschitz, S., ... \u0026amp; Dattner, I. (2024). Physics-Guided Inverse Regression for Crop Quality Assessment. Journal of Agricultural, Biological and Environmental Statistics, 1-24. \u003c/li\u003e\n\u003cli\u003eBengamra, S., Zagrouba, E., \u0026amp; Bigand, A. (2023, August). Explainable AI for deep learning based potato leaf disease detection. In 2023 IEEE International Conference on Fuzzy Systems (FUZZ) (pp. 1-6). IEEE. \u003c/li\u003e\n\u003cli\u003eYang, S., Xing, Z., Wang, H., Gao, X., Dong, X., Yao, Y., ... \u0026amp; Liu, Z. (2023). Classification and localization of maize leaf spot disease based on weakly supervised learning. Frontiers in Plant Science, 14, 1128399. \u003c/li\u003e\n\u003cli\u003eGhosal, S., Blystone, D., Singh, A. K., Ganapathysubramanian, B., Singh, A., \u0026amp; Sarkar, S. (2018). An explainable deep machine vision framework for plant stress phenotyping. Proceedings of the National Academy of Sciences, 115(18), 4613-4618. \u003c/li\u003e\n\u003cli\u003eChhetri, T. R., Hohenegger, A., Fensel, A., Kasali, M. A., \u0026amp; Adekunle, A. A. (2023). Towards improving prediction accuracy and user-level explainability using deep learning and knowledge graphs: A study on cassava disease. Expert Systems with Applications, 233, 120955. \u003c/li\u003e\n\u003cli\u003eWang, Y., Chandrasekaran, J., Haberkorn, F., Dong, Y., Gopinath, M., \u0026amp; Batarseh, F. A. (2022, October). Deepfarm: AI-driven management of farm production using explainable causality. In 2022 IEEE 29th annual software technology conference (STC) (pp. 27-36). IEEE. \u003c/li\u003e\n\u003cli\u003eYu, Y., Deng, H., Chen, J., Cheng, Y., Xu, R., Li, S., \u0026amp; Chen, Y. (2023). Improving human intuition for vision-based freshness prediction of Citrus reticulata Blanco using machine learning. Scientia Horticulturae, 321, 112300.\u003c/li\u003e\n\u003cli\u003eGosiewska, A., \u0026amp; Biecek, P. (2019). Do not trust additive explanations. \u003cem\u003earXiv preprint arXiv:1903.11420\u003c/em\u003e.\u003c/li\u003e\n\u003cli\u003eMolnar, C., K\u0026ouml;nig, G., Herbinger, J., Freiesleben, T., Dandl, S., Scholbeck, C. A., ... \u0026amp; Bischl, B. (2020, July). General pitfalls of model-agnostic interpretation methods for machine learning models. In International Workshop on Extending Explainable AI Beyond Deep Models and Classifiers (pp. 39-68). Cham: Springer International Publishing.\u003c/li\u003e\n\u003cli\u003eDraelos, R. L., \u0026amp; Carin, L. (2020). Use HiResCAM instead of Grad-CAM for faithful explanations of convolutional neural networks. \u003cem\u003earXiv preprint arXiv:2011.08891\u003c/em\u003e.\u003c/li\u003e\n\u003cli\u003eRudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature machine intelligence, 1(5), 206-215.\u003c/li\u003e\n\u003cli\u003eJin, W., Li, X., Fatehi, M., \u0026amp; Hamarneh, G. (2023). Guidelines and evaluation of clinical explainable AI in medical image analysis. \u003cem\u003eMedical image analysis\u003c/em\u003e, \u003cem\u003e84\u003c/em\u003e, 102684.\u003c/li\u003e\n\u003cli\u003ePage, M. J., McKenzie, J. E., Bossuyt, P. M., Boutron, I., Hoffmann, T. C., Mulrow, C. D., ... \u0026amp; Moher, D. (2021). The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. Bmj, 372.\u003c/li\u003e\n\u003cli\u003eASReview LAB developers. (2024). ASReview LAB - A tool for AI-assisted systematic reviews (v1.5). Zenodo. https://doi.org/10.5281/zenodo.10464713\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"","lastPublishedDoi":"10.21203/rs.3.rs-7289201/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-7289201/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"The integration of artificial intelligence (AI) in food systems is accelerating. Explainable AI can help in understanding AI, but the literature is fragmented in food systems. In light of regulatory imperatives such as the European Union's AI Act, this systematic review brings together current research on explainable AI in food, highlighting key patterns and gaps. We find that most studies use off-the-shelf explainable AI tools that fail to address the complexities of food data. Beyond model transparency, explainable AI offers broader value in model enhancement, supporting trust, and knowledge discovery. However, most studies do not adequately evaluate the explainable AI methods they use. Advancing explainable AI in food systems requires tailored and carefully evaluated approaches to ensure responsible and effective AI deployment. Domain and AI experts from the entire food system should collaborate on an evaluation framework in explainable AI for food to provide better guidance, tools, and evaluation.","manuscriptTitle":"Explainable Artificial Intelligence for Deep Learning in Food","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-08-21 09:38:00","doi":"10.21203/rs.3.rs-7289201/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"7c2b5a4a-776b-412e-95fc-ec8f5a5ae61d","owner":[],"postedDate":"August 21st, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":52625994,"name":"Social science/Science, technology and society"},{"id":52625995,"name":"Scientific community and society/Agriculture"},{"id":52625996,"name":"Biological sciences/Computational biology and bioinformatics/Machine learning"}],"tags":[],"updatedAt":"2025-12-09T14:20:26+00:00","versionOfRecord":[],"versionCreatedAt":"2025-08-21 09:38:00","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-7289201","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-7289201","identity":"rs-7289201","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.