Regional prediction of deoxynivalenol contamination in spring oats in Sweden using machine learning | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Regional prediction of deoxynivalenol contamination in spring oats in Sweden using machine learning Xinxin Wang, Thomas BÖRJESSON, Johanna Wetterlind, HJ van der Fels-Klerx This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-3979106/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 04 Oct, 2024 Read the published version in npj Science of Food → Version 1 posted 9 You are reading this latest preprint version Abstract Weather conditions and agronomical factors are known to affect Fusarium spp. growth and ultimately deoxynivalenol (DON) contamination in oat. This study aimed to develop predictive models for the contamination of spring oat at harvest with DON on a regional basis in Sweden using machine-learning algorithms. Three models were developed as regional risk-assessment tools for farmers, crop collectors, and food safety inspectors, respectively. Data included weather data from different oat growing periods, agronomical data, site-specific data, and DON contamination data from the previous year. The region, year, spring oat variety, type of cultivation (organic or not) and if the oat is intended for feed or food - was used as input to predict DON contamination for entries into classes of low (< 500 µg/kg), medium (≥ 500 µg/kg, and < 1000 µg/kg), and high (≥ 1000 µg/kg). A random forest (RF) algorithm was applied to train the models. Results showed that: 1) RF models were able to predict DON contamination at harvest with a total classification accuracy of minimal 0.72, over the years 2012-2019, and above 0.90 in the years 2016-2017, however not for individual years not included in the training of the models (external validation); 2) good predictions could already be made in June but using weather variables in the full growing season could improve the model’s robustness; 3) weather variables were the most important for predicting DON contamination, but adding agronomical and site-specific factors to weather variables as model inputs could improve the overall model performance; 4) rainfall, relative humidity, and wind speed in different oat growing stages, followed by crop variety and elevation were the most important features for predicting DON contamination in spring oats at harvest. In future studies, it might be of interest to explore whether including data for other agronomic variables, such as fertilization, irrigation, and pest control, as well as satellite image data could further improve the model performance. Biological sciences/Microbiology/Fungi/Fungal biology Scientific community and society/Agriculture grain DON mycotoxin food safety forecasting machine learning crop variety crop rotation agronomical factors agronomy food safety management feature impact analysis Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Figure 9 Figure 10 Highlights • A classification model of regional prediction of deoxynivalenol contamination in oats using machine learning has been developed. • Model results showed prediction accuracy of >70% for internal validation and < 70% for external validation of different classes. • Rainfall, relative humidity, and wind speed in different growing stages as well as crop variety and elevation were the most important features for DON contamination in oats. • Agronomic and site-specific features showed to improve the overall performance of the model based on weather. 1. Introduction Oats can be susceptible to fungal infection of Fusarium spp. and subsequent deoxynivalenol (DON) contamination during the cultivation season (Hjelkrem et al., 2017 ; Munkvold, 2014 ). The presence of DON in oats-derived feed and food can affect human and animal health (Chain et al., 2017 ). In Europe, the European Commission has set maximum legal limits (1750 µg/kg) for the presence of DON in unprocessed durum wheat and oats (Commission, 2006 ), and has defined maximum recommendation thresholds (8000 µg/kg) for the presence of DON for cereals and cereal products (with the exception of maize by-products) used for feed (Commission, 2006 b). In Sweden, DON concentrations were too high to be fit for human consumption in half of all oats in 2011 and, since then, DON contamination of oats has gained significant attention (Hartman et al., 2021). After 2011, almost all oat products are monitored for DON contamination, which generates a high cost to stakeholders such as farmers, crop collectors, and food safety authorities. Early forecasting of the high-contamination regions of DON in oats at the regional (grid) level could provide timely advice on the need for crop protection and for risk based monitoring to reduce the chance of contaminated oats entering the food chain and reduce the monitoring costs. Weather conditions, such as temperature, relative humidity, and precipitation, have a significant effect on the presence of DON contamination in oats (Persson et al., 2017 ). This is because weather conditions affect the life cycle of toxigenic fungi, influence the interaction between the pathogen and host, and the pathogen’s ability to produce DON (Moretti et al., 2019 ; Perrone et al., 2020 ). Apart from weather conditions, agronomical factors could directly or indirectly promote the infection of Fusarium spp. in grains. These factors may include crop variety, crop rotation, soil type, elevation, and geolocation of the fields (Champeil et al., 2004 ; Czaban et al., 2015 ; Janssen et al., 2018 ; Krebs et al., 2000 ).. Previous studies aimed at developing prediction models for DON in oats have used weather data (Kaukoranta et al., 2019 ; Marzec-Schmidt et al., 2021 ; Persson et al., 2017 ; Xu et al., 2013 ). Only considering weather data may limit the model’s application in different regions or farms with different oat agronomic practices. One study concluded that DON prediction in oats could be improved using agronomic factors with weather-based risk index outputs (Persson et al., 2017 ). To date, only a few studies have used weather variables combined with agronomic and site-specific variables for early forecasting of DON contamination in oats. Apart from the data available for model development, the model algorithm used also affects the model performance. A study from Lindblad et al. ( 2012 ) who aimed to predict DON in oats, stated that very little of the variation in DON could be explained by weather conditions using statistical model. In addition to statistical models, machine learning has been proven to be of added value in the prediction of mycotoxins in grains (Camardo Leggieri et al., 2021; Wang, Liu, et al., 2022; Castano-Duque et al., 2022 ; Liu et al., 2018 ; Liu et al., 2021 ). One of these cited studies has applied a deep neural network to predict mycotoxin contamination in maize and concluded that the machine learning approach has added value to classical statistical approaches (i.e., simple or multiple linear regression models) (Camardo Leggieri et al., 2021). One study applied the random forest algorithm to predict multi-mycotoxin occurrence in wheat in Europe with > 90% accuracy (Wang et al., 2022 ). Another study applied gradient boosting and bayesian network modeling to predict mycotoxin contamination in maize in USA with overall accuracy of 94% (Castano-Duque et al., 2022 ). These studies showed high prediction accuracy using machine learning, however, these studies mainly focused on prediction of the presence of different mycotoxins in maize and wheat, not in oats. In addition, only a few studies provided explanations of model prediction results (i.e. the impact of input variables on the different mycotoxin contamination levels). Machine learning approaches are often seen as black boxes that provide recommendations without sufficient explanation of “which and how input variables generated the result”. This is not functional practice when the model results are to be used as support for decision-making. Furthermore, effects of single management practices (such as cultivar, tillage, and longitude and latitude) on DON contamination in wheat have been investigated for mycotoxin prevention and control, (Li et al., 2023 ), but collective effects of multi management practices (such as combination of regional characteristics) on DON contamination have not been explored yet. Such a collective effect is essential to provide advises for reducing DON contamination in oats. The aim of this study was to 1) develop predictive models for the contamination of DON in spring oats on a regional basis in Sweden using machine-learning algorithms, 2) explore the impacts of weather features, agronomical features, and site-specific features on the DON contamination levels, 3) explore the collective effect of multi management practices (combination of cultivar, crop rotation, and regional characteristics) on DON contamination and provide advice to reduce DON contamination in oats. 2. Materials and Methods Three models were developed as regional risk-assessment tools to be used by, farmers, crop collectors, and food safety inspectors, respectively. To provide a timely forecast of DON contamination for the different user groups, the three models aimed to provide the regional DON predictions at three different times during the oats vegetation period: i) SS-model: Start of Season model Nov 1 to June 1), which would allow for recommendations on crop protection activities for farmers, ii) MS-model: Mid-Season model (Nov 1 to July 1), which would allow for recommendations on sampling strategies and as an early warning concerning regional differences for crop collectors and food safety inspectors, and iii) FS-model: Full Season model (Nov 1 to Aug 15), which would allow for a more reliable indication on how to plan for sampling strategies. In each of the three models, the predictions of DON contamination levels were provided into one out of three levels of: low (< 500 µg/kg), medium (≥ 500 µg/kg, and < 1000 µg/kg), and high (≥ 1000 µg/kg). Weather factors and relevant agronomical and site-specific factors were used as model inputs. The weather features were selected as the monthly average from November to April and the weekly average from May to August. The reason for using more detailed information from May to August is that this is the period from oats stem elongation to harvest, when oats are known to be more sensitive to fungal infection (Hjelkrem et al., 2017 ; Munkvold, 2014 ). The other input factors were selected since they are known to be relevant to the DON contamination of crops, including the oats variety, crop rotation, and other agronomical features, (Selvaraj et al., 2015, Blandino et al., 2010; Landschoot et al., 2013; Maiorano et al., 2008); and site-specific factors such as soil type and elevation (Torelli et al., 2012; Lindblad et al., 2012 ). For example, crop variety influencesthe susceptibility of crops to abiotic factors, such as drought stress, that favour fungal growth and ultimately mycotoxin contamination (Kolawole et al., 2021; Polišenská et al., 2020). Crop rotation has an impact on DON contamination in grain due to the fact that Fusarium spp. contaminated debris from the earlier crop can survive on the soil surface for a long period and act as a reservoir for contamination (Blandino et al., 2010; Bottalico & Perrone, 2002; Landschoot et al., 2013; Selvaraj et al., 2015). 2.1. Data This study used DON contamination data, weather data, agronomical data and site-specific data, in Sweden. These data were selected for the period Nov 1 of the previous year to August 15 of the current oat growing year, to include all relevant stages of fungal infection and DON contamination of spring oat. Data were firstly pre-processed (see section related to each dataset below) and then linked together into one dataset based on the grid (11 x 11 km), year, and crop variety (Fig. 1 ). Data from the period Nov 1 to June 1 were used for developing the Start of Season (SS) model; the period Nov 1 to July 1 for developing the Mid-Season (MS) model; and data from the period Nov 1 to Aug 15 for the Full Season (FS) model. Then, two types of datasets were composed using different input variables separately for modeling. Dataset 1: weather and crop variety variables from the years 2012–2019. Dataset 2: weather,, crop variety, agronomical and site-specific variables from the years 2016–2017. Agronomical and site-specific variables were only available for the years 2016–2017. DON contamination data Data related to DON concentration in spring oats include 8 years (2012–2019) of monitoring results from oats grown in Sweden (54350 records in total) at the grid level (11km × 11km). These data were derived from analyses of oats delivered to Lantmännen elevators in Sweden. The variety was known for most of the samples, and the following varieties were occurring and used as input group variables: Belinda, Ingeborg, Galant, Guld, Symphony, Fatima, Kerstin and Matilda. Furthermore, one group called Feed oats (that could be different varieties) was also recorded as well as one group for which the variety had not been specified. Whether the oats was grown for feed or food use; if it was organically cultivated (EKO) or not; and mean DON value of the previous year in the same grid were used as model input variables. Mean DON values represented the average values of DON concentrations of each oats variety group in the particular grid in each year, provided the number of oat deliveries of that variety group was more than 10 in that particular grid in that year. DON contamination levels were used as a model output variable and were defined based on the mean values of DON concentration per region per year. 31% of the records referred to DON concentrations that were below limits of quantification (LOQ = 100 µg/kg) of analytical methods used (Ridascreen ELISA or Charm Later Flow Devices had been used for analyzing DON contents); 4% of the records referred to DON concentrations that were above maximum legal limits in foodstuffs (1750 µg/kg), and; 0.2% of the records were above maximum legal limits in feed (8000 µg/kg). Three contamination levels were set: low (82% records) (< 500µg/kg), medium (9% records) (≥ 500 µg/kg and < 1000 µg/kg), and high (8% records) (≥ 1000 µg/kg). These settings were chosen from a practical farming point of view; below 500, there is no need for farmers to take any actions, whereas above 1000, farmers are recommended to always consider spraying or to check the level by taking out a reference sample. Weather data Weather data include 8 years (2012–2019) of weather features in Sweden at grid level (11km × 11km). These data were derived from the Swedish Meteorological and Hydrological Institute (SMHI). Selected variables included the maximum air temperature (˚C) (HTEMP), minimum air temperature (˚C) (LTEMP), mean air temperature (˚C) (XTEMP), rainfall (mm) (NED), mean relative humidity (%) (XHUM), minimum relative humidity (%) (LHUM), maximum relative humidity (%) (HHUM), wind speed (XVH), wind direction (XVR), and global radiation (XM). Weekly mean values and weekly sum values per grid of the above-mentioned weather features were calculated in different oat growing periods for the development of three prediction models: SS-model (week 18–21), MS-model (week 18–26), and FS-model (week 18–33). In addition, monthly mean values and monthly sum values per grid of the above-mentioned weather features from Nov 1 of the previous year to April 30 of the current year were calculated and added to the three models. Agronomical and site-specific data Agronomical and site-specific data include 2 years (2016–2017) of agronomical features in Sweden at oat field level aggregated to the grid level (11km × 11km). These data were derived by linking the oats deliveries from one producer to the fields at which oats had been grown by that producer that year, and that geographical information. Data were extracted from several sources, and then linked with DON contamination levels per grid per year (11km × 11km). The derived variables included: oats variety; year; the value range and mean value of clay, sand, and elevation; the percentage of oat, ley, other cereals except for oat; and other crops grown in the fields in the previous year (pre-crop); and two years before (pre-pre crop). Information on pre-crops was extracted from the Land Parcel Identification System Maps provided by the Swedish Board of Agriculture. Elevation data were extracted from a 2x2 m digital elevation model in raster format provided by Lantmäteriet (Swedish National Survey, Gävle, Sweden) and soil texture information was extracted from a digital soil mapping of arable land in Sweden (Piikki & Söderström, 2019 ). 2.2. Data split for model training and validation Figure 2 shows the model development steps using dataset 1 and dataset 2. For dataset 1, records from the years 2012–2019 (except 2016) were split randomly into a training set (80%) for model learning, and a test set (20%) for internal model validation. Data from the year 2016 were used for external model validation only. The reason is that the distribution of DON contamination levels in the year 2016 was close to the average of year 2012–2019. The predicted model results for the test set were graphically compared with the measured (observed) mycotoxin data to visualize the model prediction ability. For dataset 2, records were split randomly into a training set (80%) for model learning, and a testing set (20%) for internal model validation. Because agronomical and site-specific data were only available in the year 2016 and 2017, no external validation was conducted here. In addition, to test the importance of adding other features to weather data in promoting the model’s predictive accuracy, for each dataset, the model performance was compared when using weather features only and when using weather with agronomical and site-specific features (the result of this comparison is added in the Appendix ). 2.3. Predictive model A machine learning module was developed to predict the contamination of DON in oats at the grid level in Sweden, in three levels for the likelihood of contamination (low, medium, high) using above mentioned variables as input. A random forest (RF) algorithm was applied because RF can automatically handle missing values, can efficiently handle non-linear parameters, is comparatively little impacted by noise, is robust to outliers and new data, avoids overfitting, is able to deal with unbalanced data, and is widely used to deal with spatial data (Biau & Scornet, 2016 ). Python (version 3.9) programming language and data analysis library Scikit-learn (version 1.0) were used. Confusion metrics, classification accuracy, and generalization ability were used as evaluation criteria to evaluate the performance of the predictive model (Géron, 2019 ) (Fig. 3 ). Confusion metrics reflected actual values on one axis and predicted values on another. Classification accuracy for each level and total classification accuracy reflected the model performance on each level and all levels. Generalization ability reflects the model’s capability to adapt and react properly to previously unseen, new data. In this study, we performed five-fold cross validation for model training (hyperparameter tuning) (Yang & Shami, 2020 ). A predictive model was first trained on dataset 1, and model performances were evaluated based on the above-mentioned aspects. Then, following the same model development procedure, we trained the predictive model on dataset 2 to analyze the importance of weather features, agronomical features and site-specific features. The feature impact of the input variables of the developed models was analyzed and sorted. Tree SHAP (SHapley Additive exPlanations) algorithm was used to perform the feature impact analysis (Lundberg et al., 2020 ). Tree SHAP allows interpreting predictions made by often complex black box machine learning algorithms. Feature impact provides (often desirable) interpretation of the model input variables’ contribution towards the model prediction and highlights the positive and negative impact of such variables for identifying different contamination levels of DON contamination. 3. Results 3.1. Describe analysis of data. Table 1 The count of grids with DON contamination levels low, medium and high in the different years. Contamination levels Year 2012 2013 2014 2015 2016 2017 2018 2019 low 292 374 663 571 535 449 519 755 medium 139 163 86 9 39 5 19 3 high 154 123 68 12 55 5 12 3 Figure 4 shows the grids with oats used in the investigation in 2012–2019, which were drawn using geo-referenced grid points. Large variation in DON contamination levels can be seen in Table 1 . The changes of weather variables from year 2012 to 2019 were displayed in appendix figure A1, where large variation of monthly and weekly rainfall and temperature can be observed. The weather during the summers of 2013, 2015–2017 was considered relatively normal with a bit dryer weather in 2013 and cooler weather in 2015. The summer of 2012 was colder and wetter than normal whereas 2014 was warm but extremely wet in August in the Southwest part of Sweden. In 2018, the summer was exceptionally dry and warm with large negative effects on crop yields. Also 2019 was a warm year, especially in the very south of Sweden, with normal amounts of rain.. 3.1. Model result on dataset 1 Following the model development procedure, the predictive model for DON contamination level (low, medium, high) in spring oats in Sweden was trained using training data from dataset 1 (80% of 2012–2019, except for 2016). The five-fold cross-validation result (mean prediction accuracies) for the SS model, MS model and the FS model were 0.73, 0.72, and 0.72, respectively 1 . Then, models were tested on the “new” data (20% of all records of 2012–2019 except for the year 2016). Model result showed consistent performance with cross-validation results. The total prediction accuracies for the SS models, MS models, FS models were 0.73, 0.72, and 0.73, respectively. Figure 5 displays the prediction results for each DON contamination levels (low, medium, high) of the internal validation (20% of 2012–2019 except 2016) in detail. The confusion matrix (upper) visualized the internal model validation results by comparing the actual and predicted DON contamination level. Take SS-model as an example (upper left), 554 + 6 + 7 samples were predicted as low-contamination level, whereas 554 samples were correctly predicted as belonging to the low-contamination level, and 6 + 7 samples were wrongly predicted, their true levels are the medium and high-contamination class, respectively. The normalized matrix for the SS-model (lower left) shows that the prediction accuracies for the high, medium, and low contamination level are 0.76, 0.61, and 0.58 respectively. If using weather data only (remove crop variety from dataset 1) for model training and validation, the total accuracy on the test dataset for each model was 0.72, 0.72, and 0.73, ( Appendix Figure A3). Thus, adding crop variety to the weather data did not improve the overall DON contamination classification accuracy of each class. Also, adding data for the full season (FS model) did not improve the models’ performance as compared with only using data from parts of the season (SS and MS models). The external validation result for the prediction of DON contamination levels (low, medium, high) in oats in Sweden in 2016 using the model trained on weather data and crop variety from 2012 to 2019 except 2016 is shown in Figure A5 . The total classification accuracy for the SS-model, MS-model, and FS-model were 0.83, 0.28, 0.71, respectively. The external validation result of the model was not as good as internal validation results, indicating that a good model performance on the training and testing dataset doesn’t guarantee a good performance for a “new” year (for explanation about the low performance of the model, please refer to the discussion section). 3.2. Model result on dataset 2 To analyze the feature impact on DON contamination levels taking into account the weather, agronomical, and site specific features, a predictive model was developed using dataset 2 (2016 and 2017) following the same model development procedure as described in section 3.1 . Note that agronomical and site-specific features (except crop variety) were only available in the years 2016 and 2017, which is the reason a separate model has been developed using data from those two years. The total accuracy for the SS-model, MS-model, and FS-model was 0.94, 0.95, and 0.96, respectively (Fig. 6 ). If using weather data only, the total accuracy for each model was, 0.82, 0.81, and 0.88, respectively ( Appendix Figure A4). The result shows that 1) weather features are the most important variables for DON contamination model development, 2) adding crop variety, and agronomical variables could improve the overall DON contamination classification accuracy, as well as the accuracy of each class 3.3. Feature impact analysis SHAP (SHapley Additive exPlanations) values were used to explain how much each independent variable contributes to the final prediction on DON in oats in Sweden. Features were ranked based on their importance on predicting DON contamination levels. Figure 7 . Feature average impact ranking (top 20) using the dataset 2 for FS-model, showing the overall average impact (a), the directionality (positive or negative) of impact on the low contamination level (b), the directionality of impact on the medium contamination level (c), and the directionality of impact on the high contamination level (d). The feature is indicated on the y-axis and the SHAP value of it is shown on the x-axis. Positive SHAP values represent the positive impact on the contamination level, negative SHAP values represent the negative impact on the contamination level. Number 11, 12, 1, 2, 3, and 4 represent the month of November, December, January, February, March, and April. Numbers ranging from 18 to 33 represent week numbers from 18 to 33. MSUM and MAVE represent monthly mean values and sum values of weather features. AVE and SUM represent weekly mean values and sum values of weather features. Weather data include maximum air temperature (HTEMP), minimum air temperature (LTEMP), mean air temperature (XTEMP), rainfall (NED), mean relative humidity (XHUM), and minimum relative humidity (LHUM), maximum relative humidity (HHUM), wind speed (XVH), wind direction (XVR), and global radiation (XM). Figure 7 a shows the overall average impact and variables are ordered by importance (in terms of the absolute value of their contribution). For example, from Fig. 7 a it can be seen that the most important variable in determining DON contamination levels was the average rainfall in December (“NED_MAVE,12”). Figures 7 b , 7 c, and 7 d show the directionality of the impact on the low contamination level (7b), the medium contamination level (7c), and the high contamination level (7d), respectively. Positive SHAP values represent the positive impact on the contamination level, negative SHAP values represented the negative impact on the contamination level. For example, the results in Fig. 7 b indicate that lower average rainfall in December (“NED_MAVE,12”) contribute to low levels of DON contamination. Conversely, the results in Figs. 7 c and 7 d indicate that higher average rainfall in December contribute to both medium and high levels of DON contamination. Feature impact analysis on weather features Figure 8 presents in detail the feature impact on the model outcomes for several weather features based on feature dependency analysis. The two variables average rainfall in December (“NED_MAVE,12”) and weekly average maximum temperature in the beginning of August ('HTEMP_AVE', 32) were selected since these were the input weather features that had the highest impact on the model output. For example, in Fig. 8 , the three figures on the top show that a low precipitation in December contributed to low DON contamination level (positive contribution). This was the other way around for medium and high levels of DON contamination (negative contribution). The three figures at the bottom show that the lower average maximum temperature in the beginning of August contributed to a high frequency of the medium and high levels of DON contamination (positive contribution), whereas this was the other way around for low levels of DON contamination (negative contribution). Feature impact analysis on agronomical features Figure 9 presents a detailed explanation of agronomical features for feature impact on the model outputs using feature dependency analysis. The three variables of crop varieties BELINDA, GALANT, and KERSTIN, were selected because they were the non-weather feature that gave the highest impact on the model output. If the crop variety was BELINDA (1.0 in X-axis), it contributed to low levels of DON contamination. The crop variety GALANT contributed on low levels of DON contamination as well. The crop variety KERSTIN contributed to medium and high levels of DON contamination. Crop rotation did not explain much of the variations in DON contamination levels and were therefore not displayed here (see appendix Figure A6). Feature impact analysis on site-specific features Figure 10 shows the feature dependency analysis of mean value and range value of elevation and mean value of soil type (percentage of sand or lay) variables on DON contamination levels. These variables were site-specific features that gave the highest impact on the model output. For example, larger variations in elevation within fields contributed to higher DON levels (elevation range 25m positive contribution to high DON levels). On high elevation fields (> 60m) soils with high sand content and low clay content contributed to high DON levels, and the reason could be stress (due to draught in the high elevation field) makes the crop more vulnerable to fungi infection. [1] Five-fold cross-validation results for the SS-model were 0.72, 0.70, 0.72, 0.72, 0.72 with a mean of 0.72; for the MS-model, values were 0.72, 0.71, 0.74, 0.72, 0.72 with a mean of 0.72; for the FS-model, values were 0.71, 0.70, 0.73, 0.72, 0.73 with a mean of 0.72. 4. Discussion and Conclusions In the current study, three different predictive models (SS, MS, and FS model) for DON contamination levels at the regional scale in oats in Sweden were developed. Model classification accuracy showed to be high, ranging from 0.7 to 0.9 depending on different years and models. The developed models can provide valuable information to three different stakeholder groups in the oat supply chain; farmers, crop collectors, and food safety authorities, as a tool that can help in the management of mycotoxins in the oats supply chain and risk-based testing. Results showed that 1) weather variables are the most important for predicting DON contamination in oats, 2) adding relevant agronomical and site specific factors, such as crop variety, crop rotation, soil type and DON contamination condition in the previous year could improve the performance of the models, 3) good predictions could be made already in June by using the SS-model, as based on internal validation, and 4) rainfall, relative humidity, and wind speed in different growing stages as well as crop variety and elevation were the most important features for predicting DON contamination levels in oats. However, predicting individual years not included in the training of the models proved to be difficult. To date, few studies have incorporated weather, agronomical, and site-specific data to predict the regional DON contamination in oats using machine learning. But many studies have paved the way for using these data for DON contamination prediction. One study modeling the effects of weather features on DON contamination in oats indicated that the model accuracy could be improved if more factors (such as field tillage and the soil type) were included, in addition to the weather data (Marzec-Schmidt et al., 2021 ). One study investigated the association of several agronomic factors (including harvest date, crop season, county, farming system, moisture, test weight, oats variety, and previous crop) to the occurrence of Fusarium mycotoxins in Irish oats (Kolawole et al., 2021). This study concluded that the level of DON was modelled best by the variables of the previous crop and oat variety, and indicated the importance of exploring crop rotation in future studies. Another study investigated the prevention and control of mycotoxins in grains, and emphasized the importance of matching crop varieties to a specific agro-ecological zone with specific weather conditions (Matumba et al., 2021), indicating the necessity of linking weather data to the crop variety for model development. In our study, we used weather factors and relevant agronomical and site-specific factors as model inputs. Similarly to the previous studies, oat variety was the non-weather feature that had the highest impact on the model output, whereas information on previous crop could not explain much of the DON variation in our study. One of the reasons for the lack of influence of pre-crop information could perhaps be the aggregation of the data to the weather grids. The comparison of model performance using weather data, with and without agronomical and site specific factors confirmed that, in improving the performance of the DON predictive models, weather variables are the most important factors, and adding agronomical and site specific factors could further improve the overall classification accuracy (from 0.72 to 0.73 using dataset 1, from 0.81 to 0.95 using dataset 2). This was in line with the expectation of a previous study which suggested that DON prediction in oats could potentially be improved by combining weather-based risk index outputs with agronomic factors (Persson et al., 2017 ). The feature impact analysis indicated that rainfall, relative humidity, and wind speed in different oat growing stages as well as oat variety and elevation were the most important features for predicting DON contamination levels in oats. In general, weather variables (e.g., temperature, rainfall) in December of the previous year, weather variables (e.g., humidity, wind speed) around end of June (close to flowering season), and weather variables (e.g., humidity, temperature) around August (week 31,32,and 33 close to harvest season) were the most important features (Fig. 7 ). These results are in line with Hjelkrem et al., ( 2017 ) who showed that dry periods during germination (March to April) contribute to high DON contamination of oats, and warm, rainy and humid weather around flowering contributed to high DON accumulation in oat. Marzec-Schmidt et al. ( 2021 ) also confirmed that high relative humidity and precipitation around flowering correlated with high DON contamination levels in oat. Interestingly, the site-specific characteristics associated with high DON contamination levels in our study, high elevation and sandy soils, is related to dry conditions which may indicate that draught stress might have been important in the data set from 2016–2017. A previous study applied different models, including statistical analysis and machine learning techniques, for DON prediction in oats, resulting in different model performances (Lindblad et al., 2012 ). Their results showed that very little of the variation in DON levels could be explained by agronomical or weather factors, and it was not possible to predict DON levels based on these variables. This low model performance could have been caused by the unbalanced data related to DON contamination, meaning only few records were related to high DON values and most of the records were related to low DON values. Poor model performance for predicting high mycotoxin contamination levels due to unbalanced datasets have also been encountered in other studies (Liu et al., 2018 ; Liu et al., 2021 ). Their results showed that the developed models have higher performance for predicting the samples with low-level contamination than for the samples with high-level contamination. Our study applied a machine learning technique (the random forest algorithm) to handle unbalanced data, resulting in a relatively balanced classification accuracy in each DON contamination level (high, medium, low). Using both datasets 1 and 2, the model showed good classification accuracy of at least 0.7. It means that a rather large proportion of the sites for each of the three DON contamination levels were correctly classified. However, for medium and high contamination levels, still a portion of the sites (< 40%) was wrongly predicted. One reason could be that the data are very skewed, with a large number of sites belonging to the low level and much fewer sites belonging to the two other levels (mid, high) also in our data. A higher total classification accuracy than 0.7 could be reached for the predictive model if sacrificing the prediction accuracy of high and medium-contamination levels. There are usually trade-offs between the prediction accuracy of each contamination level when using Machine Learning predictive algorithms. If a model was able to predict the high and medium-contamination levels with high accuracy, the accuracy in predicting the low-contamination class was sacrificed, and vice visa. In our study, most of regions with high and medium contamination level were correctly predicted while some of the low-contamination regions were also predicted as high and medium contamination level. The models were designed to predict the three contamination levels and at the same time reduce the number of false negatives, i.e. regions that are predicted as having low contamination level but that in fact have medium or high contamination level. This is done at the expense of more false positives, i.e. regions that are predicted as having medium or high contamination level, but in fact have a low contamination level. For oat supply chain stakeholders, it is more important not to miss a region with high contamination, than to erroneously regard a non-contaminated region as contaminated. In addition, instead of using three class setting of DON contamination level, a better prediction result could be obtained by setting two class level (e.g. when using 500 ppb as the threshold value, the total prediction accuracy reached 0.84 for both internal and external validation, results not presented). The three models we designed could easily be adapted to achieve a higher prediction accuracy for the low contamination class or the higher total classification accuracy, at the expense of lower prediction accuracy on the high and medium contamination levels, depending on what the stakeholders prioritize. The SS, MS, and FS models were developed for different oat growing periods, and the results of these models provided several insights. First, on the internal validation using dataset 1 (20% of the data 2012–2019 except 2016), results showed a total classification accuracy of > 0.7 for each of the three models (relatively good performance for all levels). This indicated that good predictions could be made already by June (SS-model). This result is in line with the study from Hjelkrem et al., ( 2017 ). Their study showed that the prediction model using only pre-flowering weather data could adequately forecast the DON contamination in oat. Although good predictions could be made already by June, it is recommended to use weather variables in the full season when implementing the model in practice whenever possible. This is because DON contamination was mostly associated with the weather features around flowering as well as close to harvest (Hjelkrem et al., 2017 ). The SS, MS, and FS models resulted in relatively good performance for three contamination levels, but low performance for all three contamination levels with the external validation (dataset 1, 2016). The reason for the poor model performance with the external validation (dataset 1, 2016) could be the large variation in DON contamination distribution over the years (Table 1 ). The trained model assigned weights (e.g., opposite to the rates of three contamination levels) to different DON contamination levels. When the distribution of DON contamination levels of the external validation dataset was very different from the trained dataset, and the model used the pre-assigned weights, it could have resulted in a poor performance in the external validation. In this study, data were split randomly into a training set (80%) for model learning where five-fold cross-validation was performed for model hyperparameter tuning. Then, the models were validated on “new” data using the internal and external validation dataset (Fig. 2 ). Other cross validation methods such as leave-one-year out could be applied (Hjelkrem et al., 2017 ). However, DON contamination distribution varied greatly between the years, and there was a clear overall trend with reduced contamination over the years (Table 1 ). Leave-one-year out validation is still problematic in achieving a high validation accuracy due the large changes of DON distribution in the three contamination levels. When considering the proposed implementation of the models in practice, the validation results are perhaps not truly reflecting the prediction performance of a prediction for a new coming year. The prediction performance for a coming year is highly related to the distribution of contamination levels. To further improve the model performance, adding data in more years to extend the training dataset to train the model for learning as many patterns as possible could be one possibility. In addition, future studies could use binary levels instead of multi-levels for DON contamination levels to increase the model classification accuracy. One limitation of our study was that we did not consider all biological relevant factors for DON prediction in our model due to lack of related data. Other relevant factors could include crop management practices, such as fertilization, irrigation and pest control (Munkvold, 2014 ), the use of fungicides against Fusarium spp. around flowering (Van der Fels-Klerx et al., 2021; Liu et al., 2018 ; Torelli et al., 2012) and the harvest conditions (such as timely harvest). This information needs to be collected via field surveys with oat farmers. We have been able to get a large amount of data in this study at the expense of detailled information on agronomical features. To collect data from incividual farmers is another concept, which takes time is prone to introduce faults and that the dataset will be smaller. In future studies, it might be of interest to determine whether the inclusion of other variables could further improve the performance of the models. Variables from open sources could also be included, such as satellite data, which could provide a great asset to further improve the prediction models (Yudarwati et al., 2020 ). Another limitation with our study is the aggregate data, as some of the variation in the additional site-specific information might be smoothed out, reducing possible more local relationships. Our study only considers DON contamination of oats, not other mycotoxins, and future studies could predict the multi-mycotoxin contamination when data related to other mycotoxins were available as well. Measures that could be taken to limit the contamination could also be of interest to add, as decision making tool in the next step. It can be concluded that the use of machine learning algorithms for DON prediction in oats, using contamination levels at the regional level in Sweden provides good prediction results when considering several years. Unfortunately, the models were not general enough to manage to predict DON-levels from individual years not included in the training of the model, i.e., model performance did not as high as internal validation when do external validation using leave one year out approach. One reason could be the DON contamination levels change as much as they do in the investigated years. Under such circumstances, it seems to be a better strategy to only use two risk-levels, above or below a certain level. However, this strategy has not been tested adequately in this investigation. Such models could be used as regional risk-assessment tools for farmers, crop collectors, and food safety inspectors for logistics in the oats supply chain, improved mycotoxin control, and risk-based testing. Given EC regulation 2017/625, food safety authorities need to apply risk-based control. Regions with a medium of high DON contamination level can be sampled and tested for the presence of DON more intensively than regions with low predicted contamination class. Collectors and food safety authorities of oats can also use the model predictions for deciding on testing frequencies, and they can use the predictions for routing and logistics in their oats supply chain. Declarations Author Contributions Conceptualization, IF, TB, JW, XW; methodology, XW; formal analysis, XW; investigation, XW; resources, TB, JW; data curation, TB, JW; writing—original draft preparation, XW; writing—review and editing: XW, TB, JW, IF; visualization, XW; supervision: IF; project administration: IF; funding acquisition: IF, TB. All authors have read and agreed to the published version of the manuscript. Funding This study is funded by the European commission on the EIP-agri grant 2017-1345 Infofusion Fusarium administrated by Swedish Board of Agriculture. The FORMAS project Baby Grain Passport, Grant 2019-02280 has co-funded this study. Data Availability Statement The data presented in this study are not available due to DON contamination data are highly sensitive for the individual farmers. Acknowledgments The authors acknowledge the contribution of the private partners and all growers who participated in this project. Conflicts of Interest The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results. References Commission, E. (2006). Recommendation 2006/576/EC on the presence of deoxynivalenol, zearalenone, ochratoxin A, T-2 and HT-2 and fumonisins in products intended for animal feeding. Off J Eur Union, 229, 7–9. Hartman, E. Slutrapport Beträffande Foder & Spannmåls Projekt om Förekomst av DON i 2020 års Spannmålsskörd i Sverige. Available online: https://www.foderochspannmal.se/material-om-mykotoxiner-51.aspx (accessed on 6 June 2021). (In Swedish). Biau, G., & Scornet, E. (2016). A random forest guided tour. Test, 25 (2), 197–227. Castano-Duque, L., Vaughan, M., Lindsay, J., Barnett, K., & Rajasekaran, K. (2022). Gradient boosting and bayesian network machine learning models predict aflatoxin and fumonisin contamination of maize in Illinois–First USA case study. Frontiers in microbiology , 13 . Chain, E. P. o. C. i. t. F., Knutsen, H. K., Alexander, J., Barregård, L., Bignami, M., Brüschweiler, B., Ceccatelli, S., Cottrill, B., Dinovi, M., & Grasl-Kraupp, B. (2017). Risks to human and animal health related to the presence of deoxynivalenol and its acetylated and modified forms in food and feed. EFSA journal , 15 (9), e04718. Champeil, A., Doré, T., & Fourbet, J.-F. (2004). Fusarium head blight: epidemiological origin of the effects of cultural practices on head blight attacks and the production of mycotoxins by Fusarium in wheat grains. Plant science, 166 (6), 1389–1415. Commission, E. (2006). Commission Regulation (EC) No 1881/2006 of 19 December 2006 setting maximum levels for certain contaminants in foodstuffs. Off. J. Eur. Union, 364 , 5–24. European Commission. 2006b. Commission Recommendation of 17 August 2006 on the presence of deoxynivalenol, zearalenone, ochratoxin A, T-2 and HT-2 Food Additives and Contaminants 5 Downloaded by [National Food Administration] at 01:23 13 February 2012 and fumonisins in products intended for animal feeding (2006/576/EC). Off J Eur Union. L229:7–9. Czaban, J., Wróblewska, B., Sułek, A., Mikos, M., Boguszewska, E., Podolska, G., & Nieróbca, A. (2015). Colonisation of winter wheat grain by Fusarium spp. and mycotoxin content as dependent on a wheat variety, crop rotation, a crop management system and weather conditions. Food Additives & Contaminants: Part A, 32 (6), 874–910. Géron, A. (2019). Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: Concepts, tools, and techniques to build intelligent systems . " O'Reilly Media, Inc.". Hjelkrem, A.-G. R., Torp, T., Brodal, G., Aamot, H. U., Strand, E., Nordskog, B., Dill-Macky, R., Edwards, S. G., & Hofgaard, I. S. (2017). DON content in oat grains in Norway related to weather conditions at different growth stages. European Journal of Plant Pathology, 148 (3), 577–594. Janssen, E., Liu, C., & Van der Fels-Klerx, H. (2018). Fusarium infection and trichothecenes in barley and its comparison with wheat. World Mycotoxin Journal, 11 (1), 33–46. Kaukoranta, T., Hietaniemi, V., Rämö, S., Koivisto, T., & Parikka, P. (2019). Contrasting responses of T-2, HT-2 and DON mycotoxins and Fusarium species in oat to climate, weather, tillage and cereal intensity. European Journal of Plant Pathology, 155 (1), 93–110. Krebs, H., Dubois, D., Kulling, C., & Forrer, H. (2000). Effects of preceding crop and tillage on the incidence of Fusarium spp. and mycotoxin deoxynivalenol content in winter wheat grain. Agrarforschung, 7 (6), 264–268. Li, S., Liu, N., Cai, D., Liu, C., Ye, J., Li, B., Wu, Y., Li, L., Wang, S., & van der Fels-Klerx, H. (2023). A predictive model on deoxynivalenol in harvested wheat in China: Revealing the impact of the environment and agronomic practicing. Food Chemistry, 405 , 134727. Lindblad, M., Börjesson, T., Hietaniemi, V., & Elen, O. (2012). Statistical analysis of agronomical factors and weather conditions influencing deoxynivalenol levels in oats in Scandinavia. Food Additives & Contaminants: Part A, 29 (10), 1566–1571. Liu, C., Manstretta, V., Rossi, V., & der Fels-Klerx, V. (2018). Comparison of three modelling approaches for predicting deoxynivalenol contamination in winter wheat. Toxins, 10 (7), 267. Liu, N., Liu, C., Dudaš, T., Loc, M., Bagi, F., & Der Fels-Klerx, V. (2021). Improved aflatoxin and fumonisin forecasting models for maize (PREMA and PREFUM), using combined mechanistic and Bayesian network modelling–Serbia as a case study. Frontiers in microbiology, 12 , 630. Lundberg, S. M., Erion, G., Chen, H., DeGrave, A., Prutkin, J. M., Nair, B., Katz, R., Himmelfarb, J., Bansal, N., & Lee, S.-I. (2020). From local explanations to global understanding with explainable AI for trees. Nature machine intelligence, 2 (1), 56–67. Marzec-Schmidt, K., Börjesson, T., Suproniene, S., Jędryczka, M., Janavičienė, S., Góral, T., Karlsson, I., Kochiieru, Y., Ochodzki, P., & Mankevičienė, A. (2021). Modelling the Effects of Weather Conditions on Cereal Grain Contamination with Deoxynivalenol in the Baltic Sea Region. Toxins, 13 (11), 737. Moretti, A., Pascale, M., & Logrieco, A. F. (2019). Mycotoxin risks under a climate change scenario in Europe. Trends in food science & technology, 84 , 38–40. Munkvold, G. (2014). Crop management practices to minimize the risk of mycotoxins contamination in temperate-zone maize. Mycotoxin reduction in grain chains, 1 , 59–77. Perrone, G., Ferrara, M., Medina, A., Pascale, M., & Magan, N. (2020). Toxigenic fungi and mycotoxins in a climate change scenario: Ecology, genomics, distribution, prediction and prevention of the risk. Microorganisms, 8 (10), 1496. Persson, T., Eckersten, H., Elen, O., Roer Hjelkrem, A.-G., Markgren, J., Söderström, M., & Börjesson, T. (2017). Predicting deoxynivalenol in oats under conditions representing Scandinavian production regions. Food Additives & Contaminants: Part A, 34 (6), 1026–1038. Piikki, K., & Söderström, M. (2019). Digital soil mapping of arable land in Sweden–Validation of performance at multiple scales. Geoderma, 352 , 342–350. Wang, X., Liu, C., & van der Fels-Klerx, H. (2022). Regional prediction of multi-mycotoxin contamination of wheat in Europe using machine learning. Food Research International, 159 , 111588. Xu, X., Madden, L. V., Edwards, S. G., Doohan, F. M., Moretti, A., Hornok, L., Nicholson, P., & Ritieni, A. (2013). Developing logistic models to relate the accumulation of DON associated with Fusarium head blight to climatic conditions in Europe. European Journal of Plant Pathology, 137 (4), 689–706. Yang, L., & Shami, A. (2020). On hyperparameter optimization of machine learning algorithms: Theory and practice. Neurocomputing, 415 , 295–316. Yudarwati, R., Hongo, C., Sigit, G., Barus, B., & Utoyo, B. (2020). Bacterial Leaf Blight Detection in Rice Crops Using Ground-Based Spectroradiometer Data and Multi-temporal Satellites Images. J. Agric. Sci, 12 , 38. Additional Declarations (Not answered) Supplementary Files Appendix.docx Cite Share Download PDF Status: Published Journal Publication published 04 Oct, 2024 Read the published version in npj Science of Food → Version 1 posted Editorial decision: revise 15 May, 2024 Review # 2 received at journal 10 May, 2024 Reviewer # 2 agreed at journal 30 Apr, 2024 Review # 1 received at journal 15 Apr, 2024 Reviewer # 1 agreed at journal 31 Mar, 2024 Reviewers invited by journal 05 Mar, 2024 Submission checks completed at journal 23 Feb, 2024 First submitted to journal 22 Feb, 2024 Editor assigned by journal 22 Feb, 2024 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-3979106","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":276466874,"identity":"59907754-0353-4971-84aa-90aa5ce411ff","order_by":0,"name":"Xinxin Wang","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA20lEQVRIiWNgGAWjYDACZiBOYLCBcD5AhBqI0ZIGZjPOgFAEtEDAYYh2HmK0GBznPSbxcAczkHH28GvbtsP5DNKN+LVINvMlGySeYWMwOJOXZp3bdtiyQeYgfi38zDyGDxLbeBjMDuSYGeduO2zAIJGIXwsbM4/BgcQ2CQaz82/MjC2J0QK1xYDB7EaO8WNGYrRINvMYGyS2JfDY33hjxtj7L92AjZAWg/NnzCR/tv2Xk+zPMf7w44y1Ab9E8gG8WmAAFCNsEmDfEaUeCpg/kKJ6FIyCUTAKRg4AAFYcQIHBDZNTAAAAAElFTkSuQmCC","orcid":"https://orcid.org/0000-0002-8633-8971","institution":"wageningen","correspondingAuthor":true,"prefix":"","firstName":"Xinxin","middleName":"","lastName":"Wang","suffix":""},{"id":276466875,"identity":"332fcd57-266e-4c56-81c0-79c748148352","order_by":1,"name":"Thomas BÖRJESSON","email":"","orcid":"","institution":"agrovast","correspondingAuthor":false,"prefix":"","firstName":"Thomas","middleName":"","lastName":"BÖRJESSON","suffix":""},{"id":276466876,"identity":"93845790-4e4e-43d9-b20d-e5ebf226a7c4","order_by":2,"name":"Johanna Wetterlind","email":"","orcid":"","institution":"Swedish University of Agricultural Sciences","correspondingAuthor":false,"prefix":"","firstName":"Johanna","middleName":"","lastName":"Wetterlind","suffix":""},{"id":276466877,"identity":"c50bb7d7-172d-4942-bba7-bf00c9ee3ab2","order_by":3,"name":"HJ van der Fels-Klerx","email":"","orcid":"https://orcid.org/0000-0002-7801-394X","institution":"
[email protected]","correspondingAuthor":false,"prefix":"","firstName":"HJ","middleName":"van der","lastName":"Fels-Klerx","suffix":""}],"badges":[],"createdAt":"2024-02-22 16:02:12","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-3979106/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-3979106/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1038/s41538-024-00310-w","type":"published","date":"2024-10-04T04:00:00+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":52127270,"identity":"20a82c3e-afbf-4b05-bbac-8da3c9da103b","added_by":"auto","created_at":"2024-03-07 07:03:36","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":55616,"visible":true,"origin":"","legend":"\u003cp\u003eThree datasets, related to DON contamination, agronomical and site-specific features, and weather features in three different periods (SS-model (Nov 1 - June 1), MS-model (Nov 1 - July 1), and FS-model (Nov 1 – Aug 15)) were linked per grid cell in Sweden (11km × 11km grid) for each year and each crop variety. Note that the site-specific data are not collected grid-wise, but field-wise.\u003c/p\u003e","description":"","filename":"image1.png","url":"https://assets-eu.researchsquare.com/files/rs-3979106/v1/112d606e99262b49b4541727.png"},{"id":52127517,"identity":"3370aedf-6018-456d-b325-dac2e028b30e","added_by":"auto","created_at":"2024-03-07 07:11:40","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":142495,"visible":true,"origin":"","legend":"\u003cp\u003eData splitting for model training and validation\u003cstrong\u003e.\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"image2.png","url":"https://assets-eu.researchsquare.com/files/rs-3979106/v1/5d4faf818cbfc8ced935618b.png"},{"id":52127271,"identity":"1e47abc3-56a9-43be-b4ff-ce565a58ad93","added_by":"auto","created_at":"2024-03-07 07:03:36","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":127036,"visible":true,"origin":"","legend":"\u003cp\u003eIllustration of the evaluation of the model performance.\u003c/p\u003e","description":"","filename":"image3.png","url":"https://assets-eu.researchsquare.com/files/rs-3979106/v1/b56d54aed938ca1e1a143783.png"},{"id":52127276,"identity":"59e76bf8-5352-4b8e-ac99-b1d975864ca0","added_by":"auto","created_at":"2024-03-07 07:03:37","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":538704,"visible":true,"origin":"","legend":"\u003cp\u003eMaps of grids with oats used in the investigation in the period of 2012 to 2019.\u003c/p\u003e","description":"","filename":"image4.png","url":"https://assets-eu.researchsquare.com/files/rs-3979106/v1/0ae60546c1ef4fabd970c9d8.png"},{"id":52127280,"identity":"d8803482-791f-4311-bd5e-1311674ac15d","added_by":"auto","created_at":"2024-03-07 07:03:37","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":157678,"visible":true,"origin":"","legend":"\u003cp\u003ePrediction results using weather data and crop variety. The confusion matrix (upper) presents internal model validation results using the internal validation dataset 1 (20% of 2012 - 2019 except 2016) to predict the contamination levels (low contamination, medium contamination, and high contamination) of DON in oats in Sweden between 2012 and 2019 except 2016.\u003c/p\u003e","description":"","filename":"image5.png","url":"https://assets-eu.researchsquare.com/files/rs-3979106/v1/81d0dc6d5d90283d5673e91f.png"},{"id":52127514,"identity":"2dbb5c55-26d4-4b16-8b06-3514be5794f8","added_by":"auto","created_at":"2024-03-07 07:11:37","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":159598,"visible":true,"origin":"","legend":"\u003cp\u003ePrediction results using the combined data of weather, crop variety, agronomical, and site specific features. The confusion matrix (upper) presents internal model validation results using the test dataset (2016-2017) to predict the contamination levels (low contamination, medium contamination, high contamination) of DON in oats in Sweden during 2016 and 2017.\u003c/p\u003e","description":"","filename":"image6.png","url":"https://assets-eu.researchsquare.com/files/rs-3979106/v1/f57dc214ad6cdf522d6506d6.png"},{"id":52127275,"identity":"d7012517-af8a-4255-9980-1721878f9a42","added_by":"auto","created_at":"2024-03-07 07:03:37","extension":"png","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":193393,"visible":true,"origin":"","legend":"\u003cp\u003eFeature average impact ranking (top 20) using the dataset 2 for FS-model, showing the overall average impact (a), the directionality (positive or negative) of impact on the low contamination level (b), the directionality of impact on the medium contamination level (c), and the directionality of impact on the high contamination level (d). The feature is indicated on the y-axis and the SHAP value of it is shown on the x-axis. Positive SHAP values represent the positive impact on the contamination level, negative SHAP values represent the negative impact on the contamination level. Number 11, 12, 1, 2, 3, and 4 represent the month of November, December, January, February, March, and April. Numbers ranging from 18 to 33 represent week numbers from 18 to 33. MSUM and MAVE represent monthly mean values and sum values of weather features. AVE and SUM represent weekly mean values and sum values of weather features. Weather data include maximum air temperature (HTEMP), minimum air temperature (LTEMP), mean air temperature (XTEMP), rainfall (NED), mean relative humidity (XHUM), and minimum relative humidity (LHUM), maximum relative humidity (HHUM), wind speed (XVH), wind direction (XVR), and global radiation (XM).\u003c/p\u003e","description":"","filename":"image7.png","url":"https://assets-eu.researchsquare.com/files/rs-3979106/v1/b7758a889156ac6921a20fd3.png"},{"id":52127515,"identity":"4741cb46-3d0e-4b16-b709-7bd61271380b","added_by":"auto","created_at":"2024-03-07 07:11:38","extension":"png","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":196579,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eFeature dependency analysis for weather features (dataset 2) for the FS-model.\u003c/strong\u003e Feature dependency of average rainfall in December (“NED_MAVE”, 12) and average maximum temperature in the beginning of August (“HTEMP_AVE”, 32) on the impact of the low, medium, and high contamination levels. The value of the feature shows on the x-axis and the SHAP value shows on the y-axis. Positive SHAP values represent the positive impact on the contamination level, negative SHAP values represent the negative impact on the contamination level.\u003c/p\u003e","description":"","filename":"image8.png","url":"https://assets-eu.researchsquare.com/files/rs-3979106/v1/e62e643059d7e800cfd77761.png"},{"id":52127281,"identity":"5fc67d13-f6a7-4854-811c-8847589dc221","added_by":"auto","created_at":"2024-03-07 07:03:37","extension":"png","order_by":9,"title":"Figure 9","display":"","copyAsset":false,"role":"figure","size":228549,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eFeature dependency analysis for agronomic features (dataset 2) for the FS-model.\u003c/strong\u003e The feature dependency of the oats variety BELINDA, GALANT, and KERSTIN on the impact of the low, medium, and high DON contamination levels.\u003c/p\u003e","description":"","filename":"image9.png","url":"https://assets-eu.researchsquare.com/files/rs-3979106/v1/aa27282a15846a8de886f887.png"},{"id":52127278,"identity":"7aed0aa8-a6ce-4418-9ea1-459a6281218c","added_by":"auto","created_at":"2024-03-07 07:03:37","extension":"png","order_by":10,"title":"Figure 10","display":"","copyAsset":false,"role":"figure","size":425891,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eFeature dependency analysis for site-specific features (dataset 2) for the FS-model.\u003c/strong\u003e Figure shows examples of the feature dependency of the elevation and soil type on the impact of the low, medium, and high DON contamination levels.\u003c/p\u003e","description":"","filename":"image10.png","url":"https://assets-eu.researchsquare.com/files/rs-3979106/v1/7d4ed828f512b3c2d5d841b6.png"},{"id":65966535,"identity":"c1d89ae6-c991-4386-b5a4-36c07138d943","added_by":"auto","created_at":"2024-10-05 07:10:29","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":2599675,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-3979106/v1/6f1b2392-2fb3-4e50-ae4e-e224842b025f.pdf"},{"id":52127272,"identity":"05d89d1c-3f40-4769-84fb-2a45d8630e8e","added_by":"auto","created_at":"2024-03-07 07:03:36","extension":"docx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":950142,"visible":true,"origin":"","legend":"","description":"","filename":"Appendix.docx","url":"https://assets-eu.researchsquare.com/files/rs-3979106/v1/092b4db1c13499af3c33c64c.docx"}],"financialInterests":"(Not answered)","formattedTitle":"Regional prediction of deoxynivalenol contamination in spring oats in Sweden using machine learning","fulltext":[{"header":"Highlights","content":"\u003cp\u003e\u0026bull; A classification model of regional prediction of deoxynivalenol contamination in oats using machine learning has been developed.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u0026bull; Model results showed prediction accuracy of \u0026gt;70% for internal validation and \u0026lt; 70% for external validation of different classes.\u003c/p\u003e\n\u003cp\u003e\u0026bull; Rainfall, relative humidity, and wind speed in different growing stages as well as crop variety and elevation were the most important features for DON contamination in oats.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u0026bull; Agronomic and site-specific features showed to improve the overall performance of the model based on weather.\u003c/p\u003e\n\u003cp\u003e\u0026nbsp;\u003c/p\u003e"},{"header":"1. Introduction","content":"\u003cp\u003eOats can be susceptible to fungal infection of \u003cem\u003eFusarium\u003c/em\u003e spp. and subsequent deoxynivalenol (DON) contamination during the cultivation season (Hjelkrem et al., \u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e2017\u003c/span\u003e; Munkvold, \u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e2014\u003c/span\u003e). The presence of DON in oats-derived feed and food can affect human and animal health (Chain et al., \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e2017\u003c/span\u003e). In Europe, the European Commission has set maximum legal limits (1750 \u0026micro;g/kg) for the presence of DON in unprocessed durum wheat and oats (Commission, \u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e2006\u003c/span\u003e), and has defined maximum recommendation thresholds (8000 \u0026micro;g/kg) for the presence of DON for cereals and cereal products (with the exception of maize by-products) used for feed (Commission, \u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e2006\u003c/span\u003eb). In Sweden, DON concentrations were too high to be fit for human consumption in half of all oats in 2011 and, since then, DON contamination of oats has gained significant attention (Hartman et al., 2021). After 2011, almost all oat products are monitored for DON contamination, which generates a high cost to stakeholders such as farmers, crop collectors, and food safety authorities. Early forecasting of the high-contamination regions of DON in oats at the regional (grid) level could provide timely advice on the need for crop protection and for risk based monitoring to reduce the chance of contaminated oats entering the food chain and reduce the monitoring costs.\u003c/p\u003e \u003cp\u003eWeather conditions, such as temperature, relative humidity, and precipitation, have a significant effect on the presence of DON contamination in oats (Persson et al., \u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e2017\u003c/span\u003e). This is because weather conditions affect the life cycle of toxigenic fungi, influence the interaction between the pathogen and host, and the pathogen\u0026rsquo;s ability to produce DON (Moretti et al., \u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e2019\u003c/span\u003e; Perrone et al., \u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). Apart from weather conditions, agronomical factors could directly or indirectly promote the infection of \u003cem\u003eFusarium\u003c/em\u003e spp. in grains. These factors may include crop variety, crop rotation, soil type, elevation, and geolocation of the fields (Champeil et al., \u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e2004\u003c/span\u003e; Czaban et al., \u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e2015\u003c/span\u003e; Janssen et al., \u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e2018\u003c/span\u003e; Krebs et al., \u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e2000\u003c/span\u003e)..\u003c/p\u003e \u003cp\u003ePrevious studies aimed at developing prediction models for DON in oats have used weather data (Kaukoranta et al., \u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e2019\u003c/span\u003e; Marzec-Schmidt et al., \u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e2021\u003c/span\u003e; Persson et al., \u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e2017\u003c/span\u003e; Xu et al., \u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e2013\u003c/span\u003e). Only considering weather data may limit the model\u0026rsquo;s application in different regions or farms with different oat agronomic practices. One study concluded that DON prediction in oats could be improved using agronomic factors with weather-based risk index outputs (Persson et al., \u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e2017\u003c/span\u003e). To date, only a few studies have used weather variables combined with agronomic and site-specific variables for early forecasting of DON contamination in oats.\u003c/p\u003e \u003cp\u003eApart from the data available for model development, the model algorithm used also affects the model performance. A study from Lindblad et al. (\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e2012\u003c/span\u003e) who aimed to predict DON in oats, stated that very little of the variation in DON could be explained by weather conditions using statistical model. In addition to statistical models, machine learning has been proven to be of added value in the prediction of mycotoxins in grains (Camardo Leggieri et al., 2021; Wang, Liu, et al., 2022; Castano-Duque et al., \u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Liu et al., \u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e2018\u003c/span\u003e; Liu et al., \u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). One of these cited studies has applied a deep neural network to predict mycotoxin contamination in maize and concluded that the machine learning approach has added value to classical statistical approaches (i.e., simple or multiple linear regression models) (Camardo Leggieri et al., 2021). One study applied the random forest algorithm to predict multi-mycotoxin occurrence in wheat in Europe with \u0026gt;\u0026thinsp;90% accuracy (Wang et al., \u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). Another study applied gradient boosting and bayesian network modeling to predict mycotoxin contamination in maize in USA with overall accuracy of 94% (Castano-Duque et al., \u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). These studies showed high prediction accuracy using machine learning, however, these studies mainly focused on prediction of the presence of different mycotoxins in maize and wheat, not in oats. In addition, only a few studies provided explanations of model prediction results (i.e. the impact of input variables on the different mycotoxin contamination levels). Machine learning approaches are often seen as black boxes that provide recommendations without sufficient explanation of \u0026ldquo;which and how input variables generated the result\u0026rdquo;. This is not functional practice when the model results are to be used as support for decision-making. Furthermore, effects of single management practices (such as cultivar, tillage, and longitude and latitude) on DON contamination in wheat have been investigated for mycotoxin prevention and control, (Li et al., \u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e2023\u003c/span\u003e), but collective effects of multi management practices (such as combination of regional characteristics) on DON contamination have not been explored yet. Such a collective effect is essential to provide advises for reducing DON contamination in oats.\u003c/p\u003e \u003cp\u003eThe aim of this study was to 1) develop predictive models for the contamination of DON in spring oats on a regional basis in Sweden using machine-learning algorithms, 2) explore the impacts of weather features, agronomical features, and site-specific features on the DON contamination levels, 3) explore the collective effect of multi management practices (combination of cultivar, crop rotation, and regional characteristics) on DON contamination and provide advice to reduce DON contamination in oats.\u003c/p\u003e"},{"header":"2. Materials and Methods","content":"\u003cp\u003eThree models were developed as regional risk-assessment tools to be used by, farmers, crop collectors, and food safety inspectors, respectively. To provide a timely forecast of DON contamination for the different user groups, the three models aimed to provide the regional DON predictions at three different times during the oats vegetation period: i) SS-model: Start of Season model Nov 1 to June 1), which would allow for recommendations on crop protection activities for farmers, ii) MS-model: Mid-Season model (Nov 1 to July 1), which would allow for recommendations on sampling strategies and as an early warning concerning regional differences for crop collectors and food safety inspectors, and iii) FS-model: Full Season model (Nov 1 to Aug 15), which would allow for a more reliable indication on how to plan for sampling strategies. In each of the three models, the predictions of DON contamination levels were provided into one out of three levels of: low (\u0026lt;\u0026thinsp;500 \u0026micro;g/kg), medium (\u0026ge;\u0026thinsp;500 \u0026micro;g/kg, and \u0026lt;\u0026thinsp;1000 \u0026micro;g/kg), and high (\u0026ge;\u0026thinsp;1000 \u0026micro;g/kg).\u003c/p\u003e \u003cp\u003eWeather factors and relevant agronomical and site-specific factors were used as model inputs. The weather features were selected as the monthly average from November to April and the weekly average from May to August. The reason for using more detailed information from May to August is that this is the period from oats stem elongation to harvest, when oats are known to be more sensitive to fungal infection (Hjelkrem et al., \u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e2017\u003c/span\u003e; Munkvold, \u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e2014\u003c/span\u003e). The other input factors were selected since they are known to be relevant to the DON contamination of crops, including the oats variety, crop rotation, and other agronomical features, (Selvaraj et al., 2015, Blandino et al., 2010; Landschoot et al., 2013; Maiorano et al., 2008); and site-specific factors such as soil type and elevation (Torelli et al., 2012; Lindblad et al., \u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e2012\u003c/span\u003e). For example, crop variety influencesthe susceptibility of crops to abiotic factors, such as drought stress, that favour fungal growth and ultimately mycotoxin contamination (Kolawole et al., 2021; Polišensk\u0026aacute; et al., 2020). Crop rotation has an impact on DON contamination in grain due to the fact that Fusarium spp. contaminated debris from the earlier crop can survive on the soil surface for a long period and act as a reservoir for contamination (Blandino et al., 2010; Bottalico \u0026amp; Perrone, 2002; Landschoot et al., 2013; Selvaraj et al., 2015).\u003c/p\u003e \u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003e2.1. Data\u003c/h2\u003e \u003cp\u003eThis study used DON contamination data, weather data, agronomical data and site-specific data, in Sweden. These data were selected for the period Nov 1 of the previous year to August 15 of the current oat growing year, to include all relevant stages of fungal infection and DON contamination of spring oat. Data were firstly pre-processed (see section related to each dataset below) and then linked together into one dataset based on the grid (11 x 11 km), year, and crop variety (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). Data from the period Nov 1 to June 1 were used for developing the Start of Season (SS) model; the period Nov 1 to July 1 for developing the Mid-Season (MS) model; and data from the period Nov 1 to Aug 15 for the Full Season (FS) model. Then, two types of datasets were composed using different input variables separately for modeling. Dataset 1: weather and crop variety variables from the years 2012\u0026ndash;2019. Dataset 2: weather,, crop variety, agronomical and site-specific variables from the years 2016\u0026ndash;2017. Agronomical and site-specific variables were only available for the years 2016\u0026ndash;2017.\u003c/p\u003e \u003cp\u003e \u003cem\u003eDON contamination data\u003c/em\u003e \u003c/p\u003e \u003cp\u003eData related to DON concentration in spring oats include 8 years (2012\u0026ndash;2019) of monitoring results from oats grown in Sweden (54350 records in total) at the grid level (11km \u0026times; 11km). These data were derived from analyses of oats delivered to Lantm\u0026auml;nnen elevators in Sweden. The variety was known for most of the samples, and the following varieties were occurring and used as input group variables: Belinda, Ingeborg, Galant, Guld, Symphony, Fatima, Kerstin and Matilda. Furthermore, one group called Feed oats (that could be different varieties) was also recorded as well as one group for which the variety had not been specified. Whether the oats was grown for feed or food use; if it was organically cultivated (EKO) or not; and mean DON value of the previous year in the same grid were used as model input variables. Mean DON values represented the average values of DON concentrations of each oats variety group in the particular grid in each year, provided the number of oat deliveries of that variety group was more than 10 in that particular grid in that year. DON contamination levels were used as a model output variable and were defined based on the mean values of DON concentration per region per year. 31% of the records referred to DON concentrations that were below limits of quantification (LOQ\u0026thinsp;=\u0026thinsp;100 \u0026micro;g/kg) of analytical methods used (Ridascreen ELISA or Charm Later Flow Devices had been used for analyzing DON contents); 4% of the records referred to DON concentrations that were above maximum legal limits in foodstuffs (1750 \u0026micro;g/kg), and; 0.2% of the records were above maximum legal limits in feed (8000 \u0026micro;g/kg). Three contamination levels were set: low (82% records) (\u0026lt;\u0026thinsp;500\u0026micro;g/kg), medium (9% records) (\u0026ge;\u0026thinsp;500 \u0026micro;g/kg and \u0026lt;\u0026thinsp;1000 \u0026micro;g/kg), and high (8% records) (\u0026ge;\u0026thinsp;1000 \u0026micro;g/kg). These settings were chosen from a practical farming point of view; below 500, there is no need for farmers to take any actions, whereas above 1000, farmers are recommended to always consider spraying or to check the level by taking out a reference sample.\u003c/p\u003e \u003cp\u003e \u003cem\u003eWeather data\u003c/em\u003e \u003c/p\u003e \u003cp\u003eWeather data include 8 years (2012\u0026ndash;2019) of weather features in Sweden at grid level (11km \u0026times; 11km). These data were derived from the Swedish Meteorological and Hydrological Institute (SMHI). Selected variables included the maximum air temperature (˚C) (HTEMP), minimum air temperature (˚C) (LTEMP), mean air temperature (˚C) (XTEMP), rainfall (mm) (NED), mean relative humidity (%) (XHUM), minimum relative humidity (%) (LHUM), maximum relative humidity (%) (HHUM), wind speed (XVH), wind direction (XVR), and global radiation (XM). Weekly mean values and weekly sum values per grid of the above-mentioned weather features were calculated in different oat growing periods for the development of three prediction models: SS-model (week 18\u0026ndash;21), MS-model (week 18\u0026ndash;26), and FS-model (week 18\u0026ndash;33). In addition, monthly mean values and monthly sum values per grid of the above-mentioned weather features from Nov 1 of the previous year to April 30 of the current year were calculated and added to the three models.\u003c/p\u003e \u003cp\u003e \u003cem\u003eAgronomical and site-specific data\u003c/em\u003e \u003c/p\u003e \u003cp\u003eAgronomical and site-specific data include 2 years (2016\u0026ndash;2017) of agronomical features in Sweden at oat field level aggregated to the grid level (11km \u0026times; 11km). These data were derived by linking the oats deliveries from one producer to the fields at which oats had been grown by that producer that year, and that geographical information. Data were extracted from several sources, and then linked with DON contamination levels per grid per year (11km \u0026times; 11km). The derived variables included: oats variety; year; the value range and mean value of clay, sand, and elevation; the percentage of oat, ley, other cereals except for oat; and other crops grown in the fields in the previous year (pre-crop); and two years before (pre-pre crop). Information on pre-crops was extracted from the Land Parcel Identification System Maps provided by the Swedish Board of Agriculture. Elevation data were extracted from a 2x2 m digital elevation model in raster format provided by Lantm\u0026auml;teriet (Swedish National Survey, G\u0026auml;vle, Sweden) and soil texture information was extracted from a digital soil mapping of arable land in Sweden (Piikki \u0026amp; S\u0026ouml;derstr\u0026ouml;m, \u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e2019\u003c/span\u003e).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec4\" class=\"Section2\"\u003e \u003ch2\u003e2.2. Data split for model training and validation\u003c/h2\u003e \u003cp\u003eFigure \u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e shows the model development steps using dataset 1 and dataset 2.\u003c/p\u003e \u003cp\u003eFor dataset 1, records from the years 2012\u0026ndash;2019 (except 2016) were split randomly into a training set (80%) for model learning, and a test set (20%) for internal model validation. Data from the year 2016 were used for external model validation only. The reason is that the distribution of DON contamination levels in the year 2016 was close to the average of year 2012\u0026ndash;2019. The predicted model results for the test set were graphically compared with the measured (observed) mycotoxin data to visualize the model prediction ability.\u003c/p\u003e \u003cp\u003eFor dataset 2, records were split randomly into a training set (80%) for model learning, and a testing set (20%) for internal model validation. Because agronomical and site-specific data were only available in the year 2016 and 2017, no external validation was conducted here.\u003c/p\u003e \u003cp\u003eIn addition, to test the importance of adding other features to weather data in promoting the model\u0026rsquo;s predictive accuracy, for each dataset, the model performance was compared when using weather features only and when using weather with agronomical and site-specific features (the result of this comparison is added in the \u003cspan refid=\"Sec12\" class=\"InternalRef\"\u003eAppendix\u003c/span\u003e).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec5\" class=\"Section2\"\u003e \u003ch2\u003e2.3. Predictive model\u003c/h2\u003e \u003cp\u003eA machine learning module was developed to predict the contamination of DON in oats at the grid level in Sweden, in three levels for the likelihood of contamination (low, medium, high) using above mentioned variables as input. A random forest (RF) algorithm was applied because RF can automatically handle missing values, can efficiently handle non-linear parameters, is comparatively little impacted by noise, is robust to outliers and new data, avoids overfitting, is able to deal with unbalanced data, and is widely used to deal with spatial data (Biau \u0026amp; Scornet, \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e2016\u003c/span\u003e). Python (version 3.9) programming language and data analysis library Scikit-learn (version 1.0) were used. Confusion metrics, classification accuracy, and generalization ability were used as evaluation criteria to evaluate the performance of the predictive model (G\u0026eacute;ron, \u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e2019\u003c/span\u003e) (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e). Confusion metrics reflected actual values on one axis and predicted values on another. Classification accuracy for each level and total classification accuracy reflected the model performance on each level and all levels. Generalization ability reflects the model\u0026rsquo;s capability to adapt and react properly to previously unseen, new data. In this study, we performed five-fold cross validation for model training (hyperparameter tuning) (Yang \u0026amp; Shami, \u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e2020\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eA predictive model was first trained on dataset 1, and model performances were evaluated based on the above-mentioned aspects. Then, following the same model development procedure, we trained the predictive model on dataset 2 to analyze the importance of weather features, agronomical features and site-specific features.\u003c/p\u003e \u003cp\u003eThe feature impact of the input variables of the developed models was analyzed and sorted. Tree SHAP (SHapley Additive exPlanations) algorithm was used to perform the feature impact analysis (Lundberg et al., \u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). Tree SHAP allows interpreting predictions made by often complex black box machine learning algorithms. Feature impact provides (often desirable) interpretation of the model input variables\u0026rsquo; contribution towards the model prediction and highlights the positive and negative impact of such variables for identifying different contamination levels of DON contamination.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e"},{"header":"3. Results","content":"\u003cdiv id=\"Sec7\" class=\"Section2\"\u003e\n \u003ch2\u003e3.1. Describe analysis of data.\u003c/h2\u003e\n \u003cdiv class=\"gridtable\"\u003e\n \u003cdiv class=\"colspec\" align=\"char\"\u003e\u0026nbsp;\u003c/div\u003e\n \u003ctable id=\"Tab1\" border=\"1\"\u003e\n \u003ccaption\u003e\n \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e\n \u003cdiv class=\"CaptionContent\"\u003e\n \u003cp\u003eThe count of grids with DON contamination levels low, medium and high in the different years.\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth rowspan=\"2\" align=\"left\"\u003e\n \u003cp\u003eContamination levels\u003c/p\u003e\n \u003c/th\u003e\n \u003cth colspan=\"8\" align=\"left\"\u003e\n \u003cp\u003eYear\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003e2012\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003e2013\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003e2014\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003e2015\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003e2016\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003e2017\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003e2018\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003e2019\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003elow\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e292\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e374\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e663\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e571\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e535\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e449\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e519\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e755\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003emedium\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e139\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e163\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e86\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e9\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e39\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e5\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e19\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e3\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003ehigh\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e154\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e123\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e68\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e12\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e55\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e5\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e12\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e3\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n \u003c/div\u003e\n \u003cp\u003eFigure \u003cspan class=\"InternalRef\"\u003e4\u003c/span\u003e shows the grids with oats used in the investigation in 2012\u0026ndash;2019, which were drawn using geo-referenced grid points. Large variation in DON contamination levels can be seen in Table \u003cspan class=\"InternalRef\"\u003e1\u003c/span\u003e. The changes of weather variables from year 2012 to 2019 were displayed in \u003cspan class=\"InternalRef\"\u003eappendix\u003c/span\u003e figure A1, where large variation of monthly and weekly rainfall and temperature can be observed. The weather during the summers of 2013, 2015\u0026ndash;2017 was considered relatively normal with a bit dryer weather in 2013 and cooler weather in 2015. The summer of 2012 was colder and wetter than normal whereas 2014 was warm but extremely wet in August in the Southwest part of Sweden. In 2018, the summer was exceptionally dry and warm with large negative effects on crop yields. Also 2019 was a warm year, especially in the very south of Sweden, with normal amounts of rain..\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec8\" class=\"Section2\"\u003e\n \u003ch2\u003e3.1. Model result on dataset 1\u003c/h2\u003e\n \u003cp\u003eFollowing the model development procedure, the predictive model for DON contamination level (low, medium, high) in spring oats in Sweden was trained using training data from dataset 1 (80% of 2012\u0026ndash;2019, except for 2016). The five-fold cross-validation result (mean prediction accuracies) for the SS model, MS model and the FS model were 0.73, 0.72, and 0.72, respectively\u003csup\u003e1\u003c/sup\u003e.\u003c/p\u003e\n \u003cp\u003eThen, models were tested on the \u0026ldquo;new\u0026rdquo; data (20% of all records of 2012\u0026ndash;2019 except for the year 2016). Model result showed consistent performance with cross-validation results. The total prediction accuracies for the SS models, MS models, FS models were 0.73, 0.72, and 0.73, respectively. Figure\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e5\u003c/span\u003e displays the prediction results for each DON contamination levels (low, medium, high) of the internal validation (20% of 2012\u0026ndash;2019 except 2016) in detail. The confusion matrix (upper) visualized the internal model validation results by comparing the actual and predicted DON contamination level. Take SS-model as an example (upper left), 554\u0026thinsp;+\u0026thinsp;6\u0026thinsp;+\u0026thinsp;7 samples were predicted as low-contamination level, whereas 554 samples were correctly predicted as belonging to the low-contamination level, and 6\u0026thinsp;+\u0026thinsp;7 samples were wrongly predicted, their true levels are the medium and high-contamination class, respectively. The normalized matrix for the SS-model (lower left) shows that the prediction accuracies for the high, medium, and low contamination level are 0.76, 0.61, and 0.58 respectively.\u003c/p\u003e\n \u003cp\u003eIf using weather data only (remove crop variety from dataset 1) for model training and validation, the total accuracy on the test dataset for each model was 0.72, 0.72, and 0.73, (\u003cspan class=\"InternalRef\"\u003eAppendix\u003c/span\u003e Figure A3). Thus, adding crop variety to the weather data did not improve the overall DON contamination classification accuracy of each class. Also, adding data for the full season (FS model) did not improve the models\u0026rsquo; performance as compared with only using data from parts of the season (SS and MS models).\u003c/p\u003e\n \u003cp\u003eThe external validation result for the prediction of DON contamination levels (low, medium, high) in oats in Sweden in 2016 using the model trained on weather data and crop variety from 2012 to 2019 except 2016 is shown in Figure \u003cspan class=\"InternalRef\"\u003eA5\u003c/span\u003e. The total classification accuracy for the SS-model, MS-model, and FS-model were 0.83, 0.28, 0.71, respectively. The external validation result of the model was not as good as internal validation results, indicating that a good model performance on the training and testing dataset doesn\u0026rsquo;t guarantee a good performance for a \u0026ldquo;new\u0026rdquo; year (for explanation about the low performance of the model, please refer to the discussion section).\u003c/p\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec9\" class=\"Section2\"\u003e\n \u003ch2\u003e3.2. Model result on dataset 2\u003c/h2\u003e\n \u003cp\u003eTo analyze the feature impact on DON contamination levels taking into account the weather, agronomical, and site specific features, a predictive model was developed using dataset 2 (2016 and 2017) following the same model development procedure as described in section \u003cspan class=\"InternalRef\"\u003e3.1\u003c/span\u003e. Note that agronomical and site-specific features (except crop variety) were only available in the years 2016 and 2017, which is the reason a separate model has been developed using data from those two years. The total accuracy for the SS-model, MS-model, and FS-model was 0.94, 0.95, and 0.96, respectively (Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e6\u003c/span\u003e). If using weather data only, the total accuracy for each model was, 0.82, 0.81, and 0.88, respectively (\u003cspan class=\"InternalRef\"\u003eAppendix\u003c/span\u003e Figure A4). The result shows that 1) weather features are the most important variables for DON contamination model development, 2) adding crop variety, and agronomical variables could improve the overall DON contamination classification accuracy, as well as the accuracy of each class\u003c/p\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec10\" class=\"Section2\"\u003e\n \u003ch2\u003e3.3. Feature impact analysis\u003c/h2\u003e\n \u003cp\u003eSHAP (SHapley Additive exPlanations) values were used to explain how much each independent variable contributes to the final prediction on DON in oats in Sweden. Features were ranked based on their importance on predicting DON contamination levels.\u003c/p\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003cp\u003eFigure \u003cspan class=\"InternalRef\"\u003e7\u003c/span\u003e. Feature average impact ranking (top 20) using the dataset 2 for FS-model, showing the overall average impact (a), the directionality (positive or negative) of impact on the low contamination level (b), the directionality of impact on the medium contamination level (c), and the directionality of impact on the high contamination level (d). The feature is indicated on the y-axis and the SHAP value of it is shown on the x-axis. Positive SHAP values represent the positive impact on the contamination level, negative SHAP values represent the negative impact on the contamination level. Number 11, 12, 1, 2, 3, and 4 represent the month of November, December, January, February, March, and April. Numbers ranging from 18 to 33 represent week numbers from 18 to 33. MSUM and MAVE represent monthly mean values and sum values of weather features. AVE and SUM represent weekly mean values and sum values of weather features. Weather data include maximum air temperature (HTEMP), minimum air temperature (LTEMP), mean air temperature (XTEMP), rainfall (NED), mean relative humidity (XHUM), and minimum relative humidity (LHUM), maximum relative humidity (HHUM), wind speed (XVH), wind direction (XVR), and global radiation (XM).\u003c/p\u003e\n \u003cp\u003eFigure \u003cspan class=\"InternalRef\"\u003e7\u003c/span\u003ea shows the overall average impact and variables are ordered by importance (in terms of the absolute value of their contribution). For example, from Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e7\u003c/span\u003ea it can be seen that the most important variable in determining DON contamination levels was the average rainfall in December (\u0026ldquo;NED_MAVE,12\u0026rdquo;). Figures\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e7\u003c/span\u003eb ,\u003cspan class=\"InternalRef\"\u003e7\u003c/span\u003ec, and \u003cspan class=\"InternalRef\"\u003e7\u003c/span\u003ed show the directionality of the impact on the low contamination level (7b), the medium contamination level (7c), and the high contamination level (7d), respectively. Positive SHAP values represent the positive impact on the contamination level, negative SHAP values represented the negative impact on the contamination level. For example, the results in Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e7\u003c/span\u003eb indicate that lower average rainfall in December (\u0026ldquo;NED_MAVE,12\u0026rdquo;) contribute to low levels of DON contamination. Conversely, the results in Figs.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e7\u003c/span\u003ec and \u003cspan class=\"InternalRef\"\u003e7\u003c/span\u003ed indicate that higher average rainfall in December contribute to both medium and high levels of DON contamination.\u003c/p\u003e\n \u003cul\u003e\n \u003cli\u003e\n \u003cp\u003eFeature impact analysis on weather features\u003c/p\u003e\n \u003c/li\u003e\n \u003c/ul\u003e\n \u003cp\u003eFigure \u003cspan class=\"InternalRef\"\u003e8\u003c/span\u003e presents in detail the feature impact on the model outcomes for several weather features based on feature dependency analysis. The two variables average rainfall in December (\u0026ldquo;NED_MAVE,12\u0026rdquo;) and weekly average maximum temperature in the beginning of August (\u0026apos;HTEMP_AVE\u0026apos;, 32) were selected since these were the input weather features that had the highest impact on the model output. For example, in Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e8\u003c/span\u003e, the three figures on the top show that a low precipitation in December contributed to low DON contamination level (positive contribution). This was the other way around for medium and high levels of DON contamination (negative contribution). The three figures at the bottom show that the lower average maximum temperature in the beginning of August contributed to a high frequency of the medium and high levels of DON contamination (positive contribution), whereas this was the other way around for low levels of DON contamination (negative contribution).\u003c/p\u003e\n \u003cul\u003e\n \u003cli\u003e\n \u003cp\u003eFeature impact analysis on agronomical features\u003c/p\u003e\n \u003c/li\u003e\n \u003c/ul\u003e\n \u003cp\u003eFigure \u003cspan class=\"InternalRef\"\u003e9\u003c/span\u003e presents a detailed explanation of agronomical features for feature impact on the model outputs using feature dependency analysis. The three variables of crop varieties BELINDA, GALANT, and KERSTIN, were selected because they were the non-weather feature that gave the highest impact on the model output. If the crop variety was BELINDA (1.0 in X-axis), it contributed to low levels of DON contamination. The crop variety GALANT contributed on low levels of DON contamination as well. The crop variety KERSTIN contributed to medium and high levels of DON contamination. Crop rotation did not explain much of the variations in DON contamination levels and were therefore not displayed here (see \u003cspan class=\"InternalRef\"\u003eappendix\u003c/span\u003e Figure A6).\u003c/p\u003e\n \u003cul\u003e\n \u003cli\u003e\n \u003cp\u003eFeature impact analysis on site-specific features\u003c/p\u003e\n \u003c/li\u003e\n \u003c/ul\u003e\n \u003cp\u003eFigure \u003cspan class=\"InternalRef\"\u003e10\u003c/span\u003e shows the feature dependency analysis of mean value and range value of elevation and mean value of soil type (percentage of sand or lay) variables on DON contamination levels. These variables were site-specific features that gave the highest impact on the model output. For example, larger variations in elevation within fields contributed to higher DON levels (elevation range\u0026thinsp;\u0026lt;\u0026thinsp;25 m negative contribution, and elevation range\u0026thinsp;\u0026gt;\u0026thinsp;25m positive contribution to high DON levels). On high elevation fields (\u0026gt;\u0026thinsp;60m) soils with high sand content and low clay content contributed to high DON levels, and the reason could be stress (due to draught in the high elevation field) makes the crop more vulnerable to fungi infection.\u003c/p\u003e\n\u003c/div\u003e\n\u003cp\u003e[1] Five-fold cross-validation results for the SS-model were \u0026nbsp;0.72, 0.70, 0.72, 0.72, 0.72 with a mean of 0.72; for the MS-model, values were 0.72, 0.71, 0.74, 0.72, 0.72 with a mean of 0.72; for the FS-model, values were 0.71, 0.70, 0.73, 0.72, 0.73 with a mean of 0.72.\u003c/p\u003e\n\u003cp\u003e\u0026nbsp;\u003c/p\u003e"},{"header":"4. Discussion and Conclusions","content":"\u003cp\u003eIn the current study, three different predictive models (SS, MS, and FS model) for DON contamination levels at the regional scale in oats in Sweden were developed. Model classification accuracy showed to be high, ranging from 0.7 to 0.9 depending on different years and models. The developed models can provide valuable information to three different stakeholder groups in the oat supply chain; farmers, crop collectors, and food safety authorities, as a tool that can help in the management of mycotoxins in the oats supply chain and risk-based testing. Results showed that 1) weather variables are the most important for predicting DON contamination in oats, 2) adding relevant agronomical and site specific factors, such as crop variety, crop rotation, soil type and DON contamination condition in the previous year could improve the performance of the models, 3) good predictions could be made already in June by using the SS-model, as based on internal validation, and 4) rainfall, relative humidity, and wind speed in different growing stages as well as crop variety and elevation were the most important features for predicting DON contamination levels in oats. However, predicting individual years not included in the training of the models proved to be difficult.\u003c/p\u003e \u003cp\u003eTo date, few studies have incorporated weather, agronomical, and site-specific data to predict the regional DON contamination in oats using machine learning. But many studies have paved the way for using these data for DON contamination prediction. One study modeling the effects of weather features on DON contamination in oats indicated that the model accuracy could be improved if more factors (such as field tillage and the soil type) were included, in addition to the weather data (Marzec-Schmidt et al., \u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). One study investigated the association of several agronomic factors (including harvest date, crop season, county, farming system, moisture, test weight, oats variety, and previous crop) to the occurrence of Fusarium mycotoxins in Irish oats (Kolawole et al., 2021). This study concluded that the level of DON was modelled best by the variables of the previous crop and oat variety, and indicated the importance of exploring crop rotation in future studies. Another study investigated the prevention and control of mycotoxins in grains, and emphasized the importance of matching crop varieties to a specific agro-ecological zone with specific weather conditions (Matumba et al., 2021), indicating the necessity of linking weather data to the crop variety for model development. In our study, we used weather factors and relevant agronomical and site-specific factors as model inputs. Similarly to the previous studies, oat variety was the non-weather feature that had the highest impact on the model output, whereas information on previous crop could not explain much of the DON variation in our study. One of the reasons for the lack of influence of pre-crop information could perhaps be the aggregation of the data to the weather grids. The comparison of model performance using weather data, with and without agronomical and site specific factors confirmed that, in improving the performance of the DON predictive models, weather variables are the most important factors, and adding agronomical and site specific factors could further improve the overall classification accuracy (from 0.72 to 0.73 using dataset 1, from 0.81 to 0.95 using dataset 2). This was in line with the expectation of a previous study which suggested that DON prediction in oats could potentially be improved by combining weather-based risk index outputs with agronomic factors (Persson et al., \u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e2017\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eThe feature impact analysis indicated that rainfall, relative humidity, and wind speed in different oat growing stages as well as oat variety and elevation were the most important features for predicting DON contamination levels in oats. In general, weather variables (e.g., temperature, rainfall) in December of the previous year, weather variables (e.g., humidity, wind speed) around end of June (close to flowering season), and weather variables (e.g., humidity, temperature) around August (week 31,32,and 33 close to harvest season) were the most important features (Fig.\u0026nbsp;\u003cspan refid=\"Fig18\" class=\"InternalRef\"\u003e7\u003c/span\u003e). These results are in line with Hjelkrem et al., (\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e2017\u003c/span\u003e) who showed that dry periods during germination (March to April) contribute to high DON contamination of oats, and warm, rainy and humid weather around flowering contributed to high DON accumulation in oat. Marzec-Schmidt et al. (\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e2021\u003c/span\u003e) also confirmed that high relative humidity and precipitation around flowering correlated with high DON contamination levels in oat. Interestingly, the site-specific characteristics associated with high DON contamination levels in our study, high elevation and sandy soils, is related to dry conditions which may indicate that draught stress might have been important in the data set from 2016\u0026ndash;2017.\u003c/p\u003e \u003cp\u003eA previous study applied different models, including statistical analysis and machine learning techniques, for DON prediction in oats, resulting in different model performances (Lindblad et al., \u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e2012\u003c/span\u003e). Their results showed that very little of the variation in DON levels could be explained by agronomical or weather factors, and it was not possible to predict DON levels based on these variables. This low model performance could have been caused by the unbalanced data related to DON contamination, meaning only few records were related to high DON values and most of the records were related to low DON values. Poor model performance for predicting high mycotoxin contamination levels due to unbalanced datasets have also been encountered in other studies (Liu et al., \u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e2018\u003c/span\u003e; Liu et al., \u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). Their results showed that the developed models have higher performance for predicting the samples with low-level contamination than for the samples with high-level contamination. Our study applied a machine learning technique (the random forest algorithm) to handle unbalanced data, resulting in a relatively balanced classification accuracy in each DON contamination level (high, medium, low). Using both datasets 1 and 2, the model showed good classification accuracy of at least 0.7. It means that a rather large proportion of the sites for each of the three DON contamination levels were correctly classified. However, for medium and high contamination levels, still a portion of the sites (\u0026lt;\u0026thinsp;40%) was wrongly predicted. One reason could be that the data are very skewed, with a large number of sites belonging to the low level and much fewer sites belonging to the two other levels (mid, high) also in our data. A higher total classification accuracy than 0.7 could be reached for the predictive model if sacrificing the prediction accuracy of high and medium-contamination levels. There are usually trade-offs between the prediction accuracy of each contamination level when using Machine Learning predictive algorithms. If a model was able to predict the high and medium-contamination levels with high accuracy, the accuracy in predicting the low-contamination class was sacrificed, and vice visa. In our study, most of regions with high and medium contamination level were correctly predicted while some of the low-contamination regions were also predicted as high and medium contamination level. The models were designed to predict the three contamination levels and at the same time reduce the number of false negatives, i.e. regions that are predicted as having low contamination level but that in fact have medium or high contamination level. This is done at the expense of more false positives, i.e. regions that are predicted as having medium or high contamination level, but in fact have a low contamination level. For oat supply chain stakeholders, it is more important not to miss a region with high contamination, than to erroneously regard a non-contaminated region as contaminated. In addition, instead of using three class setting of DON contamination level, a better prediction result could be obtained by setting two class level (e.g. when using 500 ppb as the threshold value, the total prediction accuracy reached 0.84 for both internal and external validation, results not presented). The three models we designed could easily be adapted to achieve a higher prediction accuracy for the low contamination class or the higher total classification accuracy, at the expense of lower prediction accuracy on the high and medium contamination levels, depending on what the stakeholders prioritize.\u003c/p\u003e \u003cp\u003eThe SS, MS, and FS models were developed for different oat growing periods, and the results of these models provided several insights. First, on the internal validation using dataset 1 (20% of the data 2012\u0026ndash;2019 except 2016), results showed a total classification accuracy of \u0026gt;\u0026thinsp;0.7 for each of the three models (relatively good performance for all levels). This indicated that good predictions could be made already by June (SS-model). This result is in line with the study from Hjelkrem et al., (\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e2017\u003c/span\u003e). Their study showed that the prediction model using only pre-flowering weather data could adequately forecast the DON contamination in oat. Although good predictions could be made already by June, it is recommended to use weather variables in the full season when implementing the model in practice whenever possible. This is because DON contamination was mostly associated with the weather features around flowering as well as close to harvest (Hjelkrem et al., \u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e2017\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eThe SS, MS, and FS models resulted in relatively good performance for three contamination levels, but low performance for all three contamination levels with the external validation (dataset 1, 2016). The reason for the poor model performance with the external validation (dataset 1, 2016) could be the large variation in DON contamination distribution over the years (Table \u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). The trained model assigned weights (e.g., opposite to the rates of three contamination levels) to different DON contamination levels. When the distribution of DON contamination levels of the external validation dataset was very different from the trained dataset, and the model used the pre-assigned weights, it could have resulted in a poor performance in the external validation. In this study, data were split randomly into a training set (80%) for model learning where five-fold cross-validation was performed for model hyperparameter tuning. Then, the models were validated on \u0026ldquo;new\u0026rdquo; data using the internal and external validation dataset (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e). Other cross validation methods such as leave-one-year out could be applied (Hjelkrem et al., \u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e2017\u003c/span\u003e). However, DON contamination distribution varied greatly between the years, and there was a clear overall trend with reduced contamination over the years (Table \u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). Leave-one-year out validation is still problematic in achieving a high validation accuracy due the large changes of DON distribution in the three contamination levels. When considering the proposed implementation of the models in practice, the validation results are perhaps not truly reflecting the prediction performance of a prediction for a new coming year. The prediction performance for a coming year is highly related to the distribution of contamination levels. To further improve the model performance, adding data in more years to extend the training dataset to train the model for learning as many patterns as possible could be one possibility. In addition, future studies could use binary levels instead of multi-levels for DON contamination levels to increase the model classification accuracy.\u003c/p\u003e \u003cp\u003eOne limitation of our study was that we did not consider all biological relevant factors for DON prediction in our model due to lack of related data. Other relevant factors could include crop management practices, such as fertilization, irrigation and pest control (Munkvold, \u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e2014\u003c/span\u003e), the use of fungicides against \u003cem\u003eFusarium\u003c/em\u003e spp. around flowering (Van der Fels-Klerx et al., 2021; Liu et al., \u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e2018\u003c/span\u003e; Torelli et al., 2012) and the harvest conditions (such as timely harvest). This information needs to be collected via field surveys with oat farmers. We have been able to get a large amount of data in this study at the expense of detailled information on agronomical features. To collect data from incividual farmers is another concept, which takes time is prone to introduce faults and that the dataset will be smaller. In future studies, it might be of interest to determine whether the inclusion of other variables could further improve the performance of the models. Variables from open sources could also be included, such as satellite data, which could provide a great asset to further improve the prediction models (Yudarwati et al., \u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). Another limitation with our study is the aggregate data, as some of the variation in the additional site-specific information might be smoothed out, reducing possible more local relationships. Our study only considers DON contamination of oats, not other mycotoxins, and future studies could predict the multi-mycotoxin contamination when data related to other mycotoxins were available as well. Measures that could be taken to limit the contamination could also be of interest to add, as decision making tool in the next step.\u003c/p\u003e \u003cp\u003eIt can be concluded that the use of machine learning algorithms for DON prediction in oats, using contamination levels at the regional level in Sweden provides good prediction results when considering several years. Unfortunately, the models were not general enough to manage to predict DON-levels from individual years not included in the training of the model, i.e., model performance did not as high as internal validation when do external validation using leave one year out approach. One reason could be the DON contamination levels change as much as they do in the investigated years. Under such circumstances, it seems to be a better strategy to only use two risk-levels, above or below a certain level. However, this strategy has not been tested adequately in this investigation. Such models could be used as regional risk-assessment tools for farmers, crop collectors, and food safety inspectors for logistics in the oats supply chain, improved mycotoxin control, and risk-based testing. Given EC regulation 2017/625, food safety authorities need to apply risk-based control. Regions with a medium of high DON contamination level can be sampled and tested for the presence of DON more intensively than regions with low predicted contamination class. Collectors and food safety authorities of oats can also use the model predictions for deciding on testing frequencies, and they can use the predictions for routing and logistics in their oats supply chain.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eAuthor Contributions\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eConceptualization, IF, TB, JW, XW; methodology, XW; formal analysis, XW; investigation, XW; resources, TB, JW; data curation, TB, JW; writing\u0026mdash;original draft preparation, XW; writing\u0026mdash;review and editing: XW, TB, JW, IF; visualization, XW; supervision: IF; project administration: IF; funding acquisition: IF, TB. All authors have read and agreed to the published version of the manuscript.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis study is funded by the European commission on the EIP-agri grant 2017-1345 Infofusion Fusarium administrated by Swedish Board of Agriculture. The FORMAS project Baby Grain Passport, Grant 2019-02280 has co-funded this study.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eData Availability Statement\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe data presented in this study are not available due to DON contamination data are highly sensitive for the individual farmers.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAcknowledgments\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors acknowledge the contribution of the private partners and all growers who participated in this project.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConflicts of Interest\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eCommission, E. (2006). Recommendation 2006/576/EC on the presence of deoxynivalenol, zearalenone, ochratoxin A, T-2 and HT-2 and fumonisins in products intended for animal feeding. Off J Eur Union, 229, 7\u0026ndash;9.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHartman, E. Slutrapport Betr\u0026auml;ffande Foder \u0026amp; Spannm\u0026aring;ls Projekt om F\u0026ouml;rekomst av DON i 2020 \u0026aring;rs Spannm\u0026aring;lssk\u0026ouml;rd i Sverige. Available online: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.foderochspannmal.se/material-om-mykotoxiner-51.aspx\u003c/span\u003e\u003cspan address=\"https://www.foderochspannmal.se/material-om-mykotoxiner-51.aspx\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (accessed on 6 June 2021). (In Swedish).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBiau, G., \u0026amp; Scornet, E. (2016). A random forest guided tour. Test, \u003cem\u003e25\u003c/em\u003e(2), 197\u0026ndash;227.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCastano-Duque, L., Vaughan, M., Lindsay, J., Barnett, K., \u0026amp; Rajasekaran, K. (2022). Gradient boosting and bayesian network machine learning models predict aflatoxin and fumonisin contamination of maize in Illinois\u0026ndash;First USA case study. \u003cem\u003eFrontiers in microbiology\u003c/em\u003e, \u003cem\u003e13\u003c/em\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChain, E. P. o. C. i. t. F., Knutsen, H. K., Alexander, J., Barreg\u0026aring;rd, L., Bignami, M., Br\u0026uuml;schweiler, B., Ceccatelli, S., Cottrill, B., Dinovi, M., \u0026amp; Grasl-Kraupp, B. (2017). Risks to human and animal health related to the presence of deoxynivalenol and its acetylated and modified forms in food and feed. \u003cem\u003eEFSA journal\u003c/em\u003e, \u003cem\u003e15\u003c/em\u003e(9), e04718.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChampeil, A., Dor\u0026eacute;, T., \u0026amp; Fourbet, J.-F. (2004). Fusarium head blight: epidemiological origin of the effects of cultural practices on head blight attacks and the production of mycotoxins by Fusarium in wheat grains. Plant science, \u003cem\u003e166\u003c/em\u003e(6), 1389\u0026ndash;1415.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCommission, E. (2006). Commission Regulation (EC) No 1881/2006 of 19 December 2006 setting maximum levels for certain contaminants in foodstuffs. Off. J. Eur. Union, \u003cem\u003e364\u003c/em\u003e, 5\u0026ndash;24.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eEuropean Commission. 2006b. Commission Recommendation of 17 August 2006 on the presence of deoxynivalenol, zearalenone, ochratoxin A, T-2 and HT-2 Food Additives and Contaminants 5 Downloaded by [National Food Administration] at 01:23 13 February 2012 and fumonisins in products intended for animal feeding (2006/576/EC). Off J Eur Union. L229:7\u0026ndash;9.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCzaban, J., Wr\u0026oacute;blewska, B., Sułek, A., Mikos, M., Boguszewska, E., Podolska, G., \u0026amp; Nier\u0026oacute;bca, A. (2015). Colonisation of winter wheat grain by Fusarium spp. and mycotoxin content as dependent on a wheat variety, crop rotation, a crop management system and weather conditions. Food Additives \u0026amp; Contaminants: Part A, \u003cem\u003e32\u003c/em\u003e(6), 874\u0026ndash;910.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eG\u0026eacute;ron, A. (2019). \u003cem\u003eHands-on machine learning with Scikit-Learn, Keras, and TensorFlow: Concepts, tools, and techniques to build intelligent systems\u003c/em\u003e. \" O'Reilly Media, Inc.\".\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHjelkrem, A.-G. R., Torp, T., Brodal, G., Aamot, H. U., Strand, E., Nordskog, B., Dill-Macky, R., Edwards, S. G., \u0026amp; Hofgaard, I. S. (2017). DON content in oat grains in Norway related to weather conditions at different growth stages. European Journal of Plant Pathology, \u003cem\u003e148\u003c/em\u003e(3), 577\u0026ndash;594.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJanssen, E., Liu, C., \u0026amp; Van der Fels-Klerx, H. (2018). Fusarium infection and trichothecenes in barley and its comparison with wheat. World Mycotoxin Journal, \u003cem\u003e11\u003c/em\u003e(1), 33\u0026ndash;46.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKaukoranta, T., Hietaniemi, V., R\u0026auml;m\u0026ouml;, S., Koivisto, T., \u0026amp; Parikka, P. (2019). Contrasting responses of T-2, HT-2 and DON mycotoxins and Fusarium species in oat to climate, weather, tillage and cereal intensity. European Journal of Plant Pathology, \u003cem\u003e155\u003c/em\u003e(1), 93\u0026ndash;110.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKrebs, H., Dubois, D., Kulling, C., \u0026amp; Forrer, H. (2000). Effects of preceding crop and tillage on the incidence of Fusarium spp. and mycotoxin deoxynivalenol content in winter wheat grain. Agrarforschung, \u003cem\u003e7\u003c/em\u003e(6), 264\u0026ndash;268.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi, S., Liu, N., Cai, D., Liu, C., Ye, J., Li, B., Wu, Y., Li, L., Wang, S., \u0026amp; van der Fels-Klerx, H. (2023). A predictive model on deoxynivalenol in harvested wheat in China: Revealing the impact of the environment and agronomic practicing. Food Chemistry, \u003cem\u003e405\u003c/em\u003e, 134727.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLindblad, M., B\u0026ouml;rjesson, T., Hietaniemi, V., \u0026amp; Elen, O. (2012). Statistical analysis of agronomical factors and weather conditions influencing deoxynivalenol levels in oats in Scandinavia. Food Additives \u0026amp; Contaminants: Part A, \u003cem\u003e29\u003c/em\u003e(10), 1566\u0026ndash;1571.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLiu, C., Manstretta, V., Rossi, V., \u0026amp; der Fels-Klerx, V. (2018). Comparison of three modelling approaches for predicting deoxynivalenol contamination in winter wheat. Toxins, \u003cem\u003e10\u003c/em\u003e(7), 267.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLiu, N., Liu, C., Dudaš, T., Loc, M., Bagi, F., \u0026amp; Der Fels-Klerx, V. (2021). Improved aflatoxin and fumonisin forecasting models for maize (PREMA and PREFUM), using combined mechanistic and Bayesian network modelling\u0026ndash;Serbia as a case study. Frontiers in microbiology, \u003cem\u003e12\u003c/em\u003e, 630.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLundberg, S. M., Erion, G., Chen, H., DeGrave, A., Prutkin, J. M., Nair, B., Katz, R., Himmelfarb, J., Bansal, N., \u0026amp; Lee, S.-I. (2020). From local explanations to global understanding with explainable AI for trees. Nature machine intelligence, \u003cem\u003e2\u003c/em\u003e(1), 56\u0026ndash;67.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMarzec-Schmidt, K., B\u0026ouml;rjesson, T., Suproniene, S., Jędryczka, M., Janavičienė, S., G\u0026oacute;ral, T., Karlsson, I., Kochiieru, Y., Ochodzki, P., \u0026amp; Mankevičienė, A. (2021). Modelling the Effects of Weather Conditions on Cereal Grain Contamination with Deoxynivalenol in the Baltic Sea Region. Toxins, \u003cem\u003e13\u003c/em\u003e(11), 737.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMoretti, A., Pascale, M., \u0026amp; Logrieco, A. F. (2019). Mycotoxin risks under a climate change scenario in Europe. Trends in food science \u0026amp; technology, \u003cem\u003e84\u003c/em\u003e, 38\u0026ndash;40.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMunkvold, G. (2014). Crop management practices to minimize the risk of mycotoxins contamination in temperate-zone maize. Mycotoxin reduction in grain chains, \u003cem\u003e1\u003c/em\u003e, 59\u0026ndash;77.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePerrone, G., Ferrara, M., Medina, A., Pascale, M., \u0026amp; Magan, N. (2020). Toxigenic fungi and mycotoxins in a climate change scenario: Ecology, genomics, distribution, prediction and prevention of the risk. Microorganisms, \u003cem\u003e8\u003c/em\u003e(10), 1496.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePersson, T., Eckersten, H., Elen, O., Roer Hjelkrem, A.-G., Markgren, J., S\u0026ouml;derstr\u0026ouml;m, M., \u0026amp; B\u0026ouml;rjesson, T. (2017). Predicting deoxynivalenol in oats under conditions representing Scandinavian production regions. Food Additives \u0026amp; Contaminants: Part A, \u003cem\u003e34\u003c/em\u003e(6), 1026\u0026ndash;1038.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePiikki, K., \u0026amp; S\u0026ouml;derstr\u0026ouml;m, M. (2019). Digital soil mapping of arable land in Sweden\u0026ndash;Validation of performance at multiple scales. Geoderma, \u003cem\u003e352\u003c/em\u003e, 342\u0026ndash;350.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang, X., Liu, C., \u0026amp; van der Fels-Klerx, H. (2022). Regional prediction of multi-mycotoxin contamination of wheat in Europe using machine learning. Food Research International, \u003cem\u003e159\u003c/em\u003e, 111588.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eXu, X., Madden, L. V., Edwards, S. G., Doohan, F. M., Moretti, A., Hornok, L., Nicholson, P., \u0026amp; Ritieni, A. (2013). Developing logistic models to relate the accumulation of DON associated with Fusarium head blight to climatic conditions in Europe. European Journal of Plant Pathology, \u003cem\u003e137\u003c/em\u003e(4), 689\u0026ndash;706.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYang, L., \u0026amp; Shami, A. (2020). On hyperparameter optimization of machine learning algorithms: Theory and practice. Neurocomputing, \u003cem\u003e415\u003c/em\u003e, 295\u0026ndash;316.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYudarwati, R., Hongo, C., Sigit, G., Barus, B., \u0026amp; Utoyo, B. (2020). Bacterial Leaf Blight Detection in Rice Crops Using Ground-Based Spectroradiometer Data and Multi-temporal Satellites Images. J. Agric. Sci, \u003cem\u003e12\u003c/em\u003e, 38.\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"npj-science-of-food","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"npjscifood","sideBox":"Learn more about [npj Science of Food](http://www.nature.com/npjscifood/)","snPcode":"41538","submissionUrl":"https://submission.springernature.com/new-submission/41538/3","title":"npj Science of Food","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"NPJ","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"grain, DON, mycotoxin, food safety, forecasting, machine learning, crop variety, crop rotation, agronomical factors, agronomy, food safety management, feature impact analysis","lastPublishedDoi":"10.21203/rs.3.rs-3979106/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-3979106/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eWeather conditions and agronomical factors are known to affect \u003cem\u003eFusarium\u003c/em\u003e spp. growth and ultimately deoxynivalenol (DON) contamination in oat. This study aimed to develop predictive models for the contamination of spring oat at harvest with DON on a regional basis in Sweden using machine-learning algorithms. Three models were developed as regional risk-assessment tools for farmers, crop collectors, and food safety inspectors, respectively. Data included weather data from different oat growing periods, agronomical data, site-specific data, and DON contamination data from the previous year. The region, year, spring oat variety, type of cultivation (organic or not) and if the oat is intended for feed or food - was used as input to predict DON contamination for entries into classes of low (\u0026lt; 500 µg/kg), medium (≥ 500 µg/kg, and \u0026lt; 1000 µg/kg), and high (≥ 1000 µg/kg). A random forest (RF) algorithm was applied to train the models. Results showed that: 1) RF models were able to predict DON contamination at harvest with a total classification accuracy of minimal 0.72, over \u0026nbsp;the years 2012-2019, and above 0.90 in the years 2016-2017, however not for individual years not included in the training of the models (external validation); 2) good predictions could already be made in June but using weather variables in the full growing season could improve the model’s robustness; 3) weather variables were the most important for predicting DON contamination, but adding agronomical and site-specific factors to weather variables as model inputs could improve the overall model performance; \u0026nbsp;4) rainfall, relative humidity, and wind speed in different oat growing stages, followed by crop variety and elevation were the most important features for predicting DON contamination in spring oats at harvest. In future studies, it might be of interest to explore whether including data for other agronomic variables, such as fertilization, irrigation, and pest control, as well as satellite image data could further improve the model performance.\u003c/p\u003e","manuscriptTitle":"Regional prediction of deoxynivalenol contamination in spring oats in Sweden using machine learning","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-03-07 07:03:32","doi":"10.21203/rs.3.rs-3979106/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"revise","date":"2024-05-16T03:46:34+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"This content is not available.","date":"2024-05-10T15:19:18+00:00","index":2,"fulltext":"This content is not available."},{"type":"reviewerAgreed","content":"This content is not available.","date":"2024-04-30T07:28:19+00:00","index":2,"fulltext":"This content is not available."},{"type":"editorInvitedReview","content":"This content is not available.","date":"2024-04-15T12:35:29+00:00","index":1,"fulltext":"This content is not available."},{"type":"reviewerAgreed","content":"This content is not available.","date":"2024-03-31T11:34:42+00:00","index":1,"fulltext":"This content is not available."},{"type":"reviewersInvited","content":"","date":"2024-03-05T09:55:51+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2024-02-23T05:17:07+00:00","index":"","fulltext":""},{"type":"submitted","content":"npj Science of Food","date":"2024-02-22T15:58:35+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2024-02-22T15:58:35+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"npj-science-of-food","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"npjscifood","sideBox":"Learn more about [npj Science of Food](http://www.nature.com/npjscifood/)","snPcode":"41538","submissionUrl":"https://submission.springernature.com/new-submission/41538/3","title":"npj Science of Food","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"NPJ","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"26aef612-4890-4248-a70c-87b1c0ce949f","owner":[],"postedDate":"March 7th, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[{"id":29139240,"name":"Biological sciences/Microbiology/Fungi/Fungal biology"},{"id":29139241,"name":"Scientific community and society/Agriculture"}],"tags":[],"updatedAt":"2024-10-05T07:10:23+00:00","versionOfRecord":{"articleIdentity":"rs-3979106","link":"https://doi.org/10.1038/s41538-024-00310-w","journal":{"identity":"npj-science-of-food","isVorOnly":false,"title":"npj Science of Food"},"publishedOn":"2024-10-04 04:00:00","publishedOnDateReadable":"October 4th, 2024"},"versionCreatedAt":"2024-03-07 07:03:32","video":"","vorDoi":"10.1038/s41538-024-00310-w","vorDoiUrl":"https://doi.org/10.1038/s41538-024-00310-w","workflowStages":[]},"version":"v1","identity":"rs-3979106","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-3979106","identity":"rs-3979106","version":["v1"]},"buildId":"qtupq5eGEP_6zYnWcrvyt","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.