A Machine Learning Approach to Genome-Wide Association Mapping of Disease Resistance and Geographic Origin in Sorghum | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article A Machine Learning Approach to Genome-Wide Association Mapping of Disease Resistance and Geographic Origin in Sorghum Ezekiel Ahn, Insuck Baek, Sunchung Park, Louis K. Prom, Seunghyun Lim, and 5 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-6222361/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 28 Feb, 2026 Read the published version in BMC Plant Biology → Version 1 posted 17 You are reading this latest preprint version Abstract Background Sorghum, often considered the fifth most important cereal crop globally, faces significant production constraints caused by various fungal diseases. Understanding the genetic basis of disease resistance and adaptation to geographic origin is crucial for developing improved varieties. This study investigates these aspects in a diverse panel of 377 sorghum accessions using a machine learning-enabled genome-wide association study (GWAS). Results The study analyzed a panel of 377 sorghum accessions, including a mini core collection and additional accessions from Senegal. Phenotypic evaluation for resistance to anthracnose, head smut, and downy mildew was conducted on the mini core collection. Genotypic data comprising nearly 300,000 SNP markers were used for GWAS with Bootstrap Forest models. While phenotypic clustering based on disease resistance did not directly correlate with geographic origin, significant genetic differentiation was observed based on geographic origin. Machine learning-driven GWAS identified SNPs associated with geographic origin, particularly on chromosome 10, with candidate genes including transcription factors. SNPs near genes with known or predicted roles in plant defense were identified for disease resistance, such as zinc-binding proteins for anthracnose and LRR- and NB-ARC-containing proteins for head smut. Conclusions This research provides insights into the complex genetic architecture of disease resistance and geographic adaptation in sorghum. In addition to previously known resistant genes through traditional GWAS, the identified candidate genes and associated SNPs offer valuable resources for enhancing disease resistance in sorghum breeding programs through marker-assisted selection and other advanced breeding techniques. Sorghum Geographic origin Genetic diversity Machine learning GWAS Anthracnose Head smut Downy mildew Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Introduction Sorghum ( Sorghum bicolor L. Moench) is a staple cereal crop for over 500 million people worldwide, particularly in the semi-arid tropics of Africa and Asia, where it serves as a critical source of food, feed, and biofuel [ 1 – 3 ]. Sorghum's high tolerance to drought and heat makes an excellent candidate crop for ensuring food security in the future, particularly in regions susceptible to climate change [ 4 ]. However, its production is significantly constrained by both biotic and abiotic stresses [ 5 ]. Significant yield losses in sorghum can result from fungal diseases such as anthracnose, head smut, and downy mildew, posing a severe pressure on farmers' lives and the food security of dependent populations [ 6 ]. Disease-resistant cultivars are often considered the best method for managing sorghum diseases [ 7 ]. Hence, sorghum breeding aiming to generate varieties with improved disease resistance and better adaptation to various environments is critically important. Anthracnose, caused by the fungal pathogen Colletotrichum sublineola , is a widespread and destructive sorghum disease, affecting all aerial parts of the plant and causing significant reductions in grain yield and quality [ 8 ]. Yield losses exceeding 50% have been reported under severe anthracnose epidemics [ 9 , 10 ]. Head smut, caused by the biotrophic fungus Sporisorium reilianum (Kühn) Langdon & Fullerton, is another major constraint to sorghum production, particularly in humid and warmer growing regions [ 11 ]. Infection by S. reilianum leads to replacing the sorghum panicle with a large, black sorus filled with fungal spores, resulting in the complete grain yield loss in infected plants [ 12 ]. Downy mildew, brought on by the oomycete Peronosclerospora sorghi , can also cause substantial losses, particularly in susceptible varieties grown under humid conditions [ 13 ]. P. sorghi infection manifests as localized leaf lesions and systemic infection, leading to stunting, reduced tillering, and the development of characteristic "downy" abaxial growth on infected leaves [ 14 ]. Moreover, systemically infected seedlings turn pale yellow or have light-color streaking on the leaf, are chlorotic and stunted, and prematurely die [ 7 ]. While some sources of resistance to these diseases have been identified and deployed in sorghum breeding programs, new pathogen races and the complex, polygenic nature of resistance pose ongoing challenges. Further investigation is needed into the genetic architecture of resistance to these three diseases to develop sorghum varieties with durable and broad-spectrum resistance. Sorghum is an ancient crop of Africa that spread across diverse environments, especially in the semiarid tropics of Africa and South Asia [ 15 ]. The geographic origin of plants plays a pivotal role in shaping this diversity [ 16 ], as geographically distant populations have adapted under different selective pressures, such as climate, soil types, and microbe populations. As an outcome, locally adapted landraces with unique trait combinations have evolved. Studies in sorghum and other crops have demonstrated that geographic patterns of genetic variation often mirror patterns of environmental variation, suggesting a strong link between adaptation and geographic origin [ 17 – 19 ]. For instance, accessions in arid regions may possess enhanced drought tolerance, while accessions from regions with high disease pressure have evolved unique and multiple resistance genes [ 20 , 21 ]. Furthermore, humans' historical movement and exchange of plant germplasm have further contributed to the complex patterns of genetic diversity observed in sorghum [ 22 – 24 ]. A deep understanding of how geographic origin has shaped sorghum's genetic diversity is essential for uncovering its evolutionary history and crop improvement. By analyzing the genomes of sorghum accessions from different regions, we can identify genes and pathways that have been selected under diverse environmental pressures. This knowledge, in turn, can be directly applied to breeding more resilient and robust sorghum varieties. The availability of diverse germplasm is also fundamental to understanding the genetic basis of adaptive traits and making significant progress in crop improvement [ 25 – 27 ]. Broadening the genetic base of breeding programs by including landraces and geographically diverse accessions can introduce novel alleles and allelic combinations currently not integrated into elite cultivars. This is particularly important for traits such as disease resistance, where pathogens continuously evolve and overcome existing resistance mechanisms [ 28 , 29 ]. In 2001, the International Crops Research Institute for the Semi-Arid Tropics (ICRISAT) gene bank developed a core collection of 2,247 sorghum accessions from its larger germplasm collection of over 37,000 accessions [ 6 , 30 ]. However, this core collection was too large for many replicated evaluation studies. Hence, a sorghum mini core collection consisting of 10% was developed to represent a global snapshot of sorghum diversity [ 6 , 30 ]. This mini core collection has been widely used to study the genetic architecture of various traits such as disease resistance [ 31 ]. Based on this mini core collection, our previous study used a GWAS of resistance to anthracnose, head smut, and downy mildew [ 6 ]. The current study builds upon this foundation with two primary objectives. First, we characterized the genetic relationships among an expanded panel of 377 sorghum accessions, including the mini core collection and accessions from Senegal, to understand the population structure and identify potential sources of unique genetic variation. Second, we sought to identify genomic regions associated with both geographic origin and disease resistance using the machine learning model Bootstrap Forest (a variance of Random Forest) [ 32 ] for GWAS. The machine learning algorithm allows us to capture complex relationships between genetic markers and traits, providing a more nuanced understanding of the genetic architecture underlying these complex traits [ 33 ]. We hypothesized that this integrated approach, combining diverse germplasm, high-density SNP genotyping, and machine learning, would enable us to uncover novel genetic diversity, identify genomic regions underlying adaptation to different geographic origins, and pinpoint candidate genes associated with disease resistance. By leveraging the power of machine learning in GWAS, the results of this study are expected to provide valuable insights for developing improved sorghum varieties with enhanced resilience and productivity. Materials and methods Phenotypic and genotypic data This study utilizes phenotypic and genotypic data from a previous study [6] that included phenotypic evaluations of disease resistance for a subset of the sorghum mini core collection, as well as genotypic data (SNP markers) for 377 accessions (242 mini core accessions and an additional 135 accessions from Senegal). The 135 Senegalese accessions were not phenotyped for disease resistance and are included in the present study solely for analyses of genetic diversity and geographic origin. For the mini core accessions, resistance to three major sorghum diseases, anthracnose, head smut, and downy mildew, was assessed as follows. However, not all 242 accessions were evaluated for all three diseases: Anthracnose resistance was evaluated for 216 accessions using a spray inoculation method with a mixture of five C. sublineola isolates as described by Ahn et al. [6]. Disease severity was rated on a scale of 1 to 5, where 1 indicated no symptoms and 5 indicated severe infection, with accessions classified as resistant or susceptible based on the presence of acervuli (fungal fruiting bodies) [6]. Downy mildew resistance was evaluated for 213 accessions using the sandwich inoculation technique with P. sorghi pathotype 6, and accessions with 10% or less disease incidence were considered resistant [6]. Head smut resistance was assessed for 204 accessions using a syringe inoculation method with S. reilianum sporidia, and accessions were scored as resistant if no infection was detected [6]. Genotypic data for these accessions comprised 297,876 SNP markers and was obtained from this previous study [6]. In short, the SNP data were based on the sorghum reference genome version 3.1.1, initially generated using genotyping-by-sequencing (GBS) [34–37] and subsequently imputed using Beagle 4.1 [38] to address missing data. Multivariate analysis of phenotypic and genotypic data Principal Component Analysis (PCA) and t -distributed Stochastic Neighbor Embedding ( t -SNE) were performed to assess overall phenotypic variation and relationships among sorghum accessions based on disease resistance. PCA, conducted using JMP Pro 17 software (SAS Institute Inc., Cary, NC) [39], was used to determine the contribution of each disease resistance phenotype to this overall variation. t -SNE, a non-linear dimensionality reduction technique, was also employed using JMP Pro 17 with default parameters (output dimensions = 2, perplexity = 30, maximum iterations = 1,000, initial principal component dimensions = 50, convergence criterion = 1e−8, initial scale = 0.0001, Eta (learning rate) = 200, inflate iterations = 250, and random seed = 123) to further explore the underlying structure of the phenotypic variation and identify potential clusters of accessions with similar disease resistance profiles. Genetic distances among accessions were calculated based on the 297,876 SNP markers using the Identity by State (IBS) method implemented in TASSEL 5 software [40]. Two-tailed t -tests were performed to assess differences in genetic distances between resistant and susceptible accessions for each disease. A one-way analysis of variance (ANOVA) was used to evaluate differences in genetic distances among groups based on their country and geographic region of origin. We also performed a hierarchical clustering analysis using the Ward method based on the SNP markers to investigate the genetic relationships among the sorghum accessions with the default setting using JMP Pro 17. Genome-wide association analysis of geographic origin and disease resistance using Bootstrap Forest We performed a GWAS using a machine learning approach based on Bootstrap Forest model to identify genomic regions associated with geographic origin and disease resistance. These models used the same 297,876 SNP markers and the phenotypic data, specifically the binary resistant/susceptible classification, for each geological location/disease as the target variable. Unlike traditional GWAS methods that rely on linear or logistic regression and generate p -values based on statistical significance under a specific model, Bootstrap Forest model does not directly produce p -values. Instead, model performance is assessed through measures like accuracy, and the contribution of individual SNPs is quantified using importance scores. Geographic origin: Bootstrap Forest models were trained to predict the geographic origin of the sorghum accessions, using either country of origin or broader geographic region as the target variable. The models were trained on the SNP markers, with the dataset randomly split (Random seed= 1 in JMP Pro 17) into 80% for training and 20% for validation. The data split was conducted using the default settings in JMP Pro 17, with the following parameters for the Bootstrap Forest models: number of trees in the forest = 100, number of terms sampled per split = 1, bootstrap sample rate = 1, minimum splits per tree = 10, maximum splits per tree = 2000, minimum size split = 5. We first assessed the prediction accuracy on the training and validation sets to ensure robust model performance. After that, we trained another model using the same settings but with 100% of the data for training, which allowed us to use the full dataset for identifying important SNPs while ensuring that the model had been adequately validated. The top SNPs associated with the country and region of origin were identified based on their importance scores (portion= the total contribution of an SNP to the model's accuracy). To explore potential candidate genes involved in geographic adaptation, the nearest genes to these top SNPs were identified using the Sorghum bicolor reference genome v3.1.1 from Phytozome 12 (https://phytozome-next.jgi.doe.gov/) [41]. Disease resistance: The genetic basis of disease resistance was investigated using the same Bootstrap Forest modeling approach described for geographic origin. Separate models were trained to predict resistance to each disease: anthracnose, head smut, and downy mildew. These models used the same SNP markers and the phenotypic data (resistant/susceptible) for each disease as the target variable. Model parameters, data splitting procedures, and variable importance assessments were identical to those used in the geographic origin analyses. The top SNPs associated with resistance to each disease were identified, and the nearest genes to these SNPs were identified using the Sorghum bicolor reference genome v3.1.1 from Phytozome 12 to explore potential genetic mechanisms underlying disease resistance [41]. A random seed of 1 was used for all machine learning model training to ensure reproducibility. Results Phenotypic and genetic diversity of disease resistance in the sorghum mini core collection We examined the distribution of resistance and susceptibility phenotypes for each of the three diseases within the subset of the mini core collection that was phenotyped (Fig. 1). For anthracnose, 216 accessions were evaluated, with 105 classified as resistant and 111 as susceptible. Head smut resistance was assessed in 204 accessions, revealing 92 resistant and 112 susceptible lines. Downy mildew showed a different pattern, with 213 accessions evaluated and a higher proportion of susceptible (163) than resistant (50) accessions. To explore the phenotypic variation in disease resistance, a t -SNE analysis was performed (Fig. 2a). The t -SNE visualization revealed four distinct clusters and a fifth, more diffuse central group. While these clusters show some degree of structure, they do not correspond directly to geographic origin, indicating that phenotypic similarity in disease resistance is not solely driven by geographic proximity. This pattern suggests the existence of four primary resistance profiles within the mini core collection, along with a group of accessions with intermediate or mixed resistance phenotypes. The contributions of each disease resistance phenotype to overall phenotypic variation were further examined using a loading plot derived from a PCA (Fig. 2b). The first two principal components, PC1 and PC2, explained 36% and 32.3% of the total phenotypic variation, respectively. The loading plot illustrated that resistance and susceptibility responses for each disease were negatively correlated. While all three diseases contributed significantly to the observed variation, downy mildew resistance strongly influenced PC2, whereas anthracnose resistance had the most substantial influence on PC1. This suggests that these two diseases have been major drivers of phenotypic divergence in the mini core collection. Genetic diversity among the mini core accessions was assessed using SNP data. Genetic distances were calculated based on the IBS method (Fig. 2c). When grouped by disease resistance phenotype, susceptible accessions exhibited slightly greater genetic distances compared to resistant accessions for anthracnose and head smut (two-tailed t -test: p = 0.0011 for anthracnose, p = 0.0196 for head smut). However, no significant difference was observed for downy mildew ( p = 0.541). This suggests that anthracnose and head smut resistance may be more genetically conserved within the mini core collection. In contrast, downy mildew resistance might be more genetically diverse or influenced by fewer genes with more significant effects. Analysis of genetic distances by country of origin revealed significant variation (one-way ANOVA: p < 0.0001). Notably, accessions from Nicaragua (0.13), Lesotho (0.138), and Thailand (0.14) exhibited the lowest genetic distances, while those from China (0.197), Algeria (0.198), Mali (0.206), Gambia (0.275), and Sierra Leone (0.284) exhibited the highest. These significant genetic distances are related to geographic region (ANOVA p < 0.0001), with accessions from South America (0.15), Southern Africa (0.151), and Central America & Caribbean (0.153) showing the lowest genetic distances, and those from the Middle East (0.191), West Africa (0.193), and East Asia (0.196) displaying the highest, hinting that geographic origin plays a role in shaping the genetic diversity and, consequently, the disease resistance response of the mini core collection. Concurrently, accessions with similar resistance from geographically distant regions also hint at the possibility of disease-driven convergent evolution or the historical exchange of germplasm contributing to the observed patterns. Genetic relationships among mini core and Senegalese accessions based on SNP markers To investigate the genetic relationships among the mini core and additional Senegal accessions, we performed hierarchical clustering on a combined dataset of 377 accessions (including controls such as SC748-5) using all 297,876 SNP markers (Fig. 3a). Application of the Ward method, which minimizes within-cluster variance, resulted in the identification of 17 distinct clusters based on the similarity of their SNP genotype profiles. Notably, the Senegalese accessions grouped into approximately six distinct clusters, as indicated by the gray blocks on the dendrogram, highlighting their genetic distinctiveness relative to the mini core collection. Likewise, the relationships between these 17 clusters are also presented in a constellation plot (Fig. 3b) based on the genetic relationships. Genome-wide association analysis of geographic origin using Bootstrap Forest models To further investigate the genetic basis of geographic origin within the sorghum mini-core collection, we performed a GWAS employing a machine learning approach, specifically Bootstrap Forest models. This analysis aimed to identify specific genomic regions (SNPs) significantly associated with the geographic origin of the accessions, categorized by either country or broader region. Initial model validation was performed using an 80/20 training/validation split (Table 1). The training set accuracies for the geographic origin models were high (0.9136 for Country and 0.9383 for Region). However, the validation set accuracies were considerably lower: 0.5 for Country and 0.6481 for Region. Critically, the Generalized R-square for the Country model was negative (-0.534), indicating low model fit on the validation set. While the Region model showed a positive Generalized R-square (0.7866), the low validation accuracy for both geographic origin models suggests that predicting geographic origin, especially at the country level, is challenging with this dataset, likely due to factors such as small sample sizes, the complex history of sorghum dispersal, and potential misclassification of origin in the original germplasm collection. Training set accuracies for the disease resistance models were also high (anthracnose: 0.9753; head smut: 0.9675; downy mildew: 0.9814). However, validation set performance varied considerably across traits. The anthracnose resistance model showed a relatively low but potentially usable validation accuracy of 0.6296 and a low Generalized R-square (0.0548). Validation performance for head smut and downy mildew resistance models was low, with negative Generalized R-square values (-0.06 and -0.063, respectively) and low validation accuracies (0.52 and 0.846, respectively), indicating that these models are not reliable for prediction. The Country-level geographic origin model similarly showed insufficient validation, with a validation accuracy of only 0.5 and a negative Generalized R-square (-0.534). The Region-level geographic origin model performed better, with a validation accuracy of 0.6481 and a positive Generalized R-square (0.7866), but it still fell short of our initial target. Table 1. Model performance metrics for Bootstrap Forest models predicting geographic origin and disease resistance. This table presents the results of the initial 80/20 split validation for the Bootstrap Forest models. Metrics are shown for each trait's training and validation sets: Country of origin, geographic region, anthracnose resistance, head smut resistance, and downy mildew resistance. The metrics reported are: number of samples, entropy R-square, misclassification rate, Area Under the ROC Curve (AUC), Root Average Squared Error (RASE), and generalized R-square. # of sample Entropy RSquare Misclassification Rate AUC RASE Generalized RSquare Country Training 162 0.7044 0.0864 1 0.60458 0.9922 Validation 54 -0.073 0.5 0.8234 0.82237 -0.534 Region Training 162 0.7819 0.0617 0.9999 0.38833 0.978 Validation 54 0.3605 0.3519 0.8896 0.63063 0.7866 Anthracnose Training 162 0.6695 0.0247 0.9991 0.22873 0.8061 Validation 54 0.0303 0.3704 0.6598 0.48742 0.0548 Headsmut Training 154 0.6538 0.0325 0.9966 0.23841 0.7935 Validation 50 -0.032 0.48 0.5737 0.51023 -0.06 Downy mildew Training 161 0.6096 0.0186 1 0.23597 0.7372 Validation 52 -0.042 0.1538 0.5653 0.37079 -0.063 Given the inconsistent and, in some cases, weak validation performance, we proceeded with extreme caution. While the training set accuracies for all models were high (ranging from 0.9136 to 0.9814), indicating that the models can capture relationships between SNPs and the traits within the training data, the limited validation results suggest that these relationships may not generalize well to new data. Therefore, we focused primarily on the importance scores from Bootstrap Forest models trained on the full dataset (100% of the accessions) to identify potential candidate SNPs and genes. This approach allows us to leverage all available data. However, the results, especially for those traits with low validation performance, should be considered exploratory and require further validation in independent datasets. Based on their importance scores from this final model, the top SNPs associated with geographic origin and disease resistance are discussed below (Fig. 4 & Table 2 for geographic origin, Fig. 5 and Table 3 for disease resistance). For the country of origin model, the SNPs with the highest importance scores were: S10_5199488 (nearest gene: Sobic.010G065500 , BHLH transcription factor PTF1-like), S1_16693424 (nearest gene: Sobic.001G188800 , Cysteine-rich repeat secretory protein), S10_7769197 (nearest gene: Sobic.010G089700 , EREBP-like factor), S10_7675418 (nearest gene: Sobic.010G089300 , DUF6598 domain-containing protein), S6_53674050 and S6_53674064 (both nearest gene: Sobic.006G181900 , Aluminium-activated malate transporter), and S10_8370408 (nearest gene: Sobic.010G094100 , No annotation) (Fig. 4a, Table 2). For the broader geographic region model, a different set of SNPs emerged as most important: S10_7678887, S10_7680604, and S10_7675418 (all nearest gene: Sobic.010G089300 , DUF6598 domain-containing protein), S9_53912320 and S9_53912321 (both nearest gene: Sobic.009G186100 , No annotation), S10_7790015 and S10_7769197 (both nearest gene: Sobic.010G089700 , EREBP-like factor), S10_7715727 (nearest gene: Sobic.010G089600 , Trehalose 6-phosphate phosphatase), S5_3709735 (nearest gene: Sobic.005G040800 , Protein of unknown function (DUF2921)), S10_5199488 (nearest gene: Sobic.010G065500 , BHLH domain-containing protein), S9_53923701 and S9_53923509 (nearest gene: Sobic.009G186300 , Tyrosine-protein kinase), S4_50278385 (nearest gene: Sobic.004G158100 , Phosphatidylinositol transfer protein), and S3_20398659 (nearest gene: Sobic.003G163301 , No annotation) (Fig. 4b, Table 2). The prominence of SNPs on chromosome 10 in both the country-level and region-level analyses, particularly those near genes encoding a DUF6598 domain-containing protein and an EREBP-like factor, suggests that this chromosome may harbor genes or regulatory elements that have significantly influenced the adaptation and diversification of sorghum across different geographic regions. When the analysis was performed using the country of origin as the target variable (Fig. 4a), the most important SNP was S10_5199488, followed by S1_16693424, S10_7769197, S10_7675418, S6_53674050, S6_53674064, and S10_8370408. Similarly, when the analysis used the broader geographic region as the target variable (Fig. 4b), a distinct set of SNPs emerged as highly important. The most important SNPs for predicting region of origin were S10_7675418, S10_7678887, S10_7680604, S9_53912320, S10_7769197, S10_7790015, followed by a number of other SNPs on chromosomes 9 and 10 (S10_7715727, S9_53912321, S10_5199488) and chromosome 3, 4, 5 and 6 (S3_20398659, S4_50278385, S5_3709735, S6_51602713, respectively). Table 2. Details of the most important SNPs associated with the geographic origin (country and region) of sorghum mini core accessions identified by Bootstrap Forest models. The table includes the SNP identifier (SNP ID), the nearest gene and its putative function (if known), the distance in base pairs between the SNP and the nearest gene (0 indicates that the SNP is within the gene), and the importance score (portion) of the SNP in the model. SNPs are grouped by whether they were identified in the country-level or region-level analysis. SNP ID Nearest gene and function Base pairs away Importance score (portion) Country S10_5199488 Sobic.010G065500 BHLH transcription factor PTF1-like 0 0.0753 S1_16693424 Sobic.001G188800 Cysteine-rich repeat secretory protein 189 0.0186 S10_7769197 Sobic.010G089700 EREBP-like factor 0 0.0185 S10_7675418 Sobic.010G089300 DUF6598 domain-containing protein 10,801 0.0158 S6_53674050 S6_53674064 Sobic.006G181900 Aluminium-activated malate transporter 0 0.0126 S10_8370408 Sobic.010G094100 No annotation 10,311 0.012 Region S10_7678887 S10_7680604 S10_7675418 Sobic.010G089300 DUF6598 domain-containing protein 7,332 0.0652 S9_53912320 S9_53912321 Sobic.009G186100 No annotation 49 0.0307 S10_7790015 S10_7769197 Sobic.010G089700 EREBP-like factor 0 0.0273 S10_7715727 Sobic.010G089600 Trehalose 6-phosphate phosphatase 8,601 0.0103 S5_3709735 Sobic.005G040800 Protein of unknown function (DUF2921) 0 0.0089 S10_5199488 Sobic.010G065500 BHLH domain-containing protein 0 0.0068 S9_53923701 S9_53923509 Sobic.009G186300 Tyrosine-protein kinase 254 0.0115 S4_50278385 Sobic.004G158100 Phosphatidylinositol transfer protein 54 0.0111 S3_20398659 Sobic.003G163301 No annotation 0 0.0109 Identification of SNPs and candidate genes associated with disease resistance using Bootstrap Forest models Genotypic and phenotypic data were inputted into Bootstrap Forest models to identify genomic regions highly associated with resistance to the three major sorghum diseases. For each disease, separate models were trained by using the same SNP markers, resulting in the identification of SNPs with standing out importance scores for each disease, suggesting their significant contribution to predicting disease resistance. The importance scores and details of the top SNPs for each disease, including their nearest genes and associated putative functions, are presented in Fig. 5 & Table 3. For anthracnose resistance, the SNPs with the highest importance scores were S3_61650227 and S3_61650258 (nearest gene: Sobic.003G281450 , Reverse transcriptase zinc-binding domain-containing protein), S1_6061658 (nearest gene: Sobic.001G078900 , Putative Myb-like DNA-binding protein), S4_48780280 (nearest gene: Sobic.004G154200 , Ankyrin repeat-containing domain), S3_69017597 (nearest gene: Sobic.003G375500 , Zinc finger PHD-type domain-containing protein), S1_71075125 (nearest gene: Sobic.001G431800 , Uncharacterized protein DUF292), and S10_6800171 (nearest gene: Sobic.010G079900 , H15 domain-containing protein) (Fig. 5a). For head smut resistance, the top SNPs were S1_73522544 (nearest gene: Sobic.001G459600 , Leucine-rich repeats-containing protein), S1_73523267 and S1_73523586 (nearest gene: Sobic.001G459600 , NB-ARC domain-containing protein), S1_73516778 (nearest gene: Sobic.001G459500 , Leucine-rich repeats-containing protein), S2_719163 (nearest gene: Sobic.002G007700 , Nodulin-like domain-containing protein), S6_38161717 (nearest gene: Sobic.006G051700 , NB-ARC domain // WRKY DNA -binding domain), S5_1701005 (nearest gene: Sobic.005G018900 , Phosphoinositide-specific phospholipase C), and S8_49187870 (nearest gene: Sobic.008G104801 , DUF4220 domain-containing protein) (Fig. 5b). For downy mildew resistance, the top SNPs were S3_30166225 (nearest gene: Sobic.003G169966 , 3-5 exonuclease), S8_56098111 (nearest gene: Sobic.008G133300 , UNC-93 like protein), S10_8958624 (nearest gene: Sobic.010G098932 , Uncharacterized protein), S2_54739711 (nearest gene: Sobic.002G173500 , Glycosyltransferase), S8_58354598 (nearest gene: Sobic.008G150400 , F-box and WD40 domain protein), S2_8823410 (nearest gene: Sobic.002G082500 , Protein LURP-one-related 15), and S1_74545705 (nearest gene: Sobic.001G473300 , Protein phosphatase) (Fig. 5c). The identification of multiple SNPs with high importance scores for each disease supports the hypothesis that resistance to anthracnose, head smut, and downy mildew in sorghum is likely controlled by multiple genes or genomic regions, exhibiting a polygenic inheritance pattern. Table 4. List of top SNPs associated with resistance to three sorghum diseases, identified by Bootstrap Forest models. Details of the most important SNPs associated with resistance to anthracnose, head smut, and downy mildew in sorghum, as identified by Bootstrap Forest models. The table provides the SNP ID, which includes chromosome number and position; the nearest gene to the SNP, along with its putative function; the distance in base pairs between the SNP and the nearest gene; and the importance score of the SNP in the Bootstrap Forest model, which reflects the contribution of the SNP to predicting disease resistance. SNP ID Nearest gene and function Base pairs away Importance score (portion) Anthracnose S3_61650227 S3_61650258 Sobic.003G281450 Reverse transcriptase zinc-binding domain-containing protein 1,345 0.0255 S1_6061658 Sobic.001G078900 Putative MYB-like DNA-binding protein 0 0.0101 S4_48780280 Sobic.004G154200 Ankyrin repeat-containing domain 63,053 0.0091 S3_69017597 Sobic.003G375500 Zinc finger PHD-type domain-containing protein 0 0.0064 S1_71075125 Sobic.001G431800 Uncharacterized protein DUF292 0 0.0062 S10_6800171 Sobic.010G079900 H15 domain-containing protein 850 0.0057 Head smut S1_73522544 Sobic.001G459600 Leucine-rich repeats (LRR)-containing protein 222 0.00138 S1_73523267 S1_73523586 Sobic.001G459600 NB-ARC domain-containing protein 0 0.00130 S1_73516778 Sobic.001G459500 Leucine-rich repeats (LRR)-containing protein 270 0.0081 S2_719163 Sobic.002G007700 Nodulin-like domain-containing protein 0 0.0060 S6_38161717 Sobic.006G051700 NB-ARC domain // WRKY DNA -binding domain 0 0.0053 S5_1701005 Sobic.005G018900 Phosphoinositide-specific phospholipase C 0 0.0052 S8_49187870 Sobic.008G104801 DUF4220 domain-containing protein 466 0.0045 Downy mildew S3_30166225 Sobic.003G169966 3-5 exonuclease 11,050 0.0064 S8_56098111 Sobic.008G133300 UNC-93 like protein 0 0.0062 S10_8958624 Sobic.010G098932 Uncharacterized protein 2,458 0.0055 S2_54739711 Sobic.002G173500 Glycosyltransferase 18,134 0.0051 S8_58354598 Sobic.008G150400 F-box and WD40 domain protein 0 0.0044 S2_8823410 Sobic.002G082500 Protein LURP-one-related 15 2,370 0.0044 S1_74545705 Sobic.001G473300 Protein phosphatase 0 0.0044 Discussion This study investigates the complex interplay of geographic origin, genetic diversity, and disease resistance in sorghum. We primarily analyzed the global mini core collection, representing a broad sorghum diversity spectrum and additional Senegalese lines [ 30 ]. This collection has been valuable for assessing resistance to various diseases, specifically anthracnose, leaf blight, and rust [ 31 ]. To enhance our assessment of genetic diversity, we included SNP data from an additional set of Senegalese accessions, which allowed for a more comprehensive evaluation of the relationship between geographic origin, genetic variation, and disease resistance in sorghum. Leveraging a combination of phenotypic evaluations for resistance to three major diseases, extensive genotypic data from 297,876 SNP markers, and an advanced machine learning approach, we aimed to have a deeper understanding of the genetic architecture of sorghum disease resistance and to identify genomic regions associated with geographic adaptation. Our analyses revealed complex relationships among geographic origin, genetic diversity, and disease resistance profiles in sorghum. While the t -SNE analysis (Fig. 2 a) of disease resistance phenotypes in mini core lines showed some clustering, this clustering did not directly correlate with geographic origin, showing mixed originated country per each cluster. This suggests that, although adaptation to local environments, including pathogen pressures [ 42 ], likely plays a role, the observed patterns of disease resistance are not solely explained by geographic proximity. The loading plot from the PCA (Fig. 2 b) further showed the contribution. The loading plot illustrated that resistance and susceptibility responses for each disease were negatively correlated. While all three diseases contributed significantly to the observed variation, downy mildew resistance strongly influenced PC2, whereas anthracnose resistance had the most influence on PC1. Moreover, this study also found statistical significance in genetic distances based on the originating country/geographic region (ANOVA, p < 0.0001) (Fig. 2 c), confirming that geographic origin might be a significant factor in shaping the genetic structure of the sorghum accessions [ 15 ]. Furthermore, hierarchical clustering based on SNP data (Fig. 3 a and b) revealed that the Senegalese accessions formed approximately six distinct clusters, largely separate from the mini core collection. This clear genetic separation between the Senegalese lines and most of the mini core accessions points to a unique evolutionary trajectory for sorghum in Senegal, likely driven by adaptation to specific local environmental pressures, such as unique pathogen populations or soil conditions, and potentially by different agricultural practices. To gain deeper insights into the genetic foundations of the observed geographic patterns, we conducted a GWAS employing Bootstrap Forest models, a powerful machine-learning approach capable of identifying complex associations between genotype and phenotype [ 43 ]. This analysis aimed to pinpoint specific genomic regions (SNPs) significantly associated with the geographic origin of the accessions by using both country (56 countries) and broader regional (13 regions) classifications. While the training set accuracies for these models were high (exceeding 90%), the validation set accuracies varied considerably and were, in some cases, low (Table 1 ). Notably, the model for predicting the country of origin showed poor validation performance, with a negative Generalized R-square. The model used to predict broader geographic regions showed better performance, although its validation accuracy was still notably lower than that of the training set. This discrepancy between training and validation set accuracy, particularly for the country-level model, suggests that predicting specific geographic origins from SNP data in this dataset is challenging and potentially requires more considerable data input. As regional predictions were comparably more accurate, it could be a simple matter of a number of classification categories: 13 regions vs 56 countries. While these models have a good explanatory power of the dataset, we acknowledge that their predictive power on unseen data is limited. Consequently, we focused our interpretation on the SNP importance scores derived from Bootstrap Forest models trained on the full dataset. This approach allows us to leverage all available information to identify potential candidate genes. The analysis identified numerous SNPs with high importance scores, indicating their substantial contribution to distinguishing between accessions from different geographic locations (Fig. 4 , Table 2 ). When using the country of origin as the target variable, the most significant SNP was S10_5199488, located within a gene ( Sobic.010G065500 ) encoding a BHLH transcription factor (Table 2 ) [ 44 ]. This family of transcription factors plays diverse roles in plant development and responses to environmental stimuli and may be a potential link between this genomic region and adaptation to specific ecological conditions within different countries [ 44 – 46 ]. Other important SNPs for the country of origin were found on chromosomes 1, 6, and 10 (Table 2 ), further accentuating the polygenic nature of geographic adaptation. For instance, the SNP S1_16693424 is located near the gene Sobic.001G188800 , which encodes a cysteine-rich repeat secretory protein (CRRSP). CRRSPs are a family of proteins known to be involved in various processes, including plant development [ 47 ], stress responses [ 47 , 48 ], and signaling. In plants of the Arabidopsis genus, CRRSPs are induced by pathogen infection and treatment with reactive oxygen species or salicylic acid [ 49 ]. Similarly, when the analysis was performed using the broader geographic region as the target variable, a distinct set of SNPs emerged as highly important, with several located on chromosome 10 (Fig. 4 b, Table 2 ). The most important SNP for the region of origin, S10_7675418 (and two other nearby SNPs), is located near a gene ( Sobic.010G089300 ) encoding a DUF6598 domain-containing protein. The precise function of the gene in sorghum is yet to be explored, but a DUF6598 domain has been found as a candidate gene for powdery mildew resistance in wheat [ 50 ]. Additionally, SNP S10_7790015, also found to be important for predicting the region of origin, is located near a gene encoding an EREBP-like factor, further highlighting the potential role of transcription factors in sorghum's adaptation to abiotic and biotic stresses, such as drought [ 51 ] and fungal pathogens [ 52 ], which can vary significantly across broad geographic regions. The recurring prominence of SNPs on chromosome 10 in both analyses strongly suggests that this chromosome, in particular, carries genes or regulatory elements that have played a central role in the adaptation and diversification of sorghum across different geographic regions. This pattern was also observed in Medicago , where SNP-based clustering and GWA analysis using machine learning highlighted the importance of chromosome 8 in distinguishing geographic origins [ 53 ]. This ultimately highlighted the power of machine learning approaches like Bootstrap Forest for identifying complex genetic associations to plant origin and local adaptation [ 53 ]. Building upon the insights into geographic adaptation, we extended our investigation to explore the genetic architecture of resistance to three major sorghum diseases: anthracnose, head smut, and downy mildew. Employing the same Bootstrap Forest modeling approach, we again trained separate models for each disease using identical SNP markers. Numerous SNPs possessed high importance scores in each model, displaying their substantial contribution to predicting disease resistance phenotypes (Fig. 5 , Table 3). The most important SNPs for anthracnose resistance were found on chromosomes 1, 3, 4, and 10, while those for head smut resistance were located on chromosomes 1, 2, 5, 6, and 8. Similarly, SNPs associated with downy mildew resistance were identified on chromosomes 1, 2, 3, 8, and 10. This wide distribution of important SNPs across chromosomes explains that resistance to these diseases is a complex, polygenic trait in sorghum, involving many genes with minor to moderate phenotypic effects. These genes are involved in a wide array of biological processes, insinuating the presence of diverse molecular mechanisms that contribute to the overall resistance phenotype in sorghum. For anthracnose resistance, the top SNP, S3_61650227 and S3_61650258, is located near a gene ( Sobic.003G281450 ) encoding a reverse transcriptase zinc-binding domain-containing protein. While the precise role of this gene in anthracnose resistance is unknown, its zinc-binding domain is of particular interest. Zinc-binding proteins have been implicated in plant defense responses; for instance, a zinc-binding citrus protein metallothionein can act as a plant defense factor [ 54 ]. In Arabidopsis , zinc has been shown to trigger signaling mechanisms and defense responses that promote resistance to Alternaria brassicicola [ 55 ]. Moreover, a zinc metalloprotease in Fusarium graminearum targets a wheat zinc-binding protein, contributing to the pathogen's overall virulence [ 56 ]. These findings suggest that zinc-binding proteins, such as the one encoded by Sobic.003G281450 , may play a role in sorghum's defense against anthracnose. Interestingly, a linear mixed model identified this region as a top candidate SNP for anthracnose resistance in our previous traditional GWAS analysis [ 6 ]. This further augments the argument that this region, and potentially the zinc-binding domain-containing protein encoded by Sobic.003G281450 (or nearby genes), is involved in sorghum's defense response. Other genes involved in diverse functions were found near top SNPs, including genes encoding for a putative MYB-like DNA-binding protein, ankyrin repeat-containing domain, zinc finger PHD-type domain-containing protein, and more. For head smut resistance, the most important SNP, S1_73522544, is located near the gene Sobic.001G459600 , which encodes a leucine-rich repeat (LRR)-containing protein. LRR-containing proteins are well-established components of plant immune systems, playing a substantial role in pathogen recognition and the activation of downstream defense responses [ 57 , 58 ]. Curiously, another LRR-containing protein encoded by the gene Sobic.001G459500 was also identified as a top candidate for head smut resistance in our previous traditional GWAS analysis [ 6 ]. This gene is located near the SNP S1_73516778, further supporting the importance of LRR-containing proteins in sorghum's defense against head smut. In addition to these LRR-containing proteins, other genes associated with head smut resistance in this study point to a complex interplay of defense signaling and recognition mechanisms. These include genes encoding an NB-ARC domain-containing protein and a protein with both NB-ARC and WRKY DNA-binding domains. NB-ARC domains are characteristic of intracellular immune receptors (R proteins) that recognize pathogen effectors and trigger defense responses, while WRKY transcription factors are key regulators of defense gene expression [ 59 ]. Notably, a recent study in chickpeas demonstrated a direct physical interaction between a CC-NB-ARC-LRR protein and a WRKY transcription factor, promoting resistance to Fusarium wilt [ 60 ]. Finally, for downy mildew resistance, the top SNPs include SNP, S2_54739711, located near Sobic.002G173500 , a gene encoding a glycosyltransferase protein. Glycosyltransferases are involved in the biosynthesis of various cell wall components, and modifications to the cell wall can influence pathogen penetration and spread [ 61 ]. We also identified an F-box protein as a potential candidate, which is noteworthy as F-box proteins are known to play roles in plant defense, and multiple studies similarly found an F-box protein gene associated with fungal resistance in sorghum [ 62 – 64 ]. The diversity of these candidate genes suggests that resistance to each disease is not only polygenic but also involves a complex interplay of various defense mechanisms, potentially entailing pathogen recognition [ 65 ], signal transduction [ 66 ], cell wall modification [ 67 ], and other metabolic processes. Additional research incorporating aspects such as gene expression studies and functional characterization is needed to elucidate the precise roles of these candidate genes in conferring resistance to anthracnose, head smut, and downy mildew in sorghum. However, it is important to acknowledge a limitation regarding the generalizability of the disease resistance findings. The phenotypic evaluations used in this study were based on pathogen isolates relevant to a specific geographic region (Southern U.S.). While this is valuable for understanding resistance, the prevalence, virulence, and genetic diversity of C. sublineola (anthracnose), S. reilianum (head smut), and P. sorghi (downy mildew) can vary across different geographic regions. Still, the identified loci and genes will remain valuable candidates for further investigation. The candidate genes identified through the Bootstrap Forest models offer promising targets for marker-assisted selection (MAS) [ 68 ] and gene editing technologies such as CRISPR-Cas9 [ 69 ]. By focusing on these genes, previously known candidates, and their associated pathways, breeders can more efficiently select and pyramid resistance genes, developing varieties with durable and broad-spectrum resistance. For instance, identifying LRR-containing proteins as potential players in head smut resistance provides a clear avenue for targeted breeding, as these proteins are known to be involved in pathogen recognition. Similarly, the association of glycosyltransferases with downy mildew resistance suggests that modifying cell wall composition could be a promising strategy for enhancing resistance to this disease. However, follow-up studies need to validate the functional roles of these candidate genes through further research. Furthermore, future studies should continue to explore the genetic diversity of underrepresented germplasm collections, as they likely contain a wealth of untapped genetic variation for disease resistance and other valuable traits. Conclusion This study employed a comprehensive approach, integrating phenotypic evaluations, high-density SNP genotyping, and machine learning analyses to investigate the genetic architecture of disease resistance and geographic origin in sorghum. Our findings underscore the profound influence of geographic origin in shaping the genetic diversity and disease resistance profiles of sorghum accessions. The inclusion of a genetically distinct collection of Senegalese accessions significantly expanded the diversity under investigation, highlighting the importance of exploring underrepresented germplasm for valuable traits. We identified numerous SNPs associated with geographic origin and disease resistance through genome-wide association analysis using Bootstrap Forest models. Notably, chromosome 10 emerged as a potential hotspot for genes and regulatory elements in adaptation to different geographic regions. We identified several candidate genes located near the most important SNPs, consisting of those encoding transcription factors (BHLH and EREBP-like factors), a cysteine-rich repeat secretory protein, and a DUF6598 domain-containing protein. These findings suggest that diverse molecular mechanisms in the vein of transcriptional regulation, stress responses, and potentially novel pathways contribute to sorghum's adaptation to varying environments. For disease resistance, our analysis revealed a complex, polygenic architecture with important SNPs distributed across multiple chromosomes. We identified candidate genes associated with resistance to anthracnose, head smut, and downy mildew, including genes encoding zinc-binding proteins, LRR-containing proteins, F-box, glycosyltransferases, and others involved in various cellular processes. Interestingly, several candidate genes identified in this study were also identified in our previous traditional GWAS analysis, further strengthening the evidence for their involvement in disease resistance. The identification of these candidate genes and the characterization of genetically diverse accessions provide valuable resources for sorghum breeding programs. These resources can be optimized to develop improved varieties with enhanced resistance to multiple diseases and better adaptation to diverse environments through marker-assisted selection, gene editing, and other advanced breeding techniques. Ultimately, this research contributes to a deeper understanding of the genetic basis of disease resistance and geographic adaptation in sorghum, paving the way for the development of more resilient and productive varieties that can contribute to global food security in the face of evolving pathogen populations and changing environmental conditions. Declarations Data availability Data from this study can be provided by the corresponding authors upon request. Acknowledgements We are also grateful to the reviewers for their constructive feedback. Mention of any trade names or commercial products in this article is solely for the purpose of providing specific information and does not imply recommendation or endorsement by the U. S. Department of Agriculture. USDA is an equal opportunity provider and employer, and all agency services are available without discrimination. Funding This work is supported by the U.S. Department of Agriculture, Agricultural Research Service, In-House Projects No. 8042-21220-258-000-D and 3091-22000-040-000-D. Author information Authors and Affiliations Sustainable Perennial Crops Laboratory, Agricultural Research Service, United States, Department of Agriculture, Beltsville, MD, 20705, USA Ezekiel Ahn, Sunchung Park, Seunghyun Lim, Jae Hee Jang & Lyndel W. Meinhardt Environmental Microbial and Food Safety Laboratory, Agricultural Research Service, United States, Department of Agriculture, Beltsville, MD, 20705, USA Insuck Baek, Seok Min Hong & Moon S. Kim Insect Control and Cotton Disease Research, Agricultural Research Service, Southern Plains Agricultural Research Center, United States Department of Agriculture, College Station, TX, 77845, USA Louis K. Prom Department of Civil Urban Earth and Environmental Engineering, Ulsan National Institute of Science and Technology, UNIST-gil 50, Ulsan, 44919, Republic of Korea Seok Min Hong Department of Plant Pathology and Microbiology, Texas A&M University, College Station, TX, 77843, USA Clint Magill Contributions Ezekiel Ahn conceived and designed the project. Sunchung Park, Seunghyun Lim, Insuck Baek, Seok Min Hong, Jae Hee Jang, and Ezekiel Ahn performed computational work and analyzed data. Ezekiel Ahn wrote the manuscript. Louis K. Prom, Moon S. Kim, Lyndel W. Meinhardt, and Clint Magill provided resources and contributed to methodology development. All authors revised the manuscript. All authors read and approved the manuscript. Ethics declarations Ethics approval and consent to participate Not applicable. Consent for publication Not applicable. Competing interests The authors declare no competing interests. Clinical Trial Number Not applicable. Additional information Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. References Rather MA, Thakur R, Hoque M, Das RS, Miki KSL, Teixeira-Costa BE, et al. Sorghum (Sorghum bicolor). Nutri-Cereals Nutraceutical Techno-Funct Potential; 2023. Mwamahonje A, Mdindikasi Z, Mchau D, Mwenda E, Sanga D, Garcia-Oliveira AL, et al. Advances in Sorghum Improvement for Climate Resilience in the Global Arid and Semi-Arid Tropics: A Review. Agronomy. 2024;14:3025. Mutalik Desai S, Vaidya PS, Pardo PA. Commercial sector breeding of sorghum: Current status and future prospects. Sorghum 21st Century Food–fodder–feed–fuel Rapidly Chang World. 2020;:333–54. Hadebe S, Modi A, Mabhaudhi T. Drought tolerance and water use of cereal crops: A focus on sorghum as a food security crop in sub-Saharan Africa. J Agron Crop Sci. 2017;203:177–91. Abreha KB, Enyew M, Carlsson AS, Vetukuri RR, Feyissa T, Motlhaodi T, et al. Sorghum in dryland: morphological, physiological, and molecular responses of sorghum under drought stress. Planta. 2022;255:1–23. Ahn E, Hu Z, Perumal R, Prom LK, Odvody G, Upadhyaya HD, et al. Genome wide association analysis of sorghum mini core lines regarding anthracnose, downy mildew, and head smut. PLoS ONE. 2019;14:e0216671. Das I, Rajendrakumar P. Disease resistance in sorghum. Biotic stress resistance in millets. Elsevier; 2016. pp. 23–67. Frederiksen RA. Compendium of sorghum diseases. 1986. Acharya B, O’Quinn TN, Everman W, Mehl HL. Effectiveness of fungicides and their application timing for the management of sorghum foliar anthracnose in the mid-Atlantic United States. Plant Dis. 2019;103:2804–11. Mengistu G, Shimelis H, Laing M, Lule D. Breeding for anthracnose ('Colletotrichum sublineolum’Henn.) resistance in sorghum: Challenges and opportunities. Aust J Crop Sci. 2018;12:1911–20. Ahn E, Fall C, Botkin J, Curtin S, Prom LK, Magill C. Inoculation and screening methods for major sorghum diseases caused by fungal pathogens: Claviceps africana, Colletotrichum sublineola, Sporisorium reilianum, Peronosclerospora sorghi and Macrophomina phaseolina. Plants. 2023;12:1906. Little CR, Perumal R. The biology and control of sorghum diseases. Sorghum State Art Future Perspetives. 2019;58:297–346. Radwan GL, Perumal R, Isakeit T, Magill CW, Prom LK, Little CR. Screening exotic sorghum germplasm, hybrids, and elite lines for resistance to a new virulent pathotype (P6) of Peronosclerospora sorghi causing downy mildew. Plant Health Prog. 2011;12:17. Tesso TT, Perumal R, Little CR, Adeyanju A, Radwan GL, Prom LK et al. Sorghum pathology and biotechnology-a fungal disease perspective: Part II. Anthracnose, stalk rot, and downy mildew. Eur J Plant Sci Biotechnol. 2012;6 Special Issue 1:31–44. Venkateswaran K, Elangovan M, Sivaraj N. Origin, domestication and diffusion of Sorghum bicolor. Breeding Sorghum for diverse end uses. Elsevier; 2019. pp. 15–31. Cavender-Bares J, Ackerly DD, Hobbie SE, Townsend PA. Evolutionary legacy effects on ecosystems: biogeographic origins, plant traits, and implications for management in the era of global change. Annu Rev Ecol Evol Syst. 2016;47:433–62. Morris GP, Ramu P, Deshpande SP, Hash CT, Shah T, Upadhyaya HD, et al. Population genomic and genome-wide association studies of agroclimatic traits in sorghum. Proc Natl Acad Sci. 2013;110:453–8. Casa AM, Pressoir G, Brown PJ, Mitchell SE, Rooney WL, Tuinstra MR, et al. Community resources and strategies for association mapping in sorghum. Crop Sci. 2008;48:30–40. Linnen CR, Poh Y-P, Peterson BK, Barrett RD, Larson JG, Jensen JD, et al. Adaptive evolution of multiple traits through multiple mutations at a single gene. Science. 2013;339:1312–6. Lasky JR, Upadhyaya HD, Ramu P, Deshpande S, Hash CT, Bonnette J, et al. Genome-environment associations in sorghum landraces predict adaptive traits. Sci Adv. 2015;1:e1400218. Vavilov NI, Dorofeev VF. Origin and geography of cultivated plants. Cambridge University Press; 1992. Harlan JR. Crops and Man. American Society of Agronomy. Crop Sci Soc Am Madison Wis. 1992;16:63–262. Gepts P. Crop domestication as a long-term selection experiment. Plant Breed Rev. 2004;24:1–44. Bellon MR. The dynamics of crop infraspecific diversity: A conceptual framework at the farmer level. Econ Bot. 1996;:26–39. Frankel O. The conservation of plant biodiversity. Cambridge University Press; 1995. Tanksley SD, McCouch SR. Seed banks and molecular maps: unlocking genetic potential from the wild. Science. 1997;277:1063–6. Hoisington D, Khairallah M, Reeves T, Ribaut J-M, Skovmand B, Taba S, et al. Plant genetic resources: what can they contribute toward increased crop productivity? Proc Natl Acad Sci. 1999;96:5937–43. Jones JD, Dangl JL. The plant immune system. Nature. 2006;444:323–9. McDonald BA, Linde C. Pathogen population genetics, evolutionary potential, and durable resistance. Annu Rev Phytopathol. 2002;40:349–79. Upadhyaya H, Pundir R, Dwivedi S, Gowda C, Reddy VG, Singh S. Developing a mini core collection of sorghum for diversified utilization of germplasm. Crop Sci. 2009;49:1769–80. Sharma R, Upadhyaya H, Manjunatha S, Rao V, Thakur R. Resistance to foliar diseases in a mini-core collection of sorghum germplasm. Plant Dis. 2012;96:1629–33. Breiman L. Random forests. Mach Learn. 2001;45:5–32. Boulesteix A, Janitza S, Kruppa J, König IR. Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics. Wiley Interdiscip Rev Data Min Knowl Discov. 2012;2:493–507. Upadhyaya HD, Wang Y-H, Gowda C, Sharma S. Association mapping of maturity and plant height using SNP markers with the sorghum mini core collection. Theor Appl Genet. 2013;126:2003–15. Wang Y-H, Upadhyaya HD, Burrell AM, Sahraeian SME, Klein RR, Klein PE. Genetic structure and linkage disequilibrium in a diverse, representative collection of the C4 model plant, Sorghum bicolor. Genes Genomes Genet. 2013;G3:3:783–93. Elshire RJ, Glaubitz JC, Sun Q, Poland JA, Kawamoto K, Buckler ES, et al. A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PLoS ONE. 2011;6:e19379. Hu Z, Olatoye MO, Marla S, Morris GP. An integrated genotyping-by‐sequencing polymorphism map for over 10,000 sorghum genotypes. Plant Genome. 2019;12:180044. Browning BL, Browning SR. Genotype imputation with millions of reference samples. Am J Hum Genet. 2016;98:116–26. Klimberg R. Fundamentals of predictive analytics with JMP. Sas institute; 2023. Bradbury PJ, Zhang Z, Kroon DE, Casstevens TM, Ramdoss Y, Buckler ES. TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics. 2007;23:2633–5. Goodstein DM, Shu S, Howson R, Neupane R, Hayes RD, Fazo J, et al. Phytozome: a comparative platform for green plant genomics. Nucleic Acids Res. 2012;40:D1178–86. Thrall PH, Burdon J, Bever JD. Local adaptation in the Linum marginale—Melampsora lini host-pathogen interaction. Evolution. 2002;56:1340–51. Bureau A, Dupuis J, Falls K, Lunetta KL, Hayward B, Keith TP, et al. Identifying SNPs predictive of phenotype using random forests. Genet Epidemiol Off Publ Int Genet Epidemiol Soc. 2005;28:171–82. Toledo-Ortiz G, Huq E, Quail PH. The Arabidopsis basic/helix-loop-helix transcription factor family. Plant Cell. 2003;15:1749–70. Chinnusamy V, Schumaker K, Zhu J. Molecular genetic perspectives on cross-talk and specificity in abiotic stress signalling in plants. J Exp Bot. 2004;55:225–36. Manavella PA, Arce AL, Dezar CA, Bitton F, Renou J, Crespi M, et al. Cross-talk between ethylene and drought signalling pathways is mediated by the sunflower Hahb‐4 transcription factor. Plant J. 2006;48:125–37. Zhang Y, Tian H, Chen D, Zhang H, Sun M, Chen S, et al. Cysteine-rich receptor-like protein kinases: emerging regulators of plant stress responses. Trends Plant Sci. 2023;28:776–94. Wang Y, Teng Z, Li H, Wang W, Xu F, Sun K et al. An activated form of NB-ARC protein RLS1 functions with cysteine-rich receptor-like protein RMC to trigger cell death in rice. Plant Commun. 2023;4. Chen Z. A superfamily of proteins with novel cysteine-rich repeats. Plant Physiol. 2001;126:473–6. Kaur R, Vasistha NK, Ravat VK, Mishra VK, Sharma S, Joshi AK, et al. Genome-Wide Association Study Reveals Novel Powdery Mildew Resistance Loci in Bread Wheat. Plants. 2023;12:3864. Maghraby A, Alzalaty M. Genome-wide identification and evolutionary analysis of the AP2/EREBP, COX and LTP genes in Zea mays L. under drought stress. Sci Rep. 2024;14:7610. Cao Y, Wu Y, Zheng Z, Song F. Overexpression of the rice EREBP-like gene OsBIERF3 enhances disease resistance and salt tolerance in transgenic tobacco. Physiol Mol Plant Pathol. 2005;67:202–11. Lim S, Park S, Baek I, Botkin J, Jang JH, Hong SM, et al. Integrative analysis of seed morphology, geographic origin, and genetic structure in Medicago with implications for breeding and conservation. BMC Plant Biol. 2025;25:274. Nishimura S, Tatano S, Miyamoto Y, Ohtani K, Fukumoto T, Gomi K, et al. A zinc-binding citrus protein metallothionein can act as a plant defense factor by controlling host-selective ACR-toxin production. Plant Mol Biol. 2013;81:1–11. Martos S, Gallego B, Cabot C, Llugany M, Barceló J, Poschenrieder C. Zinc triggers signaling mechanisms and defense responses promoting resistance to Alternaria brassicicola in Arabidopsis thaliana. Plant Sci. 2016;249:13–24. Wang X, Liu K, Li Y, Ren Y, Li Q, Wang B. Zinc metalloprotease FgM35, which targets the wheat zinc-binding protein TaZnBP, contributes to the virulence of Fusarium graminearum. Stress Biol. 2024;4:1–17. Jones DA, Jones JD. The role of leucine-rich repeat proteins in plant defences. Advances in botanical research. Elsevier; 1997. pp. 89–167. Padmanabhan M, Cournoyer P, Dinesh-Kumar S. The leucine‐rich repeat domain in plant innate immunity: a wealth of possibilities. Cell Microbiol. 2009;11:191–8. Afzal M, Alghamdi SS, Nawaz H, Migdadi HH, Altaf M, El-Harty E, et al. Genome-wide identification and expression analysis of CC-NB-ARC-LRR (NB-ARC) disease-resistant family members from soybean (Glycine max L.) reveal their response to biotic stress. J King Saud Univ-Sci. 2022;34:101758. Chakraborty J, Priya P, Dastidar SG, Das S. Physical interaction between nuclear accumulated CC-NB-ARC-LRR protein and WRKY64 promotes EDS1 dependent Fusarium wilt resistance in chickpea. Plant Sci. 2018;276:111–33. Gibeaut DM. Nucleotide sugars and glycosyltransferases for synthesis of cell wall matrix polysaccharides. Plant Physiol Biochem. 2000;38:69–80. Ahn E, Prom LK, Hu Z, Odvody G, Magill C. Genome-wide association analysis for response of Senegalese sorghum accessions to Texas isolates of anthracnose. Plant Genome. 2021;14:e20097. Ahn E, Fall C, Prom LK, Magill C. Genome-wide association study of Senegalese sorghum seedlings responding to a Texas isolate of Colletotrichum sublineola. Sci Rep. 2022;12:13025. Birhanu C, Girma G, Mekbib F, Nida H, Tirfessa A, Lule D, et al. Exploring the genetic basis of anthracnose resistance in Ethiopian sorghum through a genome-wide association study. BMC Genomics. 2024;25:677. Gómez-Gómez L. Plant perception systems for pathogen recognition and defence. Mol Immunol. 2004;41:1055–62. Blumwald E, Aharon GS, Lam BC. Early signal transduction pathways in plant–pathogen interactions. Trends Plant Sci. 1998;3:342–6. Miedes E, Vanholme R, Boerjan W, Molina A. The role of the secondary cell wall in plant resistance to pathogens. Front Plant Sci. 2014;5:358. Ribaut J-M, Hoisington D. Marker-assisted selection: new tools and strategies. Trends Plant Sci. 1998;3:236–9. Yin K, Gao C, Qiu J-L. Progress and prospects in plant genome editing. Nat Plants. 2017;3:1–6. Additional Declarations No competing interests reported. Cite Share Download PDF Status: Published Journal Publication published 28 Feb, 2026 Read the published version in BMC Plant Biology → Version 1 posted Editorial decision: Revision requested 02 Jul, 2025 Reviews received at journal 15 Apr, 2025 Reviews received at journal 14 Apr, 2025 Reviews received at journal 12 Apr, 2025 Reviews received at journal 07 Apr, 2025 Reviewers agreed at journal 07 Apr, 2025 Reviewers agreed at journal 06 Apr, 2025 Reviewers agreed at journal 06 Apr, 2025 Reviewers agreed at journal 05 Apr, 2025 Reviewers agreed at journal 04 Apr, 2025 Reviewers agreed at journal 04 Apr, 2025 Reviewers agreed at journal 30 Mar, 2025 Reviewers invited by journal 29 Mar, 2025 Editor invited by journal 20 Mar, 2025 Editor assigned by journal 16 Mar, 2025 Submission checks completed at journal 16 Mar, 2025 First submitted to journal 13 Mar, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-6222361","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":443549919,"identity":"890059c4-df52-4bb7-b6d3-6b7486aac389","order_by":0,"name":"Ezekiel Ahn","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA7UlEQVRIie3NsYrCQBCA4ZHAVhu2nTT6ChsWRPDEV9lFME26awSv2CqV4gPIVb5IZCFVsA5oERCsLA6uuSKIEfHAZm/LA/evZmA+BsDn+58FIGEAgNCpb4srwRsJuDuBOyHoRBgzhtczBLZeFnPaDHvANgXaSKSLiZQlAh52032YJbHG09RK+HYhcpW1N1Xa34fadDSWgtvI2LDvXF0Qei15p40Z/0k4oYFUGoG3JKDEKM0WcW0jWBLBZYE0rlIRfWbJJEOirIStzDH6+Xjrdqs0/jo3w9GKmTy3kUf0dyIoXcDzY6cfPp/P90JdAWxKRAaSx+clAAAAAElFTkSuQmCC","orcid":"","institution":"Beltsville Agricultural Research Center","correspondingAuthor":true,"prefix":"","firstName":"Ezekiel","middleName":"","lastName":"Ahn","suffix":""},{"id":443549920,"identity":"0c8f9bf2-f820-4f58-9bba-5a89cc480ec2","order_by":1,"name":"Insuck Baek","email":"","orcid":"","institution":"Beltsville Agricultural Research Center","correspondingAuthor":false,"prefix":"","firstName":"Insuck","middleName":"","lastName":"Baek","suffix":""},{"id":443549921,"identity":"a6993b35-5aa2-42b2-b01f-d6cc99a0a991","order_by":2,"name":"Sunchung Park","email":"","orcid":"","institution":"Beltsville Agricultural Research Center","correspondingAuthor":false,"prefix":"","firstName":"Sunchung","middleName":"","lastName":"Park","suffix":""},{"id":443549922,"identity":"d1a96878-e139-445e-aad8-60cdb0e81928","order_by":3,"name":"Louis K. Prom","email":"","orcid":"","institution":"Southern Plains Agricultural Research Center","correspondingAuthor":false,"prefix":"","firstName":"Louis","middleName":"K.","lastName":"Prom","suffix":""},{"id":443549923,"identity":"a297ea24-7bc6-4faf-9112-c83106de66dc","order_by":4,"name":"Seunghyun Lim","email":"","orcid":"","institution":"Beltsville Agricultural Research Center","correspondingAuthor":false,"prefix":"","firstName":"Seunghyun","middleName":"","lastName":"Lim","suffix":""},{"id":443549924,"identity":"dacb4a9d-9400-48c4-956e-52ed2a930f9e","order_by":5,"name":"Jae Hee Jang","email":"","orcid":"","institution":"Beltsville Agricultural Research Center","correspondingAuthor":false,"prefix":"","firstName":"Jae","middleName":"Hee","lastName":"Jang","suffix":""},{"id":443549925,"identity":"37b5cea4-cf10-43dd-a656-a44d88601ea4","order_by":6,"name":"Seok Min Hong","email":"","orcid":"","institution":"Ulsan National Institute of Science and Technology, UNIST-gil 50","correspondingAuthor":false,"prefix":"","firstName":"Seok","middleName":"Min","lastName":"Hong","suffix":""},{"id":443549928,"identity":"7b90ec30-31ef-42ac-b587-1ec41bcb2dc7","order_by":7,"name":"Moon S. Kim","email":"","orcid":"","institution":"Beltsville Agricultural Research Center","correspondingAuthor":false,"prefix":"","firstName":"Moon","middleName":"S.","lastName":"Kim","suffix":""},{"id":443549930,"identity":"4e18533c-a56d-4566-8021-3fa5931d51af","order_by":8,"name":"Lyndel W. Meinhardt","email":"","orcid":"","institution":"Beltsville Agricultural Research Center","correspondingAuthor":false,"prefix":"","firstName":"Lyndel","middleName":"W.","lastName":"Meinhardt","suffix":""},{"id":443549932,"identity":"adfd8a69-2131-45fc-919f-0c28b3d96288","order_by":9,"name":"Clint Magill","email":"","orcid":"","institution":"Texas A\u0026M University","correspondingAuthor":false,"prefix":"","firstName":"Clint","middleName":"","lastName":"Magill","suffix":""}],"badges":[],"createdAt":"2025-03-13 19:23:13","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-6222361/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-6222361/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1186/s12870-026-08468-z","type":"published","date":"2026-02-28T15:57:51+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":80887292,"identity":"02189448-2bb6-44c7-9232-6ccba579d824","added_by":"auto","created_at":"2025-04-18 09:08:55","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":90421,"visible":true,"origin":"","legend":"\u003cp\u003eDistribution of resistance and susceptibility to three diseases in a subset of the sorghum mini core collection. Distribution of resistance (R) and susceptibility (S) phenotypes for anthracnose, head smut, and downy mildew in a subset of the sorghum mini-core collection. The 135 Senegalese lines were not phenotyped and were excluded from the graph. Not all accessions were evaluated for all three diseases. Sample sizes for each disease are: anthracnose (n = 216), head smut (n = 204), and downy mildew (n = 213). Numbers above the bars indicate the number of accessions in each category (R or S).\u003c/p\u003e","description":"","filename":"floatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-6222361/v1/08333c6e8339f5ce4c197267.png"},{"id":80887309,"identity":"89abf5ed-771c-4cd8-80b1-9ac0d14be13c","added_by":"auto","created_at":"2025-04-18 09:09:01","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":164970,"visible":true,"origin":"","legend":"\u003cp\u003eGlobal patterns of sorghum disease resistance and genetic diversity in the mini core collection. (a) \u003cem\u003et\u003c/em\u003e-SNE visualization of the phenotypic data (disease responses: resistant/susceptible), revealing four distinct clusters and potentially a fifth, more diffuse central group. (b) Loading plot showing the contribution of each disease resistance phenotype (resistance and susceptibility for anthracnose, head smut, and downy mildew) to the first two principal components (PC1 and PC2). Vector length and direction indicate the magnitude and direction of each variable's influence. PC1 and PC2 explain 36% and 32.3% of the total phenotypic variance. (c) Genetic distance among sorghum mini core accessions based on SNP data, displayed by disease resistance phenotype, country of origin, and geographic region (standard error means are shown).\u003c/p\u003e","description":"","filename":"floatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-6222361/v1/d1488c06a9f304011b3c32f0.png"},{"id":80887308,"identity":"dd0b80c6-d6a3-4bf6-9798-42e2901b28f0","added_by":"auto","created_at":"2025-04-18 09:09:01","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":271865,"visible":true,"origin":"","legend":"\u003cp\u003eGenetic relationships among sorghum mini core and Senegalese accessions revealed by hierarchical clustering and constellation plot based on SNP markers. (a) Hierarchical cluster dendrogram of 377 sorghum accessions, including the mini core collection and additional Senegalese lines, genotyped at SNP loci. Gray blocks highlight clusters predominantly composed of Senegalese lines. (b) Constellation plot visualizing the relationships among the 17 SNP-based clusters.\u003c/p\u003e","description":"","filename":"floatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-6222361/v1/ff00fed0f40b066e9327f7c7.png"},{"id":80887290,"identity":"b764d648-1508-4921-ad02-c1c2bfe7336c","added_by":"auto","created_at":"2025-04-18 09:08:55","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":289931,"visible":true,"origin":"","legend":"\u003cp\u003eGWAS of geographic origin in sorghum using Bootstrap Forest models. (a) Importance scores (portion) for SNPs associated with country of origin. (b) Importance scores for SNPs related to broader geographic regions of origin. Importance scores, shown on the y-axis, quantify the contribution of each SNP to the model's ability to correctly classify accessions based on their geographic origin; a higher score indicates a more significant contribution. The x-axis represents the physical position of each SNP along the ten sorghum chromosomes. The SNPs with the highest importance scores are labeled. Because Bootstrap Forest models do not rely on statistical significance testing like traditional GWAS methods, no \u003cem\u003ep\u003c/em\u003e-value-based significance threshold is shown.\u003c/p\u003e","description":"","filename":"floatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-6222361/v1/8e68ba30655c02ad1e9c1ac6.png"},{"id":80887302,"identity":"d64b7a28-63eb-4527-9a89-b9867f8055a3","added_by":"auto","created_at":"2025-04-18 09:08:59","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":406407,"visible":true,"origin":"","legend":"\u003cp\u003eGenome-wide association results for resistance to three sorghum diseases based on Bootstrap Forest models. \u003cstrong\u003e(a)\u003c/strong\u003e Anthracnose resistance. \u003cstrong\u003e(b)\u003c/strong\u003e Head smut resistance. \u003cstrong\u003e(c)\u003c/strong\u003e Downy mildew resistance. Each panel shows a Manhattan plot in which each point represents a single SNP. The x-axis indicates the chromosomal position of the SNP, and the y-axis represents the portion derived from the Bootstrap Forest model. This score reflects the importance of the SNP in predicting disease resistance. Higher scores are indicative of stronger associations. The most important SNPs are labeled with their identifiers.\u003c/p\u003e","description":"","filename":"floatimage5.png","url":"https://assets-eu.researchsquare.com/files/rs-6222361/v1/994e00f5aba38734e559b2fe.png"},{"id":103766373,"identity":"6b206cb1-60fa-4c3a-aa0f-0ff0c2ce49f3","added_by":"auto","created_at":"2026-03-02 16:14:14","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":2050572,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6222361/v1/710e5fde-26c8-4ef0-b8b1-d22365294767.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"A Machine Learning Approach to Genome-Wide Association Mapping of Disease Resistance and Geographic Origin in Sorghum","fulltext":[{"header":"Introduction","content":"\u003cp\u003eSorghum (\u003cem\u003eSorghum bicolor\u003c/em\u003e L. Moench) is a staple cereal crop for over 500\u0026nbsp;million people worldwide, particularly in the semi-arid tropics of Africa and Asia, where it serves as a critical source of food, feed, and biofuel [\u003cspan additionalcitationids=\"CR2\" citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e]. Sorghum's high tolerance to drought and heat makes an excellent candidate crop for ensuring food security in the future, particularly in regions susceptible to climate change [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e]. However, its production is significantly constrained by both biotic and abiotic stresses [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e]. Significant yield losses in sorghum can result from fungal diseases such as anthracnose, head smut, and downy mildew, posing a severe pressure on farmers' lives and the food security of dependent populations [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e]. Disease-resistant cultivars are often considered the best method for managing sorghum diseases [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e]. Hence, sorghum breeding aiming to generate varieties with improved disease resistance and better adaptation to various environments is critically important.\u003c/p\u003e \u003cp\u003eAnthracnose, caused by the fungal pathogen \u003cem\u003eColletotrichum sublineola\u003c/em\u003e, is a widespread and destructive sorghum disease, affecting all aerial parts of the plant and causing significant reductions in grain yield and quality [\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e]. Yield losses exceeding 50% have been reported under severe anthracnose epidemics [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e, \u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e]. Head smut, caused by the biotrophic fungus \u003cem\u003eSporisorium reilianum\u003c/em\u003e (K\u0026uuml;hn) Langdon \u0026amp; Fullerton, is another major constraint to sorghum production, particularly in humid and warmer growing regions [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e]. Infection by \u003cem\u003eS. reilianum\u003c/em\u003e leads to replacing the sorghum panicle with a large, black sorus filled with fungal spores, resulting in the complete grain yield loss in infected plants [\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e]. Downy mildew, brought on by the oomycete \u003cem\u003ePeronosclerospora sorghi\u003c/em\u003e, can also cause substantial losses, particularly in susceptible varieties grown under humid conditions [\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e]. \u003cem\u003eP. sorghi\u003c/em\u003e infection manifests as localized leaf lesions and systemic infection, leading to stunting, reduced tillering, and the development of characteristic \"downy\" abaxial growth on infected leaves [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e]. Moreover, systemically infected seedlings turn pale yellow or have light-color streaking on the leaf, are chlorotic and stunted, and prematurely die [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e]. While some sources of resistance to these diseases have been identified and deployed in sorghum breeding programs, new pathogen races and the complex, polygenic nature of resistance pose ongoing challenges. Further investigation is needed into the genetic architecture of resistance to these three diseases to develop sorghum varieties with durable and broad-spectrum resistance.\u003c/p\u003e \u003cp\u003eSorghum is an ancient crop of Africa that spread across diverse environments, especially in the semiarid tropics of Africa and South Asia [\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e]. The geographic origin of plants plays a pivotal role in shaping this diversity [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e], as geographically distant populations have adapted under different selective pressures, such as climate, soil types, and microbe populations. As an outcome, locally adapted landraces with unique trait combinations have evolved. Studies in sorghum and other crops have demonstrated that geographic patterns of genetic variation often mirror patterns of environmental variation, suggesting a strong link between adaptation and geographic origin [\u003cspan additionalcitationids=\"CR18\" citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e]. For instance, accessions in arid regions may possess enhanced drought tolerance, while accessions from regions with high disease pressure have evolved unique and multiple resistance genes [\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e, \u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e]. Furthermore, humans' historical movement and exchange of plant germplasm have further contributed to the complex patterns of genetic diversity observed in sorghum [\u003cspan additionalcitationids=\"CR23\" citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e]. A deep understanding of how geographic origin has shaped sorghum's genetic diversity is essential for uncovering its evolutionary history and crop improvement. By analyzing the genomes of sorghum accessions from different regions, we can identify genes and pathways that have been selected under diverse environmental pressures. This knowledge, in turn, can be directly applied to breeding more resilient and robust sorghum varieties.\u003c/p\u003e \u003cp\u003eThe availability of diverse germplasm is also fundamental to understanding the genetic basis of adaptive traits and making significant progress in crop improvement [\u003cspan additionalcitationids=\"CR26\" citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e]. Broadening the genetic base of breeding programs by including landraces and geographically diverse accessions can introduce novel alleles and allelic combinations currently not integrated into elite cultivars. This is particularly important for traits such as disease resistance, where pathogens continuously evolve and overcome existing resistance mechanisms [\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e, \u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eIn 2001, the International Crops Research Institute for the Semi-Arid Tropics (ICRISAT) gene bank developed a core collection of 2,247 sorghum accessions from its larger germplasm collection of over 37,000 accessions [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e, \u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e]. However, this core collection was too large for many replicated evaluation studies. Hence, a sorghum mini core collection consisting of 10% was developed to represent a global snapshot of sorghum diversity [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e, \u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e]. This mini core collection has been widely used to study the genetic architecture of various traits such as disease resistance [\u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eBased on this mini core collection, our previous study used a GWAS of resistance to anthracnose, head smut, and downy mildew [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e]. The current study builds upon this foundation with two primary objectives. First, we characterized the genetic relationships among an expanded panel of 377 sorghum accessions, including the mini core collection and accessions from Senegal, to understand the population structure and identify potential sources of unique genetic variation. Second, we sought to identify genomic regions associated with both geographic origin and disease resistance using the machine learning model Bootstrap Forest (a variance of Random Forest) [\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e] for GWAS. The machine learning algorithm allows us to capture complex relationships between genetic markers and traits, providing a more nuanced understanding of the genetic architecture underlying these complex traits [\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e]. We hypothesized that this integrated approach, combining diverse germplasm, high-density SNP genotyping, and machine learning, would enable us to uncover novel genetic diversity, identify genomic regions underlying adaptation to different geographic origins, and pinpoint candidate genes associated with disease resistance. By leveraging the power of machine learning in GWAS, the results of this study are expected to provide valuable insights for developing improved sorghum varieties with enhanced resilience and productivity.\u003c/p\u003e"},{"header":"Materials and methods","content":"\u003cp\u003e\u003cstrong\u003ePhenotypic and genotypic data\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis study utilizes phenotypic and genotypic data from a previous study [6] that included phenotypic evaluations of disease resistance for a subset of the sorghum mini core collection, as well as genotypic data (SNP markers) for 377 accessions (242 mini core accessions and an additional 135 accessions from Senegal). The 135 Senegalese accessions were not phenotyped for disease resistance and are included in the present study solely for analyses of genetic diversity and geographic origin. For the mini core accessions, resistance to three major sorghum diseases, anthracnose, head smut, and downy mildew, was assessed as follows. However, not all 242 accessions were evaluated for all three diseases: Anthracnose resistance was evaluated for 216 accessions using a spray inoculation method with a mixture of five \u003cem\u003eC. sublineola\u003c/em\u003e isolates as described by Ahn et al. [6]. Disease severity was rated on a scale of 1 to 5, where 1 indicated no symptoms and 5 indicated severe infection, with accessions classified as resistant or susceptible based on the presence of acervuli (fungal fruiting bodies) [6]. Downy mildew resistance was evaluated for 213 accessions using the sandwich inoculation technique with \u003cem\u003eP. sorghi\u003c/em\u003e pathotype 6, and accessions with 10% or less disease incidence were considered resistant [6]. Head smut resistance was assessed for 204 accessions using a syringe inoculation method with \u003cem\u003eS. reilianum\u003c/em\u003e sporidia, and accessions were scored as resistant if no infection was detected [6]. Genotypic data for these accessions comprised 297,876 SNP markers and was obtained from this previous study [6]. In short, the SNP data were based on the sorghum reference genome version 3.1.1, initially generated using genotyping-by-sequencing (GBS) [34\u0026ndash;37] and subsequently imputed using Beagle 4.1 [38] to address missing data.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eMultivariate analysis of phenotypic and genotypic data\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003ePrincipal Component Analysis (PCA) and \u003cem\u003et\u003c/em\u003e-distributed Stochastic Neighbor Embedding (\u003cem\u003et\u003c/em\u003e-SNE) were performed to assess overall phenotypic variation and relationships among sorghum accessions based on disease resistance. PCA, conducted using JMP Pro 17 software (SAS Institute Inc., Cary, NC) [39], was used to determine the contribution of each disease resistance phenotype to this overall variation. \u003cem\u003et\u003c/em\u003e-SNE, a non-linear dimensionality reduction technique, was also employed using JMP Pro 17 with default parameters (output dimensions = 2, perplexity = 30, maximum iterations = 1,000, initial principal component dimensions = 50, convergence criterion = 1e\u0026minus;8, initial scale = 0.0001, Eta (learning rate) = 200, inflate iterations = 250, and random seed = 123) to further explore the underlying structure of the phenotypic variation and identify potential clusters of accessions with similar disease resistance profiles.\u003c/p\u003e\n\u003cp\u003eGenetic distances among accessions were calculated based on the 297,876 SNP markers using the Identity by State (IBS) method implemented in TASSEL 5 software [40]. Two-tailed \u003cem\u003et\u003c/em\u003e-tests were performed to assess differences in genetic distances between resistant and susceptible accessions for each disease. A one-way analysis of variance (ANOVA) was used to evaluate differences in genetic distances among groups based on their country and geographic region of origin.\u003c/p\u003e\n\u003cp\u003eWe also performed a hierarchical clustering analysis using the Ward method based on the SNP markers to investigate the genetic relationships among the sorghum accessions with the default setting using JMP Pro 17.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eGenome-wide association analysis of geographic origin and disease resistance using Bootstrap Forest\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe performed a GWAS using a machine learning approach based on Bootstrap Forest model to identify genomic regions associated with geographic origin and disease resistance. These models used the same 297,876 SNP markers and the phenotypic data, specifically the binary resistant/susceptible classification, for each geological location/disease as the target variable. Unlike traditional GWAS methods that rely on linear or logistic regression and generate \u003cem\u003ep\u003c/em\u003e-values based on statistical significance under a specific model, Bootstrap Forest model does not directly produce \u003cem\u003ep\u003c/em\u003e-values. Instead, model performance is assessed through measures like accuracy, and the contribution of individual SNPs is quantified using importance scores. \u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eGeographic origin: \u003c/strong\u003eBootstrap Forest models were trained to predict the geographic origin of the sorghum accessions, using either country of origin or broader geographic region as the target variable. The models were trained on the SNP markers, with the dataset randomly split (Random seed= 1 in JMP Pro 17) into 80% for training and 20% for validation. The data split was conducted using the default settings in JMP Pro 17, with the following parameters for the Bootstrap Forest models: number of trees in the forest = 100, number of terms sampled per split = 1, bootstrap sample rate = 1, minimum splits per tree = 10, maximum splits per tree = 2000, minimum size split = 5. We first assessed the prediction accuracy on the training and validation sets to ensure robust model performance. After that, we trained another model using the same settings but with 100% of the data for training, which allowed us to use the full dataset for identifying important SNPs while ensuring that the model had been adequately validated. The top SNPs associated with the country and region of origin were identified based on their importance scores (portion= the total contribution of an SNP to the model\u0026apos;s accuracy). To explore potential candidate genes involved in geographic adaptation, the nearest genes to these top SNPs were identified using the \u003cem\u003eSorghum bicolor\u003c/em\u003e reference genome v3.1.1 from Phytozome 12 (https://phytozome-next.jgi.doe.gov/) [41].\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eDisease resistance: \u003c/strong\u003eThe genetic basis of disease resistance was investigated using the same Bootstrap Forest modeling approach described for geographic origin. Separate models were trained to predict resistance to each disease: anthracnose, head smut, and downy mildew. These models used the same SNP markers and the phenotypic data (resistant/susceptible) for each disease as the target variable. Model parameters, data splitting procedures, and variable importance assessments were identical to those used in the geographic origin analyses. The top SNPs associated with resistance to each disease were identified, and the nearest genes to these SNPs were identified using the \u003cem\u003eSorghum bicolor\u003c/em\u003e reference genome v3.1.1 from Phytozome 12 to explore potential genetic mechanisms underlying disease resistance [41]. A random seed of 1 was used for all machine learning model training to ensure reproducibility.\u003c/p\u003e"},{"header":"Results","content":"\u003cp\u003e\u003cstrong\u003ePhenotypic and genetic diversity of disease resistance in the sorghum mini core collection\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe examined the distribution of resistance and susceptibility phenotypes for each of the three diseases within the subset of the mini core collection that was phenotyped (Fig. 1). \u0026nbsp;For anthracnose, 216 accessions were evaluated, with 105 classified as resistant and 111 as susceptible. \u0026nbsp;Head smut resistance was assessed in 204 accessions, revealing 92 resistant and 112 susceptible lines. Downy mildew showed a different pattern, with 213 accessions evaluated and a higher proportion of susceptible (163) than resistant (50) accessions.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eTo explore the phenotypic variation in disease resistance, a \u003cem\u003et\u003c/em\u003e-SNE analysis was performed (Fig. 2a). The \u003cem\u003et\u003c/em\u003e-SNE visualization revealed four distinct clusters and a fifth, more diffuse central group. While these clusters show some degree of structure, they do not correspond directly to geographic origin, indicating that phenotypic similarity in disease resistance is not solely driven by geographic proximity. This pattern suggests the existence of four primary resistance profiles within the mini core collection, along with a group of accessions with intermediate or mixed resistance phenotypes.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThe contributions of each disease resistance phenotype to overall phenotypic variation were further examined using a loading plot derived from a PCA (Fig. 2b). The first two principal components, PC1 and PC2, explained 36% and 32.3% of the total phenotypic variation, respectively. The loading plot illustrated that resistance and susceptibility responses for each disease were negatively correlated. While all three diseases contributed significantly to the observed variation, downy mildew resistance strongly influenced PC2, whereas anthracnose resistance had the most substantial influence on PC1. This suggests that these two diseases have been major drivers of phenotypic divergence in the mini core collection.\u003c/p\u003e\n\u003cp\u003eGenetic diversity among the mini core accessions was assessed using SNP data. Genetic distances were calculated based on the IBS method (Fig. 2c). When grouped by disease resistance phenotype, susceptible accessions exhibited slightly greater genetic distances compared to resistant accessions for anthracnose and head smut (two-tailed \u003cem\u003et\u003c/em\u003e-test: \u003cem\u003ep\u003c/em\u003e = 0.0011 for anthracnose, \u003cem\u003ep\u003c/em\u003e = 0.0196 for head smut). However, no significant difference was observed for downy mildew (\u003cem\u003ep\u003c/em\u003e = 0.541). This suggests that anthracnose and head smut resistance may be more genetically conserved within the mini core collection. In contrast, downy mildew resistance might be more genetically diverse or influenced by fewer genes with more significant effects.\u003c/p\u003e\n\u003cp\u003eAnalysis of genetic distances by country of origin revealed significant variation (one-way ANOVA: \u003cem\u003ep\u003c/em\u003e \u0026lt; 0.0001). Notably, accessions from Nicaragua (0.13), Lesotho (0.138), and Thailand (0.14) exhibited the lowest genetic distances, while those from China (0.197), Algeria (0.198), Mali (0.206), Gambia (0.275), and Sierra Leone (0.284) exhibited the highest. These significant genetic distances are related to geographic region (ANOVA \u003cem\u003ep\u003c/em\u003e \u0026lt; 0.0001), with accessions from South America (0.15), Southern Africa (0.151), and Central America \u0026amp; Caribbean (0.153) showing the lowest genetic distances, and those from the Middle East (0.191), West Africa (0.193), and East Asia (0.196) displaying the highest, hinting that geographic origin plays a role in shaping the genetic diversity and, consequently, the disease resistance response of the mini core collection. Concurrently, accessions with similar resistance from geographically distant regions also hint at the possibility of disease-driven convergent evolution or the historical exchange of germplasm contributing to the observed patterns.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eGenetic relationships among mini core and Senegalese accessions based on SNP markers\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTo investigate the genetic relationships among the mini core and additional Senegal accessions, we performed hierarchical clustering on a combined dataset of 377 accessions (including controls such as SC748-5) using all 297,876 SNP markers (Fig. 3a). Application of the Ward method, which minimizes within-cluster variance, resulted in the identification of 17 distinct clusters based on the similarity of their SNP genotype profiles. Notably, the Senegalese accessions grouped into approximately six distinct clusters, as indicated by the gray blocks on the dendrogram, highlighting their genetic distinctiveness relative to the mini core collection. Likewise, the relationships between these 17 clusters are also presented in a constellation plot (Fig. 3b) based on the genetic relationships.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eGenome-wide association analysis of geographic origin using Bootstrap Forest models\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTo further investigate the genetic basis of geographic origin within the sorghum mini-core collection, we performed a GWAS employing a machine learning approach, specifically Bootstrap Forest models. This analysis aimed to identify specific genomic regions (SNPs) significantly associated with the geographic origin of the accessions, categorized by either country or broader region.\u003c/p\u003e\n\u003cp\u003eInitial model validation was performed using an 80/20 training/validation split (Table 1). The training set accuracies for the geographic origin models were high (0.9136 for Country and 0.9383 for Region). However, the validation set accuracies were considerably lower: 0.5 for Country and 0.6481 for Region. Critically, the Generalized R-square for the Country model was negative (-0.534), indicating low model fit on the validation set. While the Region model showed a positive Generalized R-square (0.7866), the low validation accuracy for both geographic origin models suggests that predicting geographic origin, especially at the country level, is challenging with this dataset, likely due to factors such as small sample sizes, the complex history of sorghum dispersal, and potential misclassification of origin in the original germplasm collection. Training set accuracies for the disease resistance models were also high (anthracnose: 0.9753; head smut: 0.9675; downy mildew: 0.9814). However, validation set performance varied considerably across traits. The anthracnose resistance model showed a relatively low but potentially usable validation accuracy of 0.6296 and a low Generalized R-square (0.0548). Validation performance for head smut and downy mildew resistance models was low, with negative Generalized R-square values (-0.06 and -0.063, respectively) and low validation accuracies (0.52 and 0.846, respectively), indicating that these models are not reliable for prediction. The Country-level geographic origin model similarly showed insufficient validation, with a validation accuracy of only 0.5 and a negative Generalized R-square (-0.534). The Region-level geographic origin model performed better, with a validation accuracy of 0.6481 and a positive Generalized R-square (0.7866), but it still fell short of our initial target.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eTable 1.\u003c/strong\u003e Model performance metrics for Bootstrap Forest models predicting geographic origin and disease resistance. This table presents the results of the initial 80/20 split validation for the Bootstrap Forest models. \u0026nbsp;Metrics are shown for each trait\u0026apos;s training and validation sets: Country of origin, geographic region, anthracnose resistance, head smut resistance, and downy mildew resistance. The metrics reported are: number of samples, entropy R-square, misclassification rate, Area Under the ROC Curve (AUC), Root Average Squared Error (RASE), and generalized R-square.\u003c/p\u003e\n\u003ctable border=\"1\" cellspacing=\"0\" cellpadding=\"0\" width=\"672\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 114px;\"\u003e\n \u003cp\u003e\u003cstrong\u003e\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 85px;\"\u003e\n \u003cp\u003e# of sample\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 85px;\"\u003e\n \u003cp\u003eEntropy RSquare\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 85px;\"\u003e\n \u003cp\u003eMisclassification Rate\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 85px;\"\u003e\n \u003cp\u003eAUC\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 85px;\"\u003e\n \u003cp\u003eRASE\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 133px;\"\u003e\n \u003cp\u003eGeneralized RSquare\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd colspan=\"7\" valign=\"top\" style=\"width: 672px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eCountry\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 114px;\"\u003e\n \u003cp\u003eTraining\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 85px;\"\u003e\n \u003cp\u003e162\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 85px;\"\u003e\n \u003cp\u003e0.7044\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 85px;\"\u003e\n \u003cp\u003e0.0864\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 85px;\"\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 85px;\"\u003e\n \u003cp\u003e0.60458\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 133px;\"\u003e\n \u003cp\u003e0.9922\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 114px;\"\u003e\n \u003cp\u003eValidation\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 85px;\"\u003e\n \u003cp\u003e54\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 85px;\"\u003e\n \u003cp\u003e-0.073\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 85px;\"\u003e\n \u003cp\u003e0.5\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 85px;\"\u003e\n \u003cp\u003e0.8234\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 85px;\"\u003e\n \u003cp\u003e0.82237\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 133px;\"\u003e\n \u003cp\u003e-0.534\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd colspan=\"7\" valign=\"top\" style=\"width: 672px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eRegion\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 114px;\"\u003e\n \u003cp\u003eTraining\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 85px;\"\u003e\n \u003cp\u003e162\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 85px;\"\u003e\n \u003cp\u003e0.7819\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 85px;\"\u003e\n \u003cp\u003e0.0617\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 85px;\"\u003e\n \u003cp\u003e0.9999\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 85px;\"\u003e\n \u003cp\u003e0.38833\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 133px;\"\u003e\n \u003cp\u003e0.978\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 114px;\"\u003e\n \u003cp\u003eValidation\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 85px;\"\u003e\n \u003cp\u003e54\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 85px;\"\u003e\n \u003cp\u003e0.3605\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 85px;\"\u003e\n \u003cp\u003e0.3519\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 85px;\"\u003e\n \u003cp\u003e0.8896\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 85px;\"\u003e\n \u003cp\u003e0.63063\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 133px;\"\u003e\n \u003cp\u003e0.7866\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd colspan=\"7\" valign=\"top\" style=\"width: 672px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eAnthracnose\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 114px;\"\u003e\n \u003cp\u003eTraining\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 85px;\"\u003e\n \u003cp\u003e162\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 85px;\"\u003e\n \u003cp\u003e0.6695\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 85px;\"\u003e\n \u003cp\u003e0.0247\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 85px;\"\u003e\n \u003cp\u003e0.9991\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 85px;\"\u003e\n \u003cp\u003e0.22873\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 133px;\"\u003e\n \u003cp\u003e0.8061\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 114px;\"\u003e\n \u003cp\u003eValidation\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 85px;\"\u003e\n \u003cp\u003e54\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 85px;\"\u003e\n \u003cp\u003e0.0303\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 85px;\"\u003e\n \u003cp\u003e0.3704\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 85px;\"\u003e\n \u003cp\u003e0.6598\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 85px;\"\u003e\n \u003cp\u003e0.48742\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 133px;\"\u003e\n \u003cp\u003e0.0548\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd colspan=\"7\" valign=\"top\" style=\"width: 672px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eHeadsmut\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 114px;\"\u003e\n \u003cp\u003eTraining\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 85px;\"\u003e\n \u003cp\u003e154\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 85px;\"\u003e\n \u003cp\u003e0.6538\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 85px;\"\u003e\n \u003cp\u003e0.0325\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 85px;\"\u003e\n \u003cp\u003e0.9966\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 85px;\"\u003e\n \u003cp\u003e0.23841\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 133px;\"\u003e\n \u003cp\u003e0.7935\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 114px;\"\u003e\n \u003cp\u003eValidation\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 85px;\"\u003e\n \u003cp\u003e50\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 85px;\"\u003e\n \u003cp\u003e-0.032\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 85px;\"\u003e\n \u003cp\u003e0.48\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 85px;\"\u003e\n \u003cp\u003e0.5737\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 85px;\"\u003e\n \u003cp\u003e0.51023\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 133px;\"\u003e\n \u003cp\u003e-0.06\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd colspan=\"7\" valign=\"top\" style=\"width: 672px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eDowny mildew\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 114px;\"\u003e\n \u003cp\u003eTraining\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 85px;\"\u003e\n \u003cp\u003e161\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 85px;\"\u003e\n \u003cp\u003e0.6096\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 85px;\"\u003e\n \u003cp\u003e0.0186\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 85px;\"\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 85px;\"\u003e\n \u003cp\u003e0.23597\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 133px;\"\u003e\n \u003cp\u003e0.7372\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 114px;\"\u003e\n \u003cp\u003eValidation\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 85px;\"\u003e\n \u003cp\u003e52\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 85px;\"\u003e\n \u003cp\u003e-0.042\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 85px;\"\u003e\n \u003cp\u003e0.1538\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 85px;\"\u003e\n \u003cp\u003e0.5653\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 85px;\"\u003e\n \u003cp\u003e0.37079\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 133px;\"\u003e\n \u003cp\u003e-0.063\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003eGiven the inconsistent and, in some cases, weak validation performance, we proceeded with extreme caution. While the training set accuracies for all models were high (ranging from 0.9136 to 0.9814), indicating that the models can capture relationships between SNPs and the traits within the training data, the limited validation results suggest that these relationships may not generalize well to new data. Therefore, we focused primarily on the importance scores from Bootstrap Forest models trained on the full dataset (100% of the accessions) to identify potential candidate SNPs and genes. This approach allows us to leverage all available data. However, the results, especially for those traits with low validation performance, should be considered exploratory and require further validation in independent datasets. Based on their importance scores from this final model, the top SNPs associated with geographic origin and disease resistance are discussed below (Fig. 4 \u0026amp; Table 2 for geographic origin, Fig. 5 and Table 3 for disease resistance).\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eFor the country of origin model, the SNPs with the highest importance scores were: S10_5199488 (nearest gene: \u003cem\u003eSobic.010G065500\u003c/em\u003e, BHLH transcription factor PTF1-like), S1_16693424 (nearest gene: \u003cem\u003eSobic.001G188800\u003c/em\u003e, Cysteine-rich repeat secretory protein), S10_7769197 (nearest gene: \u003cem\u003eSobic.010G089700\u003c/em\u003e, EREBP-like factor), S10_7675418 (nearest gene: \u003cem\u003eSobic.010G089300\u003c/em\u003e, DUF6598 domain-containing protein), S6_53674050 and S6_53674064 (both nearest gene: \u003cem\u003eSobic.006G181900\u003c/em\u003e, Aluminium-activated malate transporter), and S10_8370408 (nearest gene: \u003cem\u003eSobic.010G094100\u003c/em\u003e, No annotation) (Fig. 4a, Table 2).\u003c/p\u003e\n\u003cp\u003eFor the broader geographic region model, a different set of SNPs emerged as most important: S10_7678887, S10_7680604, and S10_7675418 (all nearest gene: \u003cem\u003eSobic.010G089300\u003c/em\u003e, DUF6598 domain-containing protein), S9_53912320 and S9_53912321 (both nearest gene: \u003cem\u003eSobic.009G186100\u003c/em\u003e, No annotation), S10_7790015 and S10_7769197 (both nearest gene: \u003cem\u003eSobic.010G089700\u003c/em\u003e, EREBP-like factor), S10_7715727 (nearest gene: \u003cem\u003eSobic.010G089600\u003c/em\u003e, Trehalose 6-phosphate phosphatase), S5_3709735 (nearest gene: \u003cem\u003eSobic.005G040800\u003c/em\u003e, Protein of unknown function (DUF2921)), S10_5199488 (nearest gene: \u003cem\u003eSobic.010G065500\u003c/em\u003e, BHLH domain-containing protein), S9_53923701 and S9_53923509 (nearest gene: \u003cem\u003eSobic.009G186300\u003c/em\u003e, Tyrosine-protein kinase), S4_50278385 (nearest gene: \u003cem\u003eSobic.004G158100\u003c/em\u003e, Phosphatidylinositol transfer protein), and S3_20398659 (nearest gene: \u003cem\u003eSobic.003G163301\u003c/em\u003e, No annotation) (Fig. 4b, Table 2).\u003c/p\u003e\n\u003cp\u003eThe prominence of SNPs on chromosome 10 in both the country-level and region-level analyses, particularly those near genes encoding a DUF6598 domain-containing protein and an EREBP-like factor, suggests that this chromosome may harbor genes or regulatory elements that have significantly influenced the adaptation and diversification of sorghum across different geographic regions.\u003c/p\u003e\n\u003cp\u003eWhen the analysis was performed using the country of origin as the target variable (Fig. 4a), the most important SNP was S10_5199488, followed by S1_16693424, S10_7769197, S10_7675418, S6_53674050, S6_53674064, and S10_8370408. Similarly, when the analysis used the broader geographic region as the target variable (Fig. 4b), a distinct set of SNPs emerged as highly important. The most important SNPs for predicting region of origin were S10_7675418, S10_7678887, S10_7680604, S9_53912320, S10_7769197, S10_7790015, followed by a number of other SNPs on chromosomes 9 and 10 (S10_7715727, S9_53912321, S10_5199488) and chromosome 3, 4, 5 and 6 (S3_20398659, S4_50278385, S5_3709735, S6_51602713, respectively).\u003c/p\u003e\n\u003cp\u003e\u0026nbsp;\u003cstrong\u003eTable 2.\u003c/strong\u003e Details of the most important SNPs associated with the geographic origin (country and region) of sorghum mini core accessions identified by Bootstrap Forest models. The table includes the SNP identifier (SNP ID), the nearest gene and its putative function (if known), the distance in base pairs between the SNP and the nearest gene (0 indicates that the SNP is within the gene), and the importance score (portion) of the SNP in the model. SNPs are grouped by whether they were identified in the country-level or region-level analysis.\u003c/p\u003e\n\u003ctable border=\"1\" cellspacing=\"0\" cellpadding=\"0\" width=\"617\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 138px;\"\u003e\n \u003cp\u003eSNP ID\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 277px;\"\u003e\n \u003cp\u003eNearest gene and function\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 105px;\"\u003e\n \u003cp\u003eBase pairs away\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 97px;\"\u003e\n \u003cp\u003eImportance score (portion)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd colspan=\"4\" valign=\"top\" style=\"width: 617px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eCountry\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 138px;\"\u003e\n \u003cp\u003eS10_5199488\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 277px;\"\u003e\n \u003cp\u003e\u003cem\u003eSobic.010G065500\u003c/em\u003e\u003c/p\u003e\n \u003cp\u003eBHLH transcription factor PTF1-like\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 105px;\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 97px;\"\u003e\n \u003cp\u003e0.0753\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 138px;\"\u003e\n \u003cp\u003eS1_16693424\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 277px;\"\u003e\n \u003cp\u003e\u003cem\u003eSobic.001G188800\u003c/em\u003e\u003c/p\u003e\n \u003cp\u003eCysteine-rich repeat secretory protein\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 105px;\"\u003e\n \u003cp\u003e189\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 97px;\"\u003e\n \u003cp\u003e0.0186\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 138px;\"\u003e\n \u003cp\u003eS10_7769197\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 277px;\"\u003e\n \u003cp\u003e\u003cem\u003eSobic.010G089700\u003c/em\u003e\u003c/p\u003e\n \u003cp\u003eEREBP-like factor\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 105px;\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 97px;\"\u003e\n \u003cp\u003e0.0185\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 138px;\"\u003e\n \u003cp\u003eS10_7675418\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 277px;\"\u003e\n \u003cp\u003e\u003cem\u003eSobic.010G089300\u003c/em\u003e\u003c/p\u003e\n \u003cp\u003eDUF6598 domain-containing protein\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 105px;\"\u003e\n \u003cp\u003e10,801\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 97px;\"\u003e\n \u003cp\u003e0.0158\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 138px;\"\u003e\n \u003cp\u003eS6_53674050 S6_53674064\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 277px;\"\u003e\n \u003cp\u003e\u003cem\u003eSobic.006G181900\u003c/em\u003e\u003c/p\u003e\n \u003cp\u003eAluminium-activated malate transporter\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 105px;\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 97px;\"\u003e\n \u003cp\u003e0.0126\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 138px;\"\u003e\n \u003cp\u003eS10_8370408\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 277px;\"\u003e\n \u003cp\u003e\u003cem\u003eSobic.010G094100\u003c/em\u003e\u003c/p\u003e\n \u003cp\u003eNo annotation\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 105px;\"\u003e\n \u003cp\u003e10,311\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 97px;\"\u003e\n \u003cp\u003e0.012\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd colspan=\"3\" valign=\"top\" style=\"width: 520px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eRegion\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 97px;\"\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 138px;\"\u003e\n \u003cp\u003eS10_7678887\u003c/p\u003e\n \u003cp\u003eS10_7680604\u003c/p\u003e\n \u003cp\u003eS10_7675418\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 277px;\"\u003e\n \u003cp\u003e\u003cem\u003eSobic.010G089300\u003c/em\u003e\u003c/p\u003e\n \u003cp\u003eDUF6598 domain-containing protein\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 105px;\"\u003e\n \u003cp\u003e7,332\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 97px;\"\u003e\n \u003cp\u003e0.0652\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 138px;\"\u003e\n \u003cp\u003eS9_53912320\u003c/p\u003e\n \u003cp\u003eS9_53912321\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 277px;\"\u003e\n \u003cp\u003e\u003cem\u003eSobic.009G186100\u003c/em\u003e\u003c/p\u003e\n \u003cp\u003eNo annotation\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 105px;\"\u003e\n \u003cp\u003e49\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 97px;\"\u003e\n \u003cp\u003e0.0307\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 138px;\"\u003e\n \u003cp\u003eS10_7790015\u003c/p\u003e\n \u003cp\u003eS10_7769197\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 277px;\"\u003e\n \u003cp\u003e\u003cem\u003eSobic.010G089700\u003c/em\u003e\u003c/p\u003e\n \u003cp\u003eEREBP-like factor\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 105px;\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 97px;\"\u003e\n \u003cp\u003e0.0273\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 138px;\"\u003e\n \u003cp\u003eS10_7715727\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 277px;\"\u003e\n \u003cp\u003e\u003cem\u003eSobic.010G089600\u003c/em\u003e\u003c/p\u003e\n \u003cp\u003eTrehalose 6-phosphate phosphatase\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 105px;\"\u003e\n \u003cp\u003e8,601\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 97px;\"\u003e\n \u003cp\u003e0.0103\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 138px;\"\u003e\n \u003cp\u003eS5_3709735\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 277px;\"\u003e\n \u003cp\u003e\u003cem\u003eSobic.005G040800\u003c/em\u003e\u003c/p\u003e\n \u003cp\u003eProtein of unknown function (DUF2921)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 105px;\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 97px;\"\u003e\n \u003cp\u003e0.0089\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 138px;\"\u003e\n \u003cp\u003eS10_5199488\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 277px;\"\u003e\n \u003cp\u003e\u003cem\u003eSobic.010G065500\u003c/em\u003e\u003c/p\u003e\n \u003cp\u003eBHLH domain-containing protein\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 105px;\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 97px;\"\u003e\n \u003cp\u003e0.0068\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 138px;\"\u003e\n \u003cp\u003eS9_53923701\u003c/p\u003e\n \u003cp\u003eS9_53923509\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 277px;\"\u003e\n \u003cp\u003e\u003cem\u003eSobic.009G186300\u003c/em\u003e\u003c/p\u003e\n \u003cp\u003eTyrosine-protein kinase\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 105px;\"\u003e\n \u003cp\u003e254\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 97px;\"\u003e\n \u003cp\u003e0.0115\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 138px;\"\u003e\n \u003cp\u003eS4_50278385\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 277px;\"\u003e\n \u003cp\u003e\u003cem\u003eSobic.004G158100\u003c/em\u003e\u003c/p\u003e\n \u003cp\u003ePhosphatidylinositol transfer protein\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 105px;\"\u003e\n \u003cp\u003e54\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 97px;\"\u003e\n \u003cp\u003e0.0111\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 138px;\"\u003e\n \u003cp\u003eS3_20398659\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 277px;\"\u003e\n \u003cp\u003e\u003cem\u003eSobic.003G163301\u003c/em\u003e\u003c/p\u003e\n \u003cp\u003eNo annotation\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 105px;\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 97px;\"\u003e\n \u003cp\u003e0.0109\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003e\u003cstrong\u003eIdentification of SNPs and candidate genes associated with disease resistance using Bootstrap Forest models\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eGenotypic and phenotypic data were inputted into Bootstrap Forest models to identify genomic regions highly associated with resistance to the three major sorghum diseases. For each disease, separate models were trained by using the same SNP markers, resulting in the identification of SNPs with standing out importance scores for each disease, suggesting their significant contribution to predicting disease resistance. The importance scores and details of the top SNPs for each disease, including their nearest genes and associated putative functions, are presented in Fig. 5 \u0026amp; Table 3.\u003c/p\u003e\n\u003cp\u003eFor anthracnose resistance, the SNPs with the highest importance scores were S3_61650227 and S3_61650258 (nearest gene: \u003cem\u003eSobic.003G281450\u003c/em\u003e, Reverse transcriptase zinc-binding domain-containing protein), S1_6061658 (nearest gene: \u003cem\u003eSobic.001G078900\u003c/em\u003e, Putative Myb-like DNA-binding protein), S4_48780280 (nearest gene: \u003cem\u003eSobic.004G154200\u003c/em\u003e, Ankyrin repeat-containing domain), S3_69017597 (nearest gene: \u003cem\u003eSobic.003G375500\u003c/em\u003e, Zinc finger PHD-type domain-containing protein), S1_71075125 (nearest gene: \u003cem\u003eSobic.001G431800\u003c/em\u003e, Uncharacterized protein DUF292), and S10_6800171 (nearest gene: \u003cem\u003eSobic.010G079900\u003c/em\u003e, H15 domain-containing protein) (Fig. 5a).\u003c/p\u003e\n\u003cp\u003eFor head smut resistance, the top SNPs were S1_73522544 (nearest gene: \u003cem\u003eSobic.001G459600\u003c/em\u003e, Leucine-rich repeats-containing protein), S1_73523267 and S1_73523586 (nearest gene: \u003cem\u003eSobic.001G459600\u003c/em\u003e, NB-ARC domain-containing protein), S1_73516778 (nearest gene: \u003cem\u003eSobic.001G459500\u003c/em\u003e, Leucine-rich repeats-containing protein), S2_719163 (nearest gene: \u003cem\u003eSobic.002G007700\u003c/em\u003e, Nodulin-like domain-containing protein), S6_38161717 (nearest gene: \u003cem\u003eSobic.006G051700\u003c/em\u003e, NB-ARC domain // WRKY DNA -binding domain), S5_1701005 (nearest gene: \u003cem\u003eSobic.005G018900\u003c/em\u003e, Phosphoinositide-specific phospholipase C), and S8_49187870 (nearest gene: \u003cem\u003eSobic.008G104801\u003c/em\u003e, DUF4220 domain-containing protein) (Fig. 5b).\u003c/p\u003e\n\u003cp\u003eFor downy mildew resistance, the top SNPs were S3_30166225 (nearest gene: \u003cem\u003eSobic.003G169966\u003c/em\u003e, 3-5 exonuclease), S8_56098111 (nearest gene: \u003cem\u003eSobic.008G133300\u003c/em\u003e, UNC-93 like protein), S10_8958624 (nearest gene: \u003cem\u003eSobic.010G098932\u003c/em\u003e, Uncharacterized protein), S2_54739711 (nearest gene: \u003cem\u003eSobic.002G173500\u003c/em\u003e, Glycosyltransferase), S8_58354598 (nearest gene: \u003cem\u003eSobic.008G150400\u003c/em\u003e, F-box and WD40 domain protein), S2_8823410 (nearest gene: \u003cem\u003eSobic.002G082500\u003c/em\u003e, Protein LURP-one-related 15), and S1_74545705 (nearest gene: \u003cem\u003eSobic.001G473300\u003c/em\u003e, Protein phosphatase) (Fig. 5c). The identification of multiple SNPs with high importance scores for each disease supports the hypothesis that resistance to anthracnose, head smut, and downy mildew in sorghum is likely controlled by multiple genes or genomic regions, exhibiting a polygenic inheritance pattern.\u003c/p\u003e\n\u003cp\u003e\u0026nbsp;\u003cstrong\u003eTable 4.\u003c/strong\u003e List of top SNPs associated with resistance to three sorghum diseases, identified by Bootstrap Forest models. Details of the most important SNPs associated with resistance to anthracnose, head smut, and downy mildew in sorghum, as identified by Bootstrap Forest models. The table provides the SNP ID, which includes chromosome number and position; the nearest gene to the SNP, along with its putative function; the distance in base pairs between the SNP and the nearest gene; and the importance score of the SNP in the Bootstrap Forest model, which reflects the contribution of the SNP to predicting disease resistance.\u003c/p\u003e\n\u003ctable border=\"1\" cellspacing=\"0\" cellpadding=\"0\" width=\"617\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 138px;\"\u003e\n \u003cp\u003eSNP ID\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 277px;\"\u003e\n \u003cp\u003eNearest gene and function\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 105px;\"\u003e\n \u003cp\u003eBase pairs away\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 97px;\"\u003e\n \u003cp\u003eImportance score (portion)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd colspan=\"4\" valign=\"top\" style=\"width: 617px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eAnthracnose\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 138px;\"\u003e\n \u003cp\u003eS3_61650227\u003c/p\u003e\n \u003cp\u003eS3_61650258\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 277px;\"\u003e\n \u003cp\u003e\u003cem\u003eSobic.003G281450\u003c/em\u003e\u003c/p\u003e\n \u003cp\u003eReverse transcriptase zinc-binding domain-containing protein\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 105px;\"\u003e\n \u003cp\u003e1,345\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 97px;\"\u003e\n \u003cp\u003e0.0255\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 138px;\"\u003e\n \u003cp\u003eS1_6061658\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 277px;\"\u003e\n \u003cp\u003e\u003cem\u003eSobic.001G078900\u003c/em\u003e\u003c/p\u003e\n \u003cp\u003ePutative MYB-like DNA-binding protein\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 105px;\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 97px;\"\u003e\n \u003cp\u003e0.0101\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 138px;\"\u003e\n \u003cp\u003eS4_48780280\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 277px;\"\u003e\n \u003cp\u003e\u003cem\u003eSobic.004G154200\u003c/em\u003e\u003c/p\u003e\n \u003cp\u003eAnkyrin repeat-containing domain\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 105px;\"\u003e\n \u003cp\u003e63,053\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 97px;\"\u003e\n \u003cp\u003e0.0091\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 138px;\"\u003e\n \u003cp\u003eS3_69017597\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 277px;\"\u003e\n \u003cp\u003e\u003cem\u003eSobic.003G375500\u003c/em\u003e\u003c/p\u003e\n \u003cp\u003eZinc finger PHD-type domain-containing protein\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 105px;\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 97px;\"\u003e\n \u003cp\u003e0.0064\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 138px;\"\u003e\n \u003cp\u003eS1_71075125\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 277px;\"\u003e\n \u003cp\u003e\u003cem\u003eSobic.001G431800\u003c/em\u003e\u003c/p\u003e\n \u003cp\u003eUncharacterized protein DUF292\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 105px;\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 97px;\"\u003e\n \u003cp\u003e0.0062\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 138px;\"\u003e\n \u003cp\u003eS10_6800171\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 277px;\"\u003e\n \u003cp\u003e\u003cem\u003eSobic.010G079900\u003c/em\u003e\u003c/p\u003e\n \u003cp\u003eH15 domain-containing protein\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 105px;\"\u003e\n \u003cp\u003e850\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 97px;\"\u003e\n \u003cp\u003e0.0057\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd colspan=\"4\" valign=\"top\" style=\"width: 617px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eHead smut\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 138px;\"\u003e\n \u003cp\u003eS1_73522544\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 277px;\"\u003e\n \u003cp\u003e\u003cem\u003eSobic.001G459600\u003c/em\u003e\u003c/p\u003e\n \u003cp\u003eLeucine-rich repeats (LRR)-containing protein\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 105px;\"\u003e\n \u003cp\u003e222\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 97px;\"\u003e\n \u003cp\u003e0.00138\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 138px;\"\u003e\n \u003cp\u003eS1_73523267\u003c/p\u003e\n \u003cp\u003eS1_73523586\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 277px;\"\u003e\n \u003cp\u003e\u003cem\u003eSobic.001G459600\u003c/em\u003e\u003c/p\u003e\n \u003cp\u003eNB-ARC domain-containing protein\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 105px;\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 97px;\"\u003e\n \u003cp\u003e0.00130\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 138px;\"\u003e\n \u003cp\u003eS1_73516778\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 277px;\"\u003e\n \u003cp\u003e\u003cem\u003eSobic.001G459500\u003c/em\u003e\u003c/p\u003e\n \u003cp\u003eLeucine-rich repeats (LRR)-containing protein\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 105px;\"\u003e\n \u003cp\u003e270\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 97px;\"\u003e\n \u003cp\u003e0.0081\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 138px;\"\u003e\n \u003cp\u003eS2_719163\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 277px;\"\u003e\n \u003cp\u003e\u003cem\u003eSobic.002G007700\u003c/em\u003e\u003c/p\u003e\n \u003cp\u003eNodulin-like domain-containing protein\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 105px;\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 97px;\"\u003e\n \u003cp\u003e0.0060\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 138px;\"\u003e\n \u003cp\u003eS6_38161717\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 277px;\"\u003e\n \u003cp\u003e\u003cem\u003eSobic.006G051700\u003c/em\u003e\u003c/p\u003e\n \u003cp\u003eNB-ARC domain // WRKY DNA -binding domain\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 105px;\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 97px;\"\u003e\n \u003cp\u003e0.0053\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 138px;\"\u003e\n \u003cp\u003eS5_1701005\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 277px;\"\u003e\n \u003cp\u003e\u003cem\u003eSobic.005G018900\u003c/em\u003e\u003c/p\u003e\n \u003cp\u003ePhosphoinositide-specific phospholipase C\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 105px;\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 97px;\"\u003e\n \u003cp\u003e0.0052\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 138px;\"\u003e\n \u003cp\u003eS8_49187870\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 277px;\"\u003e\n \u003cp\u003e\u003cem\u003eSobic.008G104801\u003c/em\u003e\u003c/p\u003e\n \u003cp\u003eDUF4220 domain-containing protein\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 105px;\"\u003e\n \u003cp\u003e466\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 97px;\"\u003e\n \u003cp\u003e0.0045\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd colspan=\"4\" valign=\"top\" style=\"width: 617px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eDowny mildew\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 138px;\"\u003e\n \u003cp\u003eS3_30166225\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 277px;\"\u003e\n \u003cp\u003e\u003cem\u003eSobic.003G169966\u003c/em\u003e\u003c/p\u003e\n \u003cp\u003e3-5 exonuclease\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 105px;\"\u003e\n \u003cp\u003e11,050\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 97px;\"\u003e\n \u003cp\u003e0.0064\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 138px;\"\u003e\n \u003cp\u003eS8_56098111\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 277px;\"\u003e\n \u003cp\u003e\u003cem\u003eSobic.008G133300\u003c/em\u003e\u003c/p\u003e\n \u003cp\u003eUNC-93 like protein\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 105px;\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 97px;\"\u003e\n \u003cp\u003e0.0062\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 138px;\"\u003e\n \u003cp\u003eS10_8958624\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 277px;\"\u003e\n \u003cp\u003e\u003cem\u003eSobic.010G098932\u003c/em\u003e\u003c/p\u003e\n \u003cp\u003eUncharacterized protein\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 105px;\"\u003e\n \u003cp\u003e2,458\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 97px;\"\u003e\n \u003cp\u003e0.0055\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 138px;\"\u003e\n \u003cp\u003eS2_54739711\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 277px;\"\u003e\n \u003cp\u003e\u003cem\u003eSobic.002G173500\u003c/em\u003e\u003c/p\u003e\n \u003cp\u003eGlycosyltransferase\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 105px;\"\u003e\n \u003cp\u003e18,134\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 97px;\"\u003e\n \u003cp\u003e0.0051\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 138px;\"\u003e\n \u003cp\u003eS8_58354598\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 277px;\"\u003e\n \u003cp\u003e\u003cem\u003eSobic.008G150400\u003c/em\u003e\u003c/p\u003e\n \u003cp\u003eF-box and WD40 domain protein\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 105px;\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 97px;\"\u003e\n \u003cp\u003e0.0044\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 138px;\"\u003e\n \u003cp\u003eS2_8823410\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 277px;\"\u003e\n \u003cp\u003e\u003cem\u003eSobic.002G082500\u003c/em\u003e\u003c/p\u003e\n \u003cp\u003eProtein LURP-one-related 15\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 105px;\"\u003e\n \u003cp\u003e2,370\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 97px;\"\u003e\n \u003cp\u003e0.0044\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 138px;\"\u003e\n \u003cp\u003eS1_74545705\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 277px;\"\u003e\n \u003cp\u003e\u003cem\u003eSobic.001G473300\u003c/em\u003e\u003c/p\u003e\n \u003cp\u003eProtein phosphatase\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 105px;\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 97px;\"\u003e\n \u003cp\u003e0.0044\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003e\u003cbr\u003e\u003c/p\u003e"},{"header":"Discussion","content":"\u003cp\u003eThis study investigates the complex interplay of geographic origin, genetic diversity, and disease resistance in sorghum. We primarily analyzed the global mini core collection, representing a broad sorghum diversity spectrum and additional Senegalese lines [\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e]. This collection has been valuable for assessing resistance to various diseases, specifically anthracnose, leaf blight, and rust [\u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e]. To enhance our assessment of genetic diversity, we included SNP data from an additional set of Senegalese accessions, which allowed for a more comprehensive evaluation of the relationship between geographic origin, genetic variation, and disease resistance in sorghum. Leveraging a combination of phenotypic evaluations for resistance to three major diseases, extensive genotypic data from 297,876 SNP markers, and an advanced machine learning approach, we aimed to have a deeper understanding of the genetic architecture of sorghum disease resistance and to identify genomic regions associated with geographic adaptation.\u003cdiv class=\"BlockQuote\"\u003e\u003cp\u003eOur analyses revealed complex relationships among geographic origin, genetic diversity, and disease resistance profiles in sorghum. While the \u003cem\u003et\u003c/em\u003e-SNE analysis (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003ea) of disease resistance phenotypes in mini core lines showed some clustering, this clustering did not directly correlate with geographic origin, showing mixed originated country per each cluster. This suggests that, although adaptation to local environments, including pathogen pressures [\u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e], likely plays a role, the observed patterns of disease resistance are not solely explained by geographic proximity. The loading plot from the PCA (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eb) further showed the contribution. The loading plot illustrated that resistance and susceptibility responses for each disease were negatively correlated. While all three diseases contributed significantly to the observed variation, downy mildew resistance strongly influenced PC2, whereas anthracnose resistance had the most influence on PC1. Moreover, this study also found statistical significance in genetic distances based on the originating country/geographic region (ANOVA, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.0001) (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003ec), confirming that geographic origin might be a significant factor in shaping the genetic structure of the sorghum accessions [\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e]. Furthermore, hierarchical clustering based on SNP data (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003ea and b) revealed that the Senegalese accessions formed approximately six distinct clusters, largely separate from the mini core collection. This clear genetic separation between the Senegalese lines and most of the mini core accessions points to a unique evolutionary trajectory for sorghum in Senegal, likely driven by adaptation to specific local environmental pressures, such as unique pathogen populations or soil conditions, and potentially by different agricultural practices.\u003c/p\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003eTo gain deeper insights into the genetic foundations of the observed geographic patterns, we conducted a GWAS employing Bootstrap Forest models, a powerful machine-learning approach capable of identifying complex associations between genotype and phenotype [\u003cspan citationid=\"CR43\" class=\"CitationRef\"\u003e43\u003c/span\u003e]. This analysis aimed to pinpoint specific genomic regions (SNPs) significantly associated with the geographic origin of the accessions by using both country (56 countries) and broader regional (13 regions) classifications. While the training set accuracies for these models were high (exceeding 90%), the validation set accuracies varied considerably and were, in some cases, low (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). Notably, the model for predicting the country of origin showed poor validation performance, with a negative Generalized R-square. The model used to predict broader geographic regions showed better performance, although its validation accuracy was still notably lower than that of the training set. This discrepancy between training and validation set accuracy, particularly for the country-level model, suggests that predicting specific geographic origins from SNP data in this dataset is challenging and potentially requires more considerable data input. As regional predictions were comparably more accurate, it could be a simple matter of a number of classification categories: 13 regions vs 56 countries. While these models have a good explanatory power of the dataset, we acknowledge that their predictive power on unseen data is limited. Consequently, we focused our interpretation on the SNP importance scores derived from Bootstrap Forest models trained on the full dataset. This approach allows us to leverage all available information to identify potential candidate genes.\u003c/p\u003e \u003cp\u003eThe analysis identified numerous SNPs with high importance scores, indicating their substantial contribution to distinguishing between accessions from different geographic locations (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e, Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e). When using the country of origin as the target variable, the most significant SNP was S10_5199488, located within a gene (\u003cem\u003eSobic.010G065500\u003c/em\u003e) encoding a BHLH transcription factor (Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e) [\u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e44\u003c/span\u003e]. This family of transcription factors plays diverse roles in plant development and responses to environmental stimuli and may be a potential link between this genomic region and adaptation to specific ecological conditions within different countries [\u003cspan additionalcitationids=\"CR45\" citationid=\"CR44\" class=\"CitationRef\"\u003e44\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR46\" class=\"CitationRef\"\u003e46\u003c/span\u003e]. Other important SNPs for the country of origin were found on chromosomes 1, 6, and 10 (Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e), further accentuating the polygenic nature of geographic adaptation. For instance, the SNP S1_16693424 is located near the gene \u003cem\u003eSobic.001G188800\u003c/em\u003e, which encodes a cysteine-rich repeat secretory protein (CRRSP). CRRSPs are a family of proteins known to be involved in various processes, including plant development [\u003cspan citationid=\"CR47\" class=\"CitationRef\"\u003e47\u003c/span\u003e], stress responses [\u003cspan citationid=\"CR47\" class=\"CitationRef\"\u003e47\u003c/span\u003e, \u003cspan citationid=\"CR48\" class=\"CitationRef\"\u003e48\u003c/span\u003e], and signaling. In plants of the \u003cem\u003eArabidopsis\u003c/em\u003e genus, CRRSPs are induced by pathogen infection and treatment with reactive oxygen species or salicylic acid [\u003cspan citationid=\"CR49\" class=\"CitationRef\"\u003e49\u003c/span\u003e]. Similarly, when the analysis was performed using the broader geographic region as the target variable, a distinct set of SNPs emerged as highly important, with several located on chromosome 10 (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eb, Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e). The most important SNP for the region of origin, S10_7675418 (and two other nearby SNPs), is located near a gene (\u003cem\u003eSobic.010G089300\u003c/em\u003e) encoding a DUF6598 domain-containing protein. The precise function of the gene in sorghum is yet to be explored, but a DUF6598 domain has been found as a candidate gene for powdery mildew resistance in wheat [\u003cspan citationid=\"CR50\" class=\"CitationRef\"\u003e50\u003c/span\u003e]. Additionally, SNP S10_7790015, also found to be important for predicting the region of origin, is located near a gene encoding an EREBP-like factor, further highlighting the potential role of transcription factors in sorghum's adaptation to abiotic and biotic stresses, such as drought [\u003cspan citationid=\"CR51\" class=\"CitationRef\"\u003e51\u003c/span\u003e] and fungal pathogens [\u003cspan citationid=\"CR52\" class=\"CitationRef\"\u003e52\u003c/span\u003e], which can vary significantly across broad geographic regions.\u003c/p\u003e \u003cp\u003eThe recurring prominence of SNPs on chromosome 10 in both analyses strongly suggests that this chromosome, in particular, carries genes or regulatory elements that have played a central role in the adaptation and diversification of sorghum across different geographic regions. This pattern was also observed in \u003cem\u003eMedicago\u003c/em\u003e, where SNP-based clustering and GWA analysis using machine learning highlighted the importance of chromosome 8 in distinguishing geographic origins [\u003cspan citationid=\"CR53\" class=\"CitationRef\"\u003e53\u003c/span\u003e]. This ultimately highlighted the power of machine learning approaches like Bootstrap Forest for identifying complex genetic associations to plant origin and local adaptation [\u003cspan citationid=\"CR53\" class=\"CitationRef\"\u003e53\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eBuilding upon the insights into geographic adaptation, we extended our investigation to explore the genetic architecture of resistance to three major sorghum diseases: anthracnose, head smut, and downy mildew. Employing the same Bootstrap Forest modeling approach, we again trained separate models for each disease using identical SNP markers. Numerous SNPs possessed high importance scores in each model, displaying their substantial contribution to predicting disease resistance phenotypes (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e, Table\u0026nbsp;3). The most important SNPs for anthracnose resistance were found on chromosomes 1, 3, 4, and 10, while those for head smut resistance were located on chromosomes 1, 2, 5, 6, and 8. Similarly, SNPs associated with downy mildew resistance were identified on chromosomes 1, 2, 3, 8, and 10. This wide distribution of important SNPs across chromosomes explains that resistance to these diseases is a complex, polygenic trait in sorghum, involving many genes with minor to moderate phenotypic effects. These genes are involved in a wide array of biological processes, insinuating the presence of diverse molecular mechanisms that contribute to the overall resistance phenotype in sorghum.\u003c/p\u003e \u003cp\u003eFor anthracnose resistance, the top SNP, S3_61650227 and S3_61650258, is located near a gene (\u003cem\u003eSobic.003G281450\u003c/em\u003e) encoding a reverse transcriptase zinc-binding domain-containing protein. While the precise role of this gene in anthracnose resistance is unknown, its zinc-binding domain is of particular interest. Zinc-binding proteins have been implicated in plant defense responses; for instance, a zinc-binding citrus protein metallothionein can act as a plant defense factor [\u003cspan citationid=\"CR54\" class=\"CitationRef\"\u003e54\u003c/span\u003e]. In \u003cem\u003eArabidopsis\u003c/em\u003e, zinc has been shown to trigger signaling mechanisms and defense responses that promote resistance to \u003cem\u003eAlternaria brassicicola\u003c/em\u003e [\u003cspan citationid=\"CR55\" class=\"CitationRef\"\u003e55\u003c/span\u003e]. Moreover, a zinc metalloprotease in \u003cem\u003eFusarium graminearum\u003c/em\u003e targets a wheat zinc-binding protein, contributing to the pathogen's overall virulence [\u003cspan citationid=\"CR56\" class=\"CitationRef\"\u003e56\u003c/span\u003e]. These findings suggest that zinc-binding proteins, such as the one encoded by \u003cem\u003eSobic.003G281450\u003c/em\u003e, may play a role in sorghum's defense against anthracnose. Interestingly, a linear mixed model identified this region as a top candidate SNP for anthracnose resistance in our previous traditional GWAS analysis [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e]. This further augments the argument that this region, and potentially the zinc-binding domain-containing protein encoded by \u003cem\u003eSobic.003G281450\u003c/em\u003e (or nearby genes), is involved in sorghum's defense response. Other genes involved in diverse functions were found near top SNPs, including genes encoding for a putative MYB-like DNA-binding protein, ankyrin repeat-containing domain, zinc finger PHD-type domain-containing protein, and more.\u003c/p\u003e \u003cp\u003eFor head smut resistance, the most important SNP, S1_73522544, is located near the gene \u003cem\u003eSobic.001G459600\u003c/em\u003e, which encodes a leucine-rich repeat (LRR)-containing protein. LRR-containing proteins are well-established components of plant immune systems, playing a substantial role in pathogen recognition and the activation of downstream defense responses [\u003cspan citationid=\"CR57\" class=\"CitationRef\"\u003e57\u003c/span\u003e, \u003cspan citationid=\"CR58\" class=\"CitationRef\"\u003e58\u003c/span\u003e]. Curiously, another LRR-containing protein encoded by the gene \u003cem\u003eSobic.001G459500\u003c/em\u003e was also identified as a top candidate for head smut resistance in our previous traditional GWAS analysis [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e]. This gene is located near the SNP S1_73516778, further supporting the importance of LRR-containing proteins in sorghum's defense against head smut. In addition to these LRR-containing proteins, other genes associated with head smut resistance in this study point to a complex interplay of defense signaling and recognition mechanisms. These include genes encoding an NB-ARC domain-containing protein and a protein with both NB-ARC and WRKY DNA-binding domains. NB-ARC domains are characteristic of intracellular immune receptors (R proteins) that recognize pathogen effectors and trigger defense responses, while WRKY transcription factors are key regulators of defense gene expression [\u003cspan citationid=\"CR59\" class=\"CitationRef\"\u003e59\u003c/span\u003e]. Notably, a recent study in chickpeas demonstrated a direct physical interaction between a CC-NB-ARC-LRR protein and a WRKY transcription factor, promoting resistance to Fusarium wilt [\u003cspan citationid=\"CR60\" class=\"CitationRef\"\u003e60\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eFinally, for downy mildew resistance, the top SNPs include SNP, S2_54739711, located near \u003cem\u003eSobic.002G173500\u003c/em\u003e, a gene encoding a glycosyltransferase protein. Glycosyltransferases are involved in the biosynthesis of various cell wall components, and modifications to the cell wall can influence pathogen penetration and spread [\u003cspan citationid=\"CR61\" class=\"CitationRef\"\u003e61\u003c/span\u003e]. We also identified an F-box protein as a potential candidate, which is noteworthy as F-box proteins are known to play roles in plant defense, and multiple studies similarly found an F-box protein gene associated with fungal resistance in sorghum [\u003cspan additionalcitationids=\"CR63\" citationid=\"CR62\" class=\"CitationRef\"\u003e62\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR64\" class=\"CitationRef\"\u003e64\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eThe diversity of these candidate genes suggests that resistance to each disease is not only polygenic but also involves a complex interplay of various defense mechanisms, potentially entailing pathogen recognition [\u003cspan citationid=\"CR65\" class=\"CitationRef\"\u003e65\u003c/span\u003e], signal transduction [\u003cspan citationid=\"CR66\" class=\"CitationRef\"\u003e66\u003c/span\u003e], cell wall modification [\u003cspan citationid=\"CR67\" class=\"CitationRef\"\u003e67\u003c/span\u003e], and other metabolic processes. Additional research incorporating aspects such as gene expression studies and functional characterization is needed to elucidate the precise roles of these candidate genes in conferring resistance to anthracnose, head smut, and downy mildew in sorghum.\u003c/p\u003e \u003cp\u003eHowever, it is important to acknowledge a limitation regarding the generalizability of the disease resistance findings. The phenotypic evaluations used in this study were based on pathogen isolates relevant to a specific geographic region (Southern U.S.). While this is valuable for understanding resistance, the prevalence, virulence, and genetic diversity of \u003cem\u003eC. sublineola\u003c/em\u003e (anthracnose), \u003cem\u003eS. reilianum\u003c/em\u003e (head smut), and \u003cem\u003eP. sorghi\u003c/em\u003e (downy mildew) can vary across different geographic regions. Still, the identified loci and genes will remain valuable candidates for further investigation.\u003c/p\u003e \u003cp\u003eThe candidate genes identified through the Bootstrap Forest models offer promising targets for marker-assisted selection (MAS) [\u003cspan citationid=\"CR68\" class=\"CitationRef\"\u003e68\u003c/span\u003e] and gene editing technologies such as CRISPR-Cas9 [\u003cspan citationid=\"CR69\" class=\"CitationRef\"\u003e69\u003c/span\u003e]. By focusing on these genes, previously known candidates, and their associated pathways, breeders can more efficiently select and pyramid resistance genes, developing varieties with durable and broad-spectrum resistance. For instance, identifying LRR-containing proteins as potential players in head smut resistance provides a clear avenue for targeted breeding, as these proteins are known to be involved in pathogen recognition. Similarly, the association of glycosyltransferases with downy mildew resistance suggests that modifying cell wall composition could be a promising strategy for enhancing resistance to this disease. However, follow-up studies need to validate the functional roles of these candidate genes through further research. Furthermore, future studies should continue to explore the genetic diversity of underrepresented germplasm collections, as they likely contain a wealth of untapped genetic variation for disease resistance and other valuable traits.\u003c/p\u003e"},{"header":"Conclusion","content":"\u003cp\u003eThis study employed a comprehensive approach, integrating phenotypic evaluations, high-density SNP genotyping, and machine learning analyses to investigate the genetic architecture of disease resistance and geographic origin in sorghum. Our findings underscore the profound influence of geographic origin in shaping the genetic diversity and disease resistance profiles of sorghum accessions. The inclusion of a genetically distinct collection of Senegalese accessions significantly expanded the diversity under investigation, highlighting the importance of exploring underrepresented germplasm for valuable traits.\u003c/p\u003e \u003cp\u003eWe identified numerous SNPs associated with geographic origin and disease resistance through genome-wide association analysis using Bootstrap Forest models. Notably, chromosome 10 emerged as a potential hotspot for genes and regulatory elements in adaptation to different geographic regions. We identified several candidate genes located near the most important SNPs, consisting of those encoding transcription factors (BHLH and EREBP-like factors), a cysteine-rich repeat secretory protein, and a DUF6598 domain-containing protein. These findings suggest that diverse molecular mechanisms in the vein of transcriptional regulation, stress responses, and potentially novel pathways contribute to sorghum's adaptation to varying environments.\u003c/p\u003e \u003cp\u003eFor disease resistance, our analysis revealed a complex, polygenic architecture with important SNPs distributed across multiple chromosomes. We identified candidate genes associated with resistance to anthracnose, head smut, and downy mildew, including genes encoding zinc-binding proteins, LRR-containing proteins, F-box, glycosyltransferases, and others involved in various cellular processes. Interestingly, several candidate genes identified in this study were also identified in our previous traditional GWAS analysis, further strengthening the evidence for their involvement in disease resistance.\u003c/p\u003e \u003cp\u003eThe identification of these candidate genes and the characterization of genetically diverse accessions provide valuable resources for sorghum breeding programs. These resources can be optimized to develop improved varieties with enhanced resistance to multiple diseases and better adaptation to diverse environments through marker-assisted selection, gene editing, and other advanced breeding techniques. Ultimately, this research contributes to a deeper understanding of the genetic basis of disease resistance and geographic adaptation in sorghum, paving the way for the development of more resilient and productive varieties that can contribute to global food security in the face of evolving pathogen populations and changing environmental conditions.\u003c/p\u003e "},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eData availability\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eData from this study can be provided by the corresponding authors upon request.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAcknowledgements\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe are also grateful to the reviewers for their constructive feedback. Mention of any trade names or commercial products in this article is solely for the purpose of providing specific information and does not imply recommendation or endorsement by the U. S. Department of Agriculture. USDA is an equal opportunity provider and employer, and all agency services are available without discrimination.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis work is supported by the U.S. Department of Agriculture, Agricultural Research Service, In-House Projects No. 8042-21220-258-000-D and 3091-22000-040-000-D. \u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthor information\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eAuthors and Affiliations\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eSustainable Perennial Crops Laboratory, Agricultural Research Service, United States, Department of Agriculture, Beltsville, MD, 20705, USA\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eEzekiel Ahn, Sunchung Park, Seunghyun Lim, Jae Hee Jang \u0026amp; Lyndel W. Meinhardt\u003c/strong\u003e\u003cstrong\u003e\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eEnvironmental Microbial and Food Safety Laboratory, Agricultural Research Service, United States, Department of Agriculture, Beltsville, MD, 20705, USA\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eInsuck Baek, Seok Min Hong \u0026amp; Moon S. Kim\u003c/strong\u003e\u003cstrong\u003e\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eInsect Control and Cotton Disease Research, Agricultural Research Service, Southern Plains Agricultural Research Center, United States Department of Agriculture, College Station, TX, 77845, USA\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eLouis K. Prom\u003c/strong\u003e\u003cstrong\u003e\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eDepartment of Civil Urban Earth and Environmental Engineering, Ulsan National Institute of Science and Technology, UNIST-gil 50, Ulsan, 44919, Republic of Korea\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eSeok Min Hong\u003c/strong\u003e\u003cstrong\u003e\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eDepartment of Plant Pathology and Microbiology, Texas A\u0026amp;M University, College Station, TX, 77843, USA\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eClint Magill\u003c/strong\u003e\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eContributions\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eEzekiel Ahn conceived and designed the project.\u0026nbsp;Sunchung Park, Seunghyun Lim, Insuck\u0026nbsp;Baek, Seok Min Hong, Jae Hee Jang, and\u0026nbsp;Ezekiel Ahn performed computational work and analyzed data. Ezekiel Ahn wrote the manuscript. Louis K. Prom, Moon S. Kim, Lyndel W. Meinhardt, and Clint Magill provided resources and contributed to methodology development. All authors revised the manuscript. All authors read and approved the manuscript.\u003cstrong\u003e\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eEthics declarations\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eEthics approval and consent to participate \u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNot applicable. \u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConsent for publication\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNot applicable. \u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCompeting interests\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors declare no competing interests. \u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eClinical Trial Number\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNot applicable.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAdditional information\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ePublisher\u0026apos;s Note \u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eSpringer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eRather MA, Thakur R, Hoque M, Das RS, Miki KSL, Teixeira-Costa BE, et al. Sorghum (Sorghum bicolor). Nutri-Cereals Nutraceutical Techno-Funct Potential; 2023.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMwamahonje A, Mdindikasi Z, Mchau D, Mwenda E, Sanga D, Garcia-Oliveira AL, et al. Advances in Sorghum Improvement for Climate Resilience in the Global Arid and Semi-Arid Tropics: A Review. Agronomy. 2024;14:3025.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMutalik Desai S, Vaidya PS, Pardo PA. Commercial sector breeding of sorghum: Current status and future prospects. Sorghum 21st Century Food\u0026ndash;fodder\u0026ndash;feed\u0026ndash;fuel Rapidly Chang World. 2020;:333\u0026ndash;54.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHadebe S, Modi A, Mabhaudhi T. Drought tolerance and water use of cereal crops: A focus on sorghum as a food security crop in sub-Saharan Africa. J Agron Crop Sci. 2017;203:177\u0026ndash;91.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAbreha KB, Enyew M, Carlsson AS, Vetukuri RR, Feyissa T, Motlhaodi T, et al. Sorghum in dryland: morphological, physiological, and molecular responses of sorghum under drought stress. Planta. 2022;255:1\u0026ndash;23.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAhn E, Hu Z, Perumal R, Prom LK, Odvody G, Upadhyaya HD, et al. Genome wide association analysis of sorghum mini core lines regarding anthracnose, downy mildew, and head smut. PLoS ONE. 2019;14:e0216671.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDas I, Rajendrakumar P. Disease resistance in sorghum. Biotic stress resistance in millets. Elsevier; 2016. pp. 23\u0026ndash;67.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFrederiksen RA. Compendium of sorghum diseases. 1986.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAcharya B, O\u0026rsquo;Quinn TN, Everman W, Mehl HL. Effectiveness of fungicides and their application timing for the management of sorghum foliar anthracnose in the mid-Atlantic United States. Plant Dis. 2019;103:2804\u0026ndash;11.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMengistu G, Shimelis H, Laing M, Lule D. Breeding for anthracnose ('Colletotrichum sublineolum\u0026rsquo;Henn.) resistance in sorghum: Challenges and opportunities. Aust J Crop Sci. 2018;12:1911\u0026ndash;20.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAhn E, Fall C, Botkin J, Curtin S, Prom LK, Magill C. Inoculation and screening methods for major sorghum diseases caused by fungal pathogens: Claviceps africana, Colletotrichum sublineola, Sporisorium reilianum, Peronosclerospora sorghi and Macrophomina phaseolina. Plants. 2023;12:1906.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLittle CR, Perumal R. The biology and control of sorghum diseases. Sorghum State Art Future Perspetives. 2019;58:297\u0026ndash;346.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRadwan GL, Perumal R, Isakeit T, Magill CW, Prom LK, Little CR. Screening exotic sorghum germplasm, hybrids, and elite lines for resistance to a new virulent pathotype (P6) of Peronosclerospora sorghi causing downy mildew. Plant Health Prog. 2011;12:17.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTesso TT, Perumal R, Little CR, Adeyanju A, Radwan GL, Prom LK et al. Sorghum pathology and biotechnology-a fungal disease perspective: Part II. Anthracnose, stalk rot, and downy mildew. Eur J Plant Sci Biotechnol. 2012;6 Special Issue 1:31\u0026ndash;44.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eVenkateswaran K, Elangovan M, Sivaraj N. Origin, domestication and diffusion of Sorghum bicolor. Breeding Sorghum for diverse end uses. Elsevier; 2019. pp. 15\u0026ndash;31.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCavender-Bares J, Ackerly DD, Hobbie SE, Townsend PA. Evolutionary legacy effects on ecosystems: biogeographic origins, plant traits, and implications for management in the era of global change. Annu Rev Ecol Evol Syst. 2016;47:433\u0026ndash;62.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMorris GP, Ramu P, Deshpande SP, Hash CT, Shah T, Upadhyaya HD, et al. Population genomic and genome-wide association studies of agroclimatic traits in sorghum. Proc Natl Acad Sci. 2013;110:453\u0026ndash;8.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCasa AM, Pressoir G, Brown PJ, Mitchell SE, Rooney WL, Tuinstra MR, et al. Community resources and strategies for association mapping in sorghum. Crop Sci. 2008;48:30\u0026ndash;40.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLinnen CR, Poh Y-P, Peterson BK, Barrett RD, Larson JG, Jensen JD, et al. Adaptive evolution of multiple traits through multiple mutations at a single gene. Science. 2013;339:1312\u0026ndash;6.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLasky JR, Upadhyaya HD, Ramu P, Deshpande S, Hash CT, Bonnette J, et al. Genome-environment associations in sorghum landraces predict adaptive traits. Sci Adv. 2015;1:e1400218.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eVavilov NI, Dorofeev VF. Origin and geography of cultivated plants. Cambridge University Press; 1992.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHarlan JR. Crops and Man. American Society of Agronomy. Crop Sci Soc Am Madison Wis. 1992;16:63\u0026ndash;262.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGepts P. Crop domestication as a long-term selection experiment. Plant Breed Rev. 2004;24:1\u0026ndash;44.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBellon MR. The dynamics of crop infraspecific diversity: A conceptual framework at the farmer level. Econ Bot. 1996;:26\u0026ndash;39.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFrankel O. The conservation of plant biodiversity. Cambridge University Press; 1995.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTanksley SD, McCouch SR. Seed banks and molecular maps: unlocking genetic potential from the wild. Science. 1997;277:1063\u0026ndash;6.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHoisington D, Khairallah M, Reeves T, Ribaut J-M, Skovmand B, Taba S, et al. Plant genetic resources: what can they contribute toward increased crop productivity? Proc Natl Acad Sci. 1999;96:5937\u0026ndash;43.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJones JD, Dangl JL. The plant immune system. Nature. 2006;444:323\u0026ndash;9.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMcDonald BA, Linde C. Pathogen population genetics, evolutionary potential, and durable resistance. Annu Rev Phytopathol. 2002;40:349\u0026ndash;79.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eUpadhyaya H, Pundir R, Dwivedi S, Gowda C, Reddy VG, Singh S. Developing a mini core collection of sorghum for diversified utilization of germplasm. Crop Sci. 2009;49:1769\u0026ndash;80.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSharma R, Upadhyaya H, Manjunatha S, Rao V, Thakur R. Resistance to foliar diseases in a mini-core collection of sorghum germplasm. Plant Dis. 2012;96:1629\u0026ndash;33.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBreiman L. Random forests. Mach Learn. 2001;45:5\u0026ndash;32.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBoulesteix A, Janitza S, Kruppa J, K\u0026ouml;nig IR. Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics. Wiley Interdiscip Rev Data Min Knowl Discov. 2012;2:493\u0026ndash;507.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eUpadhyaya HD, Wang Y-H, Gowda C, Sharma S. Association mapping of maturity and plant height using SNP markers with the sorghum mini core collection. Theor Appl Genet. 2013;126:2003\u0026ndash;15.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang Y-H, Upadhyaya HD, Burrell AM, Sahraeian SME, Klein RR, Klein PE. Genetic structure and linkage disequilibrium in a diverse, representative collection of the C4 model plant, Sorghum bicolor. Genes Genomes Genet. 2013;G3:3:783\u0026ndash;93.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eElshire RJ, Glaubitz JC, Sun Q, Poland JA, Kawamoto K, Buckler ES, et al. A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PLoS ONE. 2011;6:e19379.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHu Z, Olatoye MO, Marla S, Morris GP. An integrated genotyping-by‐sequencing polymorphism map for over 10,000 sorghum genotypes. Plant Genome. 2019;12:180044.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBrowning BL, Browning SR. Genotype imputation with millions of reference samples. Am J Hum Genet. 2016;98:116\u0026ndash;26.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKlimberg R. Fundamentals of predictive analytics with JMP. Sas institute; 2023.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBradbury PJ, Zhang Z, Kroon DE, Casstevens TM, Ramdoss Y, Buckler ES. TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics. 2007;23:2633\u0026ndash;5.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGoodstein DM, Shu S, Howson R, Neupane R, Hayes RD, Fazo J, et al. Phytozome: a comparative platform for green plant genomics. Nucleic Acids Res. 2012;40:D1178\u0026ndash;86.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eThrall PH, Burdon J, Bever JD. Local adaptation in the Linum marginale\u0026mdash;Melampsora lini host-pathogen interaction. Evolution. 2002;56:1340\u0026ndash;51.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBureau A, Dupuis J, Falls K, Lunetta KL, Hayward B, Keith TP, et al. Identifying SNPs predictive of phenotype using random forests. Genet Epidemiol Off Publ Int Genet Epidemiol Soc. 2005;28:171\u0026ndash;82.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eToledo-Ortiz G, Huq E, Quail PH. The Arabidopsis basic/helix-loop-helix transcription factor family. Plant Cell. 2003;15:1749\u0026ndash;70.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChinnusamy V, Schumaker K, Zhu J. Molecular genetic perspectives on cross-talk and specificity in abiotic stress signalling in plants. J Exp Bot. 2004;55:225\u0026ndash;36.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eManavella PA, Arce AL, Dezar CA, Bitton F, Renou J, Crespi M, et al. Cross-talk between ethylene and drought signalling pathways is mediated by the sunflower Hahb‐4 transcription factor. Plant J. 2006;48:125\u0026ndash;37.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang Y, Tian H, Chen D, Zhang H, Sun M, Chen S, et al. Cysteine-rich receptor-like protein kinases: emerging regulators of plant stress responses. Trends Plant Sci. 2023;28:776\u0026ndash;94.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang Y, Teng Z, Li H, Wang W, Xu F, Sun K et al. An activated form of NB-ARC protein RLS1 functions with cysteine-rich receptor-like protein RMC to trigger cell death in rice. Plant Commun. 2023;4.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChen Z. A superfamily of proteins with novel cysteine-rich repeats. Plant Physiol. 2001;126:473\u0026ndash;6.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKaur R, Vasistha NK, Ravat VK, Mishra VK, Sharma S, Joshi AK, et al. Genome-Wide Association Study Reveals Novel Powdery Mildew Resistance Loci in Bread Wheat. Plants. 2023;12:3864.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMaghraby A, Alzalaty M. Genome-wide identification and evolutionary analysis of the AP2/EREBP, COX and LTP genes in Zea mays L. under drought stress. Sci Rep. 2024;14:7610.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCao Y, Wu Y, Zheng Z, Song F. Overexpression of the rice EREBP-like gene OsBIERF3 enhances disease resistance and salt tolerance in transgenic tobacco. Physiol Mol Plant Pathol. 2005;67:202\u0026ndash;11.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLim S, Park S, Baek I, Botkin J, Jang JH, Hong SM, et al. Integrative analysis of seed morphology, geographic origin, and genetic structure in Medicago with implications for breeding and conservation. BMC Plant Biol. 2025;25:274.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNishimura S, Tatano S, Miyamoto Y, Ohtani K, Fukumoto T, Gomi K, et al. A zinc-binding citrus protein metallothionein can act as a plant defense factor by controlling host-selective ACR-toxin production. Plant Mol Biol. 2013;81:1\u0026ndash;11.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMartos S, Gallego B, Cabot C, Llugany M, Barcel\u0026oacute; J, Poschenrieder C. Zinc triggers signaling mechanisms and defense responses promoting resistance to Alternaria brassicicola in Arabidopsis thaliana. Plant Sci. 2016;249:13\u0026ndash;24.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang X, Liu K, Li Y, Ren Y, Li Q, Wang B. Zinc metalloprotease FgM35, which targets the wheat zinc-binding protein TaZnBP, contributes to the virulence of Fusarium graminearum. Stress Biol. 2024;4:1\u0026ndash;17.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJones DA, Jones JD. The role of leucine-rich repeat proteins in plant defences. Advances in botanical research. Elsevier; 1997. pp. 89\u0026ndash;167.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePadmanabhan M, Cournoyer P, Dinesh-Kumar S. The leucine‐rich repeat domain in plant innate immunity: a wealth of possibilities. Cell Microbiol. 2009;11:191\u0026ndash;8.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAfzal M, Alghamdi SS, Nawaz H, Migdadi HH, Altaf M, El-Harty E, et al. Genome-wide identification and expression analysis of CC-NB-ARC-LRR (NB-ARC) disease-resistant family members from soybean (Glycine max L.) reveal their response to biotic stress. J King Saud Univ-Sci. 2022;34:101758.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChakraborty J, Priya P, Dastidar SG, Das S. Physical interaction between nuclear accumulated CC-NB-ARC-LRR protein and WRKY64 promotes EDS1 dependent Fusarium wilt resistance in chickpea. Plant Sci. 2018;276:111\u0026ndash;33.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGibeaut DM. Nucleotide sugars and glycosyltransferases for synthesis of cell wall matrix polysaccharides. Plant Physiol Biochem. 2000;38:69\u0026ndash;80.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAhn E, Prom LK, Hu Z, Odvody G, Magill C. Genome-wide association analysis for response of Senegalese sorghum accessions to Texas isolates of anthracnose. Plant Genome. 2021;14:e20097.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAhn E, Fall C, Prom LK, Magill C. Genome-wide association study of Senegalese sorghum seedlings responding to a Texas isolate of Colletotrichum sublineola. Sci Rep. 2022;12:13025.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBirhanu C, Girma G, Mekbib F, Nida H, Tirfessa A, Lule D, et al. Exploring the genetic basis of anthracnose resistance in Ethiopian sorghum through a genome-wide association study. BMC Genomics. 2024;25:677.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eG\u0026oacute;mez-G\u0026oacute;mez L. Plant perception systems for pathogen recognition and defence. Mol Immunol. 2004;41:1055\u0026ndash;62.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBlumwald E, Aharon GS, Lam BC. Early signal transduction pathways in plant\u0026ndash;pathogen interactions. Trends Plant Sci. 1998;3:342\u0026ndash;6.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMiedes E, Vanholme R, Boerjan W, Molina A. The role of the secondary cell wall in plant resistance to pathogens. Front Plant Sci. 2014;5:358.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRibaut J-M, Hoisington D. Marker-assisted selection: new tools and strategies. Trends Plant Sci. 1998;3:236\u0026ndash;9.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYin K, Gao C, Qiu J-L. Progress and prospects in plant genome editing. Nat Plants. 2017;3:1\u0026ndash;6.\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"bmc-plant-biology","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"pbio","sideBox":"Learn more about [BMC Plant Biology](http://bmcplantbiol.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/pbio/default.aspx","title":"BMC Plant Biology","twitterHandle":"BMC_series","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"Sorghum, Geographic origin, Genetic diversity, Machine learning, GWAS, Anthracnose, Head smut, Downy mildew","lastPublishedDoi":"10.21203/rs.3.rs-6222361/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-6222361/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003e\u003cstrong\u003eBackground\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eSorghum, often considered the fifth most important cereal crop globally, faces significant production constraints caused by various fungal diseases. Understanding the genetic basis of disease resistance and adaptation to geographic origin is crucial for developing improved varieties. This study investigates these aspects in a diverse panel of 377 sorghum accessions using a machine learning-enabled genome-wide association study (GWAS).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eResults\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe study analyzed a panel of 377 sorghum accessions, including a mini core collection and additional accessions from Senegal. Phenotypic evaluation for resistance to anthracnose, head smut, and downy mildew was conducted on the mini core collection. Genotypic data comprising nearly 300,000 SNP markers were used for GWAS with Bootstrap Forest models. While phenotypic clustering based on disease resistance did not directly correlate with geographic origin, significant genetic differentiation was observed based on geographic origin. Machine learning-driven GWAS identified SNPs associated with geographic origin, particularly on chromosome 10, with candidate genes including transcription factors. SNPs near genes with known or predicted roles in plant defense were identified for disease resistance, such as zinc-binding proteins for anthracnose and LRR- and NB-ARC-containing proteins for head smut.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConclusions\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis research provides insights into the complex genetic architecture of disease resistance and geographic adaptation in sorghum. In addition to previously known resistant genes through traditional GWAS, the identified candidate genes and associated SNPs offer valuable resources for enhancing disease resistance in sorghum breeding programs through marker-assisted selection and other advanced breeding techniques.\u003c/p\u003e","manuscriptTitle":"A Machine Learning Approach to Genome-Wide Association Mapping of Disease Resistance and Geographic Origin in Sorghum","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-04-18 09:08:46","doi":"10.21203/rs.3.rs-6222361/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2025-07-02T10:33:40+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-04-16T01:45:16+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-04-15T00:16:18+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-04-12T14:52:46+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-04-08T00:44:57+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"100613741448835756908167280581854530025","date":"2025-04-07T15:37:18+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"101666688390877670491135376353789238791","date":"2025-04-06T19:09:11+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"212028204552047393720217032151464839688","date":"2025-04-06T08:40:58+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"214466632624584478016027312087178885613","date":"2025-04-06T03:46:44+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"129969832787571181671308080055509876914","date":"2025-04-05T03:26:54+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"101753864729489922057515861269925381411","date":"2025-04-04T19:11:34+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"70428052326511607629914836398464313398","date":"2025-03-31T00:48:14+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2025-03-29T20:35:01+00:00","index":"","fulltext":""},{"type":"editorInvited","content":"","date":"2025-03-20T23:29:42+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2025-03-17T03:48:24+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2025-03-17T03:46:38+00:00","index":"","fulltext":""},{"type":"submitted","content":"BMC Plant Biology","date":"2025-03-13T19:17:13+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"bmc-plant-biology","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"pbio","sideBox":"Learn more about [BMC Plant Biology](http://bmcplantbiol.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/pbio/default.aspx","title":"BMC Plant Biology","twitterHandle":"BMC_series","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"3dec9219-ecaf-44c9-afab-8abaecf13726","owner":[],"postedDate":"April 18th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[],"tags":[],"updatedAt":"2026-03-02T16:10:59+00:00","versionOfRecord":{"articleIdentity":"rs-6222361","link":"https://doi.org/10.1186/s12870-026-08468-z","journal":{"identity":"bmc-plant-biology","isVorOnly":false,"title":"BMC Plant Biology"},"publishedOn":"2026-02-28 15:57:51","publishedOnDateReadable":"February 28th, 2026"},"versionCreatedAt":"2025-04-18 09:08:46","video":"","vorDoi":"10.1186/s12870-026-08468-z","vorDoiUrl":"https://doi.org/10.1186/s12870-026-08468-z","workflowStages":[]},"version":"v1","identity":"rs-6222361","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-6222361","identity":"rs-6222361","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.