Nutritional Genomics of Tepary Bean (Phaseolus acutifolius): Genome‑wide association analysis and genomic prediction of seed nutritional traits and size | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Nutritional Genomics of Tepary Bean (Phaseolus acutifolius): Genome‑wide association analysis and genomic prediction of seed nutritional traits and size Sri Kiran Reddy Alla, Benedict Analin, Vijay Joshi This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-8970665/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Background Tepary bean [ Phaseolus acutifolius ] is a drought- and heat-tolerant, nitrogen-fixing legume that offers a promising low-input protein source. Nonetheless, the genetic factors influencing seed protein and amino acid profiles are not well understood. We evaluated 206 diverse accessions along with four controls in organic fields, measuring hundred-seed weight [HSW], seed width, total soluble seed protein [%], and the profiles of nineteen free amino acids. Using genotyping-by-sequencing, we identified 49,384 high-quality SNPs and conducted GWAS with multiple models [GLM, MLM, BLINK, FarmCPU] on BLUPs, controlling population structure. Results We found genome-wide significant links to protein percentage on Chr08, including candidate genes like a WNK kinase and a 2-oxoglutarate/Fe [II]-dependent oxygenase. Additionally, trait-specific loci were identified for fifteen of the nineteen free amino acids, indicating a modular genetic architecture. Notably, the essential amino acids threonine, methionine, and lysine each had unique significant loci, marking the first tepary-specific markers for these nutritionally important traits. Fewer but stable associations related to seed size were observed on Chr02 [HSW; V-ATPase subunit] and Chr07 [seed width; Aux/IAA]. Genomic prediction models further revealed high predictive ability for seed size [r ≈ 0.90–0.96] and moderate accuracy for protein and amino acid traits [r ≈ 0.15–0.45], consistent with their polygenic and modular genetic structure. Conclusion By integrating GWAS with genomic prediction, we identify candidate genes, trait-specific genomic regions, and reliable benchmarks for predicting protein concentration, essential amino acids, and seed size in tepary bean. The alignment between association signals and prediction accuracy supports a dual-breeding approach that combines marker-assisted selection for key loci with genomic selection to leverage residual polygenic variation. This combined framework strengthens opportunities to enhance seed nutritional quality without negatively affecting seed size and offers synteny-based entry points for gene discovery and introgression across Phaseolus species. genome‑wide association (GWAS) genomic prediction essential amino acids threonine methionine lysine seed protein hundred‑seed weight seed width Phaseolus acutifolius Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Introduction Tepary bean ( Phaseolus acutifolius A. Gray) is native to the Sonoran Desert and has been cultivated for centuries by Indigenous communities of the southwestern United States and northern Mexico. Its domestication history is consistent with genomic adaptations to heat and drought, as revealed by a chromosome‑scale reference genome [ 1 ]. Within Phaseolus , tepary is closely related to common bean ( P. vulgaris ), facilitating gene transfer and pre‑breeding routes for improvement [ 2 ]. Its ability to thrive in hot, arid, low‑input environments positions tepary as a climate‑smart legume with promising food‑ingredient properties [ 3 ] Tepary seeds are rich in protein and dietary fiber, providing essential amino acids, which indicates a beneficial nutritional profile compared to common beans [ 2 , 3 ]. However, their small seed size can limit market appeal by influencing water absorption, processing efficiency, and consumer acceptance [ 2 , 4 ]. Therefore, improving both seed size and protein quality is crucial to increasing tepary's adoption. In Phaseolus, association genetics offers a useful comparative framework. In common bean, image-based GWAS has identified loci linked to seed shape and size [ 4 ], while combined GWAS and RNA-seq have identified candidates responsible for seed size, highlighting its polygenic nature [ 5 ]. For nutritional traits, GWAS have mapped mineral content in common bean [ 6 ], whereas in soybean, research on amino acids is more advanced, showing modular correlations and key loci [ 7 ]. Similar progress is seen in other pulses: pea GWAS/QTL studies identified regions related to seed protein composition [ 8 ]; chickpea GWAS connected seed protein and other nutrients [ 9 , 10 ]; lentil studies have mapped traits related to protein quality, including amino acids and digestibility [ 11 ]; and cowpea GWAS found a locus associated with seed protein content [ 12 ]. Within tepary, the reference genome shows high collinearity with P. vulgaris and evidence of domestication under heat stress, enabling genome-informed improvement. However, research on food ingredients supports the development of protein-rich flours and also notes amino acid limitations, such as sulfur-containing amino acids, which guide efforts to enhance protein content and amino acid profiles. Recent association mapping in tepary has utilized whole-genome resequencing, mainly focusing on yield traits rather than seed composition. As a result, the genetic basis of seed protein percentage and free amino acids, and their relationship with seed size, remain relatively understudied in P. acutifolius. Genotyping-by-sequencing (GBS)-based GWAS is particularly suitable for underutilized legume species, offering a cost-effective approach to SNP identification. Multi-locus models like BLINK and FarmCPU improve the detection of traits influenced by multiple genes, including seed composition. When aligned with the tepary bean reference genome, this method can generate hypotheses linking markers to genes associated with protein content, amino acid profiles, and seed size. These insights can guide direct improvements in tepary beans and facilitate interspecific transfer efforts to P. vulgaris. We performed a genome-wide association analysis of a diverse P. acutifolius panel to explore seed protein concentration and free amino acid profiles, with seed size and width evaluated as supporting traits. The study found significant variation in composition traits; demonstrated that protein percentage and individual amino acids are controlled by a largely modular, non-overlapping genetic architecture; and provided a set of marker-to-trait hypotheses ready for use in selecting nutrition-focused tepary breeding lines. Fewer signals related to seed size and width showed modest effects, suggesting these traits can be improved alongside protein and amino-acid composition. Overall, these results represent the first genome-scale map of nutritional trait architecture in tepary bean and offer practical strategies for optimizing protein quality while considering seed size in a crop adapted to hot, arid, low-input environments. Materials and methods Plant Material, Growth Conditions, and Experimental Design A diversity panel of 206 Phaseolus acutifolius accessions from the USDA–NPGS, along with four commercial varieties, was included as checks set (CK1: Sacaton Brown, CK2: Sonoran White, CK3: Blue Speckled, CK4: Black) (see Additional file 1), was evaluated at the Texas A&M AgriLife Research and Extension Center in Uvalde, TX, during a certified organic field season, July to October 2024. The trial adhered to standard organic practices, using drip irrigation as needed and OMRI-certified bioinsecticides for pest management. Accessions were sown on 2024-07-25, and harvested at physiological maturity, between 2024-09-26 and 2024-09-30. We employed an augmented randomized complete block design (augRCBD) with five blocks. In each block, all four checks were replicated to help control for field variability and support the calculation of adjusted accession means (or BLUPs) for genotypes. Phenotyping Seed Weight Measurement: Hundred seed weight (HSW) At physiological maturity, pods were collected from each accession of the Tepary Bean Diversity Panel and commercial varieties. The 100‑seed weight (HSW) was recorded from three independent 100‑seed subsamples per plant. Seed weights were recorded using an ML54T/00 analytical balance (Mettler-Toledo GmbH, Switzerland) with a precision of ± 0.001 g. Seed Size Measurement: Seed width Five fully dried seeds per accession and commercial variety were scanned individually using an HP LaserJet Pro MFP M125nw scanner with a scale bar. Images were imported into ImageJ V. 1.54p ( https://imagej.net/ij/ ), and the scale was set using the known distance per image unit. This scale setting in ImageJ converts pixel units into a known scale unit, ensuring accurate pixel-to-unit conversion for comparable quantification of seed traits. Seed morphological traits, including seed width, area, length, and perimeter, were measured for each seed to assess natural variation. Seed width was used for the GWAS analysis. Quantification of total soluble seed protein The total soluble seed protein was estimated using a pre-established method [ 20 ] with minor modifications. Freeze-dried seeds were ground to a fine powder using a mortar and pestle. Briefly, 15 mg of seed tissue per accession, including commercial varieties, was processed individually, suspended in 0.1 M NaOH, and sonicated for 15 min. The supernatant was collected by centrifugation at 14,000 × g for 15 min. Protein concentration was determined using the Pierce™ Bradford Plus Protein Assay Reagent (Catalog No. 23238, Thermo Fisher Scientific Inc., USA) according to the manufacturer’s instructions. Measurements were performed in triplicate in a microtiter plate, using bovine serum albumin (BSA) as an internal standard. Absorbance at 595 nm was recorded using the Varioskan LUX multimode microplate reader (Thermo Fisher Scientific Inc., USA). Protein concentration was calculated relative to the internal standard and expressed as mg g⁻¹ of seed tissue and as a percentage (%). Free amino acid extraction and analysis Free amino acids were profiled following Joshi et al [ 21 ] with instrument-specific settings. Approximately 15 mg of seed powder was derivatized using the AccQ.Tag™ 3X Ultra‑Fluor kit [Waters] and quantified on a Waters ACQUITY H‑Class UPLC coupled to an Xevo TQ mass spectrometer (ESI). Multiple reaction monitoring transitions, collision energies, and cone voltages were optimized in IntelliStart; data acquisition used MassLynx, and quantification used TargetLynx with external calibration. DNA extraction, GBS library construction, and SNP calling. Leaf DNA was extracted from 3–4‑week‑old seedlings using the DNeasy Plant Mini Kit (Qiagen, Cat no. 69104) according to the manufacturer’s instructions. DNA concentration was quantified with a DeNovix spectrophotometer (DS‑11, DeNovix Inc., USA). Library preparation and genotyping-by-sequencing (GBS) were performed at the University of Minnesota Genomics Center using ApeKI. Reads were aligned to the Phaseolus acutifolius v1.0 reference genome, and a variant call format (VCF) file was generated and filtered for missingness, heterozygosity, and MAF. LD‑kNNi imputation was applied to reduce missing data. Population structure analysis LD‑pruned SNPs (PLINK; 50‑SNP window, r² > 0.1) yielded 2,869 non‑redundant markers for STRUCTURE 2.3.4 runs (K = 1–10; burn‑in 50,000; MCMC 50,000; 10 iterations per K). ΔK was evaluated with Structure Harvester to select the optimal K. A Q‑matrix was generated and used to visualize ancestry proportions. Principal components were computed from genome‑wide SNPs, and the first five PCs (PCA = 5) were included as covariates in GWAS. Delta K was evaluated using Structure Harvester [ 22 ], from which the optimal K was selected [ 23 ]. Genomic prediction models, cross‑validation, and accuracy metrics To complement GWAS and quantify the predictability of seed traits, we implemented a genomic prediction (GP) pipeline spanning seven learners: Bayesian Alphabet (BA), BayesB (BB), Bayes LASSO (BL), Bayesian Ridge Regression (BRR), Random Forest (RF), ridge‑regression BLUP (rrBLUP), and Support Vector Machine (SVM). Models were trained on the final GBS marker panel (49,384 SNPs) used for GWAS (24), with Stage‑1 BLUPs from the augmented‑RCBD mixed models as phenotypes (13–18,25,26). We used 5‑fold cross‑validation (CV): ~80% of accessions for training and ~ 20% held out for testing, rotating folds until each accession was predicted exactly once. For transparency, the n‑test per fold (number of accessions with non‑missing phenotypes in each test partition) is shown beneath the x‑axis in Fig. 6 . Predictive ability was defined as the Pearson correlation (r) between observed BLUPs and genomic estimated breeding values [GEBVs]. To benchmark learners, we computed Δr relative to rrBLUP (Δr = mean(r_model) – mean(r_rrBLUP)). To preserve readability in trait panels, Δr values are summarized in the caption of Figure S9 rather than annotated in each panel [ 7 – 11 ]. Statistical Analysis Data processing and statistics were performed using JMP, R (v4.3.3), GAPIT, STRUCTURE 2.3.4, PLINK, and TASSEL 5.2.96. Figures were generated in R. GBS processing was performed by the University of Minnesota Genomics Center; imputation utilized LD-kNNi in TASSEL. Analyses were performed in JMP and R (v4.3.3) using GAPIT, STRUCTURE 2.3.4, PLINK, and TASSEL 5.2.96. Stage‑1 mixed models produced adjusted means (BLUPs) and their estimation error variance–covariance (EEV), which were used as phenotypes in Stage‑2 marker models [ 18 , 19 , 25 , 26 ]. GWAS (GLM, MLM, BLINK, FarmCPU) was run with PCs (n = 5) and kinship (as applicable). Bonferroni correction at α = 0.05 set the genome‑wide threshold; with 49,384 markers, this corresponded to −log10(p) ≈ 6. Linear mixed models for augmented RCBD: For each trait and environment, we fit linear mixed models appropriate for augmented RCBD to obtain genotype effects adjusted for blocks and checks. Checks and/or genotypes were treated as random to extract BLUPs when genotype variance was estimable. All BLUPs/BLUEs used for GWAS were given in Additional files 2 and 3. Genome-wide association studies (GWAS): The filtered VCF file was converted to a numeric format for GWAS. GWAS were performed using GAPIT (R) with GLM, MLM, BLINK, and FARMCPU models. Population structure was controlled by PCs (n = 5) and, where applicable, by relatedness (K). We used a Bonferroni correction (α = 0.05/marker) to declare genome-wide significance [ 27 ]. For our current panel of 206 accessions and 49,384 high-quality markers, the -log10(p-value) threshold for genome-wide significance was 6. Candidate-gene discovery For each significant SNP, gene models, and annotations within ± 10 kb were retrieved from Phytozome v14 ( https://phytozome-next.jgi.doe.gov/ ). Top candidates per trait are summarized in Table 1 ; the full list is provided in Additional file 4. Table 1 The top candidate genes per trait identified by GWAS in tepary bean. For each trait, the top three associations (lowest P‑values within trait) are reported with model, SNP ID, chromosome (Chr), genomic position (Pos), minor allele frequency (MAF), effect sign, and the nearest candidate gene (gene model, functional annotation) within the ± 10 kb window used for candidate discovery. GWAS used Bayesian information and Linkage-disequilibrium Iteratively Nested Keyway (BLINK) and Fixed and random model Circulating Probability Unification (FarmCPU) on Stage‑1 BLUPs, with population structure controlled by PCA = 5 and, where applicable, kinship; the Bonferroni genome‑wide threshold at α = 0.05 (49,384 SNPs) corresponds to − log₁₀p ≈ 6. Coordinates and gene models are based on the tepary bean reference genome ( Phaseolus acutifolius v1.0). Abbreviations: Chr, chromosome; Pos, genomic position (bp); minor allele frequency (MAF); BLUP, best linear unbiased prediction. See Additional file 4 for all additional significant associations. Trait Model SNP Chromosome Position P.value MAF Gene Model Functional Annotation Total Soluble Protein% BLINK S08_8796692 Chr08 8796692 1.38E-10 0.36585 Phacu.CVR.008G093700 WNK protein kinase Phacu.CVR.008G093800 2-oxoglutarate [2OG] and Fe [II]-dependent oxygenase superfamily protein Phacu.CVR.008G093900 Unknown S01_17593283 Chr01 17593283 2.59E-08 0.36829 Phacu.CVR.001G119800 alpha/beta-Hydrolases superfamily protein Phacu.CVR.001G119900 zinc finger [Ran-binding] family protein S08_8784174 Chr08 8784174 6.95E-07 0.35366 Phacu.CVR.008G093600 L-arabinose isomerase Phacu.CVR.008G093700 WNK protein kinase S08_8761065 Chr08 8761065 2.70E-06 0.38293 Phacu.CVR.008G093400 MATE efflux family protein 100-SeedWeight BLINK S02_41997902 Chr02 41997902 7.24E-08 0.27622 Phacu.CVR.002G331100 6-phosphogluconate dehydrogenase Phacu.CVR.002G331300 vacuolar H+-ATPase subunit E isoform 3 Phacu.CVR.002G331400 Galactose mutarotase-like superfamily protein S01_49536594 Chr01 49536594 7.57E-07 0.36713 Phacu.CVR.001G230400 CTR1-like protein kinase, putative, expressed Phacu.CVR.001G230600 ENTH/VHS family protein; ZOS9-20 - C2H2 zinc finger protein, Seed Width FARMCPU S07_13428624 Chr07 13428624 3.16E-12 0.42958 Phacu.CVR.007G121900 indole-3-acetic acid inducible 14; Auxin-responsive Aux/IAA gene family member, S01_17629015 Chr01 17629015 2.20E-08 0.41725 Phacu.CVR.001G120100 Unknown S08_46951004 Chr08 46951004 4.40E-08 0.32394 Phacu.CVR.008G266400 Protein of unknown function [DUF630 and DUF632] Phacu.CVR.008G266500 Unknown S08_2123448 Chr08 2123448 1.86E-07 0.35211 Phacu.CVR.008G025600 SMAD/FHA domain-containing protein Phacu.CVR.008G025700 DNA/RNA polymerases superfamily protein Methionine BLINK S02_10429456 Chr02 10429456 6.85E-07 0.05446 Phacu.CVR.002G092500 TEOSINTE BRANCHED, cycloidea and PCF [TCP] 14 Threonine BLINK S08_49548543 Chr08 49548543 6.95E-10 0.07921 Phacu.CVR.008G290800 Unknown Phacu.CVR.008G290900 mitochondrial carrier protein, putative Phacu.CVR.008G291000 LIM domain-containing protein, putative, Phacu.CVR.008G291100 Ser/Thr protein phosphatase family protein, S06_14126029 Chr06 14126029 3.99E-08 0.18317 Phacu.CVR.006G045800 sucrose-phosphate synthase, putative S04_51739891 Chr04 51739891 1.40E-07 0.14356 Phacu.CVR.004G148800 cytochrome P450, putative, expressed Phacu.CVR.004G148900 cytochrome P450, putative, expressed S02_5914294 Chr02 5914294 2.21E-07 0.0495 Phacu.CVR.002G063900 Metallo-endoproteinase 1 precursor, putative, Phacu.CVR.002G064000 Pentatricopeptide repeat [PPR] superfamily protein Phacu.CVR.002G064100 integral membrane protein DUF6 domain-containing protein Phacu.CVR.002G064200 Ubiquitin-like superfamily protein S08_52080067 Chr08 52080067 6.22E-07 0.10891 Phacu.CVR.008G318200 Unknown Phacu.CVR.008G318300 expressed protein Phacu.CVR.008G318400 Ubiquitin-specific protease family C19-related protein Lysine BLINK S07_22884181 Chr07 22884181 4.97E-07 0.056931 Phacu.CVR.007G151500 Unknown S11_10209988 Chr11 10209988 6.55E-07 0.059406 Phacu.CVR.011G105200 S-adenosyl-L-methionine-dependent methyltransferases superfamily protein S04_6376564 Chr04 6376564 0.000001 0.311881 Phacu.CVR.004G052100 WUSCHEL-related homeobox 11; expressed Results Phenotypic variations across the diversity panel Seed size and composition varied widely among accessions. Hundred‑seed weight (HSW) ranged from 2 to 16 g 100‑seed⁻¹ across the 206 accessions and four checks (Fig. 1 A). Several accessions, such as PI 653254, PI 502217, PI 666351, and PI 535227, showed high 100-seed weights ranging from 14 g to 16 g/100-seeds. Checks showed an average of 10.5 g/100-seeds. Seed width spanned 0.261–0.711 cm; the checks clustered narrowly around 0.536–0.640 cm (Fig. 1 B). Overall mean seed width ranged from 0.26 to 0.71 cm. The mean seed width of checks was 0.577 cm, with a narrow range of 0.536–0.640 cm, indicating phenotypic stability. PI 502217, W6 38693, and PI 440805 showed the highest seed widths, ranging from 0.65 to 0.71 cm. Total soluble seed protein ranged from 10 to 38%, with an overall significant genotype effect (one‑way ANOVA, p < 0.0001) (Fig. 2 ). Checks showed an average protein percentage of around 23%. Some accessions with the highest total soluble seed protein content were PI 201268 (32%), W6 38850 (35.3%), PI 640957 (34.1%), PI 440802 (30%), PI 440806 (28.9%), PI 200902 (27.3%), PI 321638 (27.1%), and PI 319439 (28.2%). Across accessions, 19 free amino acids were detected; arginine (~ 28%), asparagine (~ 20%), aspartic acid (~ 18.5%), and glutamic acid (~ 12.4%) dominated the pool (Fig. 3 A). PCA showed multivariate structure with clear groupings of essential, branched‑chain, and nitrogen‑rich amino acids: PC1 (24.3%) loaded most strongly on methionine, histidine, leucine, isoleucine, and valine, whereas PC2 (13.0%) was driven by allantoin, asparagine, and glutamic acid (Fig. 3 B). GBS, variant filtering, and marker set for association GBS produced ~ 830 million raw reads (Mean ~ 3.95 M per sample). Variant calling yielded ~ 720k raw markers, which, after quality control and LD‑kNNi imputation, resulted in a final set of 49,384 high‑quality SNPs distributed across the 11 tepary chromosomes for association analyses (See FigShare link https://doi.org/10.6084/m9.figshare.31158676 ). Population structure and relatedness STRUCTURE indicated K = 2 subpopulations in the panel, corroborated by PCA (Fig. 4 ). PC1 (23.2%) and PC2 (7.3%) separated a subset of check lines from the broader germplasm; this structure was controlled in all GWAS models via principal components (PCA = 5) (see Additional file 5). GWAS for total soluble seed protein and free amino acids Association analyses (GLM, MLM, BLINK, FarmCPU) were performed on BLUPs with PCA = 5 (and K where applicable). Genome‑wide significance was set at Bonferroni α = 0.05 (49,384 markers; –log₁₀p ≈ 6). Four significant associations for total soluble seed protein percentage were identified on chromosomes 1 and 8: S08_8796692 (p = 1.38×10⁻¹⁰), S01_17593283 (p = 2.59×10⁻⁸), S08_8784174 (p = 6.95×10⁻⁷), and S08_8761065 (p = 2.70×10⁻⁶); the first three exceeded the Bonferroni threshold (− log₁₀p ≈ 6), whereas S08_8761065 was suggestive. Candidate genes included Phacu.CVR.008G093800 (2‑oxoglutarate (2OG)/Fe (II) oxygenase), Phacu.CVR.001G119900 (Ran‑binding zinc finger), and Phacu.CVR.008G093700 (WNK kinase) (Fig. 4 A, Table 1 , Additional file 4). Across the nineteen amino acids, fifteen traits showed at least one genome‑wide significant association (representative plots in Fig. 5 and Additional files 6, 7, and 8). Exemplars include glutamine on Chr09 (S09_37434888, p = 1.39 × 10⁻¹⁹) and threonine on Chr08 (S08_49548543, p = 6.95 × 10⁻¹⁰). Significant intervals for individual amino acids did not overlap, consistent with trait-specific genetic control. Both essential and non-essential amino acids exhibited distinct associations, suggesting differential genetic regulation of transport, storage, and biosynthetic pathways. Branched-chain amino acids, such as leucine, isoleucine, and valine, were associated with genetic loci linked to metabolic regulation and biosynthetic pathways. Notably, none of the amino acid traits shared SNPs or genomic regions, suggesting highly modular, trait-specific genetic control of seed metabolite levels in P. acutifolius. Candidate-gene exploration within ± 10 kb of all genome-wide significant SNPs identified 192 unique genes across fifteen seed amino acids, with functions relevant to seed metabolism, nutrient transport, and developmental regulation (Table 1 , Additional file 4). Functional annotations showed strong enrichment for gene families involved in primary carbon and nitrogen metabolism and for membrane-associated and transport-related protein families, indicating roles in amino acid transport and storage during seed development. Essential amino acid Threonine revealed five significant SNPs and identified plausible candidate genes on chromosome eight encoding a Ser/Thr protein phosphatase family protein (Fig. 5 A). Three genome‑wide significant BLINK associations were detected for lysine: S07_22884181 (p = 4.97×10⁻⁷; MAF ≈ 0.057), S11_10209988 (p = 6.55×10⁻⁷; MAF ≈ 0.059), and S04_6376564 (p ≈ 1.0×10⁻⁶; MAF ≈ 0.312). Candidate-gene inspection identified Phacu.CVR.011G105200, encoding an S-adenosyl-L-methionine-dependent methyltransferase superfamily protein, and Phacu.CVR.004G052100, annotated as WUSCHEL-related homeobox 11 (WOX11) (Fig. 5 C). The lack of shared candidate genes across traits such as seed weight, protein percentage, and amino acid levels underscores the independence of these physiological processes and suggests that tepary bean's genetic architecture for seed yield and nutritional traits is highly partitioned. GWAS for hundred-seed weight and seed width For 100‑seed weight, two genome‑wide significant associations were detected: S02_41997902 (p = 7.24×10⁻⁸) and S01_49536594 (p = 7.57×10⁻⁷) (Fig. 4 B; Table 1 ). Candidate gene analysis identified plausible loci, including Phacu.CVR.002G331300, which encodes a vacuolar H + -ATPase subunit involved in pH-dependent transport and endomembrane processes that influence plant growth and development, and Phacu.CVR.001G230400, which encodes a CTR1-like protein kinase. For seed width, five significant associations were identified on chromosomes 1, 3, 5, 7, and 8 (Fig. 4 C). Candidate gene analysis revealed Phacu.CVR.007G121900, which encodes the auxin-responsive Aux/IAA gene family. Genomic prediction Model accuracy across seed‑quality and seed‑size traits Across models, seed‑size traits were highly predictable: hundred‑seed weight (HSW) and seed width achieved r ≈ 0.90–0.96 for most learners, matching reports that morphological seed traits in pulses exhibit a strong, polygenic signal [ 4 , 8 ]. These high accuracies are consistent with the presence of robust loci (e.g., V‑ATPase / Aux/IAA) and a sizable ridge‑captured polygenic component. In contrast, total soluble protein (%) and free amino acids displayed moderate predictive abilities (r ≈ 0.15–0.45), which mirrors the modular, small‑effect genetic architecture observed in our GWAS and in other legumes where amino‑acid and protein‑quality traits are governed by dispersed, low‑effect loci [ 7 , 11 ]. Overall, rrBLUP and BRR performed on par with Bayesian sparsity learners (BA, BB, BL) and nonlinear models (RF, SVM), indicating that shrinkage‑based models remain robust baselines when the signal is broadly polygenic [ 7 , 9 – 11 ]. Because phenotype availability differed slightly by fold, n‑test per fold values are shown below each panel in Fig. 6 . To maintain figure clarity, Δr relative to rrBLUP is summarized outside the panels in Figure S9 ; averaged across traits, BRR showed a small positive Δr, whereas BA, BB, BL, RF, and SVM were near‑zero or slightly negative—again consistent with polygenic architectures where ridge‑based estimators are hard to beat [ 7 , 8 , 11 ]. Discussion Seed protein and amino acids in tepary beans are controlled by modular, pathway-aware genetic loci. Across the panel, the percentages of total protein and of fifteen of nineteen free amino acids mapped to distinct loci, suggesting a modular genetic architecture rather than a single pleiotropic control point. This aligns with legume GWAS in soybean, which shows trait clusters for amino acids, and with studies in lentil and pea, where protein quality traits and storage protein composition are dispersed across different regions [ 7 , 8 , 11 ]. The association for protein% on Chr08 coincided with a WNK kinase and a 2‑oxoglutarate/Fe(II)‑dependent oxygenase, indicating a connection between regulatory signaling and the 2‑oxoglutarate node that links carbon and nitrogen metabolism during seed filling—consistent with the central roles of GS/GOGAT and transamination in amino acid synthesis [ 28 , 29 ]. This architecture supports a transport-and-sink model where biosynthesis, long-distance transport, and sink import each contribute trait-specific variance, which appears at different association windows in seeds [ 30 ]. Among essential amino acids, threonine and methionine are traditionally limiting in pulses and are key targets for improving dietary protein quality [ 31 , 32 ]. Threonine showed a significant genome-wide association on Chr08 within a phosphatase-rich region, providing the first tepary-specific marker to track flux in the aspartate family pathway, which also produces Met, Ile, and Lys. This provides a practical tool for marker-assisted selection to increase threonine levels while monitoring related amino acids [ 11 , 28 ]. Similarly, methionine produced a genome-wide signal in our association table, marking the first GWAS indicator in tepary for a nutritionally critical amino acid. Using the Met-associated SNP, along with the protein% interval, enables recurrent selection for improved protein quality without reducing total protein; genomic selection can also capture residual variance across multiple loci within the amino acid panel [ 7 , 31 ]. In addition, lysine mapped to two intervals: S11_10209988, adjacent to Phacu.CVR.011G105200 (annotated as a SAM‑dependent methyltransferase) and S04_6376564 near Phacu.CVR.004G052100 (WOX11). Together, these loci suggest complementary regulatory levers for amino‑acid homeostasis—epigenetic control via the SAM cycle and methylation [ 33 , 34 ] and developmental regulation via a WUSCHEL‑related homeobox factor that integrates auxin/cytokinin cues [ 35 , 36 ]. Consistent with the coordinated control in the aspartate‑family pathway, feedback at key branch‑point enzymes (AK, HSD, DHDPS) links Lys–Thr–Met pools [ 37 – 40 ], providing a mechanistic context for the observed lysine associations and their potential effects on related essential amino acids. The Chr11 signal colocalizes with a SAM‑dependent methyltransferase, connecting lysine to the SAM cycle and epigenetic control during seed filling; SAM supply via MAT enzymes is known to influence DNA and histone methylation and transcriptional programs in plants [ 33 , 34 ]. Taken together, tepary's suitability for low-input, heat- and drought-adapted systems makes it a resilient legume protein source that can improve protein quality in resource-limited environments [ 1 , 41 ]. Seed‑size genetics and deployment: from V‑ATPase and Aux/IAA signals to marketable size under low inputs Seed-size mapping in tepary revealed a few loci with modest effects: one on Chr02 influencing hundred-seed weight, involving a vacuolar H⁺‑ATPase subunit E and 6‑phosphogluconate dehydrogenase, and another on Chr07 affecting seed width near an Aux/IAA gene. This reflects polygenic architecture, as evidenced by image-based GWAS findings in common bean [ 4 ]. The V‑ATPase plays a key role in energizing tonoplast transport and maintaining vacuolar pH, which are vital for nutrient storage and cell expansion, thus linking endomembrane functions to organ growth [ 42 ]. Aux/IAA proteins are vital repressors in the auxin regulatory pathway; auxin influences cell division and expansion in developing seeds, positioning the seed width signal within a well-established growth control pathway [ 43 , 44 ]. Breeders can use marker-assisted selection targeting the Chr02 HSW and Chr07 width regions to enhance seed size and width, incorporating an index that also considers protein percentage and essential amino acids to sustain nutritional improvements. Because tepary thrives with low inputs and fixes nitrogen naturally, modest genetic gains in seed size could boost marketability and consumer appeal without significant input costs, aligning with the crop’s ecological niche [ 1 , 41 ]. Compared with earlier GWAS using whole-genome resequencing that focused on seed yield and 100-seed weight across multiple environments, our GBS-based, composition-focused analysis at a single organic site identified a unique Chr02 HSW region enriched for growth-regulatory candidates. This suggests that seed size control in tepary involves both environment-responsive loci, evident in multi-environment trial models, and developmentally anchored loci detectable through standardized composition assessments [ 4 , 45 ]. Integrating genomic prediction with seed‑composition and seed‑size GWAS signals The genomic prediction (GP) analyses (Fig. 6 ; Fig. S9 ) complement the GWAS findings by quantifying the predictability of seed‑protein, amino‑acid, and seed‑size traits across models of varying complexity. The high predictive abilities for hundred‑seed weight and seed width (r ≈ 0.90–0.96) are consistent with findings in other pulses—such as common bean [ 4 ] and pea [ 8 ]—where seed morphological traits exhibit a strong polygenic signal that is readily captured by shrinkage‑based models. These accuracies confirm that both major GWAS loci (e.g., V‑ATPase on Chr02; Aux/IAA on Chr07) and background polygenic variance contribute consistently to seed‑size traits. In contrast, predictive abilities for total soluble protein (%) and amino‑acid traits were moderate (r ≈ 0.15–0.45), mirroring the modular genetic architecture detected in GWAS. Similar patterns are reported in soybean [ 7 ] and lentil [ 11 ], where amino acids and protein‑quality traits show dispersed, low‑effect loci and correspondingly modest genomic prediction accuracies. Across models, rrBLUP and BRR performed comparably to Bayesian and nonlinear learners—indicating that nutritional composition traits in tepary bean are governed largely by many small‑effect loci, and not by sparse large‑effect signals. Consistent with this architecture, model improvements relative to rrBLUP (Δr) were small and trait‑consistent (Fig. S9 ): BRR provided slightly higher average accuracy, whereas BA, BB, BL, RF, and SVM showed modestly lower or near‑zero Δr. This behavior matches reports in soybean, lentil, chickpea, and pea [ 7 , 9 – 11 ], where ridge‑based models frequently outperform more complex algorithms for nutritional‑quality traits dominated by polygenic variation. Collectively, the GWAS–GP integration supports a dual breeding‑strategy model: (i) Marker‑assisted selection (MAS) for the major protein‑quality and essential‑amino‑acid loci detected in this study (e.g., protein% WNK/2‑OG oxygenase locus on Chr08; threonine locus on Chr08; methionine locus on Chr02), and (ii) Genomic selection (GS) to capture the remaining distributed genetic variance that influences overall protein quality and amino‑acid balance. Such MAS + GS pipelines are increasingly common in legume improvement [ 11 ], and they align well with tepary bean’s adaptation to low‑input, heat‑ and drought‑stressed systems [ 1 , 3 ], positioning the crop for climate‑resilient, nutrition‑focused breeding. Positioning tepary among pulses and delivering new tools for introgression. Across pulses, the field of nutrition-genetics now includes GWAS studies on amino acids and protein quality in soybean and lentil, a significant locus linked to seed protein in cowpea, and various loci related to composition in pea. In this context, this study provides first-generation tepary SNP tools targeting protein percentage, essential amino acids (threonine and methionine), and seed size [ 7 , 8 , 11 , 12 ]. Coupled with a chromosome-scale tepary reference genome and known adaptation to heat and drought, these markers facilitate nutrition-focused breeding under low-input, nitrogen-fixing farming, an uncommon combination of resilience and improved protein content that bolsters tepary’s potential as a climate-smart legume [ 1 , 3 , 41 ]. Legume breeders can use marker-assisted selection by combining SNPs associated with protein percentage, Thr, Met, and seed size. They can also use synteny to identify orthologous regions in common bean for comparative mapping or backcrossing. Furthermore, genomic selection can be applied to harness polygenic traits linked to amino acid content. Additionally, genome–environment pipelines tailored for common bean that consider heat and drought stress can enhance the deployment of these new alleles in difficult environments [ 7 , 12 , 27 ]. We recommend conducting comparative synteny studies among Phaseolus species to identify orthologous variants in common bean and interspecific materials. Integrating selected SNPs or candidate loci identified from tepary in this study would facilitate accelerating the development of nutrition-focused ideotypes suited for arid conditions [ 1 , 3 , 4 , 27 – 30 , 41 ]. Further research should validate the links between threonine, methionine, and protein content across heat- and drought-prone environments through multi-environment trials. Additionally, in-depth studies using fine-mapping, RNA-seq, and metabolomics during seed filling are needed to clarify causal relationships and test hypotheses related to WNK signaling, 2-oxoglutarate oxygenases, amino acid transporters, and regulatory phosphatases. Converting key SNPs into cost-effective breeder assays, such as KASP, will aid index-based selection for protein quality and seed size, particularly in organic or low-input systems typical of tepary beans. Conclusion This study provides a genome-wide analysis of tepary bean nutritional and size traits, identifying key associations with protein content, essential amino acids (threonine and methionine), and seed size through a single, low-input, organic field screening of 206 accessions. It highlights a region on Chr08 linked to WNK kinases and a 2-oxoglutarate oxygenase, for % protein content, linking regulatory signaling and C–N metabolism to storage protein accumulation. The trait-specific patterns across fifteen amino acids suggest a modular genetic architecture conducive to marker stacking and genomic selection. To improve protein quality, threonine (Chr08) and methionine markers are the first tepary-specific tools for optimizing these essential amino acids, allowing breeders to enhance protein quality without reducing total protein. In seed size, HSW on Chr02 and width on Chr07 correspond to growth-related candidates such as V‑ATPase and Aux/IAA, supporting index selection to improve size and nutrition simultaneously under low-input, nitrogen-fixing farming. Overall, these loci serve as early-generation SNP tools for developing nutrition-oriented, climate-resilient tepary ideotypes and for applying these advances to Phaseolus crops through synteny-guided breeding and introgression. Further studies that convert key SNPs into breeder-ready assays (e.g., KASP), test their effects across heat- and drought-stress environments, and fine-map causal variants using gene expression and metabolomics during seed fill are recommended. This study highlights tepary bean as a valuable protein-rich legume and offers genomic tools to accelerate significant improvements in seed protein yield, quality, and size with minimal inputs. In addition to GWAS‑identified loci, the high predictive abilities for hundred‑seed weight and seed width indicate that these traits are well suited to genomic selection, whereas the moderate predictive abilities for protein percentage and amino acids reflect their polygenic, modular architecture. Together, the GWAS–GP framework demonstrates the feasibility of a dual breeding strategy that pairs marker-assisted selection of large-effect loci with genomic selection to accumulate polygenic improvements in nutritional quality. Incorporating these prediction models will accelerate the development of climate‑resilient, nutrition‑focused tepary ideotypes and inform comparative breeding across Phaseolus species. Abbreviations AA Amino acid BSA Bovine serum albumin BLUP/BLUE Best linear unbiased prediction/estimate GBS Genotyping‑by‑sequencing GS Genomic selection GWAS Genome‑wide association study HSW Hundred‑seed weight KASP Kompetitive allele‑specific PCR LD‑kNNi Linkage disequilibrium k‑nearest neighbors imputation MAF Minor allele frequency MAS Marker‑assisted selection PCA Principal component analysis V‑ATPase Vacuolar H⁺‑ATPase WNK With‑No‑Lysine(K) protein kinase. Generalized Linear Models (GLM), Mixed linear model (MLM), Bayesian-information and Linkage-disequilibrium Iteratively Nested Keyway (BLINK), Fixed and random model Circulating Probability Unification (FarmCPU). Declarations Ethics approval and consent to participate Not applicable Consent for publication Not applicable Availability of data and materials All sequencing data for tepary bean accessions have been submitted to the National Center for Biotechnology Information (NCBI) under the BioProject accession PRJNA1416895. Filtered GBS SNP data is available on FigShare: https://doi.org/10.6084/m9.figshare.31158676 List of USDA-NPGS Tepary bean accessions (Additional file 1), BLUPs/BLUEs used in this study (Additional file 2 and 3), and significant SNPs/candidate gene annotations (Additional file 4) are available on FigShare: https://doi.org/10.6084/m9.figshare.31334209 Competing interests The authors declare that they have no competing interests. Funding This research was supported by Sustainable Agriculture Research and Education (SARE) Graduate Student Grant 2025 (GS24-299), Texas Department of Agriculture (TDA), and Texas A and M AgriLife Research and Extension Centre, Uvalde, TX, USA. Acknowledgements The authors would like to thank Dalton Thompson, Research Technician, Systems Plant Physiology, Texas A&M AgriLife Research and Extension Centre, Uvalde, TX, for conducting amino acid quantifications. The authors acknowledge funding from the Sustainable Agriculture Research and Education (SARE), the Texas Department of Agriculture (TDA), and Texas A&M AgriLife Research. References Moghaddam SM, Oladzad A, Koh C, Ramsay L, Hart JP, Mamidi S, et al. The tepary bean genome provides insight into evolution and domestication under heat stress. Nat Commun 2021 May 11;12(1):2638–x. Porch TG, Cichy K, Wang W, Brick M, Beaver JS, Santana-Morant D, et al. Nutritional composition and cooking characteristics of tepary bean (Phaseolus acutifolius Gray) in comparison with common bean (Phaseolus vulgaris L.). Genet Resour Crop Evol 2017;64(5):935–953. López-Ibarra C, Ruiz-López FdJ, Bautista-Villarreal M, Báez-González JG, Rodríguez Romero BA, González-Martínez BE, et al. Protein Concentrates on Tepary Bean (Phaseolus acutifolius Gray) as a Functional Ingredient: In silico Docking of Tepary Bean Lectin to Peroxisome Proliferator-Activated Receptor Gamma. Frontiers in Nutrition 2021;olume 8 - 2021. Giordani W, Gama HC, Chiorato AF, Garcia AAF, Vieira MLC. Genome-wide association studies dissect the genetic architecture of seed shape and size in common bean. G3 Genes|Genomes|Genetics 2022;12(4):jkac048. Jurado M, García-Fernández C, Campa A, Ferreira JJ. Identification of consistent QTL and candidate genes associated with seed traits in common bean by combining GWAS and RNA-Seq. Theor Appl Genet 2024 May 27;137(6):143–5. Gunjača J, Carović-Stanko K, Lazarević B, Vidak M, Petek M, Liber Z, et al. Genome-Wide Association Studies of Mineral Content in Common Bean. Frontiers in Plant Science 2021;olume 12 - 2021. Qin J, Shi A, Song Q, Li S, Wang F, Cao Y, et al. Genome Wide Association Study and Genomic Selection of Amino Acid Concentrations in Soybean Seeds. Front Plant Sci 2019 Nov 15;10:1445. Warsame AO, Balk J, Domoney C. Identification of significant genome-wide associations and QTL underlying variation in seed protein composition in pea ( Pisum sativum L.). bioRxiv 2024:2024.07.04.602075. Roorkiwal M, Bhandari A, Barmukh R, Bajaj P, Valluri VK, Chitikineni A, et al. Genome-wide association mapping of nutritional traits for designing superior chickpea varieties. Frontiers in Plant Science 2022;olume 13 - 2022. Sari H, Uhdre R, Wallace L, Coyne CJ, Bourland B, Zhang Z, et al. Genome-wide association study in Chickpea (Cicer arietinum L.) for yield and nutritional components. Euphytica 2024;220(6):84. Johnson N, Boatwright JL, Bridges W, Thavarajah P, Kumar S, Thavarajah D. Targeted improvement of plant-based protein: Genome-wide association mapping of a lentil (Lens culinaris Medik.) diversity panel. Plants People Planet 2024;6(3):640–655. Chen Y, Xiong H, Ravelombola W, Bhattarai G, Barickman C, Alatawi I, et al. A Genome-Wide Association Study Reveals Region Associated with Seed Protein Content in Cowpea. Plants 2023;12(14). Federer WT, Crossa J. I.4 Screening Experimental Designs for Quantitative Trait Loci, Association Mapping, Genotype-by Environment Interaction, and Other Investigations. Front Physiol 2012 Jun 1;3:156. Möhring J, Williams E, Piepho H. Efficiency of augmented p-rep designs in multi-environmental trials. TAG Theoretical and applied genetics Theoretische und angewandte Genetik 2014;127. Piepho H, Möhring J, Schulz-Streeck T, Ogutu JO. A stage-wise approach for the analysis of multi-environment trials. Biom J 2012 Nov;54(6):844–860. Smith A, Cullis B, Gilmour AR. The Analysis of Crop Variety Evaluation Data in Australia. Australian & New Zealand Journal of Statistics 2001;43:129–145. Williams E, Piepho H, Whitaker D. Augmented p-rep designs. Biom J 2011 Feb;53(1):19–27. Damesa T, Hartung J, Gowda M, Beyene Y, Das B, Semagn K, et al. Comparison of Weighted and Unweighted Stage‐Wise Analysis for Genome‐Wide Association Studies and Genomic Selection. Crop Sci 2019;59. Fernández-González J, Isidro Y Sánchez J. Optimizing fully-efficient two-stage models for genomic selection using open-source software. Plant Methods 2025 Feb 4;21(1):9–9. Deans CA, Sword GA, Lenhart PA, Burkness E, Hutchison WD, Behmer ST. Quantifying Plant Soluble Protein and Digestible Carbohydrate Content, Using Corn (Zea mays) As an Exemplar. J Vis Exp 2018 Aug 6;(138):58164. doi(138):10.3791/58164. Joshi V, Joshi M, Silwal D, Noonan K, Rodriguez S, Penalosa A. Systematized biosynthesis and catabolism regulate citrulline accumulation in watermelon. Phytochemistry 2019;162:129–140. Pritchard JK, Stephens M, Donnelly P. Inference of population structure using multilocus genotype data. Genetics 2000 Jun;155(2):945–959. EVANNO G, REGNAUT S, GOUDET J. Detecting the number of clusters of individuals using the software structure: a simulation study. Mol Ecol 2005;14(8):2611–2620. Elshire RJ, Glaubitz JC, Sun Q, Poland JA, Kawamoto K, Buckler ES, et al. A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PLoS One 2011 May 4;6(5):e19379. Endelman JB. Fully efficient, two-stage analysis of multi-environment trials with directional dominance and multi-trait genomic selection. Theor Appl Genet 2023 Mar 22;136(4):65–x. Schulz-Streeck T, Ogutu JO, Piepho H. Comparisons of single-stage and two-stage approaches to genomic selection. Theor Appl Genet 2013 Jan;126(1):69–82. López-Hernández F, Cortés AJ. Last-Generation Genome-Environment Associations Reveal the Genetic Basis of Heat Tolerance in Common Bean (Phaseolus vulgaris L.). Front Genet 2019 Nov 22;10:954. Forde BG, Lea PJ. Glutamate in plants: metabolism, regulation, and signalling. J Exp Bot 2007;58(9):2339–2358. Saddhe AA, Karle SB, Aftab T, Kumar K. With no lysine kinases: the key regulatory networks and phytohormone cross talk in plant growth, development and stress response. Plant Cell Rep 2021 Nov;40(11):2097–2109. Tegeder M, Rentsch D. Uptake and partitioning of amino acids and peptides. Mol Plant 2010 Nov;3(6):997–1011. Khazaei H, Subedi M, Nickerson M, Martínez-Villaluenga C, Frias J, Vandenberg A. Seed Protein of Lentils: Current Status, Progress, and Food Applications. Foods 2019 Sep 4;8(9):391. doi: 10.3390/foods8090391. Nosworthy MG, Yu B, Zaharia LI, Medina G, Patterson N. Pulse protein quality and derived bioactive peptides. Frontiers in Plant Science 2025;olume 16 - 2025. Lee Y, Ren D, Jeon B, Liu H. S-Adenosylmethionine: more than just a methyl donor. Nat Prod Rep 2023 Sep 20;40(9):1521–1549. Meng J, Wang L, Wang J, Zhao X, Cheng J, Yu W, et al. METHIONINE ADENOSYLTRANSFERASE4 Mediates DNA and Histone Methylation. Plant Physiol 2018 Jun;177(2):652–670. Zhao Y, Hu Y, Dai M, Huang L, Zhou D. The WUSCHEL-related homeobox gene WOX11 is required to activate shoot-borne crown root development in rice. Plant Cell 2009 Mar;21(3):736–748. Zhou S, Jiang W, Long F, Cheng S, Yang W, Zhao Y, et al. Rice homeodomain protein WOX11 recruits a histone acetyltransferase complex to establish programs of cell proliferation of crown root meristem. Plant Cell 2017;29(5):1088–1104. Galili G. Regulation of Lysine and Threonine Synthesis. Plant Cell 1995;7(7):899–906. Azevedo RA, Arruda P, Turner WL, Lea PJ. The biosynthesis and metabolism of the aspartate derived amino acids in higher plants. Phytochemistry 1997;46(3):395–419. Goto DB, Onouchi H, Naito S. Dynamics of methionine biosynthesis. Plant Biotechnology 2005;22(5):379–388. Jander G, Joshi V. Aspartate-Derived Amino Acid Biosynthesis in Arabidopsis thaliana. Arabidopsis Book 2009;7:e0121. Barrera S, Berny Mier y Teran JC, Aparicio J, Diaz J, Leon R, Beebe S, et al. Identification of drought and heat tolerant tepary beans in a multi-environment trial study. Crop Sci 2024;64(6):3399–3416. Krebs M, Beyhl D, Görlich E, Al-Rasheid KAS, Marten I, Stierhof Y, et al. Arabidopsis V-ATPase activity at the tonoplast is required for efficient nutrient storage but not for sodium accumulation. Proc Natl Acad Sci U S A 2010 Feb 16;107(7):3251–3256. Leyser O. Auxin Signaling. Plant Physiol 2018;176(1):465–479. Wang R, Estelle M. Diversity and specificity: auxin perception and signaling through the TIR1/AFB pathway. Curr Opin Plant Biol 2014;21:51–58. Ravelombola W, Manley A, Pham H, Brown M, Ruhl C, Ghosh P. Genome-Wide Association Study for Seed Yield of Tepary Bean Using Whole-Genome Resequencing. Int J Mol Sci 2024 Oct 21;25(20):11302. doi: 10.3390/ijms252011302. Additional Declarations No competing interests reported. Supplementary Files Additionalfile1.xlsx Additional file 1. List of Tepary bean accessions and commercial checks used in this study. Includes USDA-NPGS accession numbers, origin, and market-type descriptions for all evaluated materials. Additionalfile2.xlsx Additional file 2. Adjusted phenotypic values (BLUPs/BLUEs) for seed traits generated from linear mixed models used as input for GWAS. Additionalfile3.xlsx Additional file 3. Adjusted phenotypic values (BLUPs/BLUEs) for 19 seed-bound amino acids. These adjusted means were used for all amino acid GWAS analyses. Additionalfile4.xlsx Additional file 4. Genome-wide significant SNPs associated with all seed traits, including candidate genes and functional annotations. Lists of SNPs, chromosome locations, raw and FDR-corrected p-values, associated genes, and predicted gene functions. Additionalfile5.png Additional file 5. Population structure and genetic differentiation. (A) STRUCTURE bar plot at K = 2 showing ancestry proportions across 206 accessions and four checks; (B) principal coordinate analysis (PCoA) depicting separation of checks from broader germplasm. ΔK was evaluated with Structure Harvester to select the optimal K. PCs were included as covariates in all GWAS models (PCA = 5) Additionalfile6.png Additional file 6. GWAS for alanine, allantoin, asparagine, and glutamine.Manhattan (left) and Q–Q (right) plots for (A) alanine, (B) allantoin, (C) asparagine, and (D) glutamine. The dashed line indicates the Bonferroni threshold (−log₁₀p ≈ 6). For glutamine, a lead association on Chr09 (S09_37434888) reached p = 1.39 × 10⁻¹⁹. See Additional file 4 for full candidate lists around the lead SNPs (±10 kb). Additionalfile7.png Additional file 7. GWAS for glycine, histidine, isoleucine, and leucine. Manhattan (left) and Q–Q (right) plots for (A) glycine, (B) histidine, (C) isoleucine, and (D) leucine. Thresholds and models as in Fig. 4. Candidate genes within ±10 kb windows of significant SNPs are summarized in Additional file 4. Additionalfile8.png Additional file 8. GWAS for proline, serine, tyrosine, and valine.Manhattan [left] and Q–Q [right] plots for (A) proline, (B) serine, (C) tyrosine, and (D) valine. Thresholds and models as in Fig. 4. The valine landscape includes significant signals on Chr09 with annotated regulatory candidates (Additional file 4). Additionalfile9.png Additional file 9. Model performance vs rrBLUPS. Relative model performance (Δr = mean(r_model) – mean(r_rrBLUP)) across six traits under 5‑fold cross‑validation. Δr values are summarized across traits as: BA −0.102 | BB −0.153 | BL −0.075 | BRR +0.007 | RF −0.009 | SVM −0.008. Panels show trait‑wise predictive ability without per‑panel Δr annotations to maximize clarity. A compact key indicates the statistical test used to compare models (paired t-test across folds; *** p<0.001; ** p<0.01; * p<0.05; • p<0.10; ns). Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-8970665","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":603043084,"identity":"1c48f204-f81c-42c9-ad31-8a1f9426fdfc","order_by":0,"name":"Sri Kiran Reddy Alla","email":"","orcid":"","institution":"Texas A\u0026M University","correspondingAuthor":false,"prefix":"","firstName":"Sri","middleName":"Kiran Reddy","lastName":"Alla","suffix":""},{"id":603043085,"identity":"09cba1d0-c56d-40c6-a5c7-30c694859720","order_by":1,"name":"Benedict Analin","email":"","orcid":"","institution":"Texas A\u0026M AgriLife Research","correspondingAuthor":false,"prefix":"","firstName":"Benedict","middleName":"","lastName":"Analin","suffix":""},{"id":603043086,"identity":"ce03567a-8b8a-4151-8e44-7acd3d918937","order_by":2,"name":"Vijay Joshi","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA6klEQVRIiWNgGAWjYBACg8MMCUBKjoFBgrHxAUyQGC1sIC3NBgeI0gJRBdLCwCZBlBbJdoaHn3kY2OTkZze3VX+o2ZbYwN68TQKvlmaGZGmgFmODOwfbbhw4djuxgedYGSEtCSAtiRskEoFa2IBaJHLM8GrhZ2ZI/g3SMn9GYlvBgX9ALfJvCGpJA9rilthwI7GN4WAbyBYe/FrYgFos5xjkGBvcSGyWONt327iNJ63YAq8W/jPJN95UWMjJz0h/+KHi223ZfvbDG2/g08LAwJPAxIMcEWz4lYMA+wHGH4RVjYJRMApGwUgGAODTSPlH1YOHAAAAAElFTkSuQmCC","orcid":"","institution":"Texas A\u0026M University","correspondingAuthor":true,"prefix":"","firstName":"Vijay","middleName":"","lastName":"Joshi","suffix":""}],"badges":[],"createdAt":"2026-02-25 18:53:29","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-8970665/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-8970665/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":104344631,"identity":"1163988f-0c96-4500-a914-03202729c4da","added_by":"auto","created_at":"2026-03-10 17:25:30","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":1403393,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003ePhenotypic distributions of seed traits in tepary bean. \u003c/strong\u003e(A) Hundred‑seed weight (HSW, g 100‑seed⁻¹) and (B) seed width (cm) across a diversity panel of 206 \u003cem\u003ePhaseolus acutifolius\u003c/em\u003e accessions plus four replicated checks grown under certified organic management (Uvalde, TX, 2024). Histograms show the frequency of accession means; dashed vertical lines indicate check means. HSW was measured from three independent 100‑seed subsamples per accession; seed width is the plot mean of n = 5 intact seeds measured along the minor axis (ImageJ v1.54p)\u003cstrong\u003e.\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"Figure1.png","url":"https://assets-eu.researchsquare.com/files/rs-8970665/v1/75e3363a410681956b4614c5.png"},{"id":104344619,"identity":"47ef34db-261f-4b14-ad70-0654ea52d05a","added_by":"auto","created_at":"2026-03-10 17:25:26","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":711030,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eDistribution of total soluble seed protein (%). \u003c/strong\u003eTotal soluble protein (%) determined by Bradford assay on triplicate extracts (0.1 M NaOH; 10× dilution; 595 nm) for each accession; bars show the frequency of accession means, with dashed lines indicating check means. Each plot was used for downstream analysis (JMP Pro 18).\u003c/p\u003e","description":"","filename":"Figure2.png","url":"https://assets-eu.researchsquare.com/files/rs-8970665/v1/1c9bf270b86e8b75698bf328.png"},{"id":104344662,"identity":"e42af8c1-5d07-46cc-bd78-272d26aa54e7","added_by":"auto","created_at":"2026-03-10 17:25:43","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":6530712,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eSeed amino‑acid composition and multivariate structure.\u003c/strong\u003e (A) Proportional distribution (%) of nineteen free amino acids in mature seeds. Arginine (~28%), asparagine (~20%), aspartic acid (~18.5%), and glutamic acid (~12.4%) dominate the pool. (B) Principal component analysis (PCA) of amino‑acid profiles across accessions; PC1 (24.3%) loads strongly on methionine, histidine, leucine, isoleucine, and valine, while PC2 (13.0%) is driven by allantoin, asparagine, and glutamic acid.\u003c/p\u003e","description":"","filename":"Figure3.png","url":"https://assets-eu.researchsquare.com/files/rs-8970665/v1/7c3f039a9813c5530027578e.png"},{"id":104344545,"identity":"d0c25134-6bf8-45b5-bbf6-28b957e2213c","added_by":"auto","created_at":"2026-03-10 17:25:00","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":18208234,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eGWAS for total soluble protein (%), hundred‑seed weight (HSW), and seed width. \u003c/strong\u003eManhattan (left) and Q–Q (right) plots for (A) total soluble protein%, (B) hundred‑seed weight (HSW), and (C) seed width (cm). GWAS used BLINK and FarmCPU on Stage‑1 BLUPs with population structure controlled by PCA (PCA = 5) and, where applicable, kinship. The horizontal dashed line marks the Bonferroni genome‑wide threshold at α = 0.05 (49,384 SNPs; −log₁₀p ≈ 6). Lead associations for protein% on Chr08 coincide with candidates Phacu.CVR.008G093700 (WNK) and Phacu.CVR.008G093800 (2‑oxoglutarate oxygenase). For HSW, the lead Chr02 signal is near Phacu.CVR.002G331300 (V‑ATPase subunit E), and for seed width, the Chr07 lead overlaps Phacu.CVR.007G121900 (Aux/IAA).\u003c/p\u003e","description":"","filename":"Figure4.png","url":"https://assets-eu.researchsquare.com/files/rs-8970665/v1/d33e4886bb015322939ab6bd.png"},{"id":104344616,"identity":"176f2905-06d5-491e-a546-d98a357bab73","added_by":"auto","created_at":"2026-03-10 17:25:24","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":19379877,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eGWAS for representative essential amino acids: threonine, methionine, and lysine. \u003c/strong\u003eManhattan (left) and Q–Q (right) plots for (A) threonine, (B) methionine, and (C) lysine. Dashed lines indicate the Bonferroni threshold (−log₁₀p ≈ 6). Threonine shows a lead Chr08 association (S08_49548543) with nearby candidates, including a Ser/Thr protein phosphatase and a mitochondrial carrier; methionine shows a lead Chr02 signal (S02_10429456) near Phacu.CVR.002G092500 (TCP14 homolog) and lysine lead SNPs S07_22884181, S11_10209988 (near a SAM‑dependent methyltransferase), and S04_6376564 (near WOX11). Full association details are in Table 1 and Additional file 4(all additional loci).\u003c/p\u003e","description":"","filename":"Figure5.png","url":"https://assets-eu.researchsquare.com/files/rs-8970665/v1/1bd0bd4a993a0ee2a20f031f.png"},{"id":104344630,"identity":"428992cf-3597-4ec3-aca1-79e2f70b62f9","added_by":"auto","created_at":"2026-03-10 17:25:30","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":2162259,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003ePredictive ability (r) of seven genomic prediction models\u003c/strong\u003e. Predictive ability (r) across 5‑fold cross‑validation for Bayesian Alphabet (BA), BayesB (BB), Bayes LASSO (BL), Bayesian Ridge Regression (BRR), Random Forest (RF), ridge‑regression BLUP (rrBLUP), and Support Vector Machine (SVM). Traits include total soluble protein (%), hundred‑seed weight (HSW), seed width, and free amino acids. Boxplots represent fold‑wise predictive ability, and n‑test values shown beneath each panel represent the number of accessions in each fold’s test partition with non‑missing BLUPs.\u003c/p\u003e","description":"","filename":"Figure6.png","url":"https://assets-eu.researchsquare.com/files/rs-8970665/v1/15b464126c7e9760ef5fcfaa.png"},{"id":109252459,"identity":"45b39ace-b446-4db1-940c-43b39eebad20","added_by":"auto","created_at":"2026-05-14 09:26:48","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":48195132,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8970665/v1/95501105-1a28-4411-ac0b-f0df80332ed1.pdf"},{"id":104344628,"identity":"3edd0441-324b-42a4-ad4b-a183d061d475","added_by":"auto","created_at":"2026-03-10 17:25:30","extension":"xlsx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":22260,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eAdditional file 1\u003c/strong\u003e. List of Tepary bean accessions and commercial checks used in this study. Includes USDA-NPGS accession numbers, origin, and market-type descriptions for all evaluated materials.\u003c/p\u003e","description":"","filename":"Additionalfile1.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-8970665/v1/0315b2c3efa4bb39ed86e9b3.xlsx"},{"id":104344615,"identity":"4c29b79f-07ad-45a0-8413-7409dd781bea","added_by":"auto","created_at":"2026-03-10 17:25:24","extension":"xlsx","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":27929,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eAdditional file 2\u003c/strong\u003e. Adjusted phenotypic values (BLUPs/BLUEs) for seed traits generated from linear mixed models used as input for GWAS.\u003c/p\u003e","description":"","filename":"Additionalfile2.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-8970665/v1/c5bf1ff303c35e50b52e6d40.xlsx"},{"id":104344634,"identity":"c5e9ab2e-cf9a-46b4-bc02-68c6d24ec254","added_by":"auto","created_at":"2026-03-10 17:25:31","extension":"xlsx","order_by":3,"title":"","display":"","copyAsset":false,"role":"supplement","size":69456,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eAdditional file 3\u003c/strong\u003e. Adjusted phenotypic values (BLUPs/BLUEs) for 19 seed-bound amino acids. These adjusted means were used for all amino acid GWAS analyses.\u003c/p\u003e","description":"","filename":"Additionalfile3.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-8970665/v1/25c775b3b27550eaa44187c1.xlsx"},{"id":104344620,"identity":"4073c3b5-8729-4669-8133-980433173f87","added_by":"auto","created_at":"2026-03-10 17:25:26","extension":"xlsx","order_by":4,"title":"","display":"","copyAsset":false,"role":"supplement","size":40315,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eAdditional file 4\u003c/strong\u003e. Genome-wide significant SNPs associated with all seed traits, including candidate genes and functional annotations. Lists of SNPs, chromosome locations, raw and FDR-corrected p-values, associated genes, and predicted gene functions.\u003c/p\u003e","description":"","filename":"Additionalfile4.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-8970665/v1/518f39a31dd6354cb23eabc9.xlsx"},{"id":104344643,"identity":"9cb15ee3-fd2d-480b-a55a-151c5671dd11","added_by":"auto","created_at":"2026-03-10 17:25:36","extension":"png","order_by":5,"title":"","display":"","copyAsset":false,"role":"supplement","size":7381402,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eAdditional file 5. Population structure and genetic differentiation. \u003c/strong\u003e(A) STRUCTURE bar plot at K = 2 showing ancestry proportions across 206 accessions and four checks; (B) principal coordinate analysis (PCoA) depicting separation of checks from broader germplasm. ΔK was evaluated with Structure Harvester to select the optimal K. PCs were included as covariates in all GWAS models (PCA = 5)\u003c/p\u003e","description":"","filename":"Additionalfile5.png","url":"https://assets-eu.researchsquare.com/files/rs-8970665/v1/37185a15c7a7960e004d8179.png"},{"id":104344681,"identity":"aa419d9a-1d76-4293-842f-9dafbf013a53","added_by":"auto","created_at":"2026-03-10 17:25:49","extension":"png","order_by":6,"title":"","display":"","copyAsset":false,"role":"supplement","size":25313975,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eAdditional file 6. GWAS for alanine, allantoin, asparagine, and glutamine.\u003c/strong\u003eManhattan (left) and Q–Q (right) plots for (A) alanine, (B) allantoin, (C) asparagine, and (D) glutamine. The dashed line indicates the Bonferroni threshold (−log₁₀p ≈ 6). For glutamine, a lead association on Chr09 (S09_37434888) reached p = 1.39 × 10⁻¹⁹. See Additional file 4 for full candidate lists around the lead SNPs (±10 kb).\u003c/p\u003e","description":"","filename":"Additionalfile6.png","url":"https://assets-eu.researchsquare.com/files/rs-8970665/v1/5dd8a95a4476a4fbd605dc64.png"},{"id":104344621,"identity":"3edae66e-b209-4f60-bfb1-55c51da038c5","added_by":"auto","created_at":"2026-03-10 17:25:27","extension":"png","order_by":7,"title":"","display":"","copyAsset":false,"role":"supplement","size":25684976,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eAdditional file 7. GWAS for glycine, histidine, isoleucine, and leucine\u003c/strong\u003e. Manhattan (left) and Q–Q (right) plots for (A) glycine, (B) histidine, (C) isoleucine, and (D) leucine. Thresholds and models as in Fig. 4. Candidate genes within ±10 kb windows of significant SNPs are summarized in Additional file 4.\u003c/p\u003e","description":"","filename":"Additionalfile7.png","url":"https://assets-eu.researchsquare.com/files/rs-8970665/v1/06c38817bef565effc9a5cff.png"},{"id":104344603,"identity":"5ff07a36-4806-4986-8d0d-415fecc097dc","added_by":"auto","created_at":"2026-03-10 17:25:22","extension":"png","order_by":8,"title":"","display":"","copyAsset":false,"role":"supplement","size":27917674,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eAdditional file 8. GWAS for proline, serine, tyrosine, and valine.\u003c/strong\u003eManhattan [left] and Q–Q [right] plots for (A) proline, (B) serine, (C) tyrosine, and (D) valine. Thresholds and models as in Fig. 4. The valine landscape includes significant signals on Chr09 with annotated regulatory candidates (Additional file 4).\u003c/p\u003e","description":"","filename":"Additionalfile8.png","url":"https://assets-eu.researchsquare.com/files/rs-8970665/v1/8a642657bbf9984fd9e45654.png"},{"id":104344666,"identity":"c8e00764-089b-4c51-8c3f-3d8e76d415e2","added_by":"auto","created_at":"2026-03-10 17:25:46","extension":"png","order_by":9,"title":"","display":"","copyAsset":false,"role":"supplement","size":651618,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eAdditional file 9. Model performance vs rrBLUPS\u003c/strong\u003e. Relative model performance (Δr = mean(r_model) – mean(r_rrBLUP)) across six traits under 5‑fold cross‑validation. Δr values are summarized across traits as: BA −0.102 | BB −0.153 | BL −0.075 | BRR +0.007 | RF −0.009 | SVM −0.008. Panels show trait‑wise predictive ability without per‑panel Δr annotations to maximize clarity. A compact key indicates the statistical test used to compare models (paired t-test across folds; *** p\u0026lt;0.001; ** p\u0026lt;0.01; * p\u0026lt;0.05; • p\u0026lt;0.10; ns).\u003c/p\u003e","description":"","filename":"Additionalfile9.png","url":"https://assets-eu.researchsquare.com/files/rs-8970665/v1/7c086bebb9d08e3074e8dae4.png"}],"financialInterests":"No competing interests reported.","formattedTitle":"Nutritional Genomics of Tepary Bean (Phaseolus acutifolius): Genome‑wide association analysis and genomic prediction of seed nutritional traits and size","fulltext":[{"header":"Introduction","content":"\u003cp\u003eTepary bean (\u003cem\u003ePhaseolus acutifolius\u003c/em\u003e A. Gray) is native to the Sonoran Desert and has been cultivated for centuries by Indigenous communities of the southwestern United States and northern Mexico. Its domestication history is consistent with genomic adaptations to heat and drought, as revealed by a chromosome‑scale reference genome [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e]. Within \u003cem\u003ePhaseolus\u003c/em\u003e, tepary is closely related to common bean (\u003cem\u003eP. vulgaris\u003c/em\u003e), facilitating gene transfer and pre‑breeding routes for improvement [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e]. Its ability to thrive in hot, arid, low‑input environments positions tepary as a climate‑smart legume with promising food‑ingredient properties [\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e]\u003c/p\u003e \u003cp\u003eTepary seeds are rich in protein and dietary fiber, providing essential amino acids, which indicates a beneficial nutritional profile compared to common beans [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e, \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e]. However, their small seed size can limit market appeal by influencing water absorption, processing efficiency, and consumer acceptance [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e, \u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e]. Therefore, improving both seed size and protein quality is crucial to increasing tepary's adoption. In Phaseolus, association genetics offers a useful comparative framework. In common bean, image-based GWAS has identified loci linked to seed shape and size [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e], while combined GWAS and RNA-seq have identified candidates responsible for seed size, highlighting its polygenic nature [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e]. For nutritional traits, GWAS have mapped mineral content in common bean [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e], whereas in soybean, research on amino acids is more advanced, showing modular correlations and key loci [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e]. Similar progress is seen in other pulses: pea GWAS/QTL studies identified regions related to seed protein composition [\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e]; chickpea GWAS connected seed protein and other nutrients [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e, \u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e]; lentil studies have mapped traits related to protein quality, including amino acids and digestibility [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e]; and cowpea GWAS found a locus associated with seed protein content [\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eWithin tepary, the reference genome shows high collinearity with P. vulgaris and evidence of domestication under heat stress, enabling genome-informed improvement. However, research on food ingredients supports the development of protein-rich flours and also notes amino acid limitations, such as sulfur-containing amino acids, which guide efforts to enhance protein content and amino acid profiles. Recent association mapping in tepary has utilized whole-genome resequencing, mainly focusing on yield traits rather than seed composition. As a result, the genetic basis of seed protein percentage and free amino acids, and their relationship with seed size, remain relatively understudied in P. acutifolius.\u003c/p\u003e \u003cp\u003eGenotyping-by-sequencing (GBS)-based GWAS is particularly suitable for underutilized legume species, offering a cost-effective approach to SNP identification. Multi-locus models like BLINK and FarmCPU improve the detection of traits influenced by multiple genes, including seed composition. When aligned with the tepary bean reference genome, this method can generate hypotheses linking markers to genes associated with protein content, amino acid profiles, and seed size. These insights can guide direct improvements in tepary beans and facilitate interspecific transfer efforts to P. vulgaris. We performed a genome-wide association analysis of a diverse \u003cem\u003eP. acutifolius\u003c/em\u003e panel to explore seed protein concentration and free amino acid profiles, with seed size and width evaluated as supporting traits. The study found significant variation in composition traits; demonstrated that protein percentage and individual amino acids are controlled by a largely modular, non-overlapping genetic architecture; and provided a set of marker-to-trait hypotheses ready for use in selecting nutrition-focused tepary breeding lines. Fewer signals related to seed size and width showed modest effects, suggesting these traits can be improved alongside protein and amino-acid composition. Overall, these results represent the first genome-scale map of nutritional trait architecture in tepary bean and offer practical strategies for optimizing protein quality while considering seed size in a crop adapted to hot, arid, low-input environments.\u003c/p\u003e"},{"header":"Materials and methods","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003ePlant Material, Growth Conditions, and Experimental Design\u003c/h2\u003e \u003cp\u003eA diversity panel of 206 Phaseolus acutifolius accessions from the USDA\u0026ndash;NPGS, along with four commercial varieties, was included as checks set (CK1: Sacaton Brown, CK2: Sonoran White, CK3: Blue Speckled, CK4: Black) (see Additional file 1), was evaluated at the Texas A\u0026amp;M AgriLife Research and Extension Center in Uvalde, TX, during a certified organic field season, July to October 2024. The trial adhered to standard organic practices, using drip irrigation as needed and OMRI-certified bioinsecticides for pest management. Accessions were sown on 2024-07-25, and harvested at physiological maturity, between 2024-09-26 and 2024-09-30. We employed an augmented randomized complete block design (augRCBD) with five blocks. In each block, all four checks were replicated to help control for field variability and support the calculation of adjusted accession means (or BLUPs) for genotypes.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003ePhenotyping\u003c/h3\u003e\n\u003cdiv id=\"Sec5\" class=\"Section2\"\u003e \u003ch2\u003eSeed Weight Measurement: Hundred seed weight (HSW)\u003c/h2\u003e \u003cp\u003eAt physiological maturity, pods were collected from each accession of the Tepary Bean Diversity Panel and commercial varieties. The 100‑seed weight (HSW) was recorded from three independent 100‑seed subsamples per plant. Seed weights were recorded using an ML54T/00 analytical balance (Mettler-Toledo GmbH, Switzerland) with a precision of \u0026plusmn;\u0026thinsp;0.001 g.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eSeed Size Measurement: Seed width\u003c/h3\u003e\n\u003cp\u003eFive fully dried seeds per accession and commercial variety were scanned individually using an HP LaserJet Pro MFP M125nw scanner with a scale bar. Images were imported into ImageJ V. 1.54p (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://imagej.net/ij/\u003c/span\u003e\u003cspan address=\"https://imagej.net/ij/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e), and the scale was set using the known distance per image unit. This scale setting in ImageJ converts pixel units into a known scale unit, ensuring accurate pixel-to-unit conversion for comparable quantification of seed traits. Seed morphological traits, including seed width, area, length, and perimeter, were measured for each seed to assess natural variation. Seed width was used for the GWAS analysis.\u003c/p\u003e\n\u003ch3\u003eQuantification of total soluble seed protein\u003c/h3\u003e\n\u003cp\u003eThe total soluble seed protein was estimated using a pre-established method [\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e] with minor modifications. Freeze-dried seeds were ground to a fine powder using a mortar and pestle. Briefly, 15 mg of seed tissue per accession, including commercial varieties, was processed individually, suspended in 0.1 M NaOH, and sonicated for 15 min. The supernatant was collected by centrifugation at 14,000 \u0026times; g for 15 min. Protein concentration was determined using the Pierce\u0026trade; Bradford Plus Protein Assay Reagent (Catalog No. 23238, Thermo Fisher Scientific Inc., USA) according to the manufacturer\u0026rsquo;s instructions. Measurements were performed in triplicate in a microtiter plate, using bovine serum albumin (BSA) as an internal standard. Absorbance at 595 nm was recorded using the Varioskan LUX multimode microplate reader (Thermo Fisher Scientific Inc., USA). Protein concentration was calculated relative to the internal standard and expressed as mg g⁻\u0026sup1; of seed tissue and as a percentage (%).\u003c/p\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003eFree amino acid extraction and analysis\u003c/h2\u003e \u003cp\u003eFree amino acids were profiled following Joshi et al [\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e] with instrument-specific settings. Approximately 15 mg of seed powder was derivatized using the AccQ.Tag\u0026trade; 3X Ultra‑Fluor kit [Waters] and quantified on a Waters ACQUITY H‑Class UPLC coupled to an Xevo TQ mass spectrometer (ESI). Multiple reaction monitoring transitions, collision energies, and cone voltages were optimized in IntelliStart; data acquisition used MassLynx, and quantification used TargetLynx with external calibration.\u003c/p\u003e \u003cp\u003e \u003cb\u003eDNA extraction, GBS library construction, and SNP calling.\u003c/b\u003e \u003c/p\u003e \u003cp\u003eLeaf DNA was extracted from 3\u0026ndash;4‑week‑old seedlings using the DNeasy Plant Mini Kit (Qiagen, Cat no. 69104) according to the manufacturer\u0026rsquo;s instructions. DNA concentration was quantified with a DeNovix spectrophotometer (DS‑11, DeNovix Inc., USA). Library preparation and genotyping-by-sequencing (GBS) were performed at the University of Minnesota Genomics Center using ApeKI. Reads were aligned to the Phaseolus acutifolius v1.0 reference genome, and a variant call format (VCF) file was generated and filtered for missingness, heterozygosity, and MAF. LD‑kNNi imputation was applied to reduce missing data.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003ePopulation structure analysis\u003c/h3\u003e\n\u003cp\u003eLD‑pruned SNPs (PLINK; 50‑SNP window, r\u0026sup2; \u0026gt; 0.1) yielded 2,869 non‑redundant markers for STRUCTURE 2.3.4 runs (K\u0026thinsp;=\u0026thinsp;1\u0026ndash;10; burn‑in 50,000; MCMC 50,000; 10 iterations per K). ΔK was evaluated with Structure Harvester to select the optimal K. A Q‑matrix was generated and used to visualize ancestry proportions. Principal components were computed from genome‑wide SNPs, and the first five PCs (PCA\u0026thinsp;=\u0026thinsp;5) were included as covariates in GWAS. Delta K was evaluated using Structure Harvester [\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e], from which the optimal K was selected [\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e].\u003c/p\u003e\n\u003ch3\u003eGenomic prediction models, cross‑validation, and accuracy metrics\u003c/h3\u003e\n\u003cp\u003eTo complement GWAS and quantify the predictability of seed traits, we implemented a genomic prediction (GP) pipeline spanning seven learners: Bayesian Alphabet (BA), BayesB (BB), Bayes LASSO (BL), Bayesian Ridge Regression (BRR), Random Forest (RF), ridge‑regression BLUP (rrBLUP), and Support Vector Machine (SVM). Models were trained on the final GBS marker panel (49,384 SNPs) used for GWAS (24), with Stage‑1 BLUPs from the augmented‑RCBD mixed models as phenotypes (13\u0026ndash;18,25,26). We used 5‑fold cross‑validation (CV): ~80% of accessions for training and ~\u0026thinsp;20% held out for testing, rotating folds until each accession was predicted exactly once. For transparency, the n‑test per fold (number of accessions with non‑missing phenotypes in each test partition) is shown beneath the x‑axis in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e6\u003c/span\u003e. Predictive ability was defined as the Pearson correlation (r) between observed BLUPs and genomic estimated breeding values [GEBVs]. To benchmark learners, we computed Δr relative to rrBLUP (Δr\u0026thinsp;=\u0026thinsp;mean(r_model) \u0026ndash; mean(r_rrBLUP)). To preserve readability in trait panels, Δr values are summarized in the caption of Figure \u003cspan refid=\"MOESM9\" class=\"InternalRef\"\u003eS9\u003c/span\u003e rather than annotated in each panel [\u003cspan additionalcitationids=\"CR8 CR9 CR10\" citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e].\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cdiv id=\"Sec11\" class=\"Section2\"\u003e \u003ch2\u003eStatistical Analysis\u003c/h2\u003e \u003cp\u003eData processing and statistics were performed using JMP, R (v4.3.3), GAPIT, STRUCTURE 2.3.4, PLINK, and TASSEL 5.2.96. Figures were generated in R. GBS processing was performed by the University of Minnesota Genomics Center; imputation utilized LD-kNNi in TASSEL.\u003c/p\u003e \u003cp\u003eAnalyses were performed in JMP and R (v4.3.3) using GAPIT, STRUCTURE 2.3.4, PLINK, and TASSEL 5.2.96. Stage‑1 mixed models produced adjusted means (BLUPs) and their estimation error variance\u0026ndash;covariance (EEV), which were used as phenotypes in Stage‑2 marker models [\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e, \u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e, \u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e, \u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e]. GWAS (GLM, MLM, BLINK, FarmCPU) was run with PCs (n\u0026thinsp;=\u0026thinsp;5) and kinship (as applicable). Bonferroni correction at α\u0026thinsp;=\u0026thinsp;0.05 set the genome‑wide threshold; with 49,384 markers, this corresponded to \u0026minus;log10(p)\u0026thinsp;\u0026asymp;\u0026thinsp;6.\u003c/p\u003e \u003cp\u003eLinear mixed models for augmented RCBD: For each trait and environment, we fit linear mixed models appropriate for augmented RCBD to obtain genotype effects adjusted for blocks and checks. Checks and/or genotypes were treated as random to extract BLUPs when genotype variance was estimable. All BLUPs/BLUEs used for GWAS were given in Additional files 2 and 3.\u003c/p\u003e \u003cp\u003eGenome-wide association studies (GWAS): The filtered VCF file was converted to a numeric format for GWAS. GWAS were performed using GAPIT (R) with GLM, MLM, BLINK, and FARMCPU models. Population structure was controlled by PCs (n\u0026thinsp;=\u0026thinsp;5) and, where applicable, by relatedness (K). We used a Bonferroni correction (α\u0026thinsp;=\u0026thinsp;0.05/marker) to declare genome-wide significance [\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e]. For our current panel of 206 accessions and 49,384 high-quality markers, the -log10(p-value) threshold for genome-wide significance was 6.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec12\" class=\"Section2\"\u003e \u003ch2\u003eCandidate-gene discovery\u003c/h2\u003e \u003cp\u003eFor each significant SNP, gene models, and annotations within \u0026plusmn;\u0026thinsp;10 kb were retrieved from Phytozome v14 (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://phytozome-next.jgi.doe.gov/\u003c/span\u003e\u003cspan address=\"https://phytozome-next.jgi.doe.gov/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e). Top candidates per trait are summarized in Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e; the full list is provided in Additional file 4.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003e\u003cb\u003eThe top candidate genes per trait identified by GWAS in tepary bean.\u003c/b\u003e For each trait, the top three associations (lowest P‑values within trait) are reported with model, SNP ID, chromosome (Chr), genomic position (Pos), minor allele frequency (MAF), effect sign, and the nearest candidate gene (gene model, functional annotation) within the \u0026plusmn;\u0026thinsp;10 kb window used for candidate discovery. GWAS used Bayesian information and Linkage-disequilibrium Iteratively Nested Keyway (BLINK) and Fixed and random model Circulating Probability Unification (FarmCPU) on Stage‑1 BLUPs, with population structure controlled by PCA\u0026thinsp;=\u0026thinsp;5 and, where applicable, kinship; the Bonferroni genome‑wide threshold at α\u0026thinsp;=\u0026thinsp;0.05 (49,384 SNPs) corresponds to \u0026minus;\u0026thinsp;log₁₀p\u0026thinsp;\u0026asymp;\u0026thinsp;6. Coordinates and gene models are based on the tepary bean reference genome (\u003cem\u003ePhaseolus acutifolius\u003c/em\u003e v1.0). Abbreviations: Chr, chromosome; Pos, genomic position (bp); minor allele frequency (MAF); BLUP, best linear unbiased prediction. See Additional file 4 for all additional significant associations.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"9\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c8\" colnum=\"8\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c9\" colnum=\"9\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTrait\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eModel\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eSNP\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eChromosome\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003ePosition\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eP.value\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003eMAF\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c8\"\u003e \u003cp\u003eGene Model\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c9\"\u003e \u003cp\u003eFunctional Annotation\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"7\" rowspan=\"8\"\u003e \u003cp\u003eTotal Soluble Protein%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\" morerows=\"7\" rowspan=\"8\"\u003e \u003cp\u003eBLINK\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003eS08_8796692\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003eChr08\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003e8796692\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003e1.38E-10\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003e0.36585\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003ePhacu.CVR.008G093700\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003eWNK protein kinase\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003ePhacu.CVR.008G093800\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e2-oxoglutarate [2OG] and Fe [II]-dependent oxygenase superfamily protein\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003ePhacu.CVR.008G093900\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003eUnknown\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c3\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eS01_17593283\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eChr01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003e17593283\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003e2.59E-08\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003e0.36829\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003ePhacu.CVR.001G119800\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003ealpha/beta-Hydrolases superfamily protein\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003ePhacu.CVR.001G119900\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003ezinc finger [Ran-binding] family protein\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c3\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eS08_8784174\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eChr08\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003e8784174\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003e6.95E-07\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003e0.35366\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003ePhacu.CVR.008G093600\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003eL-arabinose isomerase\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003ePhacu.CVR.008G093700\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003eWNK protein kinase\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eS08_8761065\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eChr08\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e8761065\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e2.70E-06\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.38293\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003ePhacu.CVR.008G093400\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003eMATE efflux family protein\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"4\" rowspan=\"5\"\u003e \u003cp\u003e100-SeedWeight\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\" morerows=\"4\" rowspan=\"5\"\u003e \u003cp\u003eBLINK\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003eS02_41997902\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003eChr02\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003e41997902\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003e7.24E-08\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003e0.27622\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003ePhacu.CVR.002G331100\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e6-phosphogluconate dehydrogenase\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003ePhacu.CVR.002G331300\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003evacuolar H+-ATPase subunit E isoform 3\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003ePhacu.CVR.002G331400\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003eGalactose mutarotase-like superfamily protein\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c3\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eS01_49536594\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eChr01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003e49536594\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003e7.57E-07\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003e0.36713\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003ePhacu.CVR.001G230400\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003eCTR1-like protein kinase, putative, expressed\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003ePhacu.CVR.001G230600\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003eENTH/VHS family protein; ZOS9-20 - C2H2 zinc finger protein,\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"5\" rowspan=\"6\"\u003e \u003cp\u003eSeed Width\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\" morerows=\"5\" rowspan=\"6\"\u003e \u003cp\u003eFARMCPU\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eS07_13428624\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eChr07\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e13428624\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e3.16E-12\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.42958\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003ePhacu.CVR.007G121900\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003eindole-3-acetic acid inducible 14; Auxin-responsive Aux/IAA gene family member,\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eS01_17629015\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eChr01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e17629015\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e2.20E-08\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.41725\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003ePhacu.CVR.001G120100\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003eUnknown\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c3\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eS08_46951004\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eChr08\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003e46951004\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003e4.40E-08\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003e0.32394\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003ePhacu.CVR.008G266400\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003eProtein of unknown function [DUF630 and DUF632]\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003ePhacu.CVR.008G266500\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003eUnknown\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c3\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eS08_2123448\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eChr08\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003e2123448\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003e1.86E-07\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003e0.35211\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003ePhacu.CVR.008G025600\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003eSMAD/FHA domain-containing protein\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003ePhacu.CVR.008G025700\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003eDNA/RNA polymerases superfamily protein\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMethionine\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eBLINK\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eS02_10429456\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eChr02\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e10429456\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e6.85E-07\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.05446\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003ePhacu.CVR.002G092500\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003eTEOSINTE BRANCHED, cycloidea and PCF [TCP] 14\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"13\" rowspan=\"14\"\u003e \u003cp\u003eThreonine\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\" morerows=\"13\" rowspan=\"14\"\u003e \u003cp\u003eBLINK\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\" morerows=\"3\" rowspan=\"4\"\u003e \u003cp\u003eS08_49548543\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\" morerows=\"3\" rowspan=\"4\"\u003e \u003cp\u003eChr08\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\" morerows=\"3\" rowspan=\"4\"\u003e \u003cp\u003e49548543\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\" morerows=\"3\" rowspan=\"4\"\u003e \u003cp\u003e6.95E-10\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\" morerows=\"3\" rowspan=\"4\"\u003e \u003cp\u003e0.07921\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003ePhacu.CVR.008G290800\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003eUnknown\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003ePhacu.CVR.008G290900\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003emitochondrial carrier protein, putative\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003ePhacu.CVR.008G291000\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003eLIM domain-containing protein, putative,\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003ePhacu.CVR.008G291100\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003eSer/Thr protein phosphatase family protein,\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eS06_14126029\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eChr06\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e14126029\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e3.99E-08\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.18317\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003ePhacu.CVR.006G045800\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003esucrose-phosphate synthase, putative\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c3\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eS04_51739891\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eChr04\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003e51739891\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003e1.40E-07\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003e0.14356\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003ePhacu.CVR.004G148800\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003ecytochrome P450, putative, expressed\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003ePhacu.CVR.004G148900\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003ecytochrome P450, putative, expressed\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c3\" morerows=\"3\" rowspan=\"4\"\u003e \u003cp\u003eS02_5914294\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\" morerows=\"3\" rowspan=\"4\"\u003e \u003cp\u003eChr02\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\" morerows=\"3\" rowspan=\"4\"\u003e \u003cp\u003e5914294\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\" morerows=\"3\" rowspan=\"4\"\u003e \u003cp\u003e2.21E-07\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\" morerows=\"3\" rowspan=\"4\"\u003e \u003cp\u003e0.0495\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003ePhacu.CVR.002G063900\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003eMetallo-endoproteinase 1 precursor, putative,\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003ePhacu.CVR.002G064000\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003ePentatricopeptide repeat [PPR] superfamily protein\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003ePhacu.CVR.002G064100\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003eintegral membrane protein DUF6 domain-containing protein\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003ePhacu.CVR.002G064200\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003eUbiquitin-like superfamily protein\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c3\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003eS08_52080067\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003eChr08\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003e52080067\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003e6.22E-07\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003e0.10891\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003ePhacu.CVR.008G318200\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003eUnknown\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003ePhacu.CVR.008G318300\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003eexpressed protein\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003ePhacu.CVR.008G318400\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003eUbiquitin-specific protease family C19-related protein\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003eLysine\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003eBLINK\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eS07_22884181\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eChr07\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e22884181\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e4.97E-07\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.056931\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003ePhacu.CVR.007G151500\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003eUnknown\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eS11_10209988\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eChr11\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e10209988\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e6.55E-07\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.059406\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003ePhacu.CVR.011G105200\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003eS-adenosyl-L-methionine-dependent methyltransferases superfamily protein\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eS04_6376564\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eChr04\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e6376564\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.000001\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.311881\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003ePhacu.CVR.004G052100\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003eWUSCHEL-related homeobox 11; expressed\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e"},{"header":"Results","content":"\u003cdiv id=\"Sec14\" class=\"Section2\"\u003e \u003ch2\u003ePhenotypic variations across the diversity panel\u003c/h2\u003e \u003cp\u003eSeed size and composition varied widely among accessions. Hundred‑seed weight (HSW) ranged from 2 to 16 g 100‑seed⁻\u0026sup1; across the 206 accessions and four checks (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e1\u003c/span\u003eA). Several accessions, such as PI 653254, PI 502217, PI 666351, and PI 535227, showed high 100-seed weights ranging from 14 g to 16 g/100-seeds. Checks showed an average of 10.5 g/100-seeds. Seed width spanned 0.261\u0026ndash;0.711 cm; the checks clustered narrowly around 0.536\u0026ndash;0.640 cm (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e1\u003c/span\u003eB). Overall mean seed width ranged from 0.26 to 0.71 cm. The mean seed width of checks was 0.577 cm, with a narrow range of 0.536\u0026ndash;0.640 cm, indicating phenotypic stability. PI 502217, W6 38693, and PI 440805 showed the highest seed widths, ranging from 0.65 to 0.71 cm. Total soluble seed protein ranged from 10 to 38%, with an overall significant genotype effect (one‑way ANOVA, p\u0026thinsp;\u0026lt;\u0026thinsp;0.0001) (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e2\u003c/span\u003e). Checks showed an average protein percentage of around 23%. Some accessions with the highest total soluble seed protein content were PI 201268 (32%), W6 38850 (35.3%), PI 640957 (34.1%), PI 440802 (30%), PI 440806 (28.9%), PI 200902 (27.3%), PI 321638 (27.1%), and PI 319439 (28.2%).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eAcross accessions, 19 free amino acids were detected; arginine (~\u0026thinsp;28%), asparagine (~\u0026thinsp;20%), aspartic acid (~\u0026thinsp;18.5%), and glutamic acid (~\u0026thinsp;12.4%) dominated the pool (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e3\u003c/span\u003eA). PCA showed multivariate structure with clear groupings of essential, branched‑chain, and nitrogen‑rich amino acids: PC1 (24.3%) loaded most strongly on methionine, histidine, leucine, isoleucine, and valine, whereas PC2 (13.0%) was driven by allantoin, asparagine, and glutamic acid (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e3\u003c/span\u003eB).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec15\" class=\"Section2\"\u003e \u003ch2\u003eGBS, variant filtering, and marker set for association\u003c/h2\u003e \u003cp\u003eGBS produced\u0026thinsp;~\u0026thinsp;830\u0026nbsp;million raw reads (Mean\u0026thinsp;~\u0026thinsp;3.95 M per sample). Variant calling yielded ~\u0026thinsp;720k raw markers, which, after quality control and LD‑kNNi imputation, resulted in a final set of 49,384 high‑quality SNPs distributed across the 11 tepary chromosomes for association analyses (See FigShare link \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.6084/m9.figshare.31158676\u003c/span\u003e\u003cspan address=\"10.6084/m9.figshare.31158676\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec16\" class=\"Section2\"\u003e \u003ch2\u003ePopulation structure and relatedness\u003c/h2\u003e \u003cp\u003eSTRUCTURE indicated K\u0026thinsp;=\u0026thinsp;2 subpopulations in the panel, corroborated by PCA (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e4\u003c/span\u003e). PC1 (23.2%) and PC2 (7.3%) separated a subset of check lines from the broader germplasm; this structure was controlled in all GWAS models via principal components (PCA\u0026thinsp;=\u0026thinsp;5) (see Additional file 5).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec17\" class=\"Section2\"\u003e \u003ch2\u003eGWAS for total soluble seed protein and free amino acids\u003c/h2\u003e \u003cp\u003eAssociation analyses (GLM, MLM, BLINK, FarmCPU) were performed on BLUPs with PCA\u0026thinsp;=\u0026thinsp;5 (and K where applicable). Genome‑wide significance was set at Bonferroni α\u0026thinsp;=\u0026thinsp;0.05 (49,384 markers; \u0026ndash;log₁₀p\u0026thinsp;\u0026asymp;\u0026thinsp;6). Four significant associations for total soluble seed protein percentage were identified on chromosomes 1 and 8: S08_8796692 (p\u0026thinsp;=\u0026thinsp;1.38\u0026times;10⁻\u0026sup1;⁰), S01_17593283 (p\u0026thinsp;=\u0026thinsp;2.59\u0026times;10⁻⁸), S08_8784174 (p\u0026thinsp;=\u0026thinsp;6.95\u0026times;10⁻⁷), and S08_8761065 (p\u0026thinsp;=\u0026thinsp;2.70\u0026times;10⁻⁶); the first three exceeded the Bonferroni threshold (\u0026minus;\u0026thinsp;log₁₀p\u0026thinsp;\u0026asymp;\u0026thinsp;6), whereas S08_8761065 was suggestive. Candidate genes included Phacu.CVR.008G093800 (2‑oxoglutarate (2OG)/Fe (II) oxygenase), Phacu.CVR.001G119900 (Ran‑binding zinc finger), and Phacu.CVR.008G093700 (WNK kinase) (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e4\u003c/span\u003eA, Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e, Additional file 4).\u003c/p\u003e \u003cp\u003eAcross the nineteen amino acids, fifteen traits showed at least one genome‑wide significant association (representative plots in Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e5\u003c/span\u003e and Additional files 6, 7, and 8). Exemplars include glutamine on Chr09 (S09_37434888, p\u0026thinsp;=\u0026thinsp;1.39 \u0026times; 10⁻\u0026sup1;⁹) and threonine on Chr08 (S08_49548543, p\u0026thinsp;=\u0026thinsp;6.95 \u0026times; 10⁻\u0026sup1;⁰). Significant intervals for individual amino acids did not overlap, consistent with trait-specific genetic control. Both essential and non-essential amino acids exhibited distinct associations, suggesting differential genetic regulation of transport, storage, and biosynthetic pathways. Branched-chain amino acids, such as leucine, isoleucine, and valine, were associated with genetic loci linked to metabolic regulation and biosynthetic pathways. Notably, none of the amino acid traits shared SNPs or genomic regions, suggesting highly modular, trait-specific genetic control of seed metabolite levels in P. acutifolius.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eCandidate-gene exploration within \u0026plusmn;\u0026thinsp;10 kb of all genome-wide significant SNPs identified 192 unique genes across fifteen seed amino acids, with functions relevant to seed metabolism, nutrient transport, and developmental regulation (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e, Additional file 4). Functional annotations showed strong enrichment for gene families involved in primary carbon and nitrogen metabolism and for membrane-associated and transport-related protein families, indicating roles in amino acid transport and storage during seed development. Essential amino acid Threonine revealed five significant SNPs and identified plausible candidate genes on chromosome eight encoding a Ser/Thr protein phosphatase family protein (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e5\u003c/span\u003eA). Three genome‑wide significant BLINK associations were detected for lysine: S07_22884181 (p\u0026thinsp;=\u0026thinsp;4.97\u0026times;10⁻⁷; MAF\u0026thinsp;\u0026asymp;\u0026thinsp;0.057), S11_10209988 (p\u0026thinsp;=\u0026thinsp;6.55\u0026times;10⁻⁷; MAF\u0026thinsp;\u0026asymp;\u0026thinsp;0.059), and S04_6376564 (p\u0026thinsp;\u0026asymp;\u0026thinsp;1.0\u0026times;10⁻⁶; MAF\u0026thinsp;\u0026asymp;\u0026thinsp;0.312). Candidate-gene inspection identified Phacu.CVR.011G105200, encoding an S-adenosyl-L-methionine-dependent methyltransferase superfamily protein, and Phacu.CVR.004G052100, annotated as WUSCHEL-related homeobox 11 (WOX11) (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e5\u003c/span\u003eC). The lack of shared candidate genes across traits such as seed weight, protein percentage, and amino acid levels underscores the independence of these physiological processes and suggests that tepary bean's genetic architecture for seed yield and nutritional traits is highly partitioned.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec18\" class=\"Section2\"\u003e \u003ch2\u003eGWAS for hundred-seed weight and seed width\u003c/h2\u003e \u003cp\u003eFor 100‑seed weight, two genome‑wide significant associations were detected: S02_41997902 (p\u0026thinsp;=\u0026thinsp;7.24\u0026times;10⁻⁸) and S01_49536594 (p\u0026thinsp;=\u0026thinsp;7.57\u0026times;10⁻⁷) (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e4\u003c/span\u003eB; Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). Candidate gene analysis identified plausible loci, including Phacu.CVR.002G331300, which encodes a vacuolar H\u003csup\u003e+\u003c/sup\u003e-ATPase subunit involved in pH-dependent transport and endomembrane processes that influence plant growth and development, and Phacu.CVR.001G230400, which encodes a CTR1-like protein kinase. For seed width, five significant associations were identified on chromosomes 1, 3, 5, 7, and 8 (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e4\u003c/span\u003eC). Candidate gene analysis revealed Phacu.CVR.007G121900, which encodes the auxin-responsive Aux/IAA gene family.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec19\" class=\"Section2\"\u003e \u003ch2\u003eGenomic prediction Model accuracy across seed‑quality and seed‑size traits\u003c/h2\u003e \u003cp\u003eAcross models, seed‑size traits were highly predictable: hundred‑seed weight (HSW) and seed width achieved r\u0026thinsp;\u0026asymp;\u0026thinsp;0.90\u0026ndash;0.96 for most learners, matching reports that morphological seed traits in pulses exhibit a strong, polygenic signal [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e, \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e]. These high accuracies are consistent with the presence of robust loci (e.g., V‑ATPase / Aux/IAA) and a sizable ridge‑captured polygenic component. In contrast, total soluble protein (%) and free amino acids displayed moderate predictive abilities (r\u0026thinsp;\u0026asymp;\u0026thinsp;0.15\u0026ndash;0.45), which mirrors the modular, small‑effect genetic architecture observed in our GWAS and in other legumes where amino‑acid and protein‑quality traits are governed by dispersed, low‑effect loci [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e, \u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e]. Overall, rrBLUP and BRR performed on par with Bayesian sparsity learners (BA, BB, BL) and nonlinear models (RF, SVM), indicating that shrinkage‑based models remain robust baselines when the signal is broadly polygenic [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e, \u003cspan additionalcitationids=\"CR10\" citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e]. Because phenotype availability differed slightly by fold, n‑test per fold values are shown below each panel in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e6\u003c/span\u003e. To maintain figure clarity, Δr relative to rrBLUP is summarized outside the panels in Figure \u003cspan refid=\"MOESM9\" class=\"InternalRef\"\u003eS9\u003c/span\u003e; averaged across traits, BRR showed a small positive Δr, whereas BA, BB, BL, RF, and SVM were near‑zero or slightly negative\u0026mdash;again consistent with polygenic architectures where ridge‑based estimators are hard to beat [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e, \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e, \u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e].\u003c/p\u003e \u003c/div\u003e"},{"header":"Discussion","content":"\u003cp\u003e \u003cb\u003eSeed protein and amino acids in tepary beans are controlled by modular, pathway-aware genetic loci.\u003c/b\u003e \u003c/p\u003e \u003cp\u003eAcross the panel, the percentages of total protein and of fifteen of nineteen free amino acids mapped to distinct loci, suggesting a modular genetic architecture rather than a single pleiotropic control point. This aligns with legume GWAS in soybean, which shows trait clusters for amino acids, and with studies in lentil and pea, where protein quality traits and storage protein composition are dispersed across different regions [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e, \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e, \u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e]. The association for protein% on Chr08 coincided with a WNK kinase and a 2‑oxoglutarate/Fe(II)‑dependent oxygenase, indicating a connection between regulatory signaling and the 2‑oxoglutarate node that links carbon and nitrogen metabolism during seed filling\u0026mdash;consistent with the central roles of GS/GOGAT and transamination in amino acid synthesis [\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e, \u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e]. This architecture supports a transport-and-sink model where biosynthesis, long-distance transport, and sink import each contribute trait-specific variance, which appears at different association windows in seeds [\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e]. Among essential amino acids, threonine and methionine are traditionally limiting in pulses and are key targets for improving dietary protein quality [\u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e, \u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e]. Threonine showed a significant genome-wide association on Chr08 within a phosphatase-rich region, providing the first tepary-specific marker to track flux in the aspartate family pathway, which also produces Met, Ile, and Lys. This provides a practical tool for marker-assisted selection to increase threonine levels while monitoring related amino acids [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e, \u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e]. Similarly, methionine produced a genome-wide signal in our association table, marking the first GWAS indicator in tepary for a nutritionally critical amino acid. Using the Met-associated SNP, along with the protein% interval, enables recurrent selection for improved protein quality without reducing total protein; genomic selection can also capture residual variance across multiple loci within the amino acid panel [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e, \u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e]. In addition, lysine mapped to two intervals: S11_10209988, adjacent to Phacu.CVR.011G105200 (annotated as a SAM‑dependent methyltransferase) and S04_6376564 near Phacu.CVR.004G052100 (WOX11). Together, these loci suggest complementary regulatory levers for amino‑acid homeostasis\u0026mdash;epigenetic control via the SAM cycle and methylation [\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e, \u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e] and developmental regulation via a WUSCHEL‑related homeobox factor that integrates auxin/cytokinin cues [\u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e, \u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e]. Consistent with the coordinated control in the aspartate‑family pathway, feedback at key branch‑point enzymes (AK, HSD, DHDPS) links Lys\u0026ndash;Thr\u0026ndash;Met pools [\u003cspan additionalcitationids=\"CR38 CR39\" citationid=\"CR37\" class=\"CitationRef\"\u003e37\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e40\u003c/span\u003e], providing a mechanistic context for the observed lysine associations and their potential effects on related essential amino acids. The Chr11 signal colocalizes with a SAM‑dependent methyltransferase, connecting lysine to the SAM cycle and epigenetic control during seed filling; SAM supply via MAT enzymes is known to influence DNA and histone methylation and transcriptional programs in plants [\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e, \u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e]. Taken together, tepary's suitability for low-input, heat- and drought-adapted systems makes it a resilient legume protein source that can improve protein quality in resource-limited environments [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e, \u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e].\u003c/p\u003e \u003cdiv id=\"Sec21\" class=\"Section2\"\u003e \u003ch2\u003eSeed‑size genetics and deployment: from V‑ATPase and Aux/IAA signals to\u003c/h2\u003e \u003cdiv id=\"Sec22\" class=\"Section3\"\u003e \u003ch2\u003emarketable size under low inputs\u003c/h2\u003e \u003cp\u003eSeed-size mapping in tepary revealed a few loci with modest effects: one on Chr02 influencing hundred-seed weight, involving a vacuolar H⁺‑ATPase subunit E and 6‑phosphogluconate dehydrogenase, and another on Chr07 affecting seed width near an Aux/IAA gene. This reflects polygenic architecture, as evidenced by image-based GWAS findings in common bean [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e]. The V‑ATPase plays a key role in energizing tonoplast transport and maintaining vacuolar pH, which are vital for nutrient storage and cell expansion, thus linking endomembrane functions to organ growth [\u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e]. Aux/IAA proteins are vital repressors in the auxin regulatory pathway; auxin influences cell division and expansion in developing seeds, positioning the seed width signal within a well-established growth control pathway [\u003cspan citationid=\"CR43\" class=\"CitationRef\"\u003e43\u003c/span\u003e, \u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e44\u003c/span\u003e]. Breeders can use marker-assisted selection targeting the Chr02 HSW and Chr07 width regions to enhance seed size and width, incorporating an index that also considers protein percentage and essential amino acids to sustain nutritional improvements. Because tepary thrives with low inputs and fixes nitrogen naturally, modest genetic gains in seed size could boost marketability and consumer appeal without significant input costs, aligning with the crop\u0026rsquo;s ecological niche [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e, \u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e]. Compared with earlier GWAS using whole-genome resequencing that focused on seed yield and 100-seed weight across multiple environments, our GBS-based, composition-focused analysis at a single organic site identified a unique Chr02 HSW region enriched for growth-regulatory candidates. This suggests that seed size control in tepary involves both environment-responsive loci, evident in multi-environment trial models, and developmentally anchored loci detectable through standardized composition assessments [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e, \u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e45\u003c/span\u003e].\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec23\" class=\"Section3\"\u003e \u003ch2\u003eIntegrating genomic prediction with seed‑composition and seed‑size GWAS signals\u003c/h2\u003e \u003cp\u003eThe genomic prediction (GP) analyses (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e6\u003c/span\u003e; Fig. \u003cspan refid=\"MOESM9\" class=\"InternalRef\"\u003eS9\u003c/span\u003e) complement the GWAS findings by quantifying the predictability of seed‑protein, amino‑acid, and seed‑size traits across models of varying complexity. The high predictive abilities for hundred‑seed weight and seed width (r\u0026thinsp;\u0026asymp;\u0026thinsp;0.90\u0026ndash;0.96) are consistent with findings in other pulses\u0026mdash;such as common bean [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e] and pea [\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e]\u0026mdash;where seed morphological traits exhibit a strong polygenic signal that is readily captured by shrinkage‑based models. These accuracies confirm that both major GWAS loci (e.g., V‑ATPase on Chr02; Aux/IAA on Chr07) and background polygenic variance contribute consistently to seed‑size traits. In contrast, predictive abilities for total soluble protein (%) and amino‑acid traits were moderate (r\u0026thinsp;\u0026asymp;\u0026thinsp;0.15\u0026ndash;0.45), mirroring the modular genetic architecture detected in GWAS. Similar patterns are reported in soybean [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e] and lentil [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e], where amino acids and protein‑quality traits show dispersed, low‑effect loci and correspondingly modest genomic prediction accuracies. Across models, rrBLUP and BRR performed comparably to Bayesian and nonlinear learners\u0026mdash;indicating that nutritional composition traits in tepary bean are governed largely by many small‑effect loci, and not by sparse large‑effect signals. Consistent with this architecture, model improvements relative to rrBLUP (Δr) were small and trait‑consistent (Fig. \u003cspan refid=\"MOESM9\" class=\"InternalRef\"\u003eS9\u003c/span\u003e): BRR provided slightly higher average accuracy, whereas BA, BB, BL, RF, and SVM showed modestly lower or near‑zero Δr. This behavior matches reports in soybean, lentil, chickpea, and pea [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e, \u003cspan additionalcitationids=\"CR10\" citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e], where ridge‑based models frequently outperform more complex algorithms for nutritional‑quality traits dominated by polygenic variation. Collectively, the GWAS\u0026ndash;GP integration supports a dual breeding‑strategy model: (i) Marker‑assisted selection (MAS) for the major protein‑quality and essential‑amino‑acid loci detected in this study (e.g., protein% WNK/2‑OG oxygenase locus on Chr08; threonine locus on Chr08; methionine locus on Chr02), and (ii) Genomic selection (GS) to capture the remaining distributed genetic variance that influences overall protein quality and amino‑acid balance. Such MAS\u0026thinsp;+\u0026thinsp;GS pipelines are increasingly common in legume improvement [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e], and they align well with tepary bean\u0026rsquo;s adaptation to low‑input, heat‑ and drought‑stressed systems [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e, \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e], positioning the crop for climate‑resilient, nutrition‑focused breeding.\u003c/p\u003e \u003cp\u003e \u003cb\u003ePositioning tepary among pulses and delivering new tools for introgression.\u003c/b\u003e \u003c/p\u003e \u003cp\u003eAcross pulses, the field of nutrition-genetics now includes GWAS studies on amino acids and protein quality in soybean and lentil, a significant locus linked to seed protein in cowpea, and various loci related to composition in pea. In this context, this study provides first-generation tepary SNP tools targeting protein percentage, essential amino acids (threonine and methionine), and seed size [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e, \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e, \u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e, \u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e]. Coupled with a chromosome-scale tepary reference genome and known adaptation to heat and drought, these markers facilitate nutrition-focused breeding under low-input, nitrogen-fixing farming, an uncommon combination of resilience and improved protein content that bolsters tepary\u0026rsquo;s potential as a climate-smart legume [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e, \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e, \u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e]. Legume breeders can use marker-assisted selection by combining SNPs associated with protein percentage, Thr, Met, and seed size. They can also use synteny to identify orthologous regions in common bean for comparative mapping or backcrossing. Furthermore, genomic selection can be applied to harness polygenic traits linked to amino acid content. Additionally, genome\u0026ndash;environment pipelines tailored for common bean that consider heat and drought stress can enhance the deployment of these new alleles in difficult environments [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e, \u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e, \u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eWe recommend conducting comparative synteny studies among \u003cem\u003ePhaseolus\u003c/em\u003e species to identify orthologous variants in common bean and interspecific materials. Integrating selected SNPs or candidate loci identified from tepary in this study would facilitate accelerating the development of nutrition-focused ideotypes suited for arid conditions [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e, \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e, \u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e, \u003cspan additionalcitationids=\"CR28 CR29\" citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e, \u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e]. Further research should validate the links between threonine, methionine, and protein content across heat- and drought-prone environments through multi-environment trials. Additionally, in-depth studies using fine-mapping, RNA-seq, and metabolomics during seed filling are needed to clarify causal relationships and test hypotheses related to WNK signaling, 2-oxoglutarate oxygenases, amino acid transporters, and regulatory phosphatases. Converting key SNPs into cost-effective breeder assays, such as KASP, will aid index-based selection for protein quality and seed size, particularly in organic or low-input systems typical of tepary beans.\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e"},{"header":"Conclusion","content":"\u003cp\u003eThis study provides a genome-wide analysis of tepary bean nutritional and size traits, identifying key associations with protein content, essential amino acids (threonine and methionine), and seed size through a single, low-input, organic field screening of 206 accessions. It highlights a region on Chr08 linked to WNK kinases and a 2-oxoglutarate oxygenase, for % protein content, linking regulatory signaling and C\u0026ndash;N metabolism to storage protein accumulation. The trait-specific patterns across fifteen amino acids suggest a modular genetic architecture conducive to marker stacking and genomic selection. To improve protein quality, threonine (Chr08) and methionine markers are the first tepary-specific tools for optimizing these essential amino acids, allowing breeders to enhance protein quality without reducing total protein. In seed size, HSW on Chr02 and width on Chr07 correspond to growth-related candidates such as V‑ATPase and Aux/IAA, supporting index selection to improve size and nutrition simultaneously under low-input, nitrogen-fixing farming. Overall, these loci serve as early-generation SNP tools for developing nutrition-oriented, climate-resilient tepary ideotypes and for applying these advances to \u003cem\u003ePhaseolus\u003c/em\u003e crops through synteny-guided breeding and introgression. Further studies that convert key SNPs into breeder-ready assays (e.g., KASP), test their effects across heat- and drought-stress environments, and fine-map causal variants using gene expression and metabolomics during seed fill are recommended. This study highlights tepary bean as a valuable protein-rich legume and offers genomic tools to accelerate significant improvements in seed protein yield, quality, and size with minimal inputs. In addition to GWAS‑identified loci, the high predictive abilities for hundred‑seed weight and seed width indicate that these traits are well suited to genomic selection, whereas the moderate predictive abilities for protein percentage and amino acids reflect their polygenic, modular architecture. Together, the GWAS\u0026ndash;GP framework demonstrates the feasibility of a dual breeding strategy that pairs marker-assisted selection of large-effect loci with genomic selection to accumulate polygenic improvements in nutritional quality. Incorporating these prediction models will accelerate the development of climate‑resilient, nutrition‑focused tepary ideotypes and inform comparative breeding across \u003cem\u003ePhaseolus\u003c/em\u003e species.\u003c/p\u003e"},{"header":"Abbreviations","content":"\u003cdiv class=\"DefinitionList\"\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003eAA\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003eAmino acid\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003eBSA\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003eBovine serum albumin\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003eBLUP/BLUE\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003eBest linear unbiased prediction/estimate\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003eGBS\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003eGenotyping‑by‑sequencing\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003eGS\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003eGenomic selection\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003eGWAS\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003eGenome‑wide association study\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003eHSW\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003eHundred‑seed weight\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003eKASP\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003eKompetitive allele‑specific PCR\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003eLD‑kNNi\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003eLinkage disequilibrium k‑nearest neighbors imputation\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003eMAF\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003eMinor allele frequency\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003eMAS\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003eMarker‑assisted selection\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003ePCA\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003ePrincipal component analysis\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003eV‑ATPase\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003eVacuolar H⁺‑ATPase\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003eWNK\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003eWith‑No‑Lysine(K) protein kinase. Generalized Linear Models (GLM), Mixed linear model (MLM), Bayesian-information and Linkage-disequilibrium Iteratively Nested Keyway (BLINK), Fixed and random model Circulating Probability Unification (FarmCPU).\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003c/div\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eEthics approval and consent to participate\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNot applicable\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConsent for publication\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNot applicable\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAvailability of data and materials\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eAll sequencing data for tepary bean accessions have been submitted to the National Center for Biotechnology Information (NCBI) under the BioProject accession PRJNA1416895. Filtered GBS SNP data is available on FigShare: https://doi.org/10.6084/m9.figshare.31158676\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eList of USDA-NPGS Tepary bean accessions (Additional file 1), BLUPs/BLUEs used in this study (Additional file 2 and 3), and significant SNPs/candidate gene annotations (Additional file 4) are available on FigShare: https://doi.org/10.6084/m9.figshare.31334209\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCompeting interests\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors declare that they have no competing interests.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis research was supported by Sustainable Agriculture Research and Education (SARE) Graduate Student Grant 2025 (GS24-299), Texas Department of Agriculture (TDA), and Texas A and M AgriLife Research and Extension Centre, Uvalde, TX, USA.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAcknowledgements\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors would like to thank Dalton Thompson, Research Technician, Systems Plant Physiology, Texas A\u0026amp;M AgriLife Research and Extension Centre, Uvalde, TX, for conducting amino acid quantifications. The authors acknowledge funding from the Sustainable Agriculture Research and Education (SARE), the Texas Department of Agriculture (TDA), and Texas A\u0026amp;M AgriLife Research.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eMoghaddam SM, Oladzad A, Koh C, Ramsay L, Hart JP, Mamidi S, et al. The tepary bean genome provides insight into evolution and domestication under heat stress. Nat Commun 2021 May 11;12(1):2638\u0026ndash;x.\u003c/li\u003e\n\u003cli\u003ePorch TG, Cichy K, Wang W, Brick M, Beaver JS, Santana-Morant D, et al. Nutritional composition and cooking characteristics of tepary bean (Phaseolus acutifolius Gray) in comparison with common bean (Phaseolus vulgaris L.). Genet Resour Crop Evol 2017;64(5):935\u0026ndash;953.\u003c/li\u003e\n\u003cli\u003eL\u0026oacute;pez-Ibarra C, Ruiz-L\u0026oacute;pez FdJ, Bautista-Villarreal M, B\u0026aacute;ez-Gonz\u0026aacute;lez JG, Rodr\u0026iacute;guez Romero BA, Gonz\u0026aacute;lez-Mart\u0026iacute;nez BE, et al. Protein Concentrates on Tepary Bean (Phaseolus acutifolius Gray) as a Functional Ingredient: In silico Docking of Tepary Bean Lectin to Peroxisome Proliferator-Activated Receptor Gamma. Frontiers in Nutrition 2021;olume 8 - 2021.\u003c/li\u003e\n\u003cli\u003eGiordani W, Gama HC, Chiorato AF, Garcia AAF, Vieira MLC. Genome-wide association studies dissect the genetic architecture of seed shape and size in common bean. G3 Genes|Genomes|Genetics 2022;12(4):jkac048.\u003c/li\u003e\n\u003cli\u003eJurado M, Garc\u0026iacute;a-Fern\u0026aacute;ndez C, Campa A, Ferreira JJ. Identification of consistent QTL and candidate genes associated with seed traits in common bean by combining GWAS and RNA-Seq. Theor Appl Genet 2024 May 27;137(6):143\u0026ndash;5.\u003c/li\u003e\n\u003cli\u003eGunjača J, Carović-Stanko K, Lazarević B, Vidak M, Petek M, Liber Z, et al. Genome-Wide Association Studies of Mineral Content in Common Bean. Frontiers in Plant Science 2021;olume 12 - 2021.\u003c/li\u003e\n\u003cli\u003eQin J, Shi A, Song Q, Li S, Wang F, Cao Y, et al. Genome Wide Association Study and Genomic Selection of Amino Acid Concentrations in Soybean Seeds. Front Plant Sci 2019 Nov 15;10:1445.\u003c/li\u003e\n\u003cli\u003eWarsame AO, Balk J, Domoney C. Identification of significant genome-wide associations and QTL underlying variation in seed protein composition in pea (\u003cem\u003ePisum sativum\u003c/em\u003e L.). bioRxiv 2024:2024.07.04.602075.\u003c/li\u003e\n\u003cli\u003eRoorkiwal M, Bhandari A, Barmukh R, Bajaj P, Valluri VK, Chitikineni A, et al. Genome-wide association mapping of nutritional traits for designing superior chickpea varieties. Frontiers in Plant Science 2022;olume 13 - 2022.\u003c/li\u003e\n\u003cli\u003eSari H, Uhdre R, Wallace L, Coyne CJ, Bourland B, Zhang Z, et al. Genome-wide association study in Chickpea (Cicer arietinum L.) for yield and nutritional components. Euphytica 2024;220(6):84.\u003c/li\u003e\n\u003cli\u003eJohnson N, Boatwright JL, Bridges W, Thavarajah P, Kumar S, Thavarajah D. Targeted improvement of plant-based protein: Genome-wide association mapping of a lentil (Lens culinaris Medik.) diversity panel. Plants People Planet 2024;6(3):640\u0026ndash;655.\u003c/li\u003e\n\u003cli\u003eChen Y, Xiong H, Ravelombola W, Bhattarai G, Barickman C, Alatawi I, et al. A Genome-Wide Association Study Reveals Region Associated with Seed Protein Content in Cowpea. Plants 2023;12(14).\u003c/li\u003e\n\u003cli\u003eFederer WT, Crossa J. I.4 Screening Experimental Designs for Quantitative Trait Loci, Association Mapping, Genotype-by Environment Interaction, and Other Investigations. Front Physiol 2012 Jun 1;3:156.\u003c/li\u003e\n\u003cli\u003eM\u0026ouml;hring J, Williams E, Piepho H. Efficiency of augmented p-rep designs in multi-environmental trials. TAG Theoretical and applied genetics Theoretische und angewandte Genetik 2014;127.\u003c/li\u003e\n\u003cli\u003ePiepho H, M\u0026ouml;hring J, Schulz-Streeck T, Ogutu JO. A stage-wise approach for the analysis of multi-environment trials. Biom J 2012 Nov;54(6):844\u0026ndash;860.\u003c/li\u003e\n\u003cli\u003eSmith A, Cullis B, Gilmour AR. The Analysis of Crop Variety Evaluation Data in Australia. Australian \u0026amp; New Zealand Journal of Statistics 2001;43:129\u0026ndash;145.\u003c/li\u003e\n\u003cli\u003eWilliams E, Piepho H, Whitaker D. Augmented p-rep designs. Biom J 2011 Feb;53(1):19\u0026ndash;27.\u003c/li\u003e\n\u003cli\u003eDamesa T, Hartung J, Gowda M, Beyene Y, Das B, Semagn K, et al. Comparison of Weighted and Unweighted Stage‐Wise Analysis for Genome‐Wide Association Studies and Genomic Selection. Crop Sci 2019;59.\u003c/li\u003e\n\u003cli\u003eFern\u0026aacute;ndez-Gonz\u0026aacute;lez J, Isidro Y S\u0026aacute;nchez J. Optimizing fully-efficient two-stage models for genomic selection using open-source software. Plant Methods 2025 Feb 4;21(1):9\u0026ndash;9.\u003c/li\u003e\n\u003cli\u003eDeans CA, Sword GA, Lenhart PA, Burkness E, Hutchison WD, Behmer ST. Quantifying Plant Soluble Protein and Digestible Carbohydrate Content, Using Corn (Zea mays) As an Exemplar. J Vis Exp 2018 Aug 6;(138):58164. doi(138):10.3791/58164.\u003c/li\u003e\n\u003cli\u003eJoshi V, Joshi M, Silwal D, Noonan K, Rodriguez S, Penalosa A. Systematized biosynthesis and catabolism regulate citrulline accumulation in watermelon. Phytochemistry 2019;162:129\u0026ndash;140.\u003c/li\u003e\n\u003cli\u003ePritchard JK, Stephens M, Donnelly P. Inference of population structure using multilocus genotype data. Genetics 2000 Jun;155(2):945\u0026ndash;959.\u003c/li\u003e\n\u003cli\u003eEVANNO G, REGNAUT S, GOUDET J. Detecting the number of clusters of individuals using the software structure: a simulation study. Mol Ecol 2005;14(8):2611\u0026ndash;2620.\u003c/li\u003e\n\u003cli\u003eElshire RJ, Glaubitz JC, Sun Q, Poland JA, Kawamoto K, Buckler ES, et al. A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PLoS One 2011 May 4;6(5):e19379.\u003c/li\u003e\n\u003cli\u003eEndelman JB. Fully efficient, two-stage analysis of multi-environment trials with directional dominance and multi-trait genomic selection. Theor Appl Genet 2023 Mar 22;136(4):65\u0026ndash;x.\u003c/li\u003e\n\u003cli\u003eSchulz-Streeck T, Ogutu JO, Piepho H. Comparisons of single-stage and two-stage approaches to genomic selection. Theor Appl Genet 2013 Jan;126(1):69\u0026ndash;82.\u003c/li\u003e\n\u003cli\u003eL\u0026oacute;pez-Hern\u0026aacute;ndez F, Cort\u0026eacute;s AJ. Last-Generation Genome-Environment Associations Reveal the Genetic Basis of Heat Tolerance in Common Bean (Phaseolus vulgaris L.). Front Genet 2019 Nov 22;10:954.\u003c/li\u003e\n\u003cli\u003eForde BG, Lea PJ. Glutamate in plants: metabolism, regulation, and signalling. J Exp Bot 2007;58(9):2339\u0026ndash;2358.\u003c/li\u003e\n\u003cli\u003eSaddhe AA, Karle SB, Aftab T, Kumar K. With no lysine kinases: the key regulatory networks and phytohormone cross talk in plant growth, development and stress response. Plant Cell Rep 2021 Nov;40(11):2097\u0026ndash;2109.\u003c/li\u003e\n\u003cli\u003eTegeder M, Rentsch D. Uptake and partitioning of amino acids and peptides. Mol Plant 2010 Nov;3(6):997\u0026ndash;1011.\u003c/li\u003e\n\u003cli\u003eKhazaei H, Subedi M, Nickerson M, Mart\u0026iacute;nez-Villaluenga C, Frias J, Vandenberg A. Seed Protein of Lentils: Current Status, Progress, and Food Applications. Foods 2019 Sep 4;8(9):391. doi: 10.3390/foods8090391.\u003c/li\u003e\n\u003cli\u003eNosworthy MG, Yu B, Zaharia LI, Medina G, Patterson N. Pulse protein quality and derived bioactive peptides. Frontiers in Plant Science 2025;olume 16 - 2025.\u003c/li\u003e\n\u003cli\u003eLee Y, Ren D, Jeon B, Liu H. S-Adenosylmethionine: more than just a methyl donor. Nat Prod Rep 2023 Sep 20;40(9):1521\u0026ndash;1549.\u003c/li\u003e\n\u003cli\u003eMeng J, Wang L, Wang J, Zhao X, Cheng J, Yu W, et al. METHIONINE ADENOSYLTRANSFERASE4 Mediates DNA and Histone Methylation. Plant Physiol 2018 Jun;177(2):652\u0026ndash;670.\u003c/li\u003e\n\u003cli\u003eZhao Y, Hu Y, Dai M, Huang L, Zhou D. The WUSCHEL-related homeobox gene WOX11 is required to activate shoot-borne crown root development in rice. Plant Cell 2009 Mar;21(3):736\u0026ndash;748.\u003c/li\u003e\n\u003cli\u003eZhou S, Jiang W, Long F, Cheng S, Yang W, Zhao Y, et al. Rice homeodomain protein WOX11 recruits a histone acetyltransferase complex to establish programs of cell proliferation of crown root meristem. Plant Cell 2017;29(5):1088\u0026ndash;1104.\u003c/li\u003e\n\u003cli\u003eGalili G. Regulation of Lysine and Threonine Synthesis. Plant Cell 1995;7(7):899\u0026ndash;906.\u003c/li\u003e\n\u003cli\u003eAzevedo RA, Arruda P, Turner WL, Lea PJ. The biosynthesis and metabolism of the aspartate derived amino acids in higher plants. Phytochemistry 1997;46(3):395\u0026ndash;419.\u003c/li\u003e\n\u003cli\u003eGoto DB, Onouchi H, Naito S. Dynamics of methionine biosynthesis. Plant Biotechnology 2005;22(5):379\u0026ndash;388.\u003c/li\u003e\n\u003cli\u003eJander G, Joshi V. Aspartate-Derived Amino Acid Biosynthesis in Arabidopsis thaliana. Arabidopsis Book 2009;7:e0121.\u003c/li\u003e\n\u003cli\u003eBarrera S, Berny Mier y Teran JC, Aparicio J, Diaz J, Leon R, Beebe S, et al. Identification of drought and heat tolerant tepary beans in a multi-environment trial study. Crop Sci 2024;64(6):3399\u0026ndash;3416.\u003c/li\u003e\n\u003cli\u003eKrebs M, Beyhl D, G\u0026ouml;rlich E, Al-Rasheid KAS, Marten I, Stierhof Y, et al. Arabidopsis V-ATPase activity at the tonoplast is required for efficient nutrient storage but not for sodium accumulation. Proc Natl Acad Sci U S A 2010 Feb 16;107(7):3251\u0026ndash;3256.\u003c/li\u003e\n\u003cli\u003eLeyser O. Auxin Signaling. Plant Physiol 2018;176(1):465\u0026ndash;479.\u003c/li\u003e\n\u003cli\u003eWang R, Estelle M. Diversity and specificity: auxin perception and signaling through the TIR1/AFB pathway. Curr Opin Plant Biol 2014;21:51\u0026ndash;58.\u003c/li\u003e\n\u003cli\u003eRavelombola W, Manley A, Pham H, Brown M, Ruhl C, Ghosh P. Genome-Wide Association Study for Seed Yield of Tepary Bean Using Whole-Genome Resequencing. Int J Mol Sci 2024 Oct 21;25(20):11302. doi: 10.3390/ijms252011302.\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"genome‑wide association (GWAS), genomic prediction, essential amino acids, threonine, methionine, lysine, seed protein, hundred‑seed weight, seed width, Phaseolus acutifolius","lastPublishedDoi":"10.21203/rs.3.rs-8970665/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-8970665/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003ch2\u003eBackground\u003c/h2\u003e \u003cp\u003eTepary bean [\u003cem\u003ePhaseolus acutifolius\u003c/em\u003e] is a drought- and heat-tolerant, nitrogen-fixing legume that offers a promising low-input protein source. Nonetheless, the genetic factors influencing seed protein and amino acid profiles are not well understood. We evaluated 206 diverse accessions along with four controls in organic fields, measuring hundred-seed weight [HSW], seed width, total soluble seed protein [%], and the profiles of nineteen free amino acids. Using genotyping-by-sequencing, we identified 49,384 high-quality SNPs and conducted GWAS with multiple models [GLM, MLM, BLINK, FarmCPU] on BLUPs, controlling population structure.\u003c/p\u003e\u003ch2\u003eResults\u003c/h2\u003e \u003cp\u003eWe found genome-wide significant links to protein percentage on Chr08, including candidate genes like a WNK kinase and a 2-oxoglutarate/Fe [II]-dependent oxygenase. Additionally, trait-specific loci were identified for fifteen of the nineteen free amino acids, indicating a modular genetic architecture. Notably, the essential amino acids threonine, methionine, and lysine each had unique significant loci, marking the first tepary-specific markers for these nutritionally important traits. Fewer but stable associations related to seed size were observed on Chr02 [HSW; V-ATPase subunit] and Chr07 [seed width; Aux/IAA]. Genomic prediction models further revealed high predictive ability for seed size [r\u0026thinsp;\u0026asymp;\u0026thinsp;0.90\u0026ndash;0.96] and moderate accuracy for protein and amino acid traits [r\u0026thinsp;\u0026asymp;\u0026thinsp;0.15\u0026ndash;0.45], consistent with their polygenic and modular genetic structure.\u003c/p\u003e\u003ch2\u003eConclusion\u003c/h2\u003e \u003cp\u003eBy integrating GWAS with genomic prediction, we identify candidate genes, trait-specific genomic regions, and reliable benchmarks for predicting protein concentration, essential amino acids, and seed size in tepary bean. The alignment between association signals and prediction accuracy supports a dual-breeding approach that combines marker-assisted selection for key loci with genomic selection to leverage residual polygenic variation. This combined framework strengthens opportunities to enhance seed nutritional quality without negatively affecting seed size and offers synteny-based entry points for gene discovery and introgression across \u003cem\u003ePhaseolus\u003c/em\u003e species.\u003c/p\u003e","manuscriptTitle":"Nutritional Genomics of Tepary Bean (Phaseolus acutifolius): Genome‑wide association analysis and genomic prediction of seed nutritional traits and size","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-03-10 17:24:13","doi":"10.21203/rs.3.rs-8970665/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"d4fa5ed8-61ea-4e63-9a4e-f1e603548577","owner":[],"postedDate":"March 10th, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2026-05-04T17:38:44+00:00","versionOfRecord":[],"versionCreatedAt":"2026-03-10 17:24:13","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-8970665","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-8970665","identity":"rs-8970665","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.