2NPLGBM: A genomic model that merges the strengths of classical and machine learning methods in genomic prediction

doi:10.21203/rs.3.rs-8094183/v1

2NPLGBM: A genomic model that merges the strengths of classical and machine learning methods in genomic prediction

2025 · doi:10.21203/rs.3.rs-8094183/v1

preprint OA: closed

Full text JSON View at publisher

Full text 210,557 characters · extracted from preprint-html · click to expand

2NPLGBM: A genomic model that merges the strengths of classical and machine learning methods in genomic prediction | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article 2NPLGBM: A genomic model that merges the strengths of classical and machine learning methods in genomic prediction Bright Enogieru Osatohanmwen, Indalécio Cunha Vieira Júnior, Ahmad Reza Sharifi, and 1 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-8094183/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted 11 You are reading this latest preprint version Abstract Background Genomic prediction (GP) is a central component of modern plant breeding, enabling the early selection of superior genotypes based on genomic marker data. Classical GP models, such as genomic best linear unbiased prediction (GBLUP), operate within the data modeling culture and typically assume additive genetic effects, which have limitations that hinder their performance in hybrid breeding, where dominance and epistasis effects play a role. In contrast, machine learning (ML) models from the algorithmic modeling culture can model non-additive genetic effects but often lack biological grounding and interpretability. To bridge these paradigms, we propose 2NPLGBM, a hybrid genomic prediction approach that integrates quantitative genetics with ML. This method introduces a two-matrix (2NP) genotype representation by concatenating additive (Z) and dominance (W) matrix representations, which serves as input to a Light Gradient Boosting Machine (LGBM), enabling the simultaneous modeling of additive, dominance, and higher-order genetic interactions (AA, AD, DD). Results The 2NPLGBM model was evaluated using six years of hybrid maize trial data across four agronomic traits (grain yield, plant height, days to silking, and days to anthesis) under five cross-validation schemes simulating temporal: Leave-One-Year-Out (LOYO), Rolling Window (RW), and genetic generalization: Five-Fold, and tester-based schemes (Tester CV0 and Tester CV00). Compared to GBLUP, 2NPLGBM achieved an average 5% improvement in predictive accuracy under temporal validations and over 15% gains under tester-based schemes, particularly for flowering traits (days to silking and days to anthesis). Moreover, it consistently improved selection efficiency, indicating that the model captures complex genetic signals relevant for ranking and hybrid selection. Feature interpretation using SHapley Additive exPlanations (SHAP) confirmed that non-additive interactions contributed substantially to prediction accuracy for highly heritable traits. It also revealed trait-specific architectures, additive effects dominated flowering traits, while dominance effects contributed substantially to plant height and yield. Classical variance component analysis supported these findings, indicating high dominance contributions of 17.3% for yield and 8.2% for plant height. Conclusion 2NPLGBM represents a biologically informed ML framework that bridges classical quantitative genetics and algorithmic modeling cultures. By jointly modeling additive and non-additive effects it enhances predictive accuracy, interpretability, and selection efficiency in hybrid breeding programs. Future work should explore multi-trait and multi-environment extensions, integration of environmental covariates, and the inclusion of multi-omics data to further strengthen predictive power and biological interpretability. Genomic prediction Hybrid breeding LightGBM Dominance Non-additive effects Machine learning 2NP matrix SHAP Maize Additive–dominance modeling GBLUP Selection efficiency Temporal validation Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 INTRODUCTION Plant breeding has advanced significantly with the introduction of genomic selection (GS). This methodology leverages genome-wide molecular markers to predict the genetic potential of individuals within a breeding population. This process, known as genomic prediction, enables breeders to estimate breeding values based on dense marker data, allowing for the selection of superior individuals even before phenotypic data is available [ 1 , 2 ]. Unlike traditional marker-assisted selection [ 3 ], which focuses on a few quantitative trait loci (QTLs), GS captures the cumulative effects of all available markers across the genome, leading to more accurate and efficient selection of elite genotypes [ 4 – 6 ]. As plant breeding programs increasingly incorporate GS, further research has been conducted to optimize the methods used for genomic prediction. Genomic prediction, just like other applications of statistical modeling, is characterized by two dominant modeling cultures: the data model culture and the algorithmic model culture [ 7 ]. These cultures reflect distinct approaches to handling genetic data and extracting predictive signals. The Data Model Culture: Structured Statistical Modeling The data model culture, primarily associated with classical statistical approaches, assumes an explicit, often linear relationship between genetic markers and phenotypic traits. Classical quantitative genetics often adopts the data modeling culture, using biologically informed models to estimate additive and dominance effects for selection. Parametric models, such as genomic best linear unbiased prediction (GBLUP), are widely used [ 1 , 2 ]. GBLUP utilizes the genomic relationship matrix (G) and assumes that marker effects are normally distributed to calculate genomic breeding values, thereby providing a robust and interpretable framework for genomic selection. Several extensions of the data model culture have been developed to improve prediction accuracy, accommodate variable genetic architectures, integrate different data sources, and enhance computational efficiency. These include: GBLUP with additive, dominance, and/or epistasis : Extends the standard GBLUP model by incorporating a dominance relationship matrix ( D ) to capture dominance deviations and epistatic effects using Hadamard products of genomic matrices. The model can be flexibly applied to extend only dominance effects, only epistasis, or a combination of dominance and epistasis, providing a more comprehensive approach to genomic prediction [ 8 – 17 ] Bayesian models (e.g., BayesA, BayesB, BayesCπ) : Introduce marker-specific shrinkage to allow for variable effect sizes and sparsity [ 18 , 19 ] Ridge Regression Best Linear Unbiased Prediction (rrBLUP) : A widely used method that applies ridge regression to estimate marker effects while controlling for multicollinearity between markers, providing a computationally efficient alternative to GBLUP [ 14 , 20 ] Genomic Selection Index Models : These models extend the classical selection index by integrating genomic estimated breeding values (GEBVs) for multiple traits to optimize selection decisions. By combining information across correlated traits, they enable simultaneous improvement of several breeding objectives while accounting for genetic correlations and economic weights [ 21 ]. This approach enhances selection efficiency, particularly in genomic breeding programs where multiple complex traits are under consideration. Multivariate GBLUP (MV-GBLUP) : These models extend the traditional GBLUP to jointly model multiple correlated traits, capturing both within- and between-trait genetic covariances. By leveraging genetic correlations among traits, MV-GBLUP improves prediction accuracy, particularly for traits with limited phenotypic data or lower heritability [ 22 ]. It provides a robust method for multi-trait genomic prediction and selection in plant and animal breeding programs. While these methods provide structured and interpretable frameworks for genomic prediction, they are limited in their ability to capture complex, non-linear genetic interactions such as dominance and epistasis. This limitation arises from their reliance on predefined genetic structures and linear assumptions. Moreover, the assumption that marker effects follow a normal distribution may not always hold, potentially leading to the underestimation of non-additive effects that contribute significantly to phenotypic variation. The Algorithmic Model Culture: Machine Learning and Deep Learning In contrast to the data model culture, the algorithmic model culture, primarily driven by machine learning (ML) and deep learning (DL), focuses on non-parametric, data-driven approaches that learn complex genetic patterns directly from data. Examples of ML methods applied to genomic prediction include: Random Forests (RF) : An ensemble learning method that captures non-linear interactions through decision trees. There have been various applications of random forest in plant breeding, with varying levels of success [ 23 , 20 , 24 – 26 ] Support Vector Machines (SVMs) : A kernel-based approach that maps genomic data into higher-dimensional spaces to capture complex relationships [ 27 , 28 ]) Boosting Machines : Gradient Boosting Machines (GBM) and their variants, such as LightGBM and XGBoost, are widely used in plant breeding due to their ability to handle complex datasets and improve prediction accuracy. These methods sequentially enhance weak learners to create strong predictive models, making them well-suited for genomic selection and trait prediction [ 29 – 32 ] Deep learning, particularly neural networks, has also gained significant traction due to its ability to model high-dimensional, non-linear interactions. Examples of deep learning methods applied in the genomic prediction of plants for breeding purposes include: Deep Neural networks : Learn hierarchical feature representations from genomic data without prior assumptions about genetic architecture [ 33 – 38 ] Convolutional Neural Networks (CNNs) : Originally developed for analyzing grid-like data such as images, CNNs have been adapted for genomic prediction to exploit the sequential organization of genetic markers along chromosomes. In this context, “spatial patterns” refer to local dependencies among adjacent markers—such as linkage disequilibrium blocks or haplotype segments—rather than physical spatial positions. By applying convolutional filters across marker sequences, CNNs can automatically extract these locally correlated genomic features, thereby improving the prediction of complex traits in structured genomic data [ 39 – 42 ] Despite the different levels of success observed in their application in Plant Breeding, ML, and DL models often require large datasets for effective training and suffer from limited interpretability. Furthermore, their reliance on black-box optimization makes it challenging to incorporate prior biological knowledge. A New Hybrid Genomic Modeling Culture Despite the successes of data and algorithmic model cultures in plant breeding, both cultures have limitations that restrict their ability to fully leverage genomic information and its applications in plant breeding. One such limitation is that the genomic data model's variation attempts to incorporate different sources of variation using multiple kernels. Depending on the chosen model, genomic data models can capture additive, dominance, and epistatic effects, either individually or in combination, by using specific genomic relationship matrices or covariance structures as input. In such modeling, there is an assumption of independence between matrices, which is not biologically accurate. For genomic algorithmic models, interactions among loci are implicitly captured from the marker data without the need to construct separate relationship matrices or kernels. However, the process by which these models identify, represent, and use such interactions is not fully understood, leading to limited biological interpretability despite sometimes high predictive accuracy. Here, we introduce a hybrid genomic modeling culture that integrates both data-driven and algorithmic approaches (see Materials and Methods ), aiming to harness their complementary strengths in addressing the assumption of independence among multiple kernels in traditional genomic models and in enhancing the biological interpretability of machine learning results. This culture combines structured statistical modeling with the flexibility of machine learning. Central to this approach is the 2NP matrix, a novel representation that concatenates the additive-centered matrix (Z) and the dominance-deviation matrix (W) to explicitly capture additive and dominance effects, respectively. The hybrid genomic modeling culture, which integrates 2NP with a machine learning algorithm, provides a powerful means of modeling both additive and non-additive genetic effects, including their interactions. The concatenation of both matrices within the 2NP structure highlights the dependencies between additive and dominance components, while the machine learning model identifies and utilizes these interactions. This unified approach holds significant promise for enhancing selection accuracy and understanding complex genetic architectures in hybrid breeding programs. Efficiency of Selection In a breeding program, selection represents the final component of the process in which genotypes with desirable traits are identified and retained from a population. In the context of GS, the efficiency of selection becomes a critical focus, as various models are employed to enhance the accuracy of predicting genetic merit [ 43 ]. Traditionally, GS is framed as the task of predicting an individual’s genomic estimated breeding value (GEBV) for a target trait [ 1 , 2 , 44 ]. While prediction accuracy is commonly used to evaluate model performance, breeders are often more concerned with practical outcomes, specifically, how well a model ranks and identifies the top-performing genotypes. Therefore, in this study, beyond assessing model performance through prediction accuracy, we also address a key breeding question: How efficiently does the model enable the selection of the best performers? To answer this, we evaluate the efficiency of selection, defined as the model's ability to correctly identify individuals with the highest true performance. By bridging the gap between structured statistical approaches and machine learning methods, we aim to establish a hybrid culture (explained above) that improves selection efficiency while preserving biological relevance. MATERIAL AND METHODS 2np Matrix Theory: Bridging the Gap in the Application of Machine Learning Methods in Genomic Prediction The standard genomic best linear unbiased prediction (GBLUP) model, incorporating both additive and dominance genetic variance, assumes that the phenotypic value) is modeled as: $$\:y=Xb+{Z}_{A}a+{Z}_{D}d+e$$ where $\:y$ is the vector of phenotypes $\:X$ , $\:{Z}_{A}$ and $\:{Z}_{D}$ are incidence matrices for $\:b$ , $\:a,$ and $\:d$ , respectively. $\:b$ is the solution vector of fixed effects (including the overall mean), $\:a\sim\:N\left(0,A{\sigma\:}_{a}^{2}\right)$ represents the additive genetic effects, $\:d\:\sim\:N\left(0,D{\sigma\:}_{d}^{2}\right)$ represents the dominance genetic effects, $\:e\:\sim\:N\left(0,I{\sigma\:}_{e}^{2}\right)$ is the vector of residual effects, $\:A$ is the additive genomic relationship matrix, $\:D$ is the dominance genomic relationship matrix, $\:I$ is the identity matrix, $\:{\sigma\:}_{A}^{2}$ , $\:{\sigma\:}_{D}^{2}$ , $\:{\sigma\:}_{e}^{2}$ are the additive genetic variance, dominance genetic variance, and residual variance, respectively. Genomic Relationship Matrices in Standard GBLUP The additive genomic relationship matrix ( $\:A$ ) is estimated following ([ 2 ]): $$\:A=ZZ{\prime\:}{\left(\sum\:_{i=1}^{m}2{p}_{i}\left(1-2{p}_{i}\right)\right)}^{-1}$$ where $\:Z=M-P$ ; is the centered genotype matrix, $\:M$ is the marker matrix coded as 0,1,2 for alternative alleles, $\:P={{2p}_{i}^{\text{❑}}}_{\text{❑}}$ is the matrix of allele frequencies, $\:{{p}_{i}^{\text{❑}}}_{\text{❑}}$ is the allele frequency at locus i . The dominance genomic relationship matrix ( $\:D$ ) is estimated using [ 9 ]: $$\:D=WW{\prime\:}{\left(\sum\:_{i=1}^{m}4{p}_{i}^{2}{q}_{i}^{2}\right)}^{-1}$$ where $\:W$ is a matrix of heterozygosity coefficients; the $\:i$ -th column of $\:W$ is defined as $\:-{{2p}_{i}^{2}}_{\text{❑}}$ for homozygous alleles and $\:{2p}_{i}{q}_{i}$ for heterozygotes, where $\:{p}_{i}$ and $\:{q}_{i}=1-{p}_{i}$ are allele frequencies at locus $\:i$ , respectively [ 9 ]. Limitations of Standard GBLUP and the 2NP Matrix Approach with Gradient Boosting Integration The GBLUP model assumes that additive and dominance effects are independent, which is not biologically accurate. The interactions play an important role in trait expression. To address this limitation, we propose the 2NP matrix, which concatenates the additive-centered matrix ( $\:Z$ ) and the dominance deviation matrix ( $\:W$ ) into a single combined kernel: $$\:{G}_{2NP}=\left[Z\mid\:W\right]$$ where $\:Z$ captures additive genetic effects and $\:W$ captures dominance deviations. This combined structure maintains the individual contributions of additive and dominance variance while enabling interactions to emerge naturally. While the 2NP matrix captures both additive and dominance effects, gradient boosting methods (e.g., LGBM and XGBoost) are leveraged to model complex gene interactions. The prediction model now takes the form: $$\:y=f\left({G}_{2NP}\right)+e$$ where $\:f\left({G}_{2NP}\right)$ is a gradient-boosting machine that captures additive effects $\:{g}_{A}$ , dominance effects $\:{g}_{D}$ , additive-additive interactions $\:{g}_{AA}$ , additive-dominance interactions $\:{g}_{AD}$ , and dominance-dominance interactions $\:{g}_{DD}$ . The gradient boosting model optimizes a non-linear transformation of $\:{G}_{2NP}$ , capturing higher-order interactions between loci that are typically ignored in standard GBLUP. Phenotypic and genotypic data We used data from Genomes to Fields (G2F) 2024 Maize Genotype by Environment Prediction Competition [ 45 ]. We used data spanning 6 years, from 2018 to 2023, which consisted of 2,925 unique maize ( Zea mays L. ) hybrids evaluated in multiple environments across the United States, Canada, and Germany. The modified Randomized Complete Block Design (RCBD), mainly with two replications per environment, was used in the trials. Our analysis covers four traits: Plant Height (cm), Days to Anthesis (days), Days to Silking (days), and Grain Yield (Mg. ha). The genotypic data were described in [ 45 ]. For the G2F materials from 2014 to 2023, variant calls were performed using the Practical Haplotype Graph (PHG) [ 46 ]. Hybrid genotypes were generated by combining information from their parent lines using the CreateHybridGenotypes plugin available in TASSEL 5 [ 47 ], yielding 5,899 individuals. SNPs with a minor allele frequency (MAF) below 1% were then filtered out, resulting in 2,425 high-quality variant positions. We filtered for the years 2018, 2019, 2020, 2021, 2022, and 2023. Since the SNP markers were already filtered for extreme values, there was no need for further filtering. To identify and remove outliers, a linear model was fitted with the hybrids and replicated as fixed effects in each unique environment, defined by field location and year, as described by [ 48 ]. To reduce computational time, a two-step analysis was employed to calculate the best linear unbiased estimates (BLUEs) for each hybrid, as described in [ 48 ]. To generate the best linear unbiased estimates (BLUEs) for each hybrid in each environment, we adjusted the BLUEs from the first of the two steps above, with a linear mixed model considering hybrid as a fixed effect and field location (FL) as a random effect: $$\:{y}_{if}=\mu\:+{H}_{i}+{FL}_{f}+{e}_{if}$$ where $\:{y}_{ik}$ is the BLUE of the i -th hybrid calculated from (the first step); $\:\mu\:$ is the overall mean; $\:{H}_{i}$ is the fixed effect of the 𝑖 -𝑡ℎ hybrid; $\:{FL}_{f}$ is the random effect of the f -th field location; $\:{e}_{if}$ is the residual term associated with the observation $\:{y}_{ik}$ . Variance components and heritability estimation were done as recorded by [ 48 ] Genomic prediction models Classical models : Two GBLUP models were chosen for this study. These are the GBLUP model, which utilizes a single genomic relationship matrix for additive effects [ 2 ], and the GBLUP model, which employs a genetic relationship matrix for both additive and dominance effects, using separate kernels for each [ 9 ]. GBLUP model with additive effects only (GBLUP_ADD): The model with the additive effects can be expressed as: $$\:y=Xb+{Z}_{a}a+e$$ where y is the vector of phenotypes, and X and Z a are the incidence matrices for b and a , respectively. b is the solution vector of fixed effects (), and a is the vector of additive genetic effects assumed to follow a normal distribution with an expectation of $\:\sim\:N\left(0,A{\sigma\:}_{a}^{2}\right)$ , e is a vector of random residual effects assumed to be $\:\sim\:N\left(0,I{\sigma\:}_{e}^{2}\right)$ , A is the additive genomic relationship matrix, I denotes the identity matrix, $\:{\sigma\:}_{a}^{2}$ additive genetic variance, $\:{\sigma\:}_{e}^{2}$ is the residual variance, and N(.,.) denotes a normally distributed random variable. The additive genomic relationship matrix A was estimated as: [ 2 ] $$\:A=\frac{ZZ{\prime\:}}{\sum\:_{i=1}^{m}2{p}_{i}\left(1-2{p}_{i}\right)}$$ where Z is a genotype matrix obtained from the subtraction of P from M ( $\:M-P$ ); M is the marker matrix with genotypes coded as 0, 1, and 2 according to the number of alternative alleles, and the dimensions ( $\:n\times\:m$ ) n is the number of sample individual and m is the number of loci; P is the matrix $\:2{p}_{i}$ , $\:{p}_{i}$ are the $\:i$ th allele frequency and $\:\sum\:_{i=1}^{m}2{p}_{i}\left(1-2{p}_{i}\right)$ is the sum of all the marker across loci [ 2 ]. GBLUP model with additive and dominance effects (GBLUP_ADDOM): The model with both additive and dominance effects can be expressed as: $$\:y=Xb+{Z}_{a}a+{Z}_{d}d+e$$ most of the above equation has been described in the Add model, d is the vector of dominance genetic effects, and $\:{Z}_{d}$ is the incidence matrix for d. d is assumed to follow a normal distribution with an expectation of ~ $\:N\left(0,D{\sigma\:}_{d}^{2}\right)$ where D is the dominance genomic relationship matrix and $\:{\sigma\:}_{d}^{2}$ dominance genetic variance. The dominance genomic relationship matrix D was estimated as: [ 9 ] $$\:D=\frac{WW{\prime\:}}{\sum\:_{i=1}^{m}4{p}_{i}^{2}{q}_{i}^{2}}$$ where W is a matrix containing heterozygosity coefficients, the coefficients of the ith column in matrix W are $\:-{{2p}_{i}^{2}}_{\text{❑}}$ for $\:{A}_{1}{A}_{1}$ , $\:{2p}_{i}{q}_{i}$ for $\:{A}_{1}{A}_{2}$ , and $\:-{{2p}_{i}^{2}}_{\text{❑}}$ for $\:{A}_{2}{A}_{2}$ , where $\:{q}_{i}$ , and $\:{p}_{i}$ are the frequencies of allele 1 ( $\:{A}_{1}$ ) and allele 2 ( $\:{A}_{2}$ ) at locus i , respectively [ 9 ]. Machine Learning Models The 2NP genomic prediction model was fitted using a Gradient Boosting Machine (GBM) implemented in the LightGBM framework [ 49 ] within Python 3.8. The model was trained using a structured workflow that combined automated Bayesian hyperparameter optimization with the LightGBM machine learning algorithm. Specifically, BayesSearchCV from the Scikit-Optimize (skopt) library [ 50 ] was employed to perform Bayesian optimization of hyperparameters, enabling efficient exploration of the parameter space while reducing the risk of overfitting. Model interpretability and biological insight were achieved using SHapley Additive exPlanations (SHAP) [ 51 ], which quantified the contribution of each genomic feature to prediction outcomes. The results of 2NPLGBM were compared with those of GBLUP. Two GBLUP models were chosen for this study. The GBLUP model (GBLUP_ADD), which uses a single genomic relationship matrix for additive effects [ 2 ], and the GBLUP model (GBLUP_ADDOM), which uses a genetic relationship matrix for both the additive and dominance effects using separate kernels for each [ 9 ]. Validation Schemes To evaluate the model’s performance and show realistic scenarios, we implemented five distinct validation schemes: two designed to simulate unseen years, Leave-One-Year-Out (LOYO) and Rolling Window (RW), and three designed to capture genetic relationships, Five-Fold, Tester CV0, and Tester CV00. For the schemes based on genetic relationships, we conducted 10 repetitions of five-fold cross-validation. In each repetition, the phenotypic data were partitioned into five subsets; each subset served as the validation set once, while the remaining four were used for training [ 52 , 53 ]. LOYO In the LOYO scheme, data from all but one year were used for training, while the left-out year served as the test set. We utilized data from all six years (2018–2023), resulting in a total of six validations, one for each year. RW In the Rolling Window scheme, a fixed window of three consecutive years was used as the training set, and the following year was used as the test set. This window was then shifted forward by one year at a time, and the procedure was repeated until the last year (2023). Five-Fold For the Five-Fold scheme, hybrid genotypes were randomly divided into five equal-sized folds. In each round of cross-validation, four folds were used for training, and the remaining fold was used for testing. This process was repeated 10 times to ensure robustness. Tester CV0 The Tester CV0 scheme focused on predicting hybrids with known testers in a new year. As with CV0, models were trained using trials from 2018 to 2022 and tested on trials from 2023. In each fold, 20% of the testers evaluated in 2023 were sampled to create the test set, and these 20% testers were retained and included in the training set. Tester CV00 The Tester CV00 scheme aimed to predict the performance of hybrids involving unknown testers in a new year. Models were trained on trials from 2018 to 2022 and tested on trials from 2023. In each fold, 20% of the testers evaluated in 2023 were sampled to form the test set, and these 20% randomly chosen testers were excluded from the training data across years in the training set. Model Performance Metrics Accurate prediction of genetic merit is a cornerstone of genomic selection (GS), enabling breeders to make informed decisions about which Genotype to advance. While several statistical and machine learning models have been developed to enhance predictive power, their utility ultimately depends on how well they capture the relationship between genotypic and phenotypic variation. In this study, we evaluated the model's performance using two metrics: Pearson’s correlation coefficient and Selection Efficiency. Pearson’s correlation measures the linear association between observed phenotypes and predicted phenotypes, providing a standard indicator of prediction accuracy. However, in practical breeding applications, accurate ranking of individuals is often more critical than raw prediction accuracy. To address this, we also employed selection efficiency [ 54 , 55 , 53 ], which evaluates how well a model identifies top-performing genotypes. We measured the selection efficiency considering the top 20% of the hybrid genotype. It is calculated as: $$\:SelectionEfficiency=\:\frac{I-C}{N-C}$$ where: N is the total number of individuals evaluated, I is the number of individuals common to both observed and predicted top 20% sets, and C is the expected number of overlaps by random chance (i.e., the expected number of individuals selected by chance). RESULTS Population structure of the Hybrids from 2018 to 2023 After data cleaning, 2,425 SNP markers were retained for downstream analyses. Between 2018 and 2023, a total of 2,925 unique hybrids were evaluated, with the highest number recorded in 2021 (1,180) and the lowest in 2023 (546) (see Table 1 ). During this period, 1,495 genotypes (as female parents) and 38 genotypes (as male parents) were used. The number of genotypes varied across years, peaking in 2019 (603 genotypes) and gradually declining thereafter. In contrast, the number of testers remained relatively stable from 2020 to 2023, at 18 per year, following a peak of 27 testers in 2018 (see Table 1 ). Among the male parents used each year, there were major testers: 2 in 2018 and 2019 (LH195, PHT6), 3 in 2020 and 2021 (PHZ51, PHK76, PHP02), and 1 in 2022 and 2023 (LH244). A Principal Component Analysis (PCA) of the SNP data revealed clear genetic structure within the hybrid population, with clusters primarily reflecting the tester used in hybrid development (Fig. 1 a). The first 10 principal components together explained just over 50% of the total genetic variance (Fig. 1 b), indicating a substantial underlying structure among the hybrids across years. Based on the PCA, seven distinct genetic groups were identified, with the first two principal components accounting for 23.6% of the variation. Table 1 Summary of the hybrid maize population evaluated across six years (2018–2023). For each year, the number of hybrids, parental lines, and testers used is shown. Year Hybrid Parent 1 Hybrid Parent 2 Number of Hybrids Major Tester 2018 578 27 1039 LH195, PHT69 2019 601 17 1158 LH195, PHT69 2020 403 18 1175 PHZ51, PHK76, PHP02 2021 408 18 1180 PHZ51, PHK76, PHP02 2022 525 18 549 LH244 2023 522 18 546 LH244 Comparison of model performance across four traits and two different CV scenarios, simulating unseen years The scenarios to be simulated were done in two different ways: Leave One Year Out (LOYO) and Rolling Window (RW) When the objective was to predict hybrid performance in a new year (LOYO), the 2NPLGBM model outperformed both GBLUP models tested in this study across all four traits. The average prediction accuracies achieved by the 2NPLGBM model were 0.507 for Grain Yield, 0.737 for Days to Silking, 0.754 for Days to Anthesis, and 0.801 for Plant Height (Fig. 2 & Supplementary Table 1). A similar trend was observed for selection efficiency, with the 2NPLGBM model also demonstrating superior performance, yielding average values of 0.332 for Grain Yield, 0.562 for Days to Silking, 0.629 for Days to Anthesis, and 0.615 for Plant Height. Furthermore, the 2NPLGBM model achieved a 2% to 14% increase in prediction accuracy compared to the GBLUP_ADD model across traits, and an increase in selection efficiency of 11% to 20%. The RW validation scheme's results did not match those from the LOYO validation scheme. For the average prediction accuracies, the GBLUP_ADDOM model showed superiority over the 2NPLGBM and GBLUP_ADD model, with an accuracy of 0.600 for Grain Yield, and 0.732 for Plant Height, but for Days to Silking, and Days to Anthesis, the 2NPLGBM model was superior with a respective accuracy of o.764 and 0.760 (Fig. 3 & Supplementary Table 1). Meanwhile, the 2NPLGBM model demonstrated its superiority in selection efficiency, with a 15% increase in Days to Anthesis (Fig. 3 and Supplementary Table 1). Better selection efficiency was also observed in Days to Silking and Plant Height, with average values of 0.551 and 0.493, respectively. Leaving Grain yield as the only trait where GBLUP_ADDOM had a better performance in selection efficiency (Fig. 3 & Supplementary Table 1). Comparison of model performance across four traits and two different CV scenarios, simulating different genetic relationships Five-Fold Under the Five-fold cross-validation scheme, the 2NPLGBM model did not outperform either of the GBLUP models (GBLUP_ADD and GBLUP_ADDOM) when evaluated across all four traits tested. Among the models tested, GBLUP_ADDOM consistently achieved the highest prediction accuracy and selection efficiency for Plant Height and Grain Yield, while GBLUP_ADD achieved better performance for Days to Anthesis and Days to Silking. This result directly indicates that the GBLUP_ADD model, which uses only additive genetic relationships for prediction, favours traits with high heritability and low dominance, albeit in this cross-validation scheme. (Fig. 4 ; Table 1 and Supplementary Table 1). Tester CV0 Across the four traits evaluated, the 2NPLGBM model consistently outperformed the GBLUP models for flowering traits. For Days to Anthesis, 2NPLGBM achieved a Pearson correlation of 0.902, representing a 27.7% improvement over GBLUP_ADD (0.710). Similarly, for Days to Silking, 2NPLGBM had a Pearson correlation of 0.895, which is an 18.4% increase over GBLUP_ADD (0.755). In contrast, for Plant Height, GBLUP_ADD (0.860) and GBLUP_ADDOM (0.867) slightly outperformed 2NPLGBM (0.852), with the latter showing a marginal decline of -0.95% compared to GBLUP_ADD. For Grain Yield, 2NPLGBM (0.721) also showed a slight increase (2.08%) compared to GBLUP_ADD (0.704), while GBLUP_ADDOM performed best with 0.729 (+ 3.98% over GBLUP_ADD) (Fig. 5 ; Supplementary Table 1). The 2NPLGBM model also demonstrated better selection efficiency for Plant Height and flowering traits. It achieved a 6.60% improvement in selection efficiency for Plant Height, 75.8% for Days to Anthesis, and a 50.00% increase for Days to Silking over the GBLUP_ADD model. For Grain Yield, it showed a 6.68% increase. Notably, GBLUP_ADDOM outperformed both GBLUP_ADD and 2NPLGBM for Grain Yield, with a modest 6.67% improvement in selection efficiency over GBLUP_ADD (Fig. 5 ; Supplementary Table 1). Tester CV00 Under this challenging scenario, all models showed lower predictive accuracy and selection efficiency across all traits. For Days to Anthesis and Days to Silking, the 2NPLGBM model showed better correlation than the GBLUP models, with correlations of 0.55 and 0.56, respectively. For Plant Height and Grain Yield, the GBLUP_ADDOM model performed better with Pearson correlation of 0.54 and 0.40, respectively. Selection efficiency metrics revealed a similar pattern: 2NPLGBM showed better efficiency for Days to Anthesis and Days to Silking, while GBLUP_ADDOM showed better efficiency for Plant Height and Grain Yield. Continuing the trend observed in other cross-validation schemes tested (Fig. 6 ; Supplementary Table 1). Trait Heritability and the Role of Dominance Effects The partitioning of genetic variance revealed important differences in the relative contribution of additive and dominance effects across traits. Narrow-sense heritability (h 2 ) was highest for Days to Silking (0.900) and Days to Anthesis (0.895), followed by Plant Height (0.780), and lowest for Grain Yield (0.476). The corresponding broad-sense heritability (H 2 ) values were moderately higher, indicating low non-additive genetic components. Notably, the proportion of dominance variance accounted for 8.2% of total genetic variance in Plant Height, 11.1% in Days to Anthesis, 9.7% in Days to Silking, and 17.3% in Yield, which also exhibited the most dominance variance in relation to the broad sense heritability ( $\:{d}^{2}$ =0.121) (see Table 2 ). Table 2 Narrow-sense heritability (h²), broad-sense heritability (H²), dominance variance proportion (d² = dominance variance / total phenotypic variance), and proportion of dominance variance (PDV = dominance variance / total genetic variance) for the four agronomic traits evaluated in the hybrid maize population. Trait $\:{h}^{2}$ $\:{H}^{2}$ $\:{d}^{2}$ PDV Plant Height (cm) 0.780 0.862 0.072 0.082 Days to Anthesis (days) 0.895 0.923 0.098 0.111 Days to Silking (days) 0.900 0.934 0.087 0.097 Grain Yield (Mg. ha) 0.476 0.656 0.121 0.173 These patterns help explain the performance of different prediction models. The superior performance of 2NPLGBM in predicting flowering traits (Days to Anthesis and Days to Silking) is consistent with their high heritability and meaningful, yet moderate, dominance contributions. In contrast, Grain Yield, while having lower additive heritability, exhibited the highest proportion of dominance variance, suggesting that incorporating non-additive effects is especially relevant for this trait. Nevertheless, 2NPLGBM did not outperform GBLUP models for grain yield prediction, indicating that dominance alone may not be sufficient and that other sources of complexity, such as environmental interactions, could further influence prediction accuracy. Together, these results highlight that the advantages of using 2NPLGBM for genomic prediction are trait-specific and most pronounced when dominance variance is present and heritability is high, as is the case with flowering time traits. Variable Importance We conducted SHAP analysis to evaluate feature importance within the 2NPLGBM model. Specifically, we identified the top 20 variables derived from the 2NP matrix that contributed most significantly to phenotypic variation across traits. For Grain Yield, Days to Anthesis, and Days to Silking, the top-ranked variables were predominantly additive. In contrast, Plant Height exhibited a greater proportion of dominance variables, with seven of the top 20 features originating from the dominance component of the 2NP matrix (Figs. 7 and 8 ). These findings are consistent with the overall decomposition of genetic effect contributions obtained from the GBLUP model. By aggregating SHAP values for additive and dominance variables from the 2NP matrix, we quantified the relative contributions of additive and dominance genetic effects to model performance. For Plant Height, dominance was the most significant contributor to phenotypic variation (Fig. 7 ), while Days to Anthesis showed the lowest relative dominance contribution (Fig. 7 ). The observed ratio of additive to dominance variables among the top-ranked features aligns with the trait-specific contributions of genetic effect types. Correspondingly, variance analysis using the GBLUP method indicated that the dominance genetic contribution in this population was lowest for Days to Anthesis (0.098) and Days to Silking (0.087), as shown in Table 2 . To assess the genetic contributions to the 2NPLGBM model, we examine interactions between SNP variables representative of additive and dominance effects. Generally, the results were inconclusive, but we observed that, unlike the GBLUP models, where the additive and dominant main effects mainly contribute to model performance (See Table 2 ), the main contributors to the 2NPLGBM model are the interaction effects (See Supplementary Figure (SFig1 and SFig2) in the Appendix). DISCUSSION The 2NPLGBM Model: A Hybrid Approach to Genomic Prediction The integration of genomic selection into plant breeding schemes has been shaped by two primary modeling cultures: the data model culture and the algorithmic model culture [ 7 ]. Data models, most notably GBLUP [ 1 , 2 ] and RR-BLUP [ 20 , 56 ], have dominated for many years due to their simplicity, interpretability, and effective handling of additive genetic effects. In contrast, algorithmic models based on machine learning [ 20 , 23 , 24 , 26 , 35 ] and deep learning [ 34 – 38 , 33 ] have emerged over the last decade due to their ability to utilize non-linear and complex relationships without making assumptions about the underlying genetic architecture. The expression of quantitative traits, however, is influenced by both additive and non-additive components, including dominance and epistasis [ 57 ]. In data models, these effects are often modeled via multi-kernel extensions, while algorithmic models tend to capture these effects intrinsically. In the study, we provide a comprehensive assessment of a hybrid genomic prediction model, 2NPLGBM, across multiple traits, validation schemes, and years in hybrid maize. By integrating a biologically informed genotype matrix (2NP) with a gradient boosting algorithm (LGBM), we demonstrate how structured genomic information can enhance predictive performance and selection efficiency, offering new perspectives for selection in hybrid breeding. The 2NP matrix is a genotype representation obtained by concatenating additive and dominance matrix representations. The assumption of dependency between additive and dominance effects in the 2NPLGBM model more closely reflects biological reality, where these effects often interact to shape complex trait expression. This may account for the improved predictive performance and selection efficiency observed in this study, particularly for highly heritable traits, where dominance and interaction effects can contribute to phenotypic variance [ 57 , 58 , 17 ]. In addition, by incorporating additive and dominance features through the 2NP matrix in the machine learning method and employing SHapley Additive exPlanations (SHAP), we were able to dissect the relative contributions of additive, dominance, and their interactions to model predictions. Traits for which the 2NPLGBM outperformed classical methods tended to show higher interaction contributions (though this was not conclusive) and were highly heritable. This highlights an important implication: the 2NPLGBM model is mostly trait-dependent, with its greatest advantage emerging when non-additive (epistasis and other genetic interactions) effects play a major role in phenotypic variation. In such contexts, our model provides a robust framework for predicting total genetic values rather than just additive breeding values, thereby enabling more informed parental selection and hybrid combination design. Genomic Predictive Ability and Selection Efficiency Previous research has shown that non-additive effects, including dominance and epistasis, play a modest yet trait-dependent role in plant breeding. Classical genomic prediction models [ 9 , 12 , 15 ] incorporated dominance effects but generally produced little to no improvement in predictive ability compared to additive-only models. Similarly, studies investigating genetic interactions [ 11 , 58 , 17 ] reported modest or negligible gains when modeling non-additive effects, with the observed impact largely dependent on the genetic architecture of the trait and the population structure. With the increasing adoption of machine learning in genomic selection, recent studies have explored transforming genomic data for ML- or DL-powered prediction, focusing on either dominance effects [ 59 , 60 , 47 ] or epistatic interactions [ 61 , 62 ]. Although these models have achieved modest improvements for traits influenced by non-additive effects, most were trained to capture only one type of genetic interaction at a time. In contrast, the 2NPLGBM model introduces a biologically informed genomic representation that simultaneously captures additive, dominance, and higher-order interactions (additive × additive, additive × dominance, and dominance × dominance). Evaluations in a hybrid maize population demonstrated that 2NPLGBM increased genomic predictive ability by over 5% in temporal validation schemes (LOYO and RW) and by more than 15% in tester-based validation schemes (Tester CV0 and Tester CV00), with the most significant gains observed for flowering traits (DTA and DTS). These improvements arise because the model directly integrates and exploits non-additive genetic signals, which are particularly relevant for traits where dominance and epistatic interactions contribute substantially to phenotypic variance. Importantly, 2NPLGBM achieved higher selection efficiency for flowering traits across all validation schemes except five-fold CV, indicating that Non-linear models can detect subtle genetic signals that are important for selection decisions, even if their overall prediction metrics, such as accuracy or correlation, appear less favorable compared to linear models. This has been demonstrated by several studies, which found that non-linear approaches may uncover complex genetic patterns or interactions that would otherwise be missed [ 62 – 66 ]. Therefore, incorporating these interaction components within the model may enhance model stability and indirectly support improved ranking performance. Its strength lies not only in modeling non-linear and dominance effects but also in delivering superior ranking performance, arguably the most critical metric in early-generation selection and resource allocation. The overall contribution of non-additive effects to phenotypic variation remains trait-dependent [ 67 ]. For traits strongly associated with hybrid vigor, models that explicitly capture these effects, such as 2NPLGBM, can yield superior predictive performance. Thus, aligning genomic prediction models with the underlying genetic architecture of the target trait is essential for maximizing total genetic gain. From a practical breeding perspective, the 2NPLGBM model offers a robust alternative for predicting total genetic values, as it jointly models additive and non-additive effects in a unified framework. A breeding strategy guided by 2NPLGBM predictions could enhance the performance of commercial hybrid populations by enabling the prediction of cross-specific performance without direct phenotypic evaluation. Furthermore, through SHAP-based or feature-importance analyses, breeders can quantify genetic contributions and uncover dominance and interaction effects that drive hybrid performance. This interpretability facilitates more informed parental selection decisions, accelerates hybrid development cycles, and ultimately enhances the efficiency and precision of genomic-assisted breeding pipelines. Limitation Deep learning architectures (e.g., DNNs, CNNs) were not fully explored in this study due to time constraints and suboptimal preliminary results with CNNs. Future work should investigate the integration of 2NP matrices with deep neural architectures or ensemble frameworks combining 2NPLGBM with DL models. Another limitation is the increased feature dimensionality resulting from matrix concatenation, which in turn increases computational demand. Additionally, while SHAP enhances interpretability, it does not directly quantify causal effects and should be interpreted as indicative rather than definitive evidence of interaction architecture. CONCLUSIONS AND FUTURE DIRECTIONS In conclusion, this study proposes a method that demonstrates the value of integrating biologically-informed genotype matrices with non-linear machine learning algorithms in improving genomic prediction and selection outcomes in hybrid breeding. Future research should focus on: Incorporating environmental covariates to improve model transferability across sites and years, Developing multi-trait and multi-environment versions of 2NPLGBM to leverage correlated traits jointly, and Integrating multi-omics data (e.g., transcriptomic or epigenetic features) to capture additional biological layers of regulation. Such extensions will help further bridge the gap between classical quantitative genetics and machine learning, advancing the predictive and explanatory power of machine learning in genomic selection. ABBREVIATIONS GBLUP Genomic best linear unbiased prediction ML Machine learning DTA Days to Anthesis DTS Days to Silking QTL Quantitative Trait Loci SNP Single-nucleotide polymorphisms MAS Marker-assisted selection RCBD Randomized Complete Block Design PHG Practical Haplotype Graph BLUE: Best linear unbiased estimates PDV Proportion of dominance variation calculated as the dominance variance divided by the total genotypic variance $\:{d}^{2}$ The proportion of dominance variation, calculated as the dominance variance divided by the total phenotypic variance $\:{h}^{2}$ Narrow-sense heritability $\:{H}^{2}$ Broad-sense heritability d Estimated degree of dominance GBDT Gradient-boosting decision tree CV Cross-validation Declarations SUPPLEMENTARY INFORMATION A Google sheet containing all supplementary data is available at https://docs.google.com/spreadsheets/d/1gHy7Eo-EiGULKyM7kaPmfcmRlXlC6QwY51cS205ojJQ/edit?gid=547868992#gid=547868992 . ETHICS APPROVAL AND CONSENT TO PARTICIPATE Not applicable. CONSENT FOR PUBLICATION Not applicable. CONFLICT OF INTEREST On behalf of all authors, the corresponding author states that there is no conflict of interest. FUNDING KWS SAAT SE provided financial support for BO through a PhD fellowship. The University of Göttingen provided additional financial support. We acknowledge support from the Open Access Publication Funds of the Göttingen University. Author Contribution BO analyzed the data and wrote the manuscript. BO and TB designed the research. IJ, RS, and TB supervised the study. BO, IJ, RS, and TB participated in interpreting results and contributing to the discussion. All authors contributed to the article and approved the submitted version. Acknowledgement The authors acknowledge the committee of The Genomes to Fields 2022 Maize Genotype by Environment Prediction Competition for providing the maize hybrid datasets. The authors acknowledge support from the Computing Center of the University of Göttingen (GWDG) through its High-Performance Computing resources. Data Availability We obtained the G2F dataset from the committee of The Genomes to Fields 2022 Maize Genotype by Environment Prediction Competition, accessible on CyVerse under [https://doi.org/10.25739/78mn-4394](https:/doi.org/10.25739/78mn-4394) . A GitHub repository containing the bash scripts, R scripts, and Python scripts used for phenotypic and genotypic analysis, as well as all genomic predictions, is available at [https://github.com/BrightGuru/2NP\_Matrix-for-Genomic-Prediction](https:/github.com/BrightGuru/2NP_Matrix-for-Genomic-Prediction) **.** References Meuwissen THE, Hayes BJ, Goddard ME. Prediction of Total Genetic Value Using Genome-Wide Dense Marker Maps. Genet [Internet]. 2001;157:1819–29. https://doi.org/10.1093/genetics/157.4.1819 . VanRaden PM. Efficient Methods to Compute Genomic Predictions. J Dairy Sci [Internet]. Elsevier; 2008 [cited 2025 Aug 13];91:4414–23. https://doi.org/10.3168/jds.2007-0980 Lande R, Thompson R. Efficiency of marker-assisted selection in the improvement of quantitative traits. Genet [Internet]. 1990;124:743–56. https://doi.org/10.1093/genetics/124.3.743 . Bhat JA, Ali S, Salgotra RK, Mir ZA, Dutta S, Jadon V, et al. Genomic Selection in the Era of Next Generation Sequencing for Complex Traits in Plant Breeding. Front Genet [Internet]. 2016. https://doi.org/10.3389/fgene.2016.00221 . [cited 2025 Mar 31];7. Kumar R, Das SP, Choudhury BU, Kumar A, Prakash NR, Verma R, et al. Advances in genomic tools for plant breeding: harnessing DNA molecular markers, genomic selection, and genome editing. Biol Res [Internet]. 2024;57:80. https://doi.org/10.1186/s40659-024-00562-6 . [cited 2025 Mar 31];. Alemu A, Åstrand J, Montesinos-López OA, Isidro Y, Sánchez J, Fernández-Gónzalez J, Tadesse W, et al. Genomic selection in plant breeding: Key factors shaping two decades of progress. Mol Plant [Internet]. 2024;17:552–78. https://doi.org/10.1016/j.molp.2024.03.007 . [cited 2025 Mar 31];. Breiman L. Statistical modeling: The two cultures. Qual Control Appl Stat Exec Sci Inst. 2003;48:81–2. Su G, Christensen OF, Ostersen T, Henryon M, Lund MS. Estimating additive and non-additive genetic variances and predicting genetic merits using genome-wide dense single nucleotide polymorphism markers. Public Library of Science San Francisco, USA; 2012. Vitezica ZG, Varona L, Legarra A. On the Additive and Dominant Variance and Covariance of Individuals Within the Genomic Selection Scope. Genet [Internet]. 2013;195:1223–30. https://doi.org/10.1534/genetics.113.155176 . Nishio M, Satoh M. Including Dominance Effects in the Genomic BLUP Method for Genomic Evaluation. PLOS ONE [Internet]. Public Libr Sci. 2014;9:e85792. https://doi.org/10.1371/journal.pone.0085792 . Jiang Y, Reif JC. Modeling Epistasis in Genomic Selection. Genetics [Internet]. 2015 [cited 2025 Apr 1];201:759–68. https://doi.org/10.1534/genetics.115.177907 Varona L, Legarra A, Toro MA, Vitezica ZG. Non-additive Effects in Genomic Selection. Front Genet [Internet]. 2018 [cited 2024 Mar 14];9:78. https://doi.org/10.3389/fgene.2018.00078 Chen Z-Q, Baison J, Pan J, Westin J, Gil MRG, Wu HX. Increased prediction ability in Norway spruce trials using a marker x environment interaction and non-additive genomic selection model. J Hered. Oxford University Press US; 2019;110:830–43. Amadeu RR, Ferrão LFV, Oliveira IDB, Benevenuto J, Endelman JB, Munoz PR. Impact of dominance effects on autotetraploid genomic prediction. Crop Sci [Internet]. 2020;60:656–65. https://doi.org/10.1002/csc2.20075 . [cited 2025 Aug 13];. González-Diéguez D, Legarra A, Charcosset A, Moreau L, Lehermeier C, Teyssèdre S, et al. Genomic prediction of hybrid crops allows disentangling dominance and epistasis. Genet [Internet]. 2021;218:iyab026. https://doi.org/10.1093/genetics/iyab026 . Vojgani E. Accounting for Epistasis in Genomic Phenotype Prediction. Dissertation, Göttingen, Georg-August Universität, 2021; 2021. Kristensen PS, Sarup P, Fé D, Orabi J, Snell P, Ripa L, et al. Prediction of additive, epistatic, and dominance effects using models accounting for incomplete inbreeding in parental lines of hybrid rye and sugar beet. Front Plant Sci [Internet] Front. 2023. https://doi.org/10.3389/fpls.2023.1193433 . [cited 2025 Nov 9];14. Gianola D, de los Campos G, Hill WG, Manfredi E, Fernando R. Additive Genetic Variability and the Bayesian Alphabet. Genet [Internet]. 2009;183:347–63. https://doi.org/10.1534/genetics.109.103952 . [cited 2025 Apr 1];. Pérez P, De Los Campos G. Genome-Wide Regression and Prediction with the BGLR Statistical Package. Genetics [Internet]. 2014 [cited 2025 Apr 1];198:483–95. https://doi.org/10.1534/genetics.114.164442 Heslot N, Yang H-P, Sorrells ME, Jannink J-L. Genomic Selection in Plant Breeding: A Comparison of Models. Crop Sci [Internet]. 2012;52:146–60. https://doi.org/10.2135/cropsci2011.06.0297 . [cited 2025 Apr 1];. Dekkers JCM. Prediction of response to marker-assisted and genomic selection using selection index theory. J Anim Breed Genet [Internet]. 2007 [cited 2025 Apr 1];124:331–41. https://doi.org/10.1111/j.1439-0388.2007.00701.x Jia Y, Jannink J-L. Multiple-Trait Genomic Selection Methods Increase Genetic Value Prediction Accuracy. Genetics [Internet]. 2012 [cited 2025 Apr 1];192:1513–22. https://doi.org/10.1534/genetics.112.144246 Goldstein BA, Polley EC, Briggs FBS. Random Forests for Genetic Association Studies. Stat Appl Genet Mol Biol [Internet]. 2011 [cited 2025 Apr 1];10:32. https://doi.org/10.2202/1544-6115.1691 Parmley KA, Higgins RH, Ganapathysubramanian B, Sarkar S, Singh AK. Machine learning approach for prescriptive plant breeding. Sci Rep Nat Publishing Group UK Lond. 2019;9:17132. Montesinos López OA, Montesinos López A, Crossa J. Random Forest for Genomic Prediction. In: Montesinos López OA, Montesinos López A, Crossa J, editors. Multivar Stat Mach Learn Methods Genomic Predict [Internet]. Cham: Springer International Publishing; 2022 [cited 2025 Apr 1]. pp. 633–81. https://doi.org/10.1007/978-3-030-89010-0_15 Zhang Q, Zhao X, Han Y, Yang F, Pan S, Liu Z, et al. Maize yield prediction using federated random forest. Comput Electron Agric [Internet]. 2023;210:107930. https://doi.org/10.1016/j.compag.2023.107930 . [cited 2025 Apr 1];. Zhao W, Lai X, Liu D, Zhang Z, Ma P, Wang Q et al. Applications of Support Vector Machine in Genomic Prediction in Pig and Maize Populations. Front Genet [Internet]. 2020;11-2020. https://doi.org/10.3389/fgene.2020.598318 Khan M, Hooda BK, Gaur A, Singh V, Jindal Y, Tanwar H, et al. Ensemble and optimization algorithm in support vector machines for classification of wheat genotypes. Sci Rep [Internet] Nat Publishing Group. 2024;14:22728. https://doi.org/10.1038/s41598-024-72056-0 . [cited 2025 Apr 1];. Friedman JH. Greedy function approximation: a gradient boosting machine. Ann Stat JSTOR; 2001;1189–232. Chen T, Guestrin C, XGBoost:. A Scalable Tree Boosting System. New York, NY, USA: Association for Computing Machinery; 2016. pp. 785–94. https://doi.org/10.1145/2939672.2939785 . Proc 22nd ACM SIGKDD Int Conf Knowl Discov Data Min [Internet]. Li W, Yin Y, Quan X, Zhang H. Gene Expression Value Prediction Based on XGBoost Algorithm. Front Genet [Internet]. 2019 [cited 2024 Jan 29];10:1077. https://doi.org/10.3389/fgene.2019.01077 Westhues CC, Mahone GS, da Silva S, Thorwarth P, Schmidt M, Richter J-C et al. Prediction of Maize Phenotypic Traits With Genomic and Environmental Predictors Using Gradient Boosting Frameworks. Front Plant Sci [Internet]. 2021;12-2021. https://doi.org/10.3389/fpls.2021.699589 Montesinos-López A, Crespo-Herrera L, Dreisigacker S, Gerard G, Vitale P, Saint Pierre C, et al. Deep learning methods improve genomic prediction of wheat breeding. Front Plant Sci [Internet] Front. 2024. https://doi.org/10.3389/fpls.2024.1324090 . [cited 2025 Apr 2];15. Montesinos-López OA, Montesinos-López A, Crossa J, Gianola D, Hernández-Suárez CM, Martín-Vallejo J, Multi-trait. Multi-environment Deep Learning Modeling for Genomic-Enabled Prediction of Plant Traits. G3 GenesGenomesGenetics [Internet]. 2018 [cited 2023 Dec 6];8:3829–40. https://doi.org/10.1534/g3.118.200728 Montesinos-López OA, Martín-Vallejo J, Crossa J, Gianola D, Hernández-Suárez CM, Montesinos-López A et al. New Deep Learning Genomic-Based Prediction Model for Multiple Traits with Binary, Ordinal, and Continuous Phenotypes. G3 GenesGenomesGenetics [Internet]. 2019 [cited 2024 May 28];9:1545–56. https://doi.org/10.1534/g3.119.300585 Sandhu KS, Lozada DN, Zhang Z, Pumphrey MO, Carter AH. Deep Learning for Predicting Complex Traits in Spring Wheat Breeding Program. Front Plant Sci [Internet]. Frontiers; 2021 [cited 2025 Apr 2];11. https://doi.org/10.3389/fpls.2020.613325 Wang X, Zeng H, Lin L, Huang Y, Lin H, Que Y. Deep learning-empowered crop breeding: intelligent, efficient and promising. Front Plant Sci [Internet] Front. 2023. https://doi.org/10.3389/fpls.2023.1260089 . [cited 2025 Apr 2];14. Montesinos-López A, Rivera C, Pinto F, Piñera F, Gonzalez D, Reynolds M et al. Multimodal deep learning methods enhance genomic prediction of wheat breeding. G3 GenesGenomesGenetics [Internet]. 2023 [cited 2025 Apr 2];13:jkad045. https://doi.org/10.1093/g3journal/jkad045 Ma W, Qiu Z, Song J, Li J, Cheng Q, Zhai J et al. A deep convolutional neural network approach for predicting phenotypes from genotypes. Planta [Internet]. 2018 [cited 2025 Apr 2];248:1307–18. https://doi.org/10.1007/s00425-018-2976-9 Liu Y, Wang D, He F, Wang J, Joshi T, Xu D. Phenotype Prediction and Genome-Wide Association Study Using Deep Convolutional Neural Network of Soybean. Front Genet [Internet]. Frontiers; 2019 [cited 2025 Apr 2];10. https://doi.org/10.3389/fgene.2019.01091 Pook T, Freudenthal J, Korte A, Simianer H. Using Local Convolutional Neural Networks for Genomic Prediction. Front Genet [Internet]. Frontiers; 2020 [cited 2025 Apr 2];11. https://doi.org/10.3389/fgene.2020.561497 Zingaretti LM, Gezan SA, Ferrão LFV, Osorio LF, Monfort A, Muñoz PR, et al. Exploring deep learning for complex trait genomic prediction in polyploid outcrossing species. Front Plant Sci Front Media SA. 2020;11:25. Heffner EL, Sorrells ME, Jannink J. Genomic selection for crop improvement. Crop Sci Wiley Online Libr. 2009;49:1–12. Crossa J, Pérez-Rodríguez P, Cuevas J, Montesinos-López O, Jarquín D, De Los Campos G, et al. Genomic selection in plant breeding: methods, models, and perspectives. Trends Plant Sci Elsevier. 2017;22:961–75. Genomes To Fields. Genomes to Fields 2024 Maize Genotype by Environment Prediction Competition [Internet]. CyVerse Data Commons; 2025 [cited 2025 Nov 11]. https://doi.org/10.25739/78MN-4394 Bradbury PJ, Casstevens T, Jensen SE, Johnson LC, Miller ZR, Monier B, et al. The Practical Haplotype Graph, a platform for storing and using pangenomes for imputation. Bioinf [Internet]. 2022;38:3698–702. https://doi.org/10.1093/bioinformatics/btac410 . Bradbury PJ, Zhang Z, Kroon DE, Casstevens TM, Ramdoss Y, Buckler ES. TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics. Volume 23. Oxford University Press; 2007. pp. 2633–5. Osatohanmwen BE, Júnior ICV, Gholami M, Westhues CC, Sharifi R, Beissinger T. Predicting Maize Hybrid Performance with Machine Learning and a Locus-Specific Degree of Dominance Transformation [Internet]. In Review; 2025 [cited 2025 Apr 29]. https://doi.org/10.21203/rs.3.rs-6002495/v1 Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W et al. Lightgbm: A highly efficient gradient boosting decision tree. Adv Neural Inf Process Syst. 2017;30. Head T, Kumar M, Nahrstaedt H, Louppe G, Shcherbatyi I. Scikit-optimize/scikit-optimize. Zenodo. 2021. Lundberg SM, Lee S-I. A unified approach to interpreting model predictions. Adv Neural Inf Process Syst. 2017;30. Sukumaran S, Jarquin D, Crossa J, Reynolds M. Genomic-enabled prediction accuracies increased by modeling genotype× environment interaction in durum wheat. Plant Genome Wiley Online Libr. 2018;11:170112. Fernandes IK, Vieira CC, Dias KOG, Fernandes SB. Theor Appl Genet [Internet]. 2024;137:189. https://doi.org/10.1007/s00122-024-04687-w . [cited 2025 Mar 11];. Using machine learning to combine genetic and environmental data for maize grain yield predictions across multi-environment trials. de Oliveira Zimmermann MJ. Breeding for yield, in mixtures of common beans (Phaseolus vulgaris L.) and maize (Zea mays L). Springer; 1997. pp. 143–8. Hamblin J, de Zimmermann MJ. Breeding common bean for yield in mixtures. Plant Breed Rev Wiley Online Libr. 1986;4:245–72. Endelman JB. Ridge Regression and Other Kernels for Genomic Selection with R Package rrBLUP. Plant Genome [Internet]. 2011 [cited 2025 Apr 1];4. https://doi.org/10.3835/plantgenome2011.08.0024 Radoev M, Becker HC, Ecke W. Genetic analysis of heterosis for yield and yield components in rapeseed (Brassica napus L.) by quantitative trait locus mapping. Volume 179. Genetics: Oxford University Press; 2008. pp. 1547–58. de Almeida Filho JE, Guimarães JFR, e Silva FF, de Resende MDV, Muñoz P, Kirst M, et al. The contribution of dominance to phenotype prediction in a pine breeding and simulated population. Heredity [Internet]. 2016;117:33–41. https://doi.org/10.1038/hdy.2016.23 . Calleja-Rodriguez A, Chen Z, Suontama M, Pan J, Wu HX. Genomic Predictions With Nonadditive Effects Improved Estimates of Additive Effects and Predictions of Total Genetic Values in Pinus sylvestris. Front Plant Sci [Internet] Front. 2021. https://doi.org/10.3389/fpls.2021.666820 . [cited 2025 Nov 9];12. Mathew B, Hauptmann A, Léon J, Sillanpää MJ, NeuralLasso. Neural Networks Meet Lasso in Genomic Prediction. Front Plant Sci [Internet]. 2022. https://doi.org/10.3389/fpls.2022.800161 . 13-2022. Sharma S, Partap A, Balaguer MA, de Malvar L, Chandra S. R. DeepG2P: Fusing Multi-Modal Data to Improve Crop Production [Internet]. arXiv; 2022 [cited 2025 Nov 9]. https://doi.org/10.48550/arXiv.2211.05986 Gianola D, Cecchinato A, Naya H, Schön C-C. Prediction of complex traits: robust alternatives to best linear unbiased prediction. Front Genet Front Media SA. 2018;9:195. Crossa J, Martini JW, Gianola D, Pérez-Rodríguez P, Jarquin D, Juliana P, et al. Deep kernel and deep learning for genome-based prediction of single traits in multienvironment breeding trials. Front Genet Front Media SA. 2019;10:1168. Montesinos-López OA, Montesinos‐López A, Hernandez‐Suarez CM, Barrón‐López JA, Crossa J. Deep‐learning power and perspectives for genomic selection. Plant Genome [Internet]. 2021;14:e20122. https://doi.org/10.1002/tpg2.20122 . [cited 2024 Mar 14];. Montesinos-López A, Montesinos-López OA, Ramos-Pulido S, Mosqueda-González BA, Guerrero-Arroyo EA, Crossa J, et al. Artificial intelligence meets genomic selection: comparing deep learning and GBLUP across diverse plant datasets. Front Genet Front Media SA. 2025;16:1568705. Crossa J, Montesinos-Lopez OA, Costa-Neto G, Vitale P, Martini JW, Runcie D, et al. Machine learning algorithms translate big data into predictive breeding accuracy. Trends Plant Sci Elsevier. 2025;30:167–84. Xu S. Mapping Quantitative Trait Loci by Controlling Polygenic Background Effects. Genetics [Internet]. 2013 [cited 2025 Nov 9];195:1209–22. https://doi.org/10.1534/genetics.113.157032 Additional Declarations No competing interests reported. Supplementary Files APPENDIX.docx Cite Share Download PDF Status: Under Review Version 1 posted Editorial decision: Revision requested 02 Feb, 2026 Reviews received at journal 29 Jan, 2026 Reviewers agreed at journal 08 Jan, 2026 Reviewers agreed at journal 08 Jan, 2026 Reviewers agreed at journal 08 Jan, 2026 Reviews received at journal 12 Dec, 2025 Reviewers agreed at journal 20 Nov, 2025 Reviewers invited by journal 20 Nov, 2025 Editor assigned by journal 14 Nov, 2025 Submission checks completed at journal 14 Nov, 2025 First submitted to journal 12 Nov, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-8094183","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":548832409,"identity":"02f023ba-0b3d-4fd9-a349-ae821ea66c7b","order_by":0,"name":"Bright Enogieru Osatohanmwen","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABdklEQVRIie2RMUvDQBTHXwjUJcH1pNB8hQuBarC0ox/CJSHglGDFRVA0EEiX2q4pKH6FuoibLQfpcugaaMFKoIsKEUE6FPHSNm1t+wEE85vu7v1/vLt7ACkpfxAMnM0nm2TB9QHQLDAGLSvaREgUHk8znN1ap8TZJSWTBJgCK8p2ljivR0PISZWndlg+6+3jzqV/Kt7vgLTndMKo3CvWszYXRDNFrenurqeBgqnBK54/sJr08aArUgQy9XV2sYHRuGrxqje/GOVcRdBAb4KRyQoZYt0FZr4ruuji1jNlphADBxorrSo39ZCdfzPl+S1/zBSQvcOPsVKKldGi4oSxYgesi+jGXYQ8HysSMuMfI0WMmDJ/vlrlXF44QAoOQkURa8R6oKaydc0ULPiyRzHRUKA7anX+Y8JG+CkUCjmprr+EwhexGhUqR+/uOUgVpx+djEhp0zPawRAWiKeAYBU8nYlus/n8LvHRcniCNM2V1pdTUlJS/g8/3OCITS8L+T0AAAAASUVORK5CYII=","orcid":"","institution":"University of Goettingen","correspondingAuthor":true,"prefix":"","firstName":"Bright","middleName":"Enogieru","lastName":"Osatohanmwen","suffix":""},{"id":548832410,"identity":"d7e70796-0096-432c-8391-452aa200b1b2","order_by":1,"name":"Indalécio Cunha Vieira Júnior","email":"","orcid":"","institution":"KWS SAAT SE \u0026 Co. KGaA","correspondingAuthor":false,"prefix":"","firstName":"Indalécio","middleName":"Cunha Vieira","lastName":"Júnior","suffix":""},{"id":548832411,"identity":"b0e57223-306a-486c-9929-2308f7732111","order_by":2,"name":"Ahmad Reza Sharifi","email":"","orcid":"","institution":"University of Goettingen","correspondingAuthor":false,"prefix":"","firstName":"Ahmad","middleName":"Reza","lastName":"Sharifi","suffix":""},{"id":548832412,"identity":"d30b3d73-3b3f-4b16-9f4f-04b75ef9fd87","order_by":3,"name":"Timothy Beissinger","email":"","orcid":"","institution":"Heritable Ag","correspondingAuthor":false,"prefix":"","firstName":"Timothy","middleName":"","lastName":"Beissinger","suffix":""}],"badges":[],"createdAt":"2025-11-12 08:53:58","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-8094183/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-8094183/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":96920724,"identity":"f8d274fa-783f-43ad-a62b-c96e906ddb27","added_by":"auto","created_at":"2025-11-27 14:15:23","extension":"docx","order_by":0,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":1248365,"visible":true,"origin":"","legend":"","description":"","filename":"2NPLGBMPAPER.docx","url":"https://assets-eu.researchsquare.com/files/rs-8094183/v1/7d10085c1532d3ecb36c4482.docx"},{"id":96920850,"identity":"114d900c-3ebb-4ba6-a935-1a7d642d4d24","added_by":"auto","created_at":"2025-11-27 14:15:28","extension":"json","order_by":1,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":8047,"visible":true,"origin":"","legend":"","description":"","filename":"bed3246af2e841e1a30ce72c0ec8d71d.json","url":"https://assets-eu.researchsquare.com/files/rs-8094183/v1/d208720ac0938c00352f6958.json"},{"id":96920951,"identity":"20c1d356-2328-486b-ae57-0c4bf751d9b8","added_by":"auto","created_at":"2025-11-27 14:15:32","extension":"xml","order_by":2,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":188120,"visible":true,"origin":"","legend":"","description":"","filename":"bed3246af2e841e1a30ce72c0ec8d71d1enriched.xml","url":"https://assets-eu.researchsquare.com/files/rs-8094183/v1/8ada3164024c021b970a6eb0.xml"},{"id":96909700,"identity":"01f6ba99-e12c-4a4e-bc42-da39816f3ed5","added_by":"auto","created_at":"2025-11-27 13:01:02","extension":"png","order_by":3,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":96645,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-8094183/v1/bbdf054099cc921621e67416.png"},{"id":96909707,"identity":"03f427f2-28f3-4be1-a139-c869fee73422","added_by":"auto","created_at":"2025-11-27 13:01:02","extension":"png","order_by":4,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":52075,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage10.png","url":"https://assets-eu.researchsquare.com/files/rs-8094183/v1/b94f9dd99244c789e887045d.png"},{"id":96909709,"identity":"ca36c428-ffab-4080-9320-df90aa61a4bc","added_by":"auto","created_at":"2025-11-27 13:01:02","extension":"png","order_by":5,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":53928,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage11.png","url":"https://assets-eu.researchsquare.com/files/rs-8094183/v1/ff83317562c55da5c75a59cf.png"},{"id":96909708,"identity":"961715f3-e8ca-416f-8d32-c83e28d0eb2d","added_by":"auto","created_at":"2025-11-27 13:01:02","extension":"png","order_by":6,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":73680,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage12.png","url":"https://assets-eu.researchsquare.com/files/rs-8094183/v1/d6903a4ba97bc7ecc4157277.png"},{"id":96909728,"identity":"0636fb0f-e021-4e0b-8095-bce7f9603ca3","added_by":"auto","created_at":"2025-11-27 13:01:03","extension":"png","order_by":7,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":74465,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage13.png","url":"https://assets-eu.researchsquare.com/files/rs-8094183/v1/371de9bcb5db3a0524be6ab0.png"},{"id":96909716,"identity":"161ac4be-e778-4617-be60-5553bab2b680","added_by":"auto","created_at":"2025-11-27 13:01:02","extension":"png","order_by":8,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":81333,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-8094183/v1/a9fe013d18bed26c82836a32.png"},{"id":96920762,"identity":"25f6746d-bacb-47f4-8d10-64a1454bb08a","added_by":"auto","created_at":"2025-11-27 14:15:24","extension":"png","order_by":9,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":78106,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-8094183/v1/3f8118b3ef5a70563fadd987.png"},{"id":96920411,"identity":"8290d573-38aa-4836-9cc3-8335e9c13d51","added_by":"auto","created_at":"2025-11-27 14:15:08","extension":"png","order_by":10,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":76991,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-8094183/v1/3ef7a7cf64f3e6dfd7fba6b0.png"},{"id":96920368,"identity":"d241f1d2-4052-440b-9b12-0a4300cf8f2b","added_by":"auto","created_at":"2025-11-27 14:15:06","extension":"png","order_by":11,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":80583,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage5.png","url":"https://assets-eu.researchsquare.com/files/rs-8094183/v1/e55156f5b33b2c783a20105e.png"},{"id":96909711,"identity":"44bb1db9-707e-4b16-a9c1-538f99583d5a","added_by":"auto","created_at":"2025-11-27 13:01:02","extension":"png","order_by":12,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":84501,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage6.png","url":"https://assets-eu.researchsquare.com/files/rs-8094183/v1/31e76ec22e56d652918e73b6.png"},{"id":96920805,"identity":"192265df-d861-4154-acbe-c2e03420bc46","added_by":"auto","created_at":"2025-11-27 14:15:26","extension":"png","order_by":13,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":106527,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage7.png","url":"https://assets-eu.researchsquare.com/files/rs-8094183/v1/bc897c8c6073e60185f4765b.png"},{"id":96921089,"identity":"3d2bcd4a-f422-47a6-89d7-16e527e4320b","added_by":"auto","created_at":"2025-11-27 14:15:41","extension":"png","order_by":14,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":160692,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage8.png","url":"https://assets-eu.researchsquare.com/files/rs-8094183/v1/9e18864835d11e2ae4fb1dfe.png"},{"id":96920853,"identity":"001d361b-9f9f-49f5-8d94-9fa54620decf","added_by":"auto","created_at":"2025-11-27 14:15:28","extension":"png","order_by":15,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":163528,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage9.png","url":"https://assets-eu.researchsquare.com/files/rs-8094183/v1/f89ab6a9785e6cdfd8c193b9.png"},{"id":96909713,"identity":"2eab4880-f73c-4070-8d9f-cb6cb0200a9b","added_by":"auto","created_at":"2025-11-27 13:01:02","extension":"png","order_by":16,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":19010,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-8094183/v1/e4512e90477b0eed468ba940.png"},{"id":96920764,"identity":"fbfffbb1-547b-40dc-bbfb-3c7ce7389585","added_by":"auto","created_at":"2025-11-27 14:15:24","extension":"png","order_by":17,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":11915,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage10.png","url":"https://assets-eu.researchsquare.com/files/rs-8094183/v1/2f2ae86dc8339cf7b738d3f0.png"},{"id":96920045,"identity":"11fef221-ec72-40c8-b0b0-bfdd9bc1ad08","added_by":"auto","created_at":"2025-11-27 14:14:43","extension":"png","order_by":18,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":12848,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage11.png","url":"https://assets-eu.researchsquare.com/files/rs-8094183/v1/12ef04937969a39def06524e.png"},{"id":96909719,"identity":"0aee433c-727f-4e1d-8fbf-0249882f5ad4","added_by":"auto","created_at":"2025-11-27 13:01:02","extension":"png","order_by":19,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":16319,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage12.png","url":"https://assets-eu.researchsquare.com/files/rs-8094183/v1/e8f68beb32b5c0bec719696b.png"},{"id":96920709,"identity":"5d4a1038-4805-42e2-a8bb-825977dbb99e","added_by":"auto","created_at":"2025-11-27 14:15:22","extension":"png","order_by":20,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":17651,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage13.png","url":"https://assets-eu.researchsquare.com/files/rs-8094183/v1/cdff9b18a54ac8e489a5d752.png"},{"id":96920780,"identity":"e619cf41-a37c-4aea-adf0-fb35aa733678","added_by":"auto","created_at":"2025-11-27 14:15:25","extension":"png","order_by":21,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":18533,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-8094183/v1/88f4e5f174cd0d9b6ce3e851.png"},{"id":96909722,"identity":"9f6c85ef-375d-4fc6-813f-8201dcded33c","added_by":"auto","created_at":"2025-11-27 13:01:02","extension":"png","order_by":22,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":17503,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-8094183/v1/1a109a56fa25107dd12a0f14.png"},{"id":96909724,"identity":"a3033984-c97f-40ed-a335-14b1a7687e4f","added_by":"auto","created_at":"2025-11-27 13:01:02","extension":"png","order_by":23,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":17218,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-8094183/v1/bbd7f208931e920e10ca41c7.png"},{"id":96920713,"identity":"7c0f46f9-0726-42d4-98ef-973f501c3726","added_by":"auto","created_at":"2025-11-27 14:15:22","extension":"png","order_by":24,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":19857,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage5.png","url":"https://assets-eu.researchsquare.com/files/rs-8094183/v1/39d07522b78075be754cefc6.png"},{"id":96909730,"identity":"2596d5f9-ab0b-4b69-b46c-84d271288ba0","added_by":"auto","created_at":"2025-11-27 13:01:03","extension":"png","order_by":25,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":18865,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage6.png","url":"https://assets-eu.researchsquare.com/files/rs-8094183/v1/cafcc7bbff9cd9f3f8387eea.png"},{"id":96920729,"identity":"32e1746c-5103-4a25-9d33-6eb5f5085421","added_by":"auto","created_at":"2025-11-27 14:15:23","extension":"png","order_by":26,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":25650,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage7.png","url":"https://assets-eu.researchsquare.com/files/rs-8094183/v1/85171afe37b71d5192c99c82.png"},{"id":96909729,"identity":"ff05ded3-39db-4c92-b20d-f3aa60c01b2e","added_by":"auto","created_at":"2025-11-27 13:01:03","extension":"png","order_by":27,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":35940,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage8.png","url":"https://assets-eu.researchsquare.com/files/rs-8094183/v1/362b857e8bf110a1343809d7.png"},{"id":96909732,"identity":"e305dd28-0a6c-4e47-983b-538a3c2e0980","added_by":"auto","created_at":"2025-11-27 13:01:03","extension":"png","order_by":28,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":38701,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage9.png","url":"https://assets-eu.researchsquare.com/files/rs-8094183/v1/ba7f0bf1ce883bece7e17c76.png"},{"id":96909734,"identity":"fd597475-67d2-4e4b-80ae-8ed9c90f8bce","added_by":"auto","created_at":"2025-11-27 13:01:03","extension":"xml","order_by":29,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":186487,"visible":true,"origin":"","legend":"","description":"","filename":"bed3246af2e841e1a30ce72c0ec8d71d1structuring.xml","url":"https://assets-eu.researchsquare.com/files/rs-8094183/v1/bbee3c1ba0bd8c785f12faad.xml"},{"id":96909735,"identity":"5ab2041f-27a8-41ee-9a33-d34db7427118","added_by":"auto","created_at":"2025-11-27 13:01:03","extension":"html","order_by":30,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":207781,"visible":true,"origin":"","legend":"","description":"","filename":"earlyproof.html","url":"https://assets-eu.researchsquare.com/files/rs-8094183/v1/9c85fe1d3a586ac57336bdb1.html"},{"id":96909696,"identity":"2dbcc723-c9f0-410e-bc2d-736618ad28f9","added_by":"auto","created_at":"2025-11-27 13:01:02","extension":"jpg","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":92195,"visible":true,"origin":"","legend":"\u003cp\u003ePrincipal Component Analysis (PCA) of SNP data in the hybrid maize population. (A) Genetic structure showing clusters primarily associated with the tester used in hybrid development. (B) Cumulative variance explained by the first 20 principal components, with the first 10 PCs accounting for just over 50% of the total genetic variance. The first two PCs explain 23.6% of the variation, revealing seven distinct genetic groups.\u003c/p\u003e","description":"","filename":"1.jpg","url":"https://assets-eu.researchsquare.com/files/rs-8094183/v1/acac8bb8465752e088c194c1.jpg"},{"id":96909698,"identity":"4f13776e-62aa-4de0-8944-cc3a0dc1aaea","added_by":"auto","created_at":"2025-11-27 13:01:02","extension":"jpg","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":89440,"visible":true,"origin":"","legend":"\u003cp\u003ePrediction ability and selection efficiency for four agronomic traits, Grain Yield, Plant Height, Days to Anthesis, and Days to Silking, were evaluated under the \u003cstrong\u003eLOYO validation\u003c/strong\u003e scheme. The performance of three genomic prediction models is compared: the GBLUP additive model (GBLUP_ADD), the GBLUP additive + dominance model (GBLUP_ADDOM), and the 2NP matrix-based LightGBM model (2NPLGBM). Predictive ability is expressed as the mean Pearson correlation between observed and predicted phenotypes, with error bars representing the standard error across years. Selection efficiency, measured by the accuracy in identifying the top 20% performing hybrids, is similarly shown with standard error bars.\u003c/p\u003e","description":"","filename":"2.jpg","url":"https://assets-eu.researchsquare.com/files/rs-8094183/v1/b3b154b67498fd586e08c40d.jpg"},{"id":96920538,"identity":"ef931287-97ae-4eb2-bfdf-32d248f79be4","added_by":"auto","created_at":"2025-11-27 14:15:15","extension":"jpg","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":79653,"visible":true,"origin":"","legend":"\u003cp\u003ePrediction ability and selection efficiency for four agronomic traits, Grain Yield, Plant Height, Days to Anthesis, and Days to Silking, were evaluated under the \u003cstrong\u003eRW validation\u003c/strong\u003e scheme. The performance of three genomic prediction models is compared: the GBLUP additive model (GBLUP_ADD), the GBLUP additive + dominance model (GBLUP_ADD), and the 2NP matrix-based LightGBM model (2NPLGBM). Predictive ability is expressed as the mean Pearson correlation between observed and predicted phenotypes, with error bars representing the standard error across years. Selection efficiency, measured as the accuracy of identifying the top 20% performing hybrids, is similarly shown with standard error bars.\u003c/p\u003e","description":"","filename":"3.jpg","url":"https://assets-eu.researchsquare.com/files/rs-8094183/v1/f3755c912ea610eafa3cd359.jpg"},{"id":96909697,"identity":"fa7aad66-a15a-4a34-b7b7-5dbe0282f975","added_by":"auto","created_at":"2025-11-27 13:01:02","extension":"jpg","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":76726,"visible":true,"origin":"","legend":"\u003cp\u003ePrediction ability and selection efficiency for four agronomic traits, Grain Yield, Plant Height, Days to Anthesis, and Days to Silking, were evaluated under the \u003cstrong\u003e5-Fold cross-validation\u003c/strong\u003e scheme. The performance of three genomic prediction models is compared: the GBLUP additive model (GBLUP_ADD), the GBLUP additive + dominance model (GBLUP_ADDOM), and the 2NP matrix-based LightGBM model (2NPLGBM). Predictive ability is expressed as the mean Pearson correlation between observed and predicted phenotypes, with error bars representing the standard error across replicates. Selection efficiency, measured as the accuracy of identifying the top 20% performing hybrids, is similarly shown with standard error bars.\u003c/p\u003e","description":"","filename":"4.jpg","url":"https://assets-eu.researchsquare.com/files/rs-8094183/v1/519c8eff51e581a798576b5c.jpg"},{"id":96909702,"identity":"d58e0cf8-b223-47eb-8887-ef48bced0dfb","added_by":"auto","created_at":"2025-11-27 13:01:02","extension":"jpg","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":69794,"visible":true,"origin":"","legend":"\u003cp\u003ePrediction ability and selection efficiency for four agronomic traits, Grain Yield, Plant Height, Days to Anthesis, and Days to Silking, evaluated under the \u003cstrong\u003eTester CV0 cross-validation\u003c/strong\u003e scheme. The performance of three genomic prediction models is compared: the GBLUP additive model (GBLUP_ADD), the GBLUP additive + dominance model (GBLUP_ADDOM), and the 2NP matrix-based LightGBM model (2NPLGBM). Predictive ability is expressed as the mean Pearson correlation between observed and predicted phenotypes, with error bars representing the standard error across replicates. Selection efficiency, measured as the accuracy of identifying the top 20% performing hybrids, is similarly shown with standard error bars.\u003c/p\u003e","description":"","filename":"5.jpg","url":"https://assets-eu.researchsquare.com/files/rs-8094183/v1/c2bc77a4297eb059c550242c.jpg"},{"id":96909704,"identity":"47a8c681-fd97-49e0-98e9-6f1401ce5aa4","added_by":"auto","created_at":"2025-11-27 13:01:02","extension":"jpg","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":68532,"visible":true,"origin":"","legend":"\u003cp\u003ePrediction ability and selection efficiency for four agronomic traits, Grain Yield, Plant Height, Days to Anthesis, and Days to Silking, evaluated under the \u003cstrong\u003eTester CV00 cross-validation\u003c/strong\u003e scheme. The performance of three genomic prediction models is compared: the GBLUP additive model (GBLUP_ADD), the GBLUP additive + dominance model (GBLUP_ADDOM), and the 2NP matrix-based LightGBM model (2NPLGBM). Predictive ability is expressed as the mean Pearson correlation between observed and predicted phenotypes, with error bars representing the standard error across replicates. Selection efficiency, measured by the accuracy in identifying the top 20% performing hybrids, is similarly shown with standard error bars.\u003c/p\u003e","description":"","filename":"6.jpg","url":"https://assets-eu.researchsquare.com/files/rs-8094183/v1/3c8d110d04a14c993e696a82.jpg"},{"id":96920366,"identity":"5888624a-78cf-4cae-adbd-1a2b8c277317","added_by":"auto","created_at":"2025-11-27 14:15:06","extension":"jpg","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":63939,"visible":true,"origin":"","legend":"\u003cp\u003eRelative contribution of additive and dominance SNP markers to model predictions for each of the four agronomic traits evaluated: Grain Yield, Plant Height, Days to Anthesis, and Days to Silking. The bar plot displays the proportion of the total absolute Shapley Additive Explanation (SHAP) values attributed to each effect type (additive in blue and dominance in purple), reflecting their relative influence on the model output.\u003c/p\u003e","description":"","filename":"7.jpg","url":"https://assets-eu.researchsquare.com/files/rs-8094183/v1/c2d8c965b41f19b61378227b.jpg"},{"id":96909712,"identity":"a89a77b2-ccf2-4455-9a45-38af264e885e","added_by":"auto","created_at":"2025-11-27 13:01:02","extension":"jpg","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":123353,"visible":true,"origin":"","legend":"\u003cp\u003eShapley Additive Explanation (SHAP) summary plots showing the absolute contribution of the top 20 SNP markers for the four agronomic traits evaluated in this study. Each panel corresponds to one trait (Grain Yield, Plant Height, Days to Anthesis, and Days to Silking), with SNP markers ranked along the y-axis according to their relative importance in model predictions. The x-axis represents the magnitude of SHAP values, indicating each marker’s contribution to the model's output. Blue-colored points represent the additive component of the 2NP matrix, while purple-colored points correspond to the dominance component of the 2NP matrix. This visualization illustrates how both additive and dominance effects contribute to trait prediction in the hybrid maize population.\u003c/p\u003e","description":"","filename":"8.jpg","url":"https://assets-eu.researchsquare.com/files/rs-8094183/v1/8ee5a82913b8f8bf7f4d56b8.jpg"},{"id":96923359,"identity":"f5f34438-18b6-4112-9b55-fa19cb5baa3b","added_by":"auto","created_at":"2025-11-27 14:21:39","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":2073726,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8094183/v1/93be693d-3323-43ce-a06b-dacdf792b1c7.pdf"},{"id":96921054,"identity":"87db0f15-64d9-45f9-8024-bbb473edb7f3","added_by":"auto","created_at":"2025-11-27 14:15:38","extension":"docx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":166509,"visible":true,"origin":"","legend":"","description":"","filename":"APPENDIX.docx","url":"https://assets-eu.researchsquare.com/files/rs-8094183/v1/8011243af6ca742ad48c8aab.docx"}],"financialInterests":"No competing interests reported.","formattedTitle":"2NPLGBM: A genomic model that merges the strengths of classical and machine learning methods in genomic prediction","fulltext":[{"header":"INTRODUCTION","content":"\u003cp\u003ePlant breeding has advanced significantly with the introduction of genomic selection (GS). This methodology leverages genome-wide molecular markers to predict the genetic potential of individuals within a breeding population. This process, known as genomic prediction, enables breeders to estimate breeding values based on dense marker data, allowing for the selection of superior individuals even before phenotypic data is available [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e, \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e]. Unlike traditional marker-assisted selection [\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e], which focuses on a few quantitative trait loci (QTLs), GS captures the cumulative effects of all available markers across the genome, leading to more accurate and efficient selection of elite genotypes [\u003cspan additionalcitationids=\"CR5\" citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e]. As plant breeding programs increasingly incorporate GS, further research has been conducted to optimize the methods used for genomic prediction. Genomic prediction, just like other applications of statistical modeling, is characterized by two dominant modeling cultures: the data model culture and the algorithmic model culture [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e]. These cultures reflect distinct approaches to handling genetic data and extracting predictive signals.\u003c/p\u003e\n\u003ch3\u003eThe Data Model Culture: Structured Statistical Modeling\u003c/h3\u003e\n\u003cp\u003eThe data model culture, primarily associated with classical statistical approaches, assumes an explicit, often linear relationship between genetic markers and phenotypic traits. Classical quantitative genetics often adopts the data modeling culture, using biologically informed models to estimate additive and dominance effects for selection. Parametric models, such as genomic best linear unbiased prediction (GBLUP), are widely used [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e, \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e]. GBLUP utilizes the genomic relationship matrix (G) and assumes that marker effects are normally distributed to calculate genomic breeding values, thereby providing a robust and interpretable framework for genomic selection.\u003c/p\u003e\u003cp\u003eSeveral extensions of the data model culture have been developed to improve prediction accuracy, accommodate variable genetic architectures, integrate different data sources, and enhance computational efficiency. These include:\u003c/p\u003e\u003cp\u003e\u003cul\u003e\u003cli\u003e\u003cp\u003e\u003cb\u003eGBLUP with additive, dominance, and/or epistasis\u003c/b\u003e: Extends the standard GBLUP model by incorporating a dominance relationship matrix (\u003cb\u003eD\u003c/b\u003e) to capture dominance deviations and epistatic effects using Hadamard products of genomic matrices. The model can be flexibly applied to extend only dominance effects, only epistasis, or a combination of dominance and epistasis, providing a more comprehensive approach to genomic prediction [\u003cspan additionalcitationids=\"CR9 CR10 CR11 CR12 CR13 CR14 CR15 CR16\" citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e]\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003e\u003cb\u003eBayesian models (e.g., BayesA, BayesB, BayesCπ)\u003c/b\u003e: Introduce marker-specific shrinkage to allow for variable effect sizes and sparsity [\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e, \u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e]\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003e\u003cb\u003eRidge Regression Best Linear Unbiased Prediction (rrBLUP)\u003c/b\u003e: A widely used method that applies ridge regression to estimate marker effects while controlling for multicollinearity between markers, providing a computationally efficient alternative to GBLUP [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e, \u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e]\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003e\u003cb\u003eGenomic Selection Index Models\u003c/b\u003e: These models extend the classical selection index by integrating genomic estimated breeding values (GEBVs) for multiple traits to optimize selection decisions. By combining information across correlated traits, they enable simultaneous improvement of several breeding objectives while accounting for genetic correlations and economic weights [\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e]. This approach enhances selection efficiency, particularly in genomic breeding programs where multiple complex traits are under consideration.\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003e\u003cb\u003eMultivariate GBLUP (MV-GBLUP)\u003c/b\u003e: These models extend the traditional GBLUP to jointly model multiple correlated traits, capturing both within- and between-trait genetic covariances. By leveraging genetic correlations among traits, MV-GBLUP improves prediction accuracy, particularly for traits with limited phenotypic data or lower heritability [\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e]. It provides a robust method for multi-trait genomic prediction and selection in plant and animal breeding programs.\u003c/p\u003e\u003c/li\u003e\u003c/ul\u003e\u003c/p\u003e\u003cp\u003eWhile these methods provide structured and interpretable frameworks for genomic prediction, they are limited in their ability to capture complex, non-linear genetic interactions such as dominance and epistasis. This limitation arises from their reliance on predefined genetic structures and linear assumptions. Moreover, the assumption that marker effects follow a normal distribution may not always hold, potentially leading to the underestimation of non-additive effects that contribute significantly to phenotypic variation.\u003c/p\u003e\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e\u003ch2\u003eThe Algorithmic Model Culture: Machine Learning and Deep Learning\u003c/h2\u003e\u003cp\u003eIn contrast to the data model culture, the algorithmic model culture, primarily driven by machine learning (ML) and deep learning (DL), focuses on non-parametric, data-driven approaches that learn complex genetic patterns directly from data.\u003c/p\u003e\u003cp\u003eExamples of ML methods applied to genomic prediction include:\u003c/p\u003e\u003cp\u003e\u003cul\u003e\u003cli\u003e\u003cp\u003e\u003cb\u003eRandom Forests (RF)\u003c/b\u003e: An ensemble learning method that captures non-linear interactions through decision trees. There have been various applications of random forest in plant breeding, with varying levels of success [\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e, \u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e, \u003cspan additionalcitationids=\"CR25\" citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e]\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003e\u003cb\u003eSupport Vector Machines (SVMs)\u003c/b\u003e: A kernel-based approach that maps genomic data into higher-dimensional spaces to capture complex relationships [\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e, \u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e])\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003e\u003cb\u003eBoosting Machines\u003c/b\u003e: Gradient Boosting Machines (GBM) and their variants, such as LightGBM and XGBoost, are widely used in plant breeding due to their ability to handle complex datasets and improve prediction accuracy. These methods sequentially enhance weak learners to create strong predictive models, making them well-suited for genomic selection and trait prediction [\u003cspan additionalcitationids=\"CR30 CR31\" citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e]\u003c/p\u003e\u003c/li\u003e\u003c/ul\u003e\u003c/p\u003e\u003cp\u003eDeep learning, particularly neural networks, has also gained significant traction due to its ability to model high-dimensional, non-linear interactions. Examples of deep learning methods applied in the genomic prediction of plants for breeding purposes include:\u003c/p\u003e\u003cp\u003e\u003cul\u003e\u003cli\u003e\u003cp\u003e\u003cb\u003eDeep Neural networks\u003c/b\u003e: Learn hierarchical feature representations from genomic data without prior assumptions about genetic architecture [\u003cspan additionalcitationids=\"CR34 CR35 CR36 CR37\" citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e]\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003e\u003cb\u003eConvolutional Neural Networks (CNNs)\u003c/b\u003e: Originally developed for analyzing grid-like data such as images, CNNs have been adapted for genomic prediction to exploit the sequential organization of genetic markers along chromosomes. In this context, \u0026ldquo;spatial patterns\u0026rdquo; refer to local dependencies among adjacent markers\u0026mdash;such as linkage disequilibrium blocks or haplotype segments\u0026mdash;rather than physical spatial positions. By applying convolutional filters across marker sequences, CNNs can automatically extract these locally correlated genomic features, thereby improving the prediction of complex traits in structured genomic data [\u003cspan additionalcitationids=\"CR40 CR41\" citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e]\u003c/p\u003e\u003c/li\u003e\u003c/ul\u003e\u003c/p\u003e\u003cp\u003eDespite the different levels of success observed in their application in Plant Breeding, ML, and DL models often require large datasets for effective training and suffer from limited interpretability. Furthermore, their reliance on black-box optimization makes it challenging to incorporate prior biological knowledge.\u003c/p\u003e\u003c/div\u003e\n\u003ch3\u003eA New Hybrid Genomic Modeling Culture\u003c/h3\u003e\n\u003cp\u003eDespite the successes of data and algorithmic model cultures in plant breeding, both cultures have limitations that restrict their ability to fully leverage genomic information and its applications in plant breeding. One such limitation is that the genomic data model's variation attempts to incorporate different sources of variation using multiple kernels. Depending on the chosen model, genomic data models can capture additive, dominance, and epistatic effects, either individually or in combination, by using specific genomic relationship matrices or covariance structures as input. In such modeling, there is an assumption of independence between matrices, which is not biologically accurate. For genomic algorithmic models, interactions among loci are implicitly captured from the marker data without the need to construct separate relationship matrices or kernels. However, the process by which these models identify, represent, and use such interactions is not fully understood, leading to limited biological interpretability despite sometimes high predictive accuracy.\u003c/p\u003e\u003cp\u003eHere, we introduce a hybrid genomic modeling culture that integrates both data-driven and algorithmic approaches (see \u003cem\u003eMaterials and Methods\u003c/em\u003e), aiming to harness their complementary strengths in addressing the assumption of independence among multiple kernels in traditional genomic models and in enhancing the biological interpretability of machine learning results. This culture combines structured statistical modeling with the flexibility of machine learning. Central to this approach is the 2NP matrix, a novel representation that concatenates the additive-centered matrix (Z) and the dominance-deviation matrix (W) to explicitly capture additive and dominance effects, respectively. The hybrid genomic modeling culture, which integrates 2NP with a machine learning algorithm, provides a powerful means of modeling both additive and non-additive genetic effects, including their interactions. The concatenation of both matrices within the 2NP structure highlights the dependencies between additive and dominance components, while the machine learning model identifies and utilizes these interactions. This unified approach holds significant promise for enhancing selection accuracy and understanding complex genetic architectures in hybrid breeding programs.\u003c/p\u003e\n\u003ch3\u003eEfficiency of Selection\u003c/h3\u003e\n\u003cp\u003eIn a breeding program, selection represents the final component of the process in which genotypes with desirable traits are identified and retained from a population. In the context of GS, the efficiency of selection becomes a critical focus, as various models are employed to enhance the accuracy of predicting genetic merit [\u003cspan citationid=\"CR43\" class=\"CitationRef\"\u003e43\u003c/span\u003e]. Traditionally, GS is framed as the task of predicting an individual\u0026rsquo;s genomic estimated breeding value (GEBV) for a target trait [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e, \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e, \u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e44\u003c/span\u003e]. While prediction accuracy is commonly used to evaluate model performance, breeders are often more concerned with practical outcomes, specifically, how well a model ranks and identifies the top-performing genotypes. Therefore, in this study, beyond assessing model performance through prediction accuracy, we also address a key breeding question: How efficiently does the model enable the selection of the best performers? To answer this, we evaluate the efficiency of selection, defined as the model's ability to correctly identify individuals with the highest true performance. By bridging the gap between structured statistical approaches and machine learning methods, we aim to establish a hybrid culture (explained above) that improves selection efficiency while preserving biological relevance.\u003c/p\u003e"},{"header":"MATERIAL AND METHODS","content":"\u003cp\u003e\u003cb\u003e2np Matrix Theory: Bridging the Gap in the Application of Machine Learning Methods in Genomic Prediction\u003c/b\u003e\u003c/p\u003e\u003cp\u003eThe standard genomic best linear unbiased prediction (GBLUP) model, incorporating both additive and dominance genetic variance, assumes that the phenotypic value) is modeled as:\u003cdiv id=\"Equa\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equa\" name=\"EquationSource\"\u003e\n$$\\:y=Xb+{Z}_{A}a+{Z}_{D}d+e$$\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003ewhere \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:y\$\u003c/span\u003e\u003c/span\u003e is the vector of phenotypes\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:X\$\u003c/span\u003e\u003c/span\u003e, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{Z}_{A}\$\u003c/span\u003e\u003c/span\u003e and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{Z}_{D}\$\u003c/span\u003e\u003c/span\u003eare incidence matrices for \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:b\$\u003c/span\u003e\u003c/span\u003e, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:a,\$\u003c/span\u003e\u003c/span\u003e and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:d\$\u003c/span\u003e\u003c/span\u003e, respectively. \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:b\$\u003c/span\u003e\u003c/span\u003e is the solution vector of fixed effects (including the overall mean), \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:a\\sim\\:N\\left(0,A{\\sigma\\:}_{a}^{2}\\right)\$\u003c/span\u003e\u003c/span\u003e represents the additive genetic effects,\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:d\\:\\sim\\:N\\left(0,D{\\sigma\\:}_{d}^{2}\\right)\$\u003c/span\u003e\u003c/span\u003erepresents the dominance genetic effects, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:e\\:\\sim\\:N\\left(0,I{\\sigma\\:}_{e}^{2}\\right)\$\u003c/span\u003e\u003c/span\u003e is the vector of residual effects, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:A\$\u003c/span\u003e\u003c/span\u003e is the additive genomic relationship matrix, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:D\$\u003c/span\u003e\u003c/span\u003e is the dominance genomic relationship matrix, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:I\$\u003c/span\u003e\u003c/span\u003e is the identity matrix, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{\\sigma\\:}_{A}^{2}\$\u003c/span\u003e\u003c/span\u003e, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{\\sigma\\:}_{D}^{2}\$\u003c/span\u003e\u003c/span\u003e, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{\\sigma\\:}_{e}^{2}\$\u003c/span\u003e\u003c/span\u003e are the additive genetic variance, dominance genetic variance, and residual variance, respectively.\u003c/p\u003e\n\u003ch3\u003eGenomic Relationship Matrices in Standard GBLUP\u003c/h3\u003e\n\u003cp\u003eThe additive genomic relationship matrix (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:A\$\u003c/span\u003e\u003c/span\u003e) is estimated following ([\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e]):\u003cdiv id=\"Equb\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equb\" name=\"EquationSource\"\u003e\n$$\\:A=ZZ{\\prime\\:}{\\left(\\sum\\:_{i=1}^{m}2{p}_{i}\\left(1-2{p}_{i}\\right)\\right)}^{-1}$$\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003ewhere \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:Z=M-P\$\u003c/span\u003e\u003c/span\u003e; is the centered genotype matrix, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:M\$\u003c/span\u003e\u003c/span\u003e is the marker matrix coded as 0,1,2 for alternative alleles, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:P={{2p}_{i}^{\\text{❑}}}_{\\text{❑}}\$\u003c/span\u003e\u003c/span\u003eis the matrix of allele frequencies, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{{p}_{i}^{\\text{❑}}}_{\\text{❑}}\$\u003c/span\u003e\u003c/span\u003eis the allele frequency at locus \u003cem\u003ei\u003c/em\u003e.\u003c/p\u003e\u003cp\u003eThe dominance genomic relationship matrix \u003cem\u003e(\u003c/em\u003e\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:D\$\u003c/span\u003e\u003c/span\u003e\u003cem\u003e)\u003c/em\u003e is estimated using [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e]:\u003cdiv id=\"Equc\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equc\" name=\"EquationSource\"\u003e\n$$\\:D=WW{\\prime\\:}{\\left(\\sum\\:_{i=1}^{m}4{p}_{i}^{2}{q}_{i}^{2}\\right)}^{-1}$$\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003ewhere \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:W\$\u003c/span\u003e\u003c/span\u003e is a matrix of heterozygosity coefficients; the \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:i\$\u003c/span\u003e\u003c/span\u003e-th column of \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:W\$\u003c/span\u003e\u003c/span\u003e is defined as \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:-{{2p}_{i}^{2}}_{\\text{❑}}\$\u003c/span\u003e\u003c/span\u003efor homozygous alleles and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{2p}_{i}{q}_{i}\$\u003c/span\u003e\u003c/span\u003e for heterozygotes, where \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{p}_{i}\$\u003c/span\u003e\u003c/span\u003e and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{q}_{i}=1-{p}_{i}\$\u003c/span\u003e\u003c/span\u003e are allele frequencies at locus\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:i\$\u003c/span\u003e\u003c/span\u003e, respectively [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e].\u003c/p\u003e\u003cdiv id=\"Sec8\" class=\"Section2\"\u003e\u003ch2\u003eLimitations of Standard GBLUP and the 2NP Matrix Approach with Gradient Boosting Integration\u003c/h2\u003e\u003cp\u003eThe GBLUP model assumes that additive and dominance effects are independent, which is not biologically accurate. The interactions play an important role in trait expression. To address this limitation, we propose the 2NP matrix, which concatenates the additive-centered matrix (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:Z\$\u003c/span\u003e\u003c/span\u003e) and the dominance deviation matrix (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:W\$\u003c/span\u003e\u003c/span\u003e) into a single combined kernel:\u003cdiv id=\"Equd\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equd\" name=\"EquationSource\"\u003e\n$$\\:{G}_{2NP}=\\left[Z\\mid\\:W\\right]$$\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003ewhere \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:Z\$\u003c/span\u003e\u003c/span\u003ecaptures additive genetic effects and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:W\$\u003c/span\u003e\u003c/span\u003e captures dominance deviations. This combined structure maintains the individual contributions of additive and dominance variance while enabling interactions to emerge naturally. While the 2NP matrix captures both additive and dominance effects, gradient boosting methods (e.g., LGBM and XGBoost) are leveraged to model complex gene interactions. The prediction model now takes the form:\u003cdiv id=\"Eque\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Eque\" name=\"EquationSource\"\u003e\n$$\\:y=f\\left({G}_{2NP}\\right)+e$$\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003ewhere \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:f\\left({G}_{2NP}\\right)\$\u003c/span\u003e\u003c/span\u003e is a gradient-boosting machine that captures additive effects \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{g}_{A}\$\u003c/span\u003e\u003c/span\u003e, dominance effects \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{g}_{D}\$\u003c/span\u003e\u003c/span\u003e, additive-additive interactions \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{g}_{AA}\$\u003c/span\u003e\u003c/span\u003e, additive-dominance interactions \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{g}_{AD}\$\u003c/span\u003e\u003c/span\u003e, and dominance-dominance interactions \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{g}_{DD}\$\u003c/span\u003e\u003c/span\u003e. The gradient boosting model optimizes a non-linear transformation of \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{G}_{2NP}\$\u003c/span\u003e\u003c/span\u003e, capturing higher-order interactions between loci that are typically ignored in standard GBLUP.\u003c/p\u003e\u003c/div\u003e\n\u003ch3\u003ePhenotypic and genotypic data\u003c/h3\u003e\n\u003cp\u003eWe used data from Genomes to Fields (G2F) 2024 Maize Genotype by Environment Prediction Competition [\u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e45\u003c/span\u003e]. We used data spanning 6 years, from 2018 to 2023, which consisted of 2,925 unique maize (\u003cem\u003eZea mays L.\u003c/em\u003e) hybrids evaluated in multiple environments across the United States, Canada, and Germany. The modified Randomized Complete Block Design (RCBD), mainly with two replications per environment, was used in the trials. Our analysis covers four traits: Plant Height (cm), Days to Anthesis (days), Days to Silking (days), and Grain Yield (Mg. ha).\u003c/p\u003e\u003cp\u003eThe genotypic data were described in [\u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e45\u003c/span\u003e]. For the G2F materials from 2014 to 2023, variant calls were performed using the Practical Haplotype Graph (PHG) [\u003cspan citationid=\"CR46\" class=\"CitationRef\"\u003e46\u003c/span\u003e]. Hybrid genotypes were generated by combining information from their parent lines using the CreateHybridGenotypes plugin available in TASSEL 5 [\u003cspan citationid=\"CR47\" class=\"CitationRef\"\u003e47\u003c/span\u003e], yielding 5,899 individuals. SNPs with a minor allele frequency (MAF) below 1% were then filtered out, resulting in 2,425 high-quality variant positions. We filtered for the years 2018, 2019, 2020, 2021, 2022, and 2023. Since the SNP markers were already filtered for extreme values, there was no need for further filtering.\u003c/p\u003e\u003cp\u003eTo identify and remove outliers, a linear model was fitted with the hybrids and replicated as fixed effects in each unique environment, defined by field location and year, as described by [\u003cspan citationid=\"CR48\" class=\"CitationRef\"\u003e48\u003c/span\u003e]. To reduce computational time, a two-step analysis was employed to calculate the best linear unbiased estimates (BLUEs) for each hybrid, as described in [\u003cspan citationid=\"CR48\" class=\"CitationRef\"\u003e48\u003c/span\u003e]. To generate the best linear unbiased estimates (BLUEs) for each hybrid in each environment, we adjusted the BLUEs from the first of the two steps above, with a linear mixed model considering hybrid as a fixed effect and field location (FL) as a random effect:\u003cdiv id=\"Equf\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equf\" name=\"EquationSource\"\u003e\n$$\\:{y}_{if}=\\mu\\:+{H}_{i}+{FL}_{f}+{e}_{if}$$\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003ewhere \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{y}_{ik}\$\u003c/span\u003e\u003c/span\u003e is the BLUE of the \u003cem\u003ei\u003c/em\u003e-th hybrid calculated from (the first step); \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:\\mu\\:\$\u003c/span\u003e\u003c/span\u003e is the overall mean; \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{H}_{i}\$\u003c/span\u003e\u003c/span\u003e is the fixed effect of the \u003cem\u003e\u0026#119894;\u003c/em\u003e-\u0026#119905;ℎ hybrid; \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{FL}_{f}\$\u003c/span\u003e\u003c/span\u003eis the random effect of the \u003cem\u003ef\u003c/em\u003e-th field location; \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{e}_{if}\$\u003c/span\u003e\u003c/span\u003e is the residual term associated with the observation\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{y}_{ik}\$\u003c/span\u003e\u003c/span\u003e. Variance components and heritability estimation were done as recorded by [\u003cspan citationid=\"CR48\" class=\"CitationRef\"\u003e48\u003c/span\u003e]\u003c/p\u003e\n\u003ch3\u003eGenomic prediction models\u003c/h3\u003e\n\u003cp\u003e\u003cb\u003eClassical models\u003c/b\u003e: Two GBLUP models were chosen for this study. These are the GBLUP model, which utilizes a single genomic relationship matrix for additive effects [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e], and the GBLUP model, which employs a genetic relationship matrix for both additive and dominance effects, using separate kernels for each [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e].\u003c/p\u003e\u003cp\u003e\u003col\u003e\u003cspan\u003e\u003cli\u003e\u003cp\u003eGBLUP model with additive effects only (GBLUP_ADD): The model with the additive effects can be expressed as:\u003c/p\u003e\u003c/li\u003e\u003c/span\u003e\u003c/ol\u003e\u003cdiv id=\"Equg\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equg\" name=\"EquationSource\"\u003e\n$$\\:y=Xb+{Z}_{a}a+e$$\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003ewhere \u003cb\u003ey\u003c/b\u003e is the vector of phenotypes, and \u003cb\u003eX\u003c/b\u003e and \u003cb\u003eZ\u003c/b\u003e\u003csub\u003e\u003cb\u003ea\u003c/b\u003e\u003c/sub\u003e are the incidence matrices for \u003cb\u003eb\u003c/b\u003e and \u003cb\u003ea\u003c/b\u003e, respectively. \u003cb\u003eb\u003c/b\u003e is the solution vector of fixed effects (), and \u003cb\u003ea\u003c/b\u003e is the vector of additive genetic effects assumed to follow a normal distribution with an expectation of \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:\\sim\\:N\\left(0,A{\\sigma\\:}_{a}^{2}\\right)\$\u003c/span\u003e\u003c/span\u003e, \u003cb\u003ee\u003c/b\u003e is a vector of random residual effects assumed to be \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:\\sim\\:N\\left(0,I{\\sigma\\:}_{e}^{2}\\right)\$\u003c/span\u003e\u003c/span\u003e, \u003cb\u003eA\u003c/b\u003e is the additive genomic relationship matrix, I denotes the identity matrix, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{\\sigma\\:}_{a}^{2}\$\u003c/span\u003e\u003c/span\u003e additive genetic variance, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{\\sigma\\:}_{e}^{2}\$\u003c/span\u003e\u003c/span\u003e is the residual variance, and N(.,.) denotes a normally distributed random variable.\u003c/p\u003e\u003cp\u003eThe additive genomic relationship matrix \u003cb\u003eA\u003c/b\u003e was estimated as: [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e]\u003cdiv id=\"Equh\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equh\" name=\"EquationSource\"\u003e\n$$\\:A=\\frac{ZZ{\\prime\\:}}{\\sum\\:_{i=1}^{m}2{p}_{i}\\left(1-2{p}_{i}\\right)}$$\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003ewhere \u003cb\u003eZ\u003c/b\u003e is a genotype matrix obtained from the subtraction of P from M (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:M-P\$\u003c/span\u003e\u003c/span\u003e); \u003cb\u003eM\u003c/b\u003e is the marker matrix with genotypes coded as 0, 1, and 2 according to the number of alternative alleles, and the dimensions (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:n\\times\\:m\$\u003c/span\u003e\u003c/span\u003e) \u003cb\u003en\u003c/b\u003e is the number of sample individual and \u003cb\u003em\u003c/b\u003e is the number of loci; \u003cb\u003eP\u003c/b\u003e is the matrix \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:2{p}_{i}\$\u003c/span\u003e\u003c/span\u003e, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{p}_{i}\$\u003c/span\u003e\u003c/span\u003e are the \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:i\$\u003c/span\u003e\u003c/span\u003eth allele frequency and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:\\sum\\:_{i=1}^{m}2{p}_{i}\\left(1-2{p}_{i}\\right)\$\u003c/span\u003e\u003c/span\u003e is the sum of all the marker across loci [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e].\u003c/p\u003e\u003cp\u003e\u003col start=\"2\"\u003e\u003cspan\u003e\u003cli\u003e\u003cp\u003eGBLUP model with additive and dominance effects (GBLUP_ADDOM): The model with both additive and dominance effects can be expressed as:\u003c/p\u003e\u003c/li\u003e\u003c/span\u003e\u003c/ol\u003e\u003cdiv id=\"Equi\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equi\" name=\"EquationSource\"\u003e\n$$\\:y=Xb+{Z}_{a}a+{Z}_{d}d+e$$\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003emost of the above equation has been described in the Add model, \u003cb\u003ed\u003c/b\u003e is the vector of dominance genetic effects, and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{Z}_{d}\$\u003c/span\u003e\u003c/span\u003eis the incidence matrix for \u003cb\u003ed. d\u003c/b\u003e is assumed to follow a normal distribution with an expectation of ~\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:N\\left(0,D{\\sigma\\:}_{d}^{2}\\right)\$\u003c/span\u003e\u003c/span\u003e where \u003cb\u003eD\u003c/b\u003e is the dominance genomic relationship matrix and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{\\sigma\\:}_{d}^{2}\$\u003c/span\u003e\u003c/span\u003e dominance genetic variance.\u003c/p\u003e\u003cp\u003eThe dominance genomic relationship matrix \u003cb\u003eD\u003c/b\u003e was estimated as: [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e]\u003cdiv id=\"Equj\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equj\" name=\"EquationSource\"\u003e\n$$\\:D=\\frac{WW{\\prime\\:}}{\\sum\\:_{i=1}^{m}4{p}_{i}^{2}{q}_{i}^{2}}$$\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003ewhere \u003cb\u003eW\u003c/b\u003e is a matrix containing heterozygosity coefficients, the coefficients of the ith column in matrix W are \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:-{{2p}_{i}^{2}}_{\\text{❑}}\$\u003c/span\u003e\u003c/span\u003efor \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{A}_{1}{A}_{1}\$\u003c/span\u003e\u003c/span\u003e, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{2p}_{i}{q}_{i}\$\u003c/span\u003e\u003c/span\u003e for \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{A}_{1}{A}_{2}\$\u003c/span\u003e\u003c/span\u003e, and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:-{{2p}_{i}^{2}}_{\\text{❑}}\$\u003c/span\u003e\u003c/span\u003e for \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{A}_{2}{A}_{2}\$\u003c/span\u003e\u003c/span\u003e, where \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{q}_{i}\$\u003c/span\u003e\u003c/span\u003e, and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{p}_{i}\$\u003c/span\u003e\u003c/span\u003eare the frequencies of allele 1 (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{A}_{1}\$\u003c/span\u003e\u003c/span\u003e) and allele 2 (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{A}_{2}\$\u003c/span\u003e\u003c/span\u003e) at locus \u003cem\u003ei\u003c/em\u003e, respectively [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e].\u003c/p\u003e\u003cp\u003e\u003cstrong\u003eMachine Learning Models\u003c/strong\u003e\u003cp\u003eThe 2NP genomic prediction model was fitted using a Gradient Boosting Machine (GBM) implemented in the LightGBM framework [\u003cspan citationid=\"CR49\" class=\"CitationRef\"\u003e49\u003c/span\u003e] within Python 3.8. The model was trained using a structured workflow that combined automated Bayesian hyperparameter optimization with the LightGBM machine learning algorithm. Specifically, BayesSearchCV from the Scikit-Optimize (skopt) library [\u003cspan citationid=\"CR50\" class=\"CitationRef\"\u003e50\u003c/span\u003e] was employed to perform Bayesian optimization of hyperparameters, enabling efficient exploration of the parameter space while reducing the risk of overfitting. Model interpretability and biological insight were achieved using SHapley Additive exPlanations (SHAP) [\u003cspan citationid=\"CR51\" class=\"CitationRef\"\u003e51\u003c/span\u003e], which quantified the contribution of each genomic feature to prediction outcomes.\u003c/p\u003e\u003c/p\u003e\u003cp\u003eThe results of 2NPLGBM were compared with those of GBLUP. Two GBLUP models were chosen for this study. The GBLUP model (GBLUP_ADD), which uses a single genomic relationship matrix for additive effects [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e], and the GBLUP model (GBLUP_ADDOM), which uses a genetic relationship matrix for both the additive and dominance effects using separate kernels for each [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e].\u003c/p\u003e\u003cdiv id=\"Sec11\" class=\"Section2\"\u003e\u003ch2\u003eValidation Schemes\u003c/h2\u003e\u003cp\u003eTo evaluate the model\u0026rsquo;s performance and show realistic scenarios, we implemented five distinct validation schemes: two designed to simulate unseen years, Leave-One-Year-Out (LOYO) and Rolling Window (RW), and three designed to capture genetic relationships, Five-Fold, Tester CV0, and Tester CV00. For the schemes based on genetic relationships, we conducted 10 repetitions of five-fold cross-validation. In each repetition, the phenotypic data were partitioned into five subsets; each subset served as the validation set once, while the remaining four were used for training [\u003cspan citationid=\"CR52\" class=\"CitationRef\"\u003e52\u003c/span\u003e, \u003cspan citationid=\"CR53\" class=\"CitationRef\"\u003e53\u003c/span\u003e].\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec12\" class=\"Section2\"\u003e\u003ch2\u003eLOYO\u003c/h2\u003e\u003cp\u003eIn the LOYO scheme, data from all but one year were used for training, while the left-out year served as the test set. We utilized data from all six years (2018\u0026ndash;2023), resulting in a total of six validations, one for each year.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec13\" class=\"Section2\"\u003e\u003ch2\u003eRW\u003c/h2\u003e\u003cp\u003eIn the Rolling Window scheme, a fixed window of three consecutive years was used as the training set, and the following year was used as the test set. This window was then shifted forward by one year at a time, and the procedure was repeated until the last year (2023).\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec14\" class=\"Section2\"\u003e\u003ch2\u003eFive-Fold\u003c/h2\u003e\u003cp\u003eFor the Five-Fold scheme, hybrid genotypes were randomly divided into five equal-sized folds. In each round of cross-validation, four folds were used for training, and the remaining fold was used for testing. This process was repeated 10 times to ensure robustness.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec15\" class=\"Section2\"\u003e\u003ch2\u003eTester CV0\u003c/h2\u003e\u003cp\u003eThe Tester CV0 scheme focused on predicting hybrids with known testers in a new year. As with CV0, models were trained using trials from 2018 to 2022 and tested on trials from 2023. In each fold, 20% of the testers evaluated in 2023 were sampled to create the test set, and these 20% testers were retained and included in the training set.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec16\" class=\"Section2\"\u003e\u003ch2\u003eTester CV00\u003c/h2\u003e\u003cp\u003eThe Tester CV00 scheme aimed to predict the performance of hybrids involving unknown testers in a new year. Models were trained on trials from 2018 to 2022 and tested on trials from 2023. In each fold, 20% of the testers evaluated in 2023 were sampled to form the test set, and these 20% randomly chosen testers were excluded from the training data across years in the training set.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec17\" class=\"Section2\"\u003e\u003ch2\u003eModel Performance Metrics\u003c/h2\u003e\u003cp\u003eAccurate prediction of genetic merit is a cornerstone of genomic selection (GS), enabling breeders to make informed decisions about which Genotype to advance. While several statistical and machine learning models have been developed to enhance predictive power, their utility ultimately depends on how well they capture the relationship between genotypic and phenotypic variation.\u003c/p\u003e\u003cp\u003eIn this study, we evaluated the model's performance using two metrics: Pearson\u0026rsquo;s correlation coefficient and Selection Efficiency. Pearson\u0026rsquo;s correlation measures the linear association between observed phenotypes and predicted phenotypes, providing a standard indicator of prediction accuracy. However, in practical breeding applications, accurate ranking of individuals is often more critical than raw prediction accuracy. To address this, we also employed selection efficiency [\u003cspan citationid=\"CR54\" class=\"CitationRef\"\u003e54\u003c/span\u003e, \u003cspan citationid=\"CR55\" class=\"CitationRef\"\u003e55\u003c/span\u003e, \u003cspan citationid=\"CR53\" class=\"CitationRef\"\u003e53\u003c/span\u003e], which evaluates how well a model identifies top-performing genotypes. We measured the selection efficiency considering the top 20% of the hybrid genotype. It is calculated as:\u003cdiv id=\"Equk\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equk\" name=\"EquationSource\"\u003e\n$$\\:SelectionEfficiency=\\:\\frac{I-C}{N-C}$$\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003ewhere:\u003c/p\u003e\u003cp\u003e\u003cul\u003e\u003cli\u003e\u003cp\u003eN is the total number of individuals evaluated,\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003eI is the number of individuals common to both observed and predicted top 20% sets, and\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003eC is the expected number of overlaps by random chance (i.e., the expected number of individuals selected by chance).\u003c/p\u003e\u003c/li\u003e\u003c/ul\u003e\u003c/p\u003e\u003c/div\u003e"},{"header":"RESULTS","content":"\u003cdiv id=\"Sec19\" class=\"Section2\"\u003e\u003ch2\u003ePopulation structure of the Hybrids from 2018 to 2023\u003c/h2\u003e\u003cp\u003eAfter data cleaning, 2,425 SNP markers were retained for downstream analyses. Between 2018 and 2023, a total of 2,925 unique hybrids were evaluated, with the highest number recorded in 2021 (1,180) and the lowest in 2023 (546) (see Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). During this period, 1,495 genotypes (as female parents) and 38 genotypes (as male parents) were used. The number of genotypes varied across years, peaking in 2019 (603 genotypes) and gradually declining thereafter. In contrast, the number of testers remained relatively stable from 2020 to 2023, at 18 per year, following a peak of 27 testers in 2018 (see Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). Among the male parents used each year, there were major testers: 2 in 2018 and 2019 (LH195, PHT6), 3 in 2020 and 2021 (PHZ51, PHK76, PHP02), and 1 in 2022 and 2023 (LH244).\u003c/p\u003e\u003cp\u003eA Principal Component Analysis (PCA) of the SNP data revealed clear genetic structure within the hybrid population, with clusters primarily reflecting the tester used in hybrid development (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003ea). The first 10 principal components together explained just over 50% of the total genetic variance (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eb), indicating a substantial underlying structure among the hybrids across years. Based on the PCA, seven distinct genetic groups were identified, with the first two principal components accounting for 23.6% of the variation.\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003eSummary of the hybrid maize population evaluated across six years (2018\u0026ndash;2023). For each year, the number of hybrids, parental lines, and testers used is shown.\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"5\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u003cp\u003eYear\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003eHybrid Parent 1\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u003cp\u003eHybrid Parent 2\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c4\"\u003e\u003cp\u003eNumber of Hybrids\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c5\"\u003e\u003cp\u003eMajor Tester\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e2018\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e578\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e27\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e1039\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003eLH195, PHT69\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e2019\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e601\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e17\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e1158\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003eLH195, PHT69\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e2020\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e403\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e18\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e1175\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003ePHZ51, PHK76, PHP02\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e2021\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e408\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e18\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e1180\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003ePHZ51, PHK76, PHP02\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e2022\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e525\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e18\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e549\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003eLH244\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e2023\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e522\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e18\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e546\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003eLH244\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec20\" class=\"Section2\"\u003e\u003ch2\u003eComparison of model performance across four traits and two different CV scenarios, simulating unseen years\u003c/h2\u003e\u003cp\u003eThe scenarios to be simulated were done in two different ways: Leave One Year Out (LOYO) and Rolling Window (RW)\u003c/p\u003e\u003cp\u003eWhen the objective was to predict hybrid performance in a new year (LOYO), the 2NPLGBM model outperformed both GBLUP models tested in this study across all four traits. The average prediction accuracies achieved by the 2NPLGBM model were 0.507 for Grain Yield, 0.737 for Days to Silking, 0.754 for Days to Anthesis, and 0.801 for Plant Height (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e \u0026amp; Supplementary Table\u0026nbsp;1). A similar trend was observed for selection efficiency, with the 2NPLGBM model also demonstrating superior performance, yielding average values of 0.332 for Grain Yield, 0.562 for Days to Silking, 0.629 for Days to Anthesis, and 0.615 for Plant Height. Furthermore, the 2NPLGBM model achieved a 2% to 14% increase in prediction accuracy compared to the GBLUP_ADD model across traits, and an increase in selection efficiency of 11% to 20%.\u003c/p\u003e\u003cp\u003eThe RW validation scheme's results did not match those from the LOYO validation scheme. For the average prediction accuracies, the GBLUP_ADDOM model showed superiority over the 2NPLGBM and GBLUP_ADD model, with an accuracy of 0.600 for Grain Yield, and 0.732 for Plant Height, but for Days to Silking, and Days to Anthesis, the 2NPLGBM model was superior with a respective accuracy of o.764 and 0.760 (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e \u0026amp; Supplementary Table\u0026nbsp;1). Meanwhile, the 2NPLGBM model demonstrated its superiority in selection efficiency, with a 15% increase in Days to Anthesis (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e and Supplementary Table\u0026nbsp;1). Better selection efficiency was also observed in Days to Silking and Plant Height, with average values of 0.551 and 0.493, respectively. Leaving Grain yield as the only trait where GBLUP_ADDOM had a better performance in selection efficiency (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e \u0026amp; Supplementary Table\u0026nbsp;1).\u003c/p\u003e\u003cp\u003e\u003cb\u003eComparison of model performance across four traits and two different CV scenarios, simulating different genetic relationships\u003c/b\u003e\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec21\" class=\"Section2\"\u003e\u003ch2\u003eFive-Fold\u003c/h2\u003e\u003cp\u003eUnder the Five-fold cross-validation scheme, the 2NPLGBM model did not outperform either of the GBLUP models (GBLUP_ADD and GBLUP_ADDOM) when evaluated across all four traits tested. Among the models tested, GBLUP_ADDOM consistently achieved the highest prediction accuracy and selection efficiency for Plant Height and Grain Yield, while GBLUP_ADD achieved better performance for Days to Anthesis and Days to Silking. This result directly indicates that the GBLUP_ADD model, which uses only additive genetic relationships for prediction, favours traits with high heritability and low dominance, albeit in this cross-validation scheme. (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e; Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e and Supplementary Table\u0026nbsp;1).\u003c/p\u003e\u003cdiv id=\"Sec22\" class=\"Section3\"\u003e\u003ch2\u003eTester CV0\u003c/h2\u003e\u003cp\u003eAcross the four traits evaluated, the 2NPLGBM model consistently outperformed the GBLUP models for flowering traits. For Days to Anthesis, 2NPLGBM achieved a Pearson correlation of 0.902, representing a 27.7% improvement over GBLUP_ADD (0.710). Similarly, for Days to Silking, 2NPLGBM had a Pearson correlation of 0.895, which is an 18.4% increase over GBLUP_ADD (0.755). In contrast, for Plant Height, GBLUP_ADD (0.860) and GBLUP_ADDOM (0.867) slightly outperformed 2NPLGBM (0.852), with the latter showing a marginal decline of -0.95% compared to GBLUP_ADD. For Grain Yield, 2NPLGBM (0.721) also showed a slight increase (2.08%) compared to GBLUP_ADD (0.704), while GBLUP_ADDOM performed best with 0.729 (+\u0026thinsp;3.98% over GBLUP_ADD) (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e; Supplementary Table\u0026nbsp;1).\u003c/p\u003e\u003cp\u003eThe 2NPLGBM model also demonstrated better selection efficiency for Plant Height and flowering traits. It achieved a 6.60% improvement in selection efficiency for Plant Height, 75.8% for Days to Anthesis, and a 50.00% increase for Days to Silking over the GBLUP_ADD model. For Grain Yield, it showed a 6.68% increase. Notably, GBLUP_ADDOM outperformed both GBLUP_ADD and 2NPLGBM for Grain Yield, with a modest 6.67% improvement in selection efficiency over GBLUP_ADD (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e; Supplementary Table\u0026nbsp;1).\u003c/p\u003e\u003c/div\u003e\u003c/div\u003e\u003cdiv id=\"Sec23\" class=\"Section2\"\u003e\u003ch2\u003eTester CV00\u003c/h2\u003e\u003cp\u003eUnder this challenging scenario, all models showed lower predictive accuracy and selection efficiency across all traits. For Days to Anthesis and Days to Silking, the 2NPLGBM model showed better correlation than the GBLUP models, with correlations of 0.55 and 0.56, respectively. For Plant Height and Grain Yield, the GBLUP_ADDOM model performed better with Pearson correlation of 0.54 and 0.40, respectively. Selection efficiency metrics revealed a similar pattern: 2NPLGBM showed better efficiency for Days to Anthesis and Days to Silking, while GBLUP_ADDOM showed better efficiency for Plant Height and Grain Yield. Continuing the trend observed in other cross-validation schemes tested (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003e; Supplementary Table\u0026nbsp;1).\u003c/p\u003e\u003cdiv id=\"Sec24\" class=\"Section3\"\u003e\u003ch2\u003eTrait Heritability and the Role of Dominance Effects\u003c/h2\u003e\u003cp\u003eThe partitioning of genetic variance revealed important differences in the relative contribution of additive and dominance effects across traits. Narrow-sense heritability (h\u003csup\u003e2\u003c/sup\u003e) was highest for Days to Silking (0.900) and Days to Anthesis (0.895), followed by Plant Height (0.780), and lowest for Grain Yield (0.476). The corresponding broad-sense heritability (H\u003csup\u003e2\u003c/sup\u003e) values were moderately higher, indicating low non-additive genetic components. Notably, the proportion of dominance variance accounted for 8.2% of total genetic variance in Plant Height, 11.1% in Days to Anthesis, 9.7% in Days to Silking, and 17.3% in Yield, which also exhibited the most dominance variance in relation to the broad sense heritability (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{d}^{2}\$\u003c/span\u003e\u003c/span\u003e=0.121) (see Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e).\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003eNarrow-sense heritability (h\u0026sup2;), broad-sense heritability (H\u0026sup2;), dominance variance proportion (d\u0026sup2; = dominance variance / total phenotypic variance), and proportion of dominance variance (PDV\u0026thinsp;=\u0026thinsp;dominance variance / total genetic variance) for the four agronomic traits evaluated in the hybrid maize population.\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"5\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u003cp\u003eTrait\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003e\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{h}^{2}\$\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u003cp\u003e\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{H}^{2}\$\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c4\"\u003e\u003cp\u003e\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{d}^{2}\$\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c5\"\u003e\u003cp\u003ePDV\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003ePlant Height (cm)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e0.780\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.862\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e0.072\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e0.082\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eDays to Anthesis (days)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e0.895\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.923\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e0.098\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e0.111\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eDays to Silking (days)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e0.900\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.934\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e0.087\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e0.097\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eGrain Yield (Mg. ha)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e0.476\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.656\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e0.121\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e0.173\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003eThese patterns help explain the performance of different prediction models. The superior performance of 2NPLGBM in predicting flowering traits (Days to Anthesis and Days to Silking) is consistent with their high heritability and meaningful, yet moderate, dominance contributions. In contrast, Grain Yield, while having lower additive heritability, exhibited the highest proportion of dominance variance, suggesting that incorporating non-additive effects is especially relevant for this trait. Nevertheless, 2NPLGBM did not outperform GBLUP models for grain yield prediction, indicating that dominance alone may not be sufficient and that other sources of complexity, such as environmental interactions, could further influence prediction accuracy.\u003c/p\u003e\u003cp\u003eTogether, these results highlight that the advantages of using 2NPLGBM for genomic prediction are trait-specific and most pronounced when dominance variance is present and heritability is high, as is the case with flowering time traits.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec25\" class=\"Section3\"\u003e\u003ch2\u003eVariable Importance\u003c/h2\u003e\u003cp\u003eWe conducted SHAP analysis to evaluate feature importance within the 2NPLGBM model. Specifically, we identified the top 20 variables derived from the 2NP matrix that contributed most significantly to phenotypic variation across traits. For Grain Yield, Days to Anthesis, and Days to Silking, the top-ranked variables were predominantly additive. In contrast, Plant Height exhibited a greater proportion of dominance variables, with seven of the top 20 features originating from the dominance component of the 2NP matrix (Figs.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003e and \u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e8\u003c/span\u003e). These findings are consistent with the overall decomposition of genetic effect contributions obtained from the GBLUP model.\u003c/p\u003e\u003cp\u003eBy aggregating SHAP values for additive and dominance variables from the 2NP matrix, we quantified the relative contributions of additive and dominance genetic effects to model performance. For Plant Height, dominance was the most significant contributor to phenotypic variation (Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003e), while Days to Anthesis showed the lowest relative dominance contribution (Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003e). The observed ratio of additive to dominance variables among the top-ranked features aligns with the trait-specific contributions of genetic effect types. Correspondingly, variance analysis using the GBLUP method indicated that the dominance genetic contribution in this population was lowest for Days to Anthesis (0.098) and Days to Silking (0.087), as shown in Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e.\u003c/p\u003e\u003cp\u003eTo assess the genetic contributions to the 2NPLGBM model, we examine interactions between SNP variables representative of additive and dominance effects. Generally, the results were inconclusive, but we observed that, unlike the GBLUP models, where the additive and dominant main effects mainly contribute to model performance (See Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e), the main contributors to the 2NPLGBM model are the interaction effects (See Supplementary Figure (SFig1 and SFig2) in the Appendix).\u003c/p\u003e"},{"header":"DISCUSSION","content":"\u003cdiv id=\"Sec27\" class=\"Section2\"\u003e\u003ch2\u003eThe 2NPLGBM Model: A Hybrid Approach to Genomic Prediction\u003c/h2\u003e\u003cp\u003eThe integration of genomic selection into plant breeding schemes has been shaped by two primary modeling cultures: the data model culture and the algorithmic model culture [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e]. Data models, most notably GBLUP [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e, \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e] and RR-BLUP [\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e, \u003cspan citationid=\"CR56\" class=\"CitationRef\"\u003e56\u003c/span\u003e], have dominated for many years due to their simplicity, interpretability, and effective handling of additive genetic effects. In contrast, algorithmic models based on machine learning [\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e, \u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e, \u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e, \u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e, \u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e] and deep learning [\u003cspan additionalcitationids=\"CR35 CR36 CR37\" citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e–\u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e, \u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e] have emerged over the last decade due to their ability to utilize non-linear and complex relationships without making assumptions about the underlying genetic architecture. The expression of quantitative traits, however, is influenced by both additive and non-additive components, including dominance and epistasis [\u003cspan citationid=\"CR57\" class=\"CitationRef\"\u003e57\u003c/span\u003e]. In data models, these effects are often modeled via multi-kernel extensions, while algorithmic models tend to capture these effects intrinsically.\u003c/p\u003e\u003cp\u003eIn the study, we provide a comprehensive assessment of a hybrid genomic prediction model, 2NPLGBM, across multiple traits, validation schemes, and years in hybrid maize. By integrating a biologically informed genotype matrix (2NP) with a gradient boosting algorithm (LGBM), we demonstrate how structured genomic information can enhance predictive performance and selection efficiency, offering new perspectives for selection in hybrid breeding. The 2NP matrix is a genotype representation obtained by concatenating additive and dominance matrix representations. The assumption of dependency between additive and dominance effects in the 2NPLGBM model more closely reflects biological reality, where these effects often interact to shape complex trait expression. This may account for the improved predictive performance and selection efficiency observed in this study, particularly for highly heritable traits, where dominance and interaction effects can contribute to phenotypic variance [\u003cspan citationid=\"CR57\" class=\"CitationRef\"\u003e57\u003c/span\u003e, \u003cspan citationid=\"CR58\" class=\"CitationRef\"\u003e58\u003c/span\u003e, \u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e].\u003c/p\u003e\u003cp\u003eIn addition, by incorporating additive and dominance features through the 2NP matrix in the machine learning method and employing SHapley Additive exPlanations (SHAP), we were able to dissect the relative contributions of additive, dominance, and their interactions to model predictions. Traits for which the 2NPLGBM outperformed classical methods tended to show higher interaction contributions (though this was not conclusive) and were highly heritable. This highlights an important implication: the 2NPLGBM model is mostly trait-dependent, with its greatest advantage emerging when non-additive (epistasis and other genetic interactions) effects play a major role in phenotypic variation. In such contexts, our model provides a robust framework for predicting total genetic values rather than just additive breeding values, thereby enabling more informed parental selection and hybrid combination design.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec28\" class=\"Section2\"\u003e\u003ch2\u003eGenomic Predictive Ability and Selection Efficiency\u003c/h2\u003e\u003cp\u003ePrevious research has shown that non-additive effects, including dominance and epistasis, play a modest yet trait-dependent role in plant breeding. Classical genomic prediction models [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e, \u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e, \u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e] incorporated dominance effects but generally produced little to no improvement in predictive ability compared to additive-only models. Similarly, studies investigating genetic interactions [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e, \u003cspan citationid=\"CR58\" class=\"CitationRef\"\u003e58\u003c/span\u003e, \u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e] reported modest or negligible gains when modeling non-additive effects, with the observed impact largely dependent on the genetic architecture of the trait and the population structure.\u003c/p\u003e\u003cp\u003eWith the increasing adoption of machine learning in genomic selection, recent studies have explored transforming genomic data for ML- or DL-powered prediction, focusing on either dominance effects [\u003cspan citationid=\"CR59\" class=\"CitationRef\"\u003e59\u003c/span\u003e, \u003cspan citationid=\"CR60\" class=\"CitationRef\"\u003e60\u003c/span\u003e, \u003cspan citationid=\"CR47\" class=\"CitationRef\"\u003e47\u003c/span\u003e] or epistatic interactions [\u003cspan citationid=\"CR61\" class=\"CitationRef\"\u003e61\u003c/span\u003e, \u003cspan citationid=\"CR62\" class=\"CitationRef\"\u003e62\u003c/span\u003e]. Although these models have achieved modest improvements for traits influenced by non-additive effects, most were trained to capture only one type of genetic interaction at a time.\u003c/p\u003e\u003cp\u003eIn contrast, the 2NPLGBM model introduces a biologically informed genomic representation that simultaneously captures additive, dominance, and higher-order interactions (additive × additive, additive × dominance, and dominance × dominance). Evaluations in a hybrid maize population demonstrated that 2NPLGBM increased genomic predictive ability by over 5% in temporal validation schemes (LOYO and RW) and by more than 15% in tester-based validation schemes (Tester CV0 and Tester CV00), with the most significant gains observed for flowering traits (DTA and DTS). These improvements arise because the model directly integrates and exploits non-additive genetic signals, which are particularly relevant for traits where dominance and epistatic interactions contribute substantially to phenotypic variance. Importantly, 2NPLGBM achieved higher selection efficiency for flowering traits across all validation schemes except five-fold CV, indicating that Non-linear models can detect subtle genetic signals that are important for selection decisions, even if their overall prediction metrics, such as accuracy or correlation, appear less favorable compared to linear models. This has been demonstrated by several studies, which found that non-linear approaches may uncover complex genetic patterns or interactions that would otherwise be missed [\u003cspan additionalcitationids=\"CR63 CR64 CR65\" citationid=\"CR62\" class=\"CitationRef\"\u003e62\u003c/span\u003e–\u003cspan citationid=\"CR66\" class=\"CitationRef\"\u003e66\u003c/span\u003e]. Therefore, incorporating these interaction components within the model may enhance model stability and indirectly support improved ranking performance. Its strength lies not only in modeling non-linear and dominance effects but also in delivering superior ranking performance, arguably the most critical metric in early-generation selection and resource allocation.\u003c/p\u003e\u003cp\u003eThe overall contribution of non-additive effects to phenotypic variation remains trait-dependent [\u003cspan citationid=\"CR67\" class=\"CitationRef\"\u003e67\u003c/span\u003e]. For traits strongly associated with hybrid vigor, models that explicitly capture these effects, such as 2NPLGBM, can yield superior predictive performance. Thus, aligning genomic prediction models with the underlying genetic architecture of the target trait is essential for maximizing total genetic gain.\u003c/p\u003e\u003cp\u003eFrom a practical breeding perspective, the 2NPLGBM model offers a robust alternative for predicting total genetic values, as it jointly models additive and non-additive effects in a unified framework. A breeding strategy guided by 2NPLGBM predictions could enhance the performance of commercial hybrid populations by enabling the prediction of cross-specific performance without direct phenotypic evaluation. Furthermore, through SHAP-based or feature-importance analyses, breeders can quantify genetic contributions and uncover dominance and interaction effects that drive hybrid performance. This interpretability facilitates more informed parental selection decisions, accelerates hybrid development cycles, and ultimately enhances the efficiency and precision of genomic-assisted breeding pipelines.\u003c/p\u003e\u003c/div\u003e\n\u003ch3\u003eLimitation\u003c/h3\u003e\n\u003cp\u003eDeep learning architectures (e.g., DNNs, CNNs) were not fully explored in this study due to time constraints and suboptimal preliminary results with CNNs. Future work should investigate the integration of 2NP matrices with deep neural architectures or ensemble frameworks combining 2NPLGBM with DL models. Another limitation is the increased feature dimensionality resulting from matrix concatenation, which in turn increases computational demand. Additionally, while SHAP enhances interpretability, it does not directly quantify causal effects and should be interpreted as indicative rather than definitive evidence of interaction architecture.\u003c/p\u003e"},{"header":"CONCLUSIONS AND FUTURE DIRECTIONS","content":"\u003cp\u003eIn conclusion, this study proposes a method that demonstrates the value of integrating biologically-informed genotype matrices with non-linear machine learning algorithms in improving genomic prediction and selection outcomes in hybrid breeding. Future research should focus on:\u003c/p\u003e\u003col\u003e\u003cspan\u003e\u003cli\u003e\u003cp\u003eIncorporating environmental covariates to improve model transferability across sites and years,\u003c/p\u003e\u003c/li\u003e\u003c/span\u003e\u003cspan\u003e\u003cli\u003e\u003cp\u003eDeveloping multi-trait and multi-environment versions of 2NPLGBM to leverage correlated traits jointly, and\u003c/p\u003e\u003c/li\u003e\u003c/span\u003e\u003cspan\u003e\u003cli\u003e\u003cp\u003eIntegrating multi-omics data (e.g., transcriptomic or epigenetic features) to capture additional biological layers of regulation.\u003c/p\u003e\u003c/li\u003e\u003c/span\u003e\u003c/ol\u003e\u003cp\u003eSuch extensions will help further bridge the gap between classical quantitative genetics and machine learning, advancing the predictive and explanatory power of machine learning in genomic selection.\u003c/p\u003e"},{"header":"ABBREVIATIONS","content":"\u003cp\u003eGBLUP Genomic best linear unbiased prediction\u003c/p\u003e\u003cp\u003eML Machine learning\u003c/p\u003e\u003cp\u003eDTA Days to Anthesis\u003c/p\u003e\u003cp\u003eDTS Days to Silking\u003c/p\u003e\u003cp\u003eQTL Quantitative Trait Loci\u003c/p\u003e\u003cp\u003eSNP Single-nucleotide polymorphisms\u003c/p\u003e\u003cp\u003eMAS Marker-assisted selection\u003c/p\u003e\u003cp\u003eRCBD Randomized Complete Block Design\u003c/p\u003e\u003cp\u003ePHG Practical Haplotype Graph\u003c/p\u003e\u003cp\u003eBLUE: Best linear unbiased estimates\u003c/p\u003e\u003cp\u003ePDV Proportion of dominance variation calculated as the dominance variance divided by the total genotypic variance\u003c/p\u003e\u003cp\u003e\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{d}^{2}\$\u003c/span\u003e\u003c/span\u003e The proportion of dominance variation, calculated as the dominance variance divided by the total phenotypic variance\u003c/p\u003e\u003cp\u003e\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{h}^{2}\$\u003c/span\u003e\u003c/span\u003e Narrow-sense heritability\u003c/p\u003e\u003cp\u003e\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{H}^{2}\$\u003c/span\u003e\u003c/span\u003e Broad-sense heritability\u003c/p\u003e\u003cp\u003ed Estimated degree of dominance\u003c/p\u003e\u003cp\u003eGBDT Gradient-boosting decision tree\u003c/p\u003e\u003cp\u003eCV Cross-validation\u003c/p\u003e"},{"header":"Declarations","content":"\u003ch2\u003eSUPPLEMENTARY INFORMATION\u003c/h2\u003e\u003cp\u003eA Google sheet containing all supplementary data is available at \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://docs.google.com/spreadsheets/d/1gHy7Eo-EiGULKyM7kaPmfcmRlXlC6QwY51cS205ojJQ/edit?gid=547868992#gid=547868992\u003c/span\u003e\u003cspan address=\"https://docs.google.com/spreadsheets/d/1gHy7Eo-EiGULKyM7kaPmfcmRlXlC6QwY51cS205ojJQ/edit?gid=547868992#gid=547868992\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/p\u003e\u003ch2\u003eETHICS APPROVAL AND CONSENT TO PARTICIPATE\u003c/h2\u003e\u003cp\u003eNot applicable.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003e\u003cstrong\u003eCONSENT FOR PUBLICATION\u003c/strong\u003e\u003c/p\u003e\u003cp\u003eNot applicable.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003e\u003cstrong\u003eCONFLICT OF INTEREST\u003c/strong\u003e\u003c/p\u003e\u003cp\u003eOn behalf of all authors, the corresponding author states that there is no conflict of interest.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003ch2\u003eFUNDING\u003c/h2\u003e\u003cp\u003eKWS SAAT SE provided financial support for BO through a PhD fellowship. The University of Göttingen provided additional financial support. We acknowledge support from the Open Access Publication Funds of the Göttingen University.\u003c/p\u003e\u003ch2\u003eAuthor Contribution\u003c/h2\u003e\u003cp\u003eBO analyzed the data and wrote the manuscript. BO and TB designed the research. IJ, RS, and TB supervised the study. BO, IJ, RS, and TB participated in interpreting results and contributing to the discussion. All authors contributed to the article and approved the submitted version.\u003c/p\u003e\u003ch2\u003eAcknowledgement\u003c/h2\u003e\u003cp\u003eThe authors acknowledge the committee of The Genomes to Fields 2022 Maize Genotype by Environment Prediction Competition for providing the maize hybrid datasets. The authors acknowledge support from the Computing Center of the University of Göttingen (GWDG) through its High-Performance Computing resources.\u003c/p\u003e\u003ch2\u003eData Availability\u003c/h2\u003e\u003cp\u003eWe obtained the G2F dataset from the committee of The Genomes to Fields 2022 Maize Genotype by Environment Prediction Competition, accessible on CyVerse under [https://doi.org/10.25739/78mn-4394](https:/doi.org/10.25739/78mn-4394) . A GitHub repository containing the bash scripts, R scripts, and Python scripts used for phenotypic and genotypic analysis, as well as all genomic predictions, is available at [https://github.com/BrightGuru/2NP\\_Matrix-for-Genomic-Prediction](https:/github.com/BrightGuru/2NP_Matrix-for-Genomic-Prediction) **.**\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eMeuwissen THE, Hayes BJ, Goddard ME. Prediction of Total Genetic Value Using Genome-Wide Dense Marker Maps. Genet [Internet]. 2001;157:1819\u0026ndash;29. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1093/genetics/157.4.1819\u003c/span\u003e\u003cspan address=\"10.1093/genetics/157.4.1819\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eVanRaden PM. Efficient Methods to Compute Genomic Predictions. J Dairy Sci [Internet]. Elsevier; 2008 [cited 2025 Aug 13];91:4414\u0026ndash;23. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3168/jds.2007-0980\u003c/span\u003e\u003cspan address=\"10.3168/jds.2007-0980\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLande R, Thompson R. Efficiency of marker-assisted selection in the improvement of quantitative traits. Genet [Internet]. 1990;124:743\u0026ndash;56. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1093/genetics/124.3.743\u003c/span\u003e\u003cspan address=\"10.1093/genetics/124.3.743\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eBhat JA, Ali S, Salgotra RK, Mir ZA, Dutta S, Jadon V, et al. Genomic Selection in the Era of Next Generation Sequencing for Complex Traits in Plant Breeding. Front Genet [Internet]. 2016. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3389/fgene.2016.00221\u003c/span\u003e\u003cspan address=\"10.3389/fgene.2016.00221\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. [cited 2025 Mar 31];7.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eKumar R, Das SP, Choudhury BU, Kumar A, Prakash NR, Verma R, et al. Advances in genomic tools for plant breeding: harnessing DNA molecular markers, genomic selection, and genome editing. Biol Res [Internet]. 2024;57:80. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1186/s40659-024-00562-6\u003c/span\u003e\u003cspan address=\"10.1186/s40659-024-00562-6\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. [cited 2025 Mar 31];.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eAlemu A, \u0026Aring;strand J, Montesinos-L\u0026oacute;pez OA, Isidro Y, S\u0026aacute;nchez J, Fern\u0026aacute;ndez-G\u0026oacute;nzalez J, Tadesse W, et al. Genomic selection in plant breeding: Key factors shaping two decades of progress. Mol Plant [Internet]. 2024;17:552\u0026ndash;78. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.molp.2024.03.007\u003c/span\u003e\u003cspan address=\"10.1016/j.molp.2024.03.007\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. [cited 2025 Mar 31];.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eBreiman L. Statistical modeling: The two cultures. Qual Control Appl Stat Exec Sci Inst. 2003;48:81\u0026ndash;2.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSu G, Christensen OF, Ostersen T, Henryon M, Lund MS. Estimating additive and non-additive genetic variances and predicting genetic merits using genome-wide dense single nucleotide polymorphism markers. Public Library of Science San Francisco, USA; 2012.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eVitezica ZG, Varona L, Legarra A. On the Additive and Dominant Variance and Covariance of Individuals Within the Genomic Selection Scope. Genet [Internet]. 2013;195:1223\u0026ndash;30. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1534/genetics.113.155176\u003c/span\u003e\u003cspan address=\"10.1534/genetics.113.155176\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eNishio M, Satoh M. Including Dominance Effects in the Genomic BLUP Method for Genomic Evaluation. PLOS ONE [Internet]. Public Libr Sci. 2014;9:e85792. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1371/journal.pone.0085792\u003c/span\u003e\u003cspan address=\"10.1371/journal.pone.0085792\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eJiang Y, Reif JC. Modeling Epistasis in Genomic Selection. Genetics [Internet]. 2015 [cited 2025 Apr 1];201:759\u0026ndash;68. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1534/genetics.115.177907\u003c/span\u003e\u003cspan address=\"10.1534/genetics.115.177907\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eVarona L, Legarra A, Toro MA, Vitezica ZG. Non-additive Effects in Genomic Selection. Front Genet [Internet]. 2018 [cited 2024 Mar 14];9:78. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3389/fgene.2018.00078\u003c/span\u003e\u003cspan address=\"10.3389/fgene.2018.00078\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eChen Z-Q, Baison J, Pan J, Westin J, Gil MRG, Wu HX. Increased prediction ability in Norway spruce trials using a marker x environment interaction and non-additive genomic selection model. J Hered. Oxford University Press US; 2019;110:830\u0026ndash;43.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eAmadeu RR, Ferr\u0026atilde;o LFV, Oliveira IDB, Benevenuto J, Endelman JB, Munoz PR. Impact of dominance effects on autotetraploid genomic prediction. Crop Sci [Internet]. 2020;60:656\u0026ndash;65. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1002/csc2.20075\u003c/span\u003e\u003cspan address=\"10.1002/csc2.20075\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. [cited 2025 Aug 13];.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eGonz\u0026aacute;lez-Di\u0026eacute;guez D, Legarra A, Charcosset A, Moreau L, Lehermeier C, Teyss\u0026egrave;dre S, et al. Genomic prediction of hybrid crops allows disentangling dominance and epistasis. Genet [Internet]. 2021;218:iyab026. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1093/genetics/iyab026\u003c/span\u003e\u003cspan address=\"10.1093/genetics/iyab026\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eVojgani E. Accounting for Epistasis in Genomic Phenotype Prediction. Dissertation, G\u0026ouml;ttingen, Georg-August Universit\u0026auml;t, 2021; 2021.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eKristensen PS, Sarup P, F\u0026eacute; D, Orabi J, Snell P, Ripa L, et al. Prediction of additive, epistatic, and dominance effects using models accounting for incomplete inbreeding in parental lines of hybrid rye and sugar beet. Front Plant Sci [Internet] Front. 2023. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3389/fpls.2023.1193433\u003c/span\u003e\u003cspan address=\"10.3389/fpls.2023.1193433\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. [cited 2025 Nov 9];14.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eGianola D, de los Campos G, Hill WG, Manfredi E, Fernando R. Additive Genetic Variability and the Bayesian Alphabet. Genet [Internet]. 2009;183:347\u0026ndash;63. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1534/genetics.109.103952\u003c/span\u003e\u003cspan address=\"10.1534/genetics.109.103952\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. [cited 2025 Apr 1];.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eP\u0026eacute;rez P, De Los Campos G. Genome-Wide Regression and Prediction with the BGLR Statistical Package. Genetics [Internet]. 2014 [cited 2025 Apr 1];198:483\u0026ndash;95. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1534/genetics.114.164442\u003c/span\u003e\u003cspan address=\"10.1534/genetics.114.164442\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eHeslot N, Yang H-P, Sorrells ME, Jannink J-L. Genomic Selection in Plant Breeding: A Comparison of Models. Crop Sci [Internet]. 2012;52:146\u0026ndash;60. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.2135/cropsci2011.06.0297\u003c/span\u003e\u003cspan address=\"10.2135/cropsci2011.06.0297\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. [cited 2025 Apr 1];.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eDekkers JCM. Prediction of response to marker-assisted and genomic selection using selection index theory. J Anim Breed Genet [Internet]. 2007 [cited 2025 Apr 1];124:331\u0026ndash;41. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1111/j.1439-0388.2007.00701.x\u003c/span\u003e\u003cspan address=\"10.1111/j.1439-0388.2007.00701.x\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eJia Y, Jannink J-L. Multiple-Trait Genomic Selection Methods Increase Genetic Value Prediction Accuracy. Genetics [Internet]. 2012 [cited 2025 Apr 1];192:1513\u0026ndash;22. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1534/genetics.112.144246\u003c/span\u003e\u003cspan address=\"10.1534/genetics.112.144246\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eGoldstein BA, Polley EC, Briggs FBS. Random Forests for Genetic Association Studies. Stat Appl Genet Mol Biol [Internet]. 2011 [cited 2025 Apr 1];10:32. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.2202/1544-6115.1691\u003c/span\u003e\u003cspan address=\"10.2202/1544-6115.1691\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eParmley KA, Higgins RH, Ganapathysubramanian B, Sarkar S, Singh AK. Machine learning approach for prescriptive plant breeding. Sci Rep Nat Publishing Group UK Lond. 2019;9:17132.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMontesinos L\u0026oacute;pez OA, Montesinos L\u0026oacute;pez A, Crossa J. Random Forest for Genomic Prediction. In: Montesinos L\u0026oacute;pez OA, Montesinos L\u0026oacute;pez A, Crossa J, editors. Multivar Stat Mach Learn Methods Genomic Predict [Internet]. Cham: Springer International Publishing; 2022 [cited 2025 Apr 1]. pp. 633\u0026ndash;81. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/978-3-030-89010-0_15\u003c/span\u003e\u003cspan address=\"10.1007/978-3-030-89010-0_15\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eZhang Q, Zhao X, Han Y, Yang F, Pan S, Liu Z, et al. Maize yield prediction using federated random forest. Comput Electron Agric [Internet]. 2023;210:107930. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.compag.2023.107930\u003c/span\u003e\u003cspan address=\"10.1016/j.compag.2023.107930\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. [cited 2025 Apr 1];.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eZhao W, Lai X, Liu D, Zhang Z, Ma P, Wang Q et al. Applications of Support Vector Machine in Genomic Prediction in Pig and Maize Populations. Front Genet [Internet]. 2020;11-2020. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3389/fgene.2020.598318\u003c/span\u003e\u003cspan address=\"10.3389/fgene.2020.598318\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eKhan M, Hooda BK, Gaur A, Singh V, Jindal Y, Tanwar H, et al. Ensemble and optimization algorithm in support vector machines for classification of wheat genotypes. Sci Rep [Internet] Nat Publishing Group. 2024;14:22728. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1038/s41598-024-72056-0\u003c/span\u003e\u003cspan address=\"10.1038/s41598-024-72056-0\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. [cited 2025 Apr 1];.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eFriedman JH. Greedy function approximation: a gradient boosting machine. Ann Stat JSTOR; 2001;1189\u0026ndash;232.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eChen T, Guestrin C, XGBoost:. A Scalable Tree Boosting System. New York, NY, USA: Association for Computing Machinery; 2016. pp. 785\u0026ndash;94. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1145/2939672.2939785\u003c/span\u003e\u003cspan address=\"10.1145/2939672.2939785\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. Proc 22nd ACM SIGKDD Int Conf Knowl Discov Data Min [Internet].\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLi W, Yin Y, Quan X, Zhang H. Gene Expression Value Prediction Based on XGBoost Algorithm. Front Genet [Internet]. 2019 [cited 2024 Jan 29];10:1077. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3389/fgene.2019.01077\u003c/span\u003e\u003cspan address=\"10.3389/fgene.2019.01077\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eWesthues CC, Mahone GS, da Silva S, Thorwarth P, Schmidt M, Richter J-C et al. Prediction of Maize Phenotypic Traits With Genomic and Environmental Predictors Using Gradient Boosting Frameworks. Front Plant Sci [Internet]. 2021;12-2021. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3389/fpls.2021.699589\u003c/span\u003e\u003cspan address=\"10.3389/fpls.2021.699589\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMontesinos-L\u0026oacute;pez A, Crespo-Herrera L, Dreisigacker S, Gerard G, Vitale P, Saint Pierre C, et al. Deep learning methods improve genomic prediction of wheat breeding. Front Plant Sci [Internet] Front. 2024. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3389/fpls.2024.1324090\u003c/span\u003e\u003cspan address=\"10.3389/fpls.2024.1324090\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. [cited 2025 Apr 2];15.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMontesinos-L\u0026oacute;pez OA, Montesinos-L\u0026oacute;pez A, Crossa J, Gianola D, Hern\u0026aacute;ndez-Su\u0026aacute;rez CM, Mart\u0026iacute;n-Vallejo J, Multi-trait. Multi-environment Deep Learning Modeling for Genomic-Enabled Prediction of Plant Traits. G3 GenesGenomesGenetics [Internet]. 2018 [cited 2023 Dec 6];8:3829\u0026ndash;40. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1534/g3.118.200728\u003c/span\u003e\u003cspan address=\"10.1534/g3.118.200728\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMontesinos-L\u0026oacute;pez OA, Mart\u0026iacute;n-Vallejo J, Crossa J, Gianola D, Hern\u0026aacute;ndez-Su\u0026aacute;rez CM, Montesinos-L\u0026oacute;pez A et al. New Deep Learning Genomic-Based Prediction Model for Multiple Traits with Binary, Ordinal, and Continuous Phenotypes. G3 GenesGenomesGenetics [Internet]. 2019 [cited 2024 May 28];9:1545\u0026ndash;56. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1534/g3.119.300585\u003c/span\u003e\u003cspan address=\"10.1534/g3.119.300585\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSandhu KS, Lozada DN, Zhang Z, Pumphrey MO, Carter AH. Deep Learning for Predicting Complex Traits in Spring Wheat Breeding Program. Front Plant Sci [Internet]. Frontiers; 2021 [cited 2025 Apr 2];11. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3389/fpls.2020.613325\u003c/span\u003e\u003cspan address=\"10.3389/fpls.2020.613325\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eWang X, Zeng H, Lin L, Huang Y, Lin H, Que Y. Deep learning-empowered crop breeding: intelligent, efficient and promising. Front Plant Sci [Internet] Front. 2023. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3389/fpls.2023.1260089\u003c/span\u003e\u003cspan address=\"10.3389/fpls.2023.1260089\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. [cited 2025 Apr 2];14.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMontesinos-L\u0026oacute;pez A, Rivera C, Pinto F, Pi\u0026ntilde;era F, Gonzalez D, Reynolds M et al. Multimodal deep learning methods enhance genomic prediction of wheat breeding. G3 GenesGenomesGenetics [Internet]. 2023 [cited 2025 Apr 2];13:jkad045. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1093/g3journal/jkad045\u003c/span\u003e\u003cspan address=\"10.1093/g3journal/jkad045\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMa W, Qiu Z, Song J, Li J, Cheng Q, Zhai J et al. A deep convolutional neural network approach for predicting phenotypes from genotypes. Planta [Internet]. 2018 [cited 2025 Apr 2];248:1307\u0026ndash;18. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/s00425-018-2976-9\u003c/span\u003e\u003cspan address=\"10.1007/s00425-018-2976-9\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLiu Y, Wang D, He F, Wang J, Joshi T, Xu D. Phenotype Prediction and Genome-Wide Association Study Using Deep Convolutional Neural Network of Soybean. Front Genet [Internet]. Frontiers; 2019 [cited 2025 Apr 2];10. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3389/fgene.2019.01091\u003c/span\u003e\u003cspan address=\"10.3389/fgene.2019.01091\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003ePook T, Freudenthal J, Korte A, Simianer H. Using Local Convolutional Neural Networks for Genomic Prediction. Front Genet [Internet]. Frontiers; 2020 [cited 2025 Apr 2];11. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3389/fgene.2020.561497\u003c/span\u003e\u003cspan address=\"10.3389/fgene.2020.561497\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eZingaretti LM, Gezan SA, Ferr\u0026atilde;o LFV, Osorio LF, Monfort A, Mu\u0026ntilde;oz PR, et al. Exploring deep learning for complex trait genomic prediction in polyploid outcrossing species. Front Plant Sci Front Media SA. 2020;11:25.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eHeffner EL, Sorrells ME, Jannink J. Genomic selection for crop improvement. Crop Sci Wiley Online Libr. 2009;49:1\u0026ndash;12.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eCrossa J, P\u0026eacute;rez-Rodr\u0026iacute;guez P, Cuevas J, Montesinos-L\u0026oacute;pez O, Jarqu\u0026iacute;n D, De Los Campos G, et al. Genomic selection in plant breeding: methods, models, and perspectives. Trends Plant Sci Elsevier. 2017;22:961\u0026ndash;75.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eGenomes To Fields. Genomes to Fields 2024 Maize Genotype by Environment Prediction Competition [Internet]. CyVerse Data Commons; 2025 [cited 2025 Nov 11]. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.25739/78MN-4394\u003c/span\u003e\u003cspan address=\"10.25739/78MN-4394\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eBradbury PJ, Casstevens T, Jensen SE, Johnson LC, Miller ZR, Monier B, et al. The Practical Haplotype Graph, a platform for storing and using pangenomes for imputation. Bioinf [Internet]. 2022;38:3698\u0026ndash;702. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1093/bioinformatics/btac410\u003c/span\u003e\u003cspan address=\"10.1093/bioinformatics/btac410\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eBradbury PJ, Zhang Z, Kroon DE, Casstevens TM, Ramdoss Y, Buckler ES. TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics. Volume 23. Oxford University Press; 2007. pp. 2633\u0026ndash;5.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eOsatohanmwen BE, J\u0026uacute;nior ICV, Gholami M, Westhues CC, Sharifi R, Beissinger T. Predicting Maize Hybrid Performance with Machine Learning and a Locus-Specific Degree of Dominance Transformation [Internet]. In Review; 2025 [cited 2025 Apr 29]. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.21203/rs.3.rs-6002495/v1\u003c/span\u003e\u003cspan address=\"10.21203/rs.3.rs-6002495/v1\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eKe G, Meng Q, Finley T, Wang T, Chen W, Ma W et al. Lightgbm: A highly efficient gradient boosting decision tree. Adv Neural Inf Process Syst. 2017;30.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eHead T, Kumar M, Nahrstaedt H, Louppe G, Shcherbatyi I. Scikit-optimize/scikit-optimize. Zenodo. 2021.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLundberg SM, Lee S-I. A unified approach to interpreting model predictions. Adv Neural Inf Process Syst. 2017;30.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSukumaran S, Jarquin D, Crossa J, Reynolds M. Genomic-enabled prediction accuracies increased by modeling genotype\u0026times; environment interaction in durum wheat. Plant Genome Wiley Online Libr. 2018;11:170112.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eFernandes IK, Vieira CC, Dias KOG, Fernandes SB. Theor Appl Genet [Internet]. 2024;137:189. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/s00122-024-04687-w\u003c/span\u003e\u003cspan address=\"10.1007/s00122-024-04687-w\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. [cited 2025 Mar 11];. Using machine learning to combine genetic and environmental data for maize grain yield predictions across multi-environment trials.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003ede Oliveira Zimmermann MJ. Breeding for yield, in mixtures of common beans (Phaseolus vulgaris L.) and maize (Zea mays L). Springer; 1997. pp. 143\u0026ndash;8.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eHamblin J, de Zimmermann MJ. Breeding common bean for yield in mixtures. Plant Breed Rev Wiley Online Libr. 1986;4:245\u0026ndash;72.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eEndelman JB. Ridge Regression and Other Kernels for Genomic Selection with R Package rrBLUP. Plant Genome [Internet]. 2011 [cited 2025 Apr 1];4. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3835/plantgenome2011.08.0024\u003c/span\u003e\u003cspan address=\"10.3835/plantgenome2011.08.0024\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eRadoev M, Becker HC, Ecke W. Genetic analysis of heterosis for yield and yield components in rapeseed (Brassica napus L.) by quantitative trait locus mapping. Volume 179. Genetics: Oxford University Press; 2008. pp. 1547\u0026ndash;58.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003ede Almeida Filho JE, Guimar\u0026atilde;es JFR, e Silva FF, de Resende MDV, Mu\u0026ntilde;oz P, Kirst M, et al. The contribution of dominance to phenotype prediction in a pine breeding and simulated population. Heredity [Internet]. 2016;117:33\u0026ndash;41. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1038/hdy.2016.23\u003c/span\u003e\u003cspan address=\"10.1038/hdy.2016.23\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eCalleja-Rodriguez A, Chen Z, Suontama M, Pan J, Wu HX. Genomic Predictions With Nonadditive Effects Improved Estimates of Additive Effects and Predictions of Total Genetic Values in Pinus sylvestris. Front Plant Sci [Internet] Front. 2021. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3389/fpls.2021.666820\u003c/span\u003e\u003cspan address=\"10.3389/fpls.2021.666820\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. [cited 2025 Nov 9];12.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMathew B, Hauptmann A, L\u0026eacute;on J, Sillanp\u0026auml;\u0026auml; MJ, NeuralLasso. Neural Networks Meet Lasso in Genomic Prediction. Front Plant Sci [Internet]. 2022. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3389/fpls.2022.800161\u003c/span\u003e\u003cspan address=\"10.3389/fpls.2022.800161\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. 13-2022.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSharma S, Partap A, Balaguer MA, de Malvar L, Chandra S. R. DeepG2P: Fusing Multi-Modal Data to Improve Crop Production [Internet]. arXiv; 2022 [cited 2025 Nov 9]. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.48550/arXiv.2211.05986\u003c/span\u003e\u003cspan address=\"10.48550/arXiv.2211.05986\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eGianola D, Cecchinato A, Naya H, Sch\u0026ouml;n C-C. Prediction of complex traits: robust alternatives to best linear unbiased prediction. Front Genet Front Media SA. 2018;9:195.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eCrossa J, Martini JW, Gianola D, P\u0026eacute;rez-Rodr\u0026iacute;guez P, Jarquin D, Juliana P, et al. Deep kernel and deep learning for genome-based prediction of single traits in multienvironment breeding trials. Front Genet Front Media SA. 2019;10:1168.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMontesinos-L\u0026oacute;pez OA, Montesinos‐L\u0026oacute;pez A, Hernandez‐Suarez CM, Barr\u0026oacute;n‐L\u0026oacute;pez JA, Crossa J. Deep‐learning power and perspectives for genomic selection. Plant Genome [Internet]. 2021;14:e20122. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1002/tpg2.20122\u003c/span\u003e\u003cspan address=\"10.1002/tpg2.20122\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. [cited 2024 Mar 14];.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMontesinos-L\u0026oacute;pez A, Montesinos-L\u0026oacute;pez OA, Ramos-Pulido S, Mosqueda-Gonz\u0026aacute;lez BA, Guerrero-Arroyo EA, Crossa J, et al. Artificial intelligence meets genomic selection: comparing deep learning and GBLUP across diverse plant datasets. Front Genet Front Media SA. 2025;16:1568705.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eCrossa J, Montesinos-Lopez OA, Costa-Neto G, Vitale P, Martini JW, Runcie D, et al. Machine learning algorithms translate big data into predictive breeding accuracy. Trends Plant Sci Elsevier. 2025;30:167\u0026ndash;84.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eXu S. Mapping Quantitative Trait Loci by Controlling Polygenic Background Effects. Genetics [Internet]. 2013 [cited 2025 Nov 9];195:1209\u0026ndash;22. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1534/genetics.113.157032\u003c/span\u003e\u003cspan address=\"10.1534/genetics.113.157032\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"plant-methods","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"plme","sideBox":"Learn more about [Plant Methods](http://plantmethods.biomedcentral.com/)","snPcode":"13007","submissionUrl":"https://submission.nature.com/new-submission/13007/3","title":"Plant Methods","twitterHandle":"@PlantMethods","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"em","reportingPortfolio":"BMC/SO AJ","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"Genomic prediction, Hybrid breeding, LightGBM, Dominance, Non-additive effects, Machine learning, 2NP matrix, SHAP, Maize, Additive–dominance modeling, GBLUP, Selection efficiency, Temporal validation","lastPublishedDoi":"10.21203/rs.3.rs-8094183/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-8094183/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003e\u003cb\u003eBackground\u003c/b\u003e\u003c/p\u003e\u003cp\u003eGenomic prediction (GP) is a central component of modern plant breeding, enabling the early selection of superior genotypes based on genomic marker data. Classical GP models, such as genomic best linear unbiased prediction (GBLUP), operate within the data modeling culture and typically assume additive genetic effects, which have limitations that hinder their performance in hybrid breeding, where dominance and epistasis effects play a role. In contrast, machine learning (ML) models from the algorithmic modeling culture can model non-additive genetic effects but often lack biological grounding and interpretability. To bridge these paradigms, we propose 2NPLGBM, a hybrid genomic prediction approach that integrates quantitative genetics with ML. This method introduces a two-matrix (2NP) genotype representation by concatenating additive (Z) and dominance (W) matrix representations, which serves as input to a Light Gradient Boosting Machine (LGBM), enabling the simultaneous modeling of additive, dominance, and higher-order genetic interactions (AA, AD, DD).\u003c/p\u003e\u003cp\u003e\u003cb\u003eResults\u003c/b\u003e\u003c/p\u003e\u003cp\u003eThe 2NPLGBM model was evaluated using six years of hybrid maize trial data across four agronomic traits (grain yield, plant height, days to silking, and days to anthesis) under five cross-validation schemes simulating temporal: Leave-One-Year-Out (LOYO), Rolling Window (RW), and genetic generalization: Five-Fold, and tester-based schemes (Tester CV0 and Tester CV00). Compared to GBLUP, 2NPLGBM achieved an average 5% improvement in predictive accuracy under temporal validations and over 15% gains under tester-based schemes, particularly for flowering traits (days to silking and days to anthesis). Moreover, it consistently improved selection efficiency, indicating that the model captures complex genetic signals relevant for ranking and hybrid selection. Feature interpretation using SHapley Additive exPlanations (SHAP) confirmed that non-additive interactions contributed substantially to prediction accuracy for highly heritable traits. It also revealed trait-specific architectures, additive effects dominated flowering traits, while dominance effects contributed substantially to plant height and yield. Classical variance component analysis supported these findings, indicating high dominance contributions of 17.3% for yield and 8.2% for plant height.\u003c/p\u003e\u003cp\u003e\u003cb\u003eConclusion\u003c/b\u003e\u003c/p\u003e\u003cp\u003e2NPLGBM represents a biologically informed ML framework that bridges classical quantitative genetics and algorithmic modeling cultures. By jointly modeling additive and non-additive effects it enhances predictive accuracy, interpretability, and selection efficiency in hybrid breeding programs. Future work should explore multi-trait and multi-environment extensions, integration of environmental covariates, and the inclusion of multi-omics data to further strengthen predictive power and biological interpretability.\u003c/p\u003e","manuscriptTitle":"2NPLGBM: A genomic model that merges the strengths of classical and machine learning methods in genomic prediction","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-11-27 13:00:57","doi":"10.21203/rs.3.rs-8094183/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2026-02-02T07:38:52+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-01-30T02:13:44+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"106024537093496548056374170930146036890","date":"2026-01-09T04:50:06+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"126323717007880386666878270152568254517","date":"2026-01-09T04:13:01+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"221183266001423171001880730335293975793","date":"2026-01-09T04:12:55+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-12-12T16:59:41+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"317297516636190988322185717101892901313","date":"2025-11-20T11:48:46+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2025-11-20T11:33:28+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2025-11-14T14:12:34+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2025-11-14T14:11:37+00:00","index":"","fulltext":""},{"type":"submitted","content":"Plant Methods","date":"2025-11-12T08:51:04+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"plant-methods","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"plme","sideBox":"Learn more about [Plant Methods](http://plantmethods.biomedcentral.com/)","snPcode":"13007","submissionUrl":"https://submission.nature.com/new-submission/13007/3","title":"Plant Methods","twitterHandle":"@PlantMethods","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"em","reportingPortfolio":"BMC/SO AJ","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"b520c5d2-fcf3-402b-95bf-37950b441176","owner":[],"postedDate":"November 27th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[],"tags":[],"updatedAt":"2026-04-24T22:23:44+00:00","versionOfRecord":[],"versionCreatedAt":"2025-11-27 13:00:57","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-8094183","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-8094183","identity":"rs-8094183","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00