A Non-Linear Game for Two: Genetic Parameters and Prediction of Fertilization Success using Bayesian and Machine Learning Frameworks | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article A Non-Linear Game for Two: Genetic Parameters and Prediction of Fertilization Success using Bayesian and Machine Learning Frameworks Fotis Pappas, Paul Vincent Debes, Martin Johnsson, Christos Palaiokostas This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-8078842/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Fertility is an important but often cryptic intrinsic characteristic of domesticated animals. Predicting reproductive potential is of great importance for the industry but assessment through indirect proxies is laborious and often impractical. Among other biological factors, genetic effects are expected to play a crucial role in shaping male and female fertility. In cases where heritable components are strong, polygenic merit could be a valuable tool for decision-making in breeding schemes. Here we estimate sex-specific variance components affecting fertilization success by analysing outcomes of over 3,000 controlled mating events in an Arctic charr breeding nucleus from Iceland. Furthermore, a machine learning framework using relationships-to-founders vectors as input and a two-tower neural network architecture is proposed and tested for prediction of fertilization success. Both approaches seem to capture meaningful biological signals and offer alternative tools for ranking, selecting or even allocating matings between breeding candidates. Figures Figure 1 Figure 2 Figure 3 Background Reproductive success is a fundamental prerequisite of sustainable and profitable animal production. In salmonid fish farming, the relative economic value of each breeder is particularly high due to the high reproductive potential of these species [1,2]. However, external fertilization is an environmentally sensitive process [3] and thus requires substantial investment in resources and labour. Combined with the typically narrow spawning windows especially for females [4], fertilization failure becomes extremely costly, leading not only to wasted resources but also to potential reductions in population size and genetic diversity. Ensuring consistent and reliable fertility is therefore critical to the long-term viability of breeding programs. Traditionally, reproductive capacity in breeders has been assessed using proxy traits such as gamete counts and quality measures (e.g., egg size, sperm motility) [5]. While informative, the collection of such phenotypes is often impractical, given the associated costs and labor intensity. An interesting alternative is to predict fertility through biological predictors. More specifically, estimating the genetic merit for fertility in candidate breeders without own phenotype could simultaneously improve fertilization rates in the short term and drive long-term genetic gains by shifting the population mean through more effective selection enabled by minimizing fertility-related loss of families. Interestingly, familial fertilization rates are governed by intrinsic biological factors contributed by (at least) two individuals acting in a non-additive manner. A probabilistic, multiplicative framework that accounts for the non-linear effects of sires and dams on familial fertilization rates may therefore provide the most realistic description of mating outcomes [6]. Among potential analytical approaches, machine learning offers a suitable means to efficiently model such complex interactions [7]. In this study, we analyze more than 3,000 mating events from the Icelandic Arctic charr breeding program (Hólar University, Iceland) focusing on the genetic variance components of latent male and female fertility, as well as the applicability of machine learning architectures for predicting fertilization success and failure based on genetic merit. Our work introduces two key analytical novelties. First, we propose the use of relationships-to-founders as input features, offering an alternative to previously used data sources such as genotypic information, the full additive genetic relationship matrix (A) or its low-dimensional representations through eigenvector analysis [7–12]. Second, we developed a two-tower multilayer perceptron (MLP) architecture with probability multiplication fusion, specifically designed to predict fertilization outcomes. Methods Background information of the studied population – Collected data Fertilization success data were recorded over 17 years (2008-2024) from the Icelandic Arctic charr breeding nucleus in Hólar, northern Iceland [13]. Records included a total of 3,087 artificial mating events between 1,863 sires and 2,937 dams (males often contributed to two families whereas dams usually contributed to one). The number of viable fertilized eggs was recorded with drastic right-censoring with up to 400 eyed eggs counted per full-sib family. For our analysis, a binary fertilization outcome was considered with at least 400 eyed eggs (upper bound) being considered a success. The latter corresponds to a success rate of ~88% across all years. Estimation of genetic parameters for latent fertility traits undefined A sex-specific, bivariate model was constructed for the binary fertilization success trait discussed above using a Bayesian hierarchical framework [ 6 ]. Under a Generalized Linear Mixed Model (GLMM) using probit-links, fertilization success requires both a female and a male latent liability to exceed zero. Each sex-limited, latent phenotype had its own intercept and year random effects, while the additive genetic effects were modelled on the liability scale using the additive genetic relationship matrix built with R/AGHmatrix [ 14 , 15 ]. Additionally, a permanent-environmental effect was considered for sires, while both sexes shared a correlated residual term. For each record, the success probability is defined as the product of the standard normal cumulated density function (CDF, denoted as \(\:{\Phi\:}\) ) for female and male liabilities: $$\:p=\:Ff\bullet\:Fm\:={\Phi\:}\left({\eta\:}_{female}\right)\bullet\:{\Phi\:}\left({\eta\:}_{male}\right)$$ where \(\:{\eta\:}_{female}\) and \(\:{\eta\:}_{male}\) are the latent fertility liabilities for dams and sires respectively (Fig. 1A visualizes a Fertilization Rate equivalent ) . Weakly informative priors were specified ( https://github.com/pappasfotios/AC_Iceland_fertility/blob/main/Stan/FertLiabilities.stan ), and the model, scripted in Stan via R/rstan [ 14 , 16 ], was run in 3 chains for 10,000 iterations (4,000 warmup). Convergence was assessed using \(\:\widehat{R}\) , effective sample size, and visual inspection of trace plots. Liability-scale additive variances, heritabilities [ 17 ], repeatability for male fertility and genetic correlation between male and female fertility were derived along with per-animal Estimated Breeding Values (EBVs). Two-tower MLP model The same underlying multiplicative system, yielding fertilization success probabilities from female and male contributions was considered for our ML model. The proposed two-tower structure is inspired by widely adopted models in retrieval and recommendation systems first popularized by Google® for YouTube® recommendations [ 18 ]. The portion of the additive genetic relationship matrix referring to relationships to founders of the pedigree was retrieved and the breeder-corresponding vectors were used as input to train sire- and dam-specific multi-layer perceptrons/towers consisting of three hidden layers each, with rectified linear unit (ReLU) activations (architecture described in Fig. 1B ). Training was carried out using the Adam optimizer [ 19 ] and early stopping to control training time and overfitting. Observations were partitioned using stratification based on spawning year to prevent parent leakage (i.e. we avoid having the same breeder present in both training and validation/testing sets), combined with target stratification to maintain class balance. An initial 80% − 20% split was performed to separate a dataset for 4-fold cross-validation (80%, used for dropout tuning) and a final test-set partition (20%). Two dropout steps for the first two hidden layers were included with rates ranging between 0.0 to 0.5 evaluated via grid-search in the cross-validation phase of our analysis, meaning that between 0% and 50% of the neurons in these hidden layers were randomly disabled during training as a means of regularization. Finally, model training was performed on all observations of the 80% partition and tested in the 20% validation set. The neural networks were implemented in PyTorch [ 22 ], while cross-validation and evaluation metrics (ROC AUC, accuracy, Precision-Recall AUC and F1-score) were computed using scikit-learn library in Python [ 23 ]. Results & Discussion Genetic parameters of latent fertility factors Heritability estimates were 0.36 ([0.20, 0.56] 95% CI) for female fertility and 0.15 ([0.00, 0.43] 95% CI) for male fertility (trace-plots in Figure S1 and diagnostics in Table S1 ) on the respective latent scales. The posterior mean for genetic correlation between the two traits was 0.14 but was uncertain ([-0.64, 0.78] 95% CI). Repeatability posterior mean for male reproductive potential was 0.38 ([0.14, 0.62] 95% CI). Besides more heritable, the point estimate of phenotypic variance was higher for females liabilities (posterior mean = 2.80, [1.74, 4.96] 95% CI) compared to male liabilities (posterior mean = 2.25, [1.20, 4.56] 95% CI), while the opposite was true for intercept posteriors (Table S1). Despite the levels of uncertainty expressed by posterior variances, the compiled results indicate that dam fertility is more variable and significantly affects fertility outcomes in this population. Furthermore, the heritability posteriors ranged at similar levels, but with an opposing sex-specific pattern compared to a recent study in the Swedish National Breeding Program, where male fertility heritability was higher than that of females, although the trait definitions differed considerably [6]. At the same time the posteriors in the current study appear narrower, yielding more confident parameter estimates, probably reflecting the larger sample size. Tuning and performance of two-tower MLP Grid-search ( Table S1 ) indicated dropout rates of 0.0 and 0.5 as the best-performing values for the first and second hidden layers, respectively. This configuration yielded an average ROC AUC of 0.645 across the four cross-validation folds, indicating that stronger regularization was beneficial in the second hidden layer. Overall, training with this architecture achieved an ROC AUC of 0.654 ( Figure 2A ) and a PR-AUC of 0.945 on the final held-out test set. From a practical breeding management standpoint, when selecting the top 40% predicted outcomes, the expected accuracy was the highest ( Figure 2B ). Given the low (sire) to moderate (dam) heritability of the traits, these predictive values are encouraging, even if the trait definition was not optimal. Importantly, the relative gain compared to random selection is valuable in a breeding context, where false positives can result in inefficient on-farm selection and misallocation of resources. Unfortunately, the exact signals and interactions offering predictive ability to the model remain cryptic since the real fertility status of individual breeders is naturally masked. However, the tower outputs seem to generally be linked to true combined labels ( Figure 2C ). Variation in liability output almost collapses to constants for more recent generations ( Figure 2D ) probably due to limited downstream information. An additional strength of the two-tower design is its flexibility. Different data types and dimensionalities can be assigned to the male and female towers. For instance, female inputs could consist of SNP genotypes while male inputs might comprise sperm CpG methylation markers [25], enabling the model to integrate heterogeneous molecular signals. This makes the architecture adaptable and multi-purpose across species and data modalities. Furthermore, mate compatibility could also be modelled through more complex architectures resulting in deeper refinement of the biological background. Finally, if combined with interpretability techniques, the framework could also be employed for feature selection and biomarker discovery, helping to identify molecular signatures of fertility and infertility. Relationship of proofs from hierarchical Bayesian framework and two-tower MLPs To assess ranking similarities between our two analytical approaches, the relationship of Estimated Breeding Values (EBVs) and liabilities yielded from our ML model we investigated. Those seem to be associated, as expected ( Figure 3 ). More specifically, correlations of 0.42 and 0.44 for female and male, respectively, were estimated coupled with respective low p -values. Although the true genetic merit for fertility remains latent and cannot be directly assessed, obtaining correlated proofs from two conceptually independent approaches supports consistency. This suggests that both models are capturing effects of heritable components despite their distinct assumptions and data. Conclusion This study suggests that on-farm fertilization success can be modelled as a non-linear combination of latent fertility probabilities in a multiplicative fashion. Variance components were estimated under a Bayesian hierarchical framework suggesting moderate heritability for female fertility and a low-moderate estimate for male fertility in Icelandic Arctic charr. Additionally, we propose a flexible and easily implementable alternative based on a two-tower MLP architecture. Each tower models sex-specific liabilities, which are then integrated to predict cross outcomes (ROC AUC = 0.654). The framework is easily extendable to utilize multiple data modalities, including omics and diverse sources of metadata. From a practical standpoint, such models could support decision-making in animal production systems by providing accurate predictions of reproductive success amongst pairs and by ranking breeding candidates based on tower outputs. In this proof-of-concept study, we evaluated the predictive ability of our ML system using only pedigree information as input. Future extensions could include genotypic information, compatibility effects between breeders and genotype-by-environment interactions (G×E) that are expected to be important for such traits. Declarations The study utilized retrospective data collected during normal operation of the Icelandic Arctic charr breeding program (https://www.holaraquatic.is/breeding-program.html). No experimental procedures took place Ethics approval and consent to participate Not applicable. Consent for publication Not applicable. Availability of data and materials Data and code available at: https://github.com/pappasfotios/AC_Iceland_fertility Competing interests The authors declare that they have no competing interests. Funding The authors acknowledge support from the Icelandic Research Fund 2410430-052. FP was supported by the faculty of Veterinary Medicine and Animal Science, Swedish University of Agricultural Sciences. Authors’ contributions FP conceived methodology, conducted official data analysis and drafted the initial manuscript. PVD acquired and edited data. CP and PVD conceptualized the study and together with MJ supervised the analysis. All authors contributed to writing and approved the final version. Acknowledgements Not applicable. References Klemetsen A, Amundsen P-A, Dempson JB, Jonsson B, Jonsson N, O’Connell MF, et al. Atlantic salmon Salmo salar L., brown trout Salmo trutta L. and Arctic charr Salvelinus alpinus (L.): a review of aspects of their life histories. Ecology of Freshwater Fish. 2003;12:1–59. https://doi.org/10.1034/j.1600-0633.2003.00010.x Gjedrem T, Robinson N. Advances by Selective Breeding for Aquatic Species: A Review. Agricultural Sciences. Scientific Research Publishing; 2014;5:1152–8. https://doi.org/10.4236/as.2014.512125 Kumar P, Babita M, Kailasam M, Muralidhar M, Hussain T, Behera A, et al. Effect of Changing Environmental Factors on Reproductive Cycle and Endocrinology of Fishes. In: Sinha A, Kumar S, Kumari K, editors. Outlook of Climate Change and Fish Nutrition [Internet]. Singapore: Springer Nature; 2022 [cited 2024 Jul 29]. p. 377–96. https://doi.org/10.1007/978-981-19-5500-6_25 Gillet C. Egg production in an Arctic charr (Salvelinus alpinus L.) brood stock: effects of temperature on the timing of spawning and the quality of eggs. Aquat Living Resour. EDP Sciences; 1991;4:109–16. https://doi.org/10.1051/alr:1991010 Jeuthe H, Schmitz M, Brännäs E. Evaluation of gamete quality indicators for Arctic charr Salvelinus alpinus . Aquaculture. 2019;504:446–53. https://doi.org/10.1016/j.aquaculture.2019.02.024 Pappas F, Johnsson M, Debes PV, Palaiokostas C. Genetic parameters and sex-specific architecture of observed and latent fertility phenotypes in a closed breeding nucleus of an Arctic salmonid [Internet]. bioRxiv; 2025 [cited 2025 Oct 20]. p. 2025.10.16.682826. https://doi.org/10.1101/2025.10.16.682826 Montesinos-López OA, Montesinos-López A, Pérez-Rodríguez P, Barrón-López JA, Martini JWR, Fajardo-Flores SB, et al. A review of deep learning applications for genomic selection. BMC Genomics. 2021;22:19. https://doi.org/10.1186/s12864-020-07319-x Palaiokostas C. Breeding evaluations in aquaculture using neural networks. Aquaculture Reports. 2024;39:102468. https://doi.org/10.1016/j.aqrep.2024.102468 Gianola D, Okut H, Weigel KA, Rosa GJ. Predicting complex quantitative traits with Bayesian neural networks: a case study with Jersey cows and wheat. BMC Genet. 2011;12:87. https://doi.org/10.1186/1471-2156-12-87 Pérez-Enciso M, Zingaretti LM. A Guide on Deep Learning for Complex Trait Genomic Prediction. Genes. Multidisciplinary Digital Publishing Institute; 2019;10:553. https://doi.org/10.3390/genes10070553 Ehret A, Hochstuhl D, Gianola D, Thaller G. Application of neural networks with back-propagation to genome-enabled prediction of complex traits in Holstein-Friesian and German Fleckvieh cattle. Genetics Selection Evolution. 2015;47:22. https://doi.org/10.1186/s12711-015-0097-5 Inamori M, Kimura T, Mori M, Tarumoto Y, Hattori T, Hayano M, et al. Machine learning for genomic and pedigree prediction in sugarcane. The Plant Genome. 2024;17:e20486. https://doi.org/10.1002/tpg2.20486 Debes PV, Lobligeois SBC, Svavarsson E. Genetic and Environmental (Co)variation of Egg Size, Fecundity, and Growth Traits in Arctic Charr. Evol Appl. 2025;18:e70135. https://doi.org/10.1111/eva.70135 R Core Team. R: A Language and Environment for Statistical Computing [Internet]. Vienna, Austria: R Foundation for Statistical Computing; 2023. https://www.R-project.org/ Amadeu RR, Garcia AAF, Munoz PR, Ferrão LFV. AGHmatrix: genetic relationship matrices in R. Bioinformatics. 2023;39:btad445. https://doi.org/10.1093/bioinformatics/btad445 Stan Development Team. Stan User’s Guide [Internet]. 2023 [cited 2025 Sep 16]. https://mc-stan.org/docs/stan-users-guide/index.html. Accessed 16 Sep 2025 de Villemereuil P, Schielzeth H, Nakagawa S, Morrissey M. General Methods for Evolutionary Quantitative Genetic Inference from Generalized Mixed Models. Genetics. 2016;204:1281–94. https://doi.org/10.1534/genetics.115.186536 Covington P, Adams J, Sargin E. Deep Neural Networks for YouTube Recommendations. Proceedings of the 10th ACM Conference on Recommender Systems [Internet]. Boston Massachusetts USA: ACM; 2016 [cited 2025 Oct 15]. p. 191–8. https://doi.org/10.1145/2959100.2959190 Kingma DP, Ba J. Adam: A Method for Stochastic Optimization [Internet]. arXiv; 2017 [cited 2025 Oct 20]. https://doi.org/10.48550/arXiv.1412.6980 Wickham H. ggplot2: Elegant Graphics for Data Analysis [Internet]. Springer-Verlag New York; 2016. https://ggplot2.tidyverse.org Gansner ER, North SC. An open graph visualization system and its applications to software engineering. Software: Practice and Experience. 2000;30:1203–33. https://doi.org/10.1002/1097-024X(200009)30:113.0.CO;2-N Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library [Internet]. arXiv; 2019 [cited 2025 Sep 30]. https://doi.org/10.48550/arXiv.1912.01703 Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine Learning in Python. MACHINE LEARNING IN PYTHON. Hunter JD. Matplotlib: A 2D Graphics Environment. Computing in Science & Engineering. 2007;9:90–5. https://doi.org/10.1109/MCSE.2007.55 Pappas F, Johnsson M, Andersson G, Debes PV, Palaiokostas C. Sperm DNA methylation landscape and its links to male fertility in a non-model teleost using EM-seq. Heredity. Nature Publishing Group; 2025;134:293–305. https://doi.org/10.1038/s41437-025-00756-y Kassambara A. “ggplot2” Based Publication Ready Plots [Internet]. 2025 [cited 2025 Sep 30]. https://cran.r-project.org/web/packages/ggpubr/refman/ggpubr.html. Accessed 30 Sep 2025 Supplementary Files FigureS1.pdf TableS1.pdf TableS2.pdf Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-8078842","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":547040786,"identity":"bb9fa022-1e73-49e9-b94b-76f24c1d91e8","order_by":0,"name":"Fotis Pappas","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA/ElEQVRIiWNgGAWjYPACOQYGZiD1sQHEYWw8QIQWYwYeoBbGmQ0MEkCqgUgtQJKZF6yFgQGvFnP2s8ce/PhjIGfPznz4s+0Omzrd9sNAW2psonFpsezJSzfsbTMw5mFmS5POPZMmYXYmEajlWFpuAw4tBgdyzCR4G/4k9jDzmDHnth2WMDsA1MLYcBi3lvNvzCT//DEAaTH+bAnScv4hAS03csykedjAWgykGUFabhCwxXLGu3RjWZBfDrOlSfa2pUluuwG0JQGPX8z5c489fAMMMfb+w4c//Gyz4Tc7n/7wwYcaG9wOY+BhwyKcgEM5Hi2jYBSMglEwCpAAADjXXGLSNOlKAAAAAElFTkSuQmCC","orcid":"https://orcid.org/0000-0003-3696-5069","institution":"Swedish University of Agricultural Sciences: Sveriges lantbruksuniversitet","correspondingAuthor":true,"prefix":"","firstName":"Fotis","middleName":"","lastName":"Pappas","suffix":""},{"id":547040787,"identity":"392331f1-7ffe-4b59-887d-e03aa434c78c","order_by":1,"name":"Paul Vincent Debes","email":"","orcid":"","institution":"Holarskoli","correspondingAuthor":false,"prefix":"","firstName":"Paul","middleName":"Vincent","lastName":"Debes","suffix":""},{"id":547040788,"identity":"c08f5b3e-a73f-451f-9ab9-06ab2ac18bc3","order_by":2,"name":"Martin Johnsson","email":"","orcid":"","institution":"Swedish University of Agricultural Sciences: Sveriges lantbruksuniversitet","correspondingAuthor":false,"prefix":"","firstName":"Martin","middleName":"","lastName":"Johnsson","suffix":""},{"id":547040789,"identity":"26dbf59c-b37c-40bc-bb37-f915a94075d9","order_by":3,"name":"Christos Palaiokostas","email":"","orcid":"","institution":"Swedish University of Agricultural Sciences: Sveriges lantbruksuniversitet","correspondingAuthor":false,"prefix":"","firstName":"Christos","middleName":"","lastName":"Palaiokostas","suffix":""}],"badges":[],"createdAt":"2025-11-10 15:46:38","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-8078842/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-8078842/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":96355643,"identity":"69008566-ff51-44e3-80fd-743df1c46dfd","added_by":"auto","created_at":"2025-11-20 08:18:10","extension":"png","order_by":0,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":693802,"visible":true,"origin":"","legend":"","description":"","filename":"Figure1.png","url":"https://assets-eu.researchsquare.com/files/rs-8078842/v1/542e2afe9ba6fa810b2ee839.png"},{"id":96355648,"identity":"7995ec53-d5c4-4ea8-ae4c-e4c0c29c1523","added_by":"auto","created_at":"2025-11-20 08:18:10","extension":"png","order_by":2,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":1927204,"visible":true,"origin":"","legend":"","description":"","filename":"Figure2.png","url":"https://assets-eu.researchsquare.com/files/rs-8078842/v1/5516ceebb98dabeae218de72.png"},{"id":96355647,"identity":"ca2e80a4-c5ff-42dd-9051-8e360ab68daf","added_by":"auto","created_at":"2025-11-20 08:18:10","extension":"png","order_by":3,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":308931,"visible":true,"origin":"","legend":"","description":"","filename":"Figure3.png","url":"https://assets-eu.researchsquare.com/files/rs-8078842/v1/56be6390df40a96c467d1422.png"},{"id":96355650,"identity":"1b03d788-3e80-45aa-8e16-dd109fe9509b","added_by":"auto","created_at":"2025-11-20 08:18:10","extension":"xml","order_by":4,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":7956,"visible":true,"origin":"","legend":"","description":"","filename":"gsevGSEVD2500233.xml","url":"https://assets-eu.researchsquare.com/files/rs-8078842/v1/0ddfc4ec5877da2f84ede600.xml"},{"id":96355660,"identity":"7e50fda3-b0ac-4b35-8d0e-7517e092059e","added_by":"auto","created_at":"2025-11-20 08:18:10","extension":"xml","order_by":5,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":1189,"visible":true,"origin":"","legend":"","description":"","filename":"GSEVD25002333259.go.xml","url":"https://assets-eu.researchsquare.com/files/rs-8078842/v1/c886b4ab06d43abbe2c70b59.xml"},{"id":96367385,"identity":"772a4bc3-ed75-4da5-8a73-0323b8fc5219","added_by":"auto","created_at":"2025-11-20 10:12:42","extension":"xml","order_by":6,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":861,"visible":true,"origin":"","legend":"","description":"","filename":"GSEVD2500233Import.xml","url":"https://assets-eu.researchsquare.com/files/rs-8078842/v1/c5f8960694ae81cf18abd3bb.xml"},{"id":96355657,"identity":"24506721-89a9-4e2d-a80a-a139e092bec2","added_by":"auto","created_at":"2025-11-20 08:18:10","extension":"xml","order_by":11,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":62917,"visible":true,"origin":"","legend":"","description":"","filename":"GSEVD25002330enriched.xml","url":"https://assets-eu.researchsquare.com/files/rs-8078842/v1/a1e9721b8a897d0b6ae55951.xml"},{"id":96367640,"identity":"63430550-5afd-4c82-ac4d-0b3b8eb81af8","added_by":"auto","created_at":"2025-11-20 10:13:46","extension":"png","order_by":12,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":693802,"visible":true,"origin":"","legend":"","description":"","filename":"Figure1.png","url":"https://assets-eu.researchsquare.com/files/rs-8078842/v1/47ce086878b68e16fe55bf5f.png"},{"id":96367220,"identity":"f87cd9ef-1737-4ee3-b9fd-54cb11f079fc","added_by":"auto","created_at":"2025-11-20 10:12:20","extension":"png","order_by":13,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":1927204,"visible":true,"origin":"","legend":"","description":"","filename":"Figure2.png","url":"https://assets-eu.researchsquare.com/files/rs-8078842/v1/d3a8fa413b9847f90290e53d.png"},{"id":96366417,"identity":"0ae6abb7-5dec-49dd-9f66-3f85438c767d","added_by":"auto","created_at":"2025-11-20 10:11:26","extension":"png","order_by":14,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":308931,"visible":true,"origin":"","legend":"","description":"","filename":"Figure3.png","url":"https://assets-eu.researchsquare.com/files/rs-8078842/v1/8d39106cabd30bb00374a480.png"},{"id":96366992,"identity":"7421fa1b-f7d6-4d22-a823-0107dda6f8ec","added_by":"auto","created_at":"2025-11-20 10:12:05","extension":"png","order_by":15,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":116072,"visible":true,"origin":"","legend":"","description":"","filename":"OnlineFigure1.png","url":"https://assets-eu.researchsquare.com/files/rs-8078842/v1/4dccc298bc5a3746663a3183.png"},{"id":96366300,"identity":"04a73110-2c34-484c-857f-6a77ccd91575","added_by":"auto","created_at":"2025-11-20 10:11:21","extension":"png","order_by":16,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":255803,"visible":true,"origin":"","legend":"","description":"","filename":"OnlineFigure2.png","url":"https://assets-eu.researchsquare.com/files/rs-8078842/v1/c374aed0658e49255e60038f.png"},{"id":96367418,"identity":"6f5f4611-f524-4ced-b730-04265189eb97","added_by":"auto","created_at":"2025-11-20 10:12:45","extension":"png","order_by":17,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":97899,"visible":true,"origin":"","legend":"","description":"","filename":"OnlineFigure3.png","url":"https://assets-eu.researchsquare.com/files/rs-8078842/v1/6a0ce40b742ab7e8b0224444.png"},{"id":96355663,"identity":"17636ce6-1e85-430c-ae16-600b590bd420","added_by":"auto","created_at":"2025-11-20 08:18:10","extension":"xml","order_by":18,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":61499,"visible":true,"origin":"","legend":"","description":"","filename":"GSEVD25002330structuring.xml","url":"https://assets-eu.researchsquare.com/files/rs-8078842/v1/e2cc9ad3845a904c5dbad185.xml"},{"id":96355661,"identity":"66c533bd-55a5-4766-ab51-bf95d3ecaeee","added_by":"auto","created_at":"2025-11-20 08:18:10","extension":"html","order_by":19,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":70142,"visible":true,"origin":"","legend":"","description":"","filename":"earlyproof.html","url":"https://assets-eu.researchsquare.com/files/rs-8078842/v1/60e796aaa043318ec9bf7827.html"},{"id":96355644,"identity":"6fee3405-b10d-413a-944d-635997cb922e","added_by":"auto","created_at":"2025-11-20 08:18:10","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":693802,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eA)\u003c/strong\u003e Fertilization rate contours as a product of male (Fm) and female (Ff) fertility probabilities. \u003cstrong\u003eB)\u003c/strong\u003e two-tower MLP architecture with unknown drop rates for the first two hidden layers to be found through grid search. Each tower has different weights and outputs a liability score that is then transformed via a probit link to range [0, 1]. Fertilization probability is produced after multiplying the respective probabilities by exponentiating the sum of their natural logarithms for numerical stability. Created with ggplot2 v4.0.0 [20], graphviz v12.2.1 [21] and BioRender.com.\u003c/p\u003e","description":"","filename":"Figure1.png","url":"https://assets-eu.researchsquare.com/files/rs-8078842/v1/ce9c58f4efe3ef6c08d4dc17.png"},{"id":96367001,"identity":"713acb52-2a2c-456a-b9bc-994e26ab1e60","added_by":"auto","created_at":"2025-11-20 10:12:05","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":1927204,"visible":true,"origin":"","legend":"\u003cp\u003eTwo-tower model performance and predictions: \u003cstrong\u003eA)\u003c/strong\u003e ROC plot (held-out test set) of the classifier based on the two-tower architecture, \u003cstrong\u003eB)\u003c/strong\u003e cumulative accuracy and F1-score (held-out test set) by ranked fractions of data with baseline metrics corresponding to the training data, \u003cstrong\u003eC)\u003c/strong\u003e violin plot of predicted reproductive potentials of sires, dams and corresponding pairs by true label in the held-out test set, \u003cstrong\u003eD) \u003c/strong\u003eboxplot of sex-specific predictions (on whole dataset) by cohort acting as a proxy assessment of genetic trends. Created with matplotlib v3.10.5 [24] and BioRender.com\u003c/p\u003e","description":"","filename":"Figure2.png","url":"https://assets-eu.researchsquare.com/files/rs-8078842/v1/bd682094888385387b51aa1c.png"},{"id":96355646,"identity":"48e837ee-de3c-41a9-9ea7-16b5a1eef325","added_by":"auto","created_at":"2025-11-20 08:18:10","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":308931,"visible":true,"origin":"","legend":"\u003cp\u003eEstimated Breeding Values (EBVs) vs. proofs produced from the two-tower MLP model using the final trained model and the full data as input. Color-coding is used for datapoints and regression lines corresponding to males and females. Pearson’s correlation coefficients and corresponding p-values are also printed with the same color-coding convention. Plot was created with ggplot2 v4.0.0 \u003cem\u003e[20]\u003c/em\u003eand ggpubr v0.6.1 \u003cem\u003e[26]\u003c/em\u003e.\u003c/p\u003e","description":"","filename":"Figure3.png","url":"https://assets-eu.researchsquare.com/files/rs-8078842/v1/6323c2419a75e3c1e7d45836.png"},{"id":96369479,"identity":"1b4c0197-9f1d-4e39-861a-7a551fe72b70","added_by":"auto","created_at":"2025-11-20 10:21:07","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":3096381,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8078842/v1/42f3d090-98be-4ed3-baca-e08aa8b49a10.pdf"},{"id":96355653,"identity":"d92af09e-1706-4b27-9b18-0d219f6e3550","added_by":"auto","created_at":"2025-11-20 08:18:10","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":393963,"visible":true,"origin":"","legend":"","description":"","filename":"FigureS1.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8078842/v1/b2cbfd371983cf4f19bad705.pdf"},{"id":96367236,"identity":"f612c7ef-c6e7-4dc6-8085-47e3b28406c4","added_by":"auto","created_at":"2025-11-20 10:12:21","extension":"pdf","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":155372,"visible":true,"origin":"","legend":"","description":"","filename":"TableS1.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8078842/v1/37851f19d44751ee5aea57dd.pdf"},{"id":96366905,"identity":"2617d33a-83b2-4734-b7a2-a11d0f0a7421","added_by":"auto","created_at":"2025-11-20 10:12:02","extension":"pdf","order_by":3,"title":"","display":"","copyAsset":false,"role":"supplement","size":95197,"visible":true,"origin":"","legend":"","description":"","filename":"TableS2.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8078842/v1/394e27a8699e1244213ff2ab.pdf"}],"financialInterests":"","formattedTitle":"A Non-Linear Game for Two: Genetic Parameters and Prediction of Fertilization Success using Bayesian and Machine Learning Frameworks","fulltext":[{"header":"Background","content":"\u003cp\u003eReproductive success is a fundamental prerequisite of sustainable and profitable animal production. In salmonid fish farming, the relative economic value of each breeder is particularly high due to the high reproductive potential of these species [1,2]. However, external fertilization is an environmentally sensitive process [3] and thus requires substantial investment in resources and labour. Combined with the typically narrow spawning windows especially for females [4], fertilization failure becomes extremely costly, leading not only to wasted resources but also to potential reductions in population size and genetic diversity. Ensuring consistent and reliable fertility is therefore critical to the long-term viability of breeding programs.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eTraditionally, reproductive capacity in breeders has been assessed using proxy traits such as gamete counts and quality measures (e.g., egg size, sperm motility) [5]. While informative, the collection of such phenotypes is often impractical, given the associated costs and labor intensity. An interesting alternative is to predict fertility through biological predictors. More specifically, estimating the genetic merit for fertility in candidate breeders without own phenotype could simultaneously improve fertilization rates in the short term and drive long-term genetic gains by shifting the population mean through more effective selection enabled by minimizing fertility-related loss of families.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eInterestingly, familial fertilization rates are governed by intrinsic biological factors contributed by (at least) two individuals acting in a non-additive manner. A probabilistic, multiplicative framework that accounts for the non-linear effects of sires and dams on familial fertilization rates may therefore provide the most realistic description of mating outcomes [6]. Among potential analytical approaches, machine learning offers a suitable means to efficiently model such complex interactions [7].\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eIn this study, we analyze more than 3,000 mating events from the Icelandic Arctic charr breeding program (H\u0026oacute;lar University, Iceland) focusing on the genetic variance components of latent male and female fertility, as well as the applicability of machine learning architectures for predicting fertilization success and failure based on genetic merit.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eOur work introduces two key analytical novelties. First, we propose the use of relationships-to-founders as input features, offering an alternative to previously used data sources such as genotypic information, the full additive genetic relationship matrix (A) or its low-dimensional representations through eigenvector analysis [7\u0026ndash;12]. Second, we developed a two-tower multilayer perceptron (MLP) architecture with probability multiplication fusion, specifically designed to predict fertilization outcomes.\u0026nbsp;\u003c/p\u003e"},{"header":"Methods","content":"\u003ch3\u003eBackground information of the studied population \u0026ndash; Collected data\u003c/h3\u003e\n\u003cp\u003eFertilization success data were recorded over 17 years (2008-2024) from the Icelandic Arctic charr breeding nucleus in H\u0026oacute;lar, northern Iceland [13]. Records included a total of 3,087 artificial mating events between 1,863 sires and 2,937 dams (males often contributed to two families whereas dams usually contributed to one). The number of viable fertilized eggs was recorded with drastic right-censoring with up to 400 eyed eggs counted per full-sib family. For our analysis, a binary fertilization outcome was considered with at least 400 eyed eggs (upper bound) being considered a success. The latter corresponds to a success rate of ~88% across all years.\u0026nbsp;\u003c/p\u003e\n\u003ch3\u003eEstimation of genetic parameters for latent fertility traits\u003c/h3\u003e\n\u003ch3\u003eundefined\u003c/h3\u003e\n\u003cp\u003eA sex-specific, bivariate model was constructed for the binary fertilization success trait discussed above using a Bayesian hierarchical framework [\u003cspan class=\"CitationRef\"\u003e6\u003c/span\u003e]. Under a Generalized Linear Mixed Model (GLMM) using probit-links, fertilization success requires both a female and a male latent liability to exceed zero. Each sex-limited, latent phenotype had its own intercept and year random effects, while the additive genetic effects were modelled on the liability scale using the additive genetic relationship matrix built with R/AGHmatrix [\u003cspan class=\"CitationRef\"\u003e14\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e15\u003c/span\u003e]. Additionally, a permanent-environmental effect was considered for sires, while both sexes shared a correlated residual term. For each record, the success probability is defined as the product of the standard normal cumulated density function (CDF, denoted as \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{\\Phi\\:}\\)\u003c/span\u003e\u003c/span\u003e) for female and male liabilities:\u003c/p\u003e\n\u003cdiv id=\"Equa\" class=\"Equation\"\u003e\n \u003cdiv class=\"mathdisplay\" id=\"FileID_Equa\" name=\"EquationSource\"\u003e$$\\:p=\\:Ff\\bullet\\:Fm\\:={\\Phi\\:}\\left({\\eta\\:}_{female}\\right)\\bullet\\:{\\Phi\\:}\\left({\\eta\\:}_{male}\\right)$$\u003c/div\u003e\n\u003c/div\u003e\n\u003cp\u003ewhere \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{\\eta\\:}_{female}\\)\u003c/span\u003e\u003c/span\u003e and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{\\eta\\:}_{male}\\)\u003c/span\u003e\u003c/span\u003e are the latent fertility liabilities for dams and sires respectively (Fig. 1A visualizes a Fertilization Rate equivalent\u003cstrong\u003e)\u003c/strong\u003e. Weakly informative priors were specified (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/pappasfotios/AC_Iceland_fertility/blob/main/Stan/FertLiabilities.stan\u003c/span\u003e\u003c/span\u003e), and the model, scripted in Stan via R/rstan [\u003cspan class=\"CitationRef\"\u003e14\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e16\u003c/span\u003e], was run in 3 chains for 10,000 iterations (4,000 warmup). Convergence was assessed using \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\widehat{R}\\)\u003c/span\u003e\u003c/span\u003e, effective sample size, and visual inspection of trace plots. Liability-scale additive variances, heritabilities [\u003cspan class=\"CitationRef\"\u003e17\u003c/span\u003e], repeatability for male fertility and genetic correlation between male and female fertility were derived along with per-animal Estimated Breeding Values (EBVs).\u003c/p\u003e\n\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e\n \u003ch2\u003eTwo-tower MLP model\u003c/h2\u003e\n \u003cp\u003eThe same underlying multiplicative system, yielding fertilization success probabilities from female and male contributions was considered for our ML model. The proposed two-tower structure is inspired by widely adopted models in retrieval and recommendation systems first popularized by Google\u0026reg; for YouTube\u0026reg; recommendations [\u003cspan class=\"CitationRef\"\u003e18\u003c/span\u003e]. The portion of the additive genetic relationship matrix referring to relationships to founders of the pedigree was retrieved and the breeder-corresponding vectors were used as input to train sire- and dam-specific multi-layer perceptrons/towers consisting of three hidden layers each, with rectified linear unit (ReLU) activations (architecture described in \u003cstrong\u003eFig.\u0026nbsp;1B\u003c/strong\u003e). Training was carried out using the Adam optimizer [\u003cspan class=\"CitationRef\"\u003e19\u003c/span\u003e] and early stopping to control training time and overfitting.\u003c/p\u003e\n \u003cp\u003eObservations were partitioned using stratification based on spawning year to prevent parent leakage (i.e. we avoid having the same breeder present in both training and validation/testing sets), combined with target stratification to maintain class balance. An initial 80% \u0026minus;\u0026thinsp;20% split was performed to separate a dataset for 4-fold cross-validation (80%, used for dropout tuning) and a final test-set partition (20%). Two dropout steps for the first two hidden layers were included with rates ranging between 0.0 to 0.5 evaluated via grid-search in the cross-validation phase of our analysis, meaning that between 0% and 50% of the neurons in these hidden layers were randomly disabled during training as a means of regularization. Finally, model training was performed on all observations of the 80% partition and tested in the 20% validation set. The neural networks were implemented in PyTorch [\u003cspan class=\"CitationRef\"\u003e22\u003c/span\u003e], while cross-validation and evaluation metrics (ROC AUC, accuracy, Precision-Recall AUC and F1-score) were computed using scikit-learn library in Python [\u003cspan class=\"CitationRef\"\u003e23\u003c/span\u003e].\u003c/p\u003e\n\u003c/div\u003e"},{"header":"Results \u0026 Discussion","content":"\u003ch2\u003eGenetic parameters of latent fertility factors\u003c/h2\u003e\n\u003cp\u003eHeritability estimates were 0.36 ([0.20, 0.56] 95% CI) for female fertility and 0.15 ([0.00, 0.43] 95% CI) for male fertility (trace-plots in \u003cstrong\u003eFigure S1\u0026nbsp;\u003c/strong\u003eand diagnostics in\u003cstrong\u003e\u0026nbsp;Table S1\u003c/strong\u003e) on the respective latent scales. The posterior mean for genetic correlation between the two traits was 0.14 but was uncertain ([-0.64, 0.78] 95% CI). Repeatability posterior mean for male reproductive potential was 0.38 ([0.14, 0.62] 95% CI). Besides more heritable, the point estimate of phenotypic variance was higher for females liabilities (posterior mean = 2.80, [1.74, 4.96] 95% CI) compared to male liabilities (posterior mean = 2.25, [1.20, 4.56] 95% CI), while the opposite was true for intercept posteriors (Table S1). Despite the levels of uncertainty expressed by posterior variances, the compiled results indicate that dam fertility is more variable and significantly affects fertility outcomes in this population. Furthermore, the heritability posteriors ranged at similar levels, but with an opposing sex-specific pattern compared to a recent study in the Swedish National Breeding Program, where male fertility heritability was higher than that of females, although the trait definitions differed considerably [6]. At the same time the posteriors in the current study appear narrower, yielding more confident parameter estimates, probably reflecting the larger sample size.\u0026nbsp;\u003c/p\u003e\n\u003ch2\u003eTuning and performance of two-tower MLP\u003c/h2\u003e\n\u003cp\u003eGrid-search (\u003cstrong\u003eTable S1\u003c/strong\u003e) indicated dropout rates of 0.0 and 0.5 as the best-performing values for the first and second hidden layers, respectively. This configuration yielded an average ROC AUC of 0.645 across the four cross-validation folds, indicating that stronger regularization was beneficial in the second hidden layer. Overall, training with this architecture achieved an ROC AUC of 0.654 (\u003cstrong\u003eFigure 2A\u003c/strong\u003e) and a PR-AUC of 0.945 on the final held-out test set. From a practical breeding management standpoint, when selecting the top 40% predicted outcomes, the expected accuracy was the highest (\u003cstrong\u003eFigure 2B\u003c/strong\u003e). Given the low (sire) to moderate (dam) heritability of the traits, these predictive values are encouraging, even if the trait definition was not optimal. Importantly, the relative gain compared to random selection is valuable in a breeding context, where false positives can result in inefficient on-farm selection and misallocation of resources.\u003c/p\u003e\n\u003cp\u003eUnfortunately, the exact signals and interactions offering predictive ability to the model remain cryptic since the real fertility status of individual breeders is naturally masked. However, the tower outputs seem to generally be linked to true combined labels (\u003cstrong\u003eFigure 2C\u003c/strong\u003e). Variation in liability output almost collapses to constants for more recent generations (\u003cstrong\u003eFigure 2D\u003c/strong\u003e) probably due to limited downstream information.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eAn additional strength of the two-tower design is its flexibility. Different data types and dimensionalities can be assigned to the male and female towers. For instance, female inputs could consist of SNP genotypes while male inputs might comprise sperm CpG methylation markers [25], enabling the model to integrate heterogeneous molecular signals. This makes the architecture adaptable and multi-purpose across species and data modalities. Furthermore, mate compatibility could also be modelled through more complex architectures resulting in deeper refinement of the biological background. Finally, if combined with interpretability techniques, the framework could also be employed for feature selection and biomarker discovery, helping to identify molecular signatures of fertility and infertility.\u0026nbsp;\u003c/p\u003e\n\u003ch2\u003eRelationship of proofs from hierarchical Bayesian framework and two-tower MLPs\u003c/h2\u003e\n\u003cp\u003eTo assess ranking similarities between our two analytical approaches, the relationship of Estimated Breeding Values (EBVs) and liabilities yielded from our ML model we investigated. Those seem to be associated, as expected (\u003cstrong\u003eFigure 3\u003c/strong\u003e). More specifically, correlations of 0.42 and 0.44 for female and male, respectively, were estimated coupled with respective low \u003cem\u003ep\u003c/em\u003e-values. Although the true genetic merit for fertility remains latent and cannot be directly assessed, obtaining correlated proofs from two conceptually independent approaches supports consistency. This suggests that both models are capturing effects of heritable components despite their distinct assumptions and data. \u003c/p\u003e"},{"header":"Conclusion","content":"\u003cp\u003eThis study suggests that on-farm fertilization success can be modelled as a non-linear combination of latent fertility probabilities in a multiplicative fashion. Variance components were estimated under a Bayesian hierarchical framework suggesting moderate heritability for female fertility and a low-moderate estimate for male fertility in Icelandic Arctic charr. Additionally, we propose a flexible and easily implementable alternative based on a two-tower MLP architecture. Each tower models sex-specific liabilities, which are then integrated to predict cross outcomes (ROC AUC\u0026thinsp;=\u0026thinsp;0.654). The framework is easily extendable to utilize multiple data modalities, including omics and diverse sources of metadata. From a practical standpoint, such models could support decision-making in animal production systems by providing accurate predictions of reproductive success amongst pairs and by ranking breeding candidates based on tower outputs. In this proof-of-concept study, we evaluated the predictive ability of our ML system using only pedigree information as input. Future extensions could include genotypic information, compatibility effects between breeders and genotype-by-environment interactions (G\u0026times;E) that are expected to be important for such traits.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cspan\u003eThe study utilized retrospective data collected during normal operation of the Icelandic Arctic charr breeding program (https://www.holaraquatic.is/breeding-program.html). No experimental procedures took place\u003c/span\u003e\u003c/p\u003e\u003cp\u003e\u003cstrong\u003eEthics approval and consent to participate\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNot applicable.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConsent for publication\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNot applicable.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAvailability of data and materials\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eData and code available at: https://github.com/pappasfotios/AC_Iceland_fertility\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCompeting interests\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors declare that they have no competing interests.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors acknowledge support from the Icelandic Research Fund 2410430-052. FP was supported by the faculty of Veterinary Medicine and Animal Science, Swedish University of Agricultural Sciences.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthors\u0026rsquo; contributions\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eFP conceived methodology, conducted official data analysis and drafted the initial manuscript. PVD acquired and edited data. CP and PVD conceptualized the study and together with MJ supervised the analysis. All authors contributed to writing and approved the final version.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAcknowledgements\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNot applicable.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eKlemetsen A, Amundsen P-A, Dempson JB, Jonsson B, Jonsson N, O\u0026rsquo;Connell MF, et al. Atlantic salmon \u003cem\u003eSalmo salar\u003c/em\u003e L., brown trout \u003cem\u003eSalmo trutta\u003c/em\u003e L. and Arctic charr \u003cem\u003eSalvelinus alpinus\u003c/em\u003e (L.): a review of aspects of their life histories. Ecology of Freshwater Fish. 2003;12:1\u0026ndash;59. https://doi.org/10.1034/j.1600-0633.2003.00010.x\u003c/li\u003e\n\u003cli\u003eGjedrem T, Robinson N. Advances by Selective Breeding for Aquatic Species: A Review. Agricultural Sciences. Scientific Research Publishing; 2014;5:1152\u0026ndash;8. https://doi.org/10.4236/as.2014.512125\u003c/li\u003e\n\u003cli\u003eKumar P, Babita M, Kailasam M, Muralidhar M, Hussain T, Behera A, et al. Effect of Changing Environmental Factors on Reproductive Cycle and Endocrinology of Fishes. In: Sinha A, Kumar S, Kumari K, editors. Outlook of Climate Change and Fish Nutrition [Internet]. Singapore: Springer Nature; 2022 [cited 2024 Jul 29]. p. 377\u0026ndash;96. https://doi.org/10.1007/978-981-19-5500-6_25\u003c/li\u003e\n\u003cli\u003eGillet C. Egg production in an Arctic charr (Salvelinus alpinus L.) brood stock: effects of temperature on the timing of spawning and the quality of eggs. Aquat Living Resour. EDP Sciences; 1991;4:109\u0026ndash;16. https://doi.org/10.1051/alr:1991010\u003c/li\u003e\n\u003cli\u003eJeuthe H, Schmitz M, Br\u0026auml;nn\u0026auml;s E. Evaluation of gamete quality indicators for Arctic charr \u003cem\u003eSalvelinus alpinus\u003c/em\u003e. Aquaculture. 2019;504:446\u0026ndash;53. https://doi.org/10.1016/j.aquaculture.2019.02.024\u003c/li\u003e\n\u003cli\u003ePappas F, Johnsson M, Debes PV, Palaiokostas C. Genetic parameters and sex-specific architecture of observed and latent fertility phenotypes in a closed breeding nucleus of an Arctic salmonid [Internet]. bioRxiv; 2025 [cited 2025 Oct 20]. p. 2025.10.16.682826. https://doi.org/10.1101/2025.10.16.682826\u003c/li\u003e\n\u003cli\u003eMontesinos-L\u0026oacute;pez OA, Montesinos-L\u0026oacute;pez A, P\u0026eacute;rez-Rodr\u0026iacute;guez P, Barr\u0026oacute;n-L\u0026oacute;pez JA, Martini JWR, Fajardo-Flores SB, et al. A review of deep learning applications for genomic selection. BMC Genomics. 2021;22:19. https://doi.org/10.1186/s12864-020-07319-x\u003c/li\u003e\n\u003cli\u003ePalaiokostas C. Breeding evaluations in aquaculture using neural networks. Aquaculture Reports. 2024;39:102468. https://doi.org/10.1016/j.aqrep.2024.102468\u003c/li\u003e\n\u003cli\u003eGianola D, Okut H, Weigel KA, Rosa GJ. Predicting complex quantitative traits with Bayesian neural networks: a case study with Jersey cows and wheat. BMC Genet. 2011;12:87. https://doi.org/10.1186/1471-2156-12-87\u003c/li\u003e\n\u003cli\u003eP\u0026eacute;rez-Enciso M, Zingaretti LM. A Guide on Deep Learning for Complex Trait Genomic Prediction. Genes. Multidisciplinary Digital Publishing Institute; 2019;10:553. https://doi.org/10.3390/genes10070553\u003c/li\u003e\n\u003cli\u003eEhret A, Hochstuhl D, Gianola D, Thaller G. Application of neural networks with back-propagation to genome-enabled prediction of complex traits in Holstein-Friesian and German Fleckvieh cattle. Genetics Selection Evolution. 2015;47:22. https://doi.org/10.1186/s12711-015-0097-5\u003c/li\u003e\n\u003cli\u003eInamori M, Kimura T, Mori M, Tarumoto Y, Hattori T, Hayano M, et al. Machine learning for genomic and pedigree prediction in sugarcane. The Plant Genome. 2024;17:e20486. https://doi.org/10.1002/tpg2.20486\u003c/li\u003e\n\u003cli\u003eDebes PV, Lobligeois SBC, Svavarsson E. Genetic and Environmental (Co)variation of Egg Size, Fecundity, and Growth Traits in Arctic Charr. Evol Appl. 2025;18:e70135. https://doi.org/10.1111/eva.70135\u003c/li\u003e\n\u003cli\u003eR Core Team. R: A Language and Environment for Statistical Computing [Internet]. Vienna, Austria: R Foundation for Statistical Computing; 2023. https://www.R-project.org/\u003c/li\u003e\n\u003cli\u003eAmadeu RR, Garcia AAF, Munoz PR, Ferr\u0026atilde;o LFV. AGHmatrix: genetic relationship matrices in R. Bioinformatics. 2023;39:btad445. https://doi.org/10.1093/bioinformatics/btad445\u003c/li\u003e\n\u003cli\u003eStan Development Team. Stan User\u0026rsquo;s Guide [Internet]. 2023 [cited 2025 Sep 16]. https://mc-stan.org/docs/stan-users-guide/index.html. Accessed 16 Sep 2025\u003c/li\u003e\n\u003cli\u003ede Villemereuil P, Schielzeth H, Nakagawa S, Morrissey M. General Methods for Evolutionary Quantitative Genetic Inference from Generalized Mixed Models. Genetics. 2016;204:1281\u0026ndash;94. https://doi.org/10.1534/genetics.115.186536\u003c/li\u003e\n\u003cli\u003eCovington P, Adams J, Sargin E. Deep Neural Networks for YouTube Recommendations. Proceedings of the 10th ACM Conference on Recommender Systems [Internet]. Boston Massachusetts USA: ACM; 2016 [cited 2025 Oct 15]. p. 191\u0026ndash;8. https://doi.org/10.1145/2959100.2959190\u003c/li\u003e\n\u003cli\u003eKingma DP, Ba J. Adam: A Method for Stochastic Optimization [Internet]. arXiv; 2017 [cited 2025 Oct 20]. https://doi.org/10.48550/arXiv.1412.6980\u003c/li\u003e\n\u003cli\u003eWickham H. ggplot2: Elegant Graphics for Data Analysis [Internet]. Springer-Verlag New York; 2016. https://ggplot2.tidyverse.org\u003c/li\u003e\n\u003cli\u003eGansner ER, North SC. An open graph visualization system and its applications to software engineering. Software: Practice and Experience. 2000;30:1203\u0026ndash;33. https://doi.org/10.1002/1097-024X(200009)30:11\u0026lt;1203::AID-SPE338\u0026gt;3.0.CO;2-N\u003c/li\u003e\n\u003cli\u003ePaszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library [Internet]. arXiv; 2019 [cited 2025 Sep 30]. https://doi.org/10.48550/arXiv.1912.01703\u003c/li\u003e\n\u003cli\u003ePedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine Learning in Python. MACHINE LEARNING IN PYTHON. \u003c/li\u003e\n\u003cli\u003eHunter JD. Matplotlib: A 2D Graphics Environment. Computing in Science \u0026amp; Engineering. 2007;9:90\u0026ndash;5. https://doi.org/10.1109/MCSE.2007.55\u003c/li\u003e\n\u003cli\u003ePappas F, Johnsson M, Andersson G, Debes PV, Palaiokostas C. Sperm DNA methylation landscape and its links to male fertility in a non-model teleost using EM-seq. Heredity. Nature Publishing Group; 2025;134:293\u0026ndash;305. https://doi.org/10.1038/s41437-025-00756-y\u003c/li\u003e\n\u003cli\u003eKassambara A. \u0026ldquo;ggplot2\u0026rdquo; Based Publication Ready Plots [Internet]. 2025 [cited 2025 Sep 30]. https://cran.r-project.org/web/packages/ggpubr/refman/ggpubr.html. Accessed 30 Sep 2025\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":true,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"","lastPublishedDoi":"10.21203/rs.3.rs-8078842/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-8078842/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eFertility is an important but often cryptic intrinsic characteristic of domesticated animals. Predicting reproductive potential is of great importance for the industry but assessment through indirect proxies is laborious and often impractical. Among other biological factors, genetic effects are expected to play a crucial role in shaping male and female fertility. In cases where heritable components are strong, polygenic merit could be a valuable tool for decision-making in breeding schemes. Here we estimate sex-specific variance components affecting fertilization success by analysing outcomes of over 3,000 controlled mating events in an Arctic charr breeding nucleus from Iceland. Furthermore, a machine learning framework using relationships-to-founders vectors as input and a two-tower neural network architecture is proposed and tested for prediction of fertilization success. Both approaches seem to capture meaningful biological signals and offer alternative tools for ranking, selecting or even allocating matings between breeding candidates.\u003c/p\u003e","manuscriptTitle":"A Non-Linear Game for Two: Genetic Parameters and Prediction of Fertilization Success using Bayesian and Machine Learning Frameworks","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-11-20 08:18:05","doi":"10.21203/rs.3.rs-8078842/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"625d262d-de53-40aa-8073-f81591965fbe","owner":[],"postedDate":"November 20th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2026-05-21T13:45:35+00:00","versionOfRecord":[],"versionCreatedAt":"2025-11-20 08:18:05","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-8078842","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-8078842","identity":"rs-8078842","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.