Allelic variation at a single locus distinguishes spring and winter faba beans | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Allelic variation at a single locus distinguishes spring and winter faba beans Murukarthick Jayakodi, Hailin Zhang, Alex Windhorst, Elesandro Bornhofen, and 17 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-5356723/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 10 Mar, 2026 Read the published version in Nature Genetics → Version 1 posted You are reading this latest preprint version Abstract Winter faba beans exhibit significant yield advantages over spring cultivars and hold promise for enhancing local protein production and agricultural sustainability. However, the threat of winter kill limits wider cultivation and the genetics of faba bean winter hardiness remain unresolved. Here, we develop a highly improved faba bean reference genome and combine this with resequencing and phenotyping of winter and spring faba bean accessions to identify genetic determinants of winter hardiness. Genome-wide association analysis of frost tolerance traits identifies a major winter hardiness locus where the most strongly associated variant explains the vast majority of phenotypic variation and accurately differentiates between winter and spring types. Furthermore, we identify additional signals within the winter faba bean gene pool that pave the way for further improvement of winter hardiness. Our work provides improved genomic resources and resolves the genetics of a key agronomic trait in a global protein crop to facilitate future breeding efforts. Biological sciences/Genetics/Genomics/Genome assembly algorithms Biological sciences/Genetics/Genetic association study/Genome-wide association studies Biological sciences/Genetics/Gene expression Figures Figure 1 Figure 2 Figure 3 Figure 4 Main Faba bean ( Vicia faba , 2n = 2x = 12) is renowned for its high protein content 1 (~ 29%), superior nitrogen-fixing ability 2 and sustainability credentials 3 . Faba bean production has grown steadily during the past ten years in Europe and other parts of the world (FAO 2022) and both spring and winter types are grown. Notably, winter faba beans have significant yield advantages (up to 47%) over spring cultivars 4 and are mostly grown in North-West Europe and the UK. Against the backdrop of increasing temperature and drought, winter faba bean is recognized as a promising crop to simultaneously boost local protein production and agricultural sustainability via crop rotation and as a cover crop. However, winter faba bean cultivation remains limited due to insufficient winter hardiness, including winter survival and tolerance against frost during winter and early spring, so-called “late frost”, of the available winter faba bean cultivars. Despite attempts to identify quantitative trait loci (QTL) controlling winter hardiness 5 – 10 , the genetic basis for winter hardiness remains unclear. Faba bean seeds contain vicine and convicine (VC), pyrimidine glucosides that cause hemolytic anemia (also known as favism) in individuals deficient in glucose-6-phosphate dehydrogenase. This is a major restricting factor in faba bean production and consumption 11 , targeted primarily in faba bean breeding. Substantial breeding and research efforts have cloned the major VC1 gene 12 , which controls the rate of VC accumulation in seed, and several low VC faba bean spring-type cultivars. Nevertheless, the development of low VC winter cultivars is yet to be realized in faba bean 13 . The release of the first faba bean reference genome has facilitated genetic studies 14 and the genomic characterization of faba bean germplasm 15 . However, the genomic loci that have been under selection due to breeding remain largely unknown. Many of the agronomic traits are still selected phenotypically by breeders. Therefore, charting genomic regions shaped by breeding efforts, termed “breeding signatures”, could potentially discover genes underlying important agronomic and nutritional quality traits and accelerate the selection process by trait-specific molecular markers. Compared to the spring gene pool, winter types have received less attention by breeders, which is partially due to a lack of diversity in the winter gene pool. Therefore, identifying genomic regions selected for in the elite spring gene pool during intensive breeding may provide targets also for winter faba bean improvement, especially for quality traits. To improve faba bean winter hardiness and breeding in general, we present an improved genome assembly and gene annotation of the reference cultivar (cv.) Hedin/2. In addition, resequencing of diverse elite breeding lines and winter genotypes uncovers the genes related to winter hardiness and genomic regions that were selected during breeding. The findings from our study and the greatly improved new genome sequence, provide genomic insights into faba bean cold acclimation and genetic improvement. Results Improved faba bean reference sequence and gene annotation We generated a Bionano optical map for the reference faba bean cv. ‘Hedin/2’ and obtained a contig N50 of 41 Mb ( Supplementary table 1 ) . Hybrid scaffolding, which combined the Bionano map and contigs from the first version of the ‘Hedin/2 v.1’ reference genome generated from PacBio HiFi reads, resulted in 317 scaffolds with a greatly improved N50 value of 100 Mb ( Table 1 ). We then used chromosome conformation capture sequencing (Hi-C) data to order and orient the hybrid scaffolds into six large chromosomal pseudomolecules with a total size of 11.7 Gb (hereafter ‘Hedin/2 v.2’) ( Extended Data Fig. 1 ). Over 97% of the scaffolds were assigned to precise chromosomal locations ( Table 1, Supplementary table 2 ), compared to 94% in Hedin/2 v.1. There were fewer sequence gaps in the second version (335) than in the first version (5,195) ( Supplementary table 2 ) . The size of unanchored sequences in ‘Hedin/2 v.2’ was 295 Mb, compared to 648 Mb in the first version. Further assembly evaluation by Merqury 16 revealed comparable and yet improved accuracy relating to structural and base-level accuracy as well as higher gene space completeness in the second version ( Table 1 ). While overall chromosomal collinearity between both versions was high, many inverted sequence orientations were corrected in Hedin/2 v.2’ ( Extended Data Fig. 2 ). In addition, the search for telomeric sequence motif (“TTTAGGG”) in ‘Hedin/2 v.2’ revealed the presence of telomeres at most chromosome ends ( Extended Data Fig. 3 ) . An improved de novo gene annotation was generated by integrating newly generated PacBio Iso-Seq datasets from 11 tissues ( Supplementary table 3 ), previous RNA-Seq datasets 11 and related protein evidence ( Supplementary table 4 ) resulting in a total of 35,107 protein-coding genes, including 963 genes not identified in the Hedin/2 v.1 sequence. In total, 55,283 transcripts were identified from the coding genes. This is higher than for Hedin/2 v.1, which had 37,065 transcripts, indicating notable improvement in the annotation of alternative splicing ( Supplementary table 5 ). Further, manual curation resolved the fragmented and merged gene models using full-length lso-Seq datasets ( Extended Data Fig. 4 ). The long intron genes were well annotated in ‘Hedin/2 v.2’ (max. intron size of 245 kb) compared to version 1 with a maximum intron size of 145 kb. Seed storage protein (SSP) genes are crucial in grain legumes and are often tandemly clustered making them difficult to annotate with short RNA-Seq reads. Comparative analysis of SSP genes between the two annotation versions ( Supplementary table 6 ) revealed that a cluster of five genes belonging to the subgroup “legumin B4” was newly annotated and anchored in ‘Hedin/2 v.2’ chromosomes ( Extended Data Fig. 5a ), indicating the advantages of including the optical mapping and Iso-Seq data both in assembly and annotation, respectively. Overall, our improved reference genome sequence and gene annotation offer an accurate physical guide for the orientation of contigs in future faba bean genome assemblies, especially in pangenome studies, and may facilitate the precise characterization of agronomically and economically important genes. Repeat composition and centromere landscape Using a combination of structure- and similarity-based approaches, 93% of the genome was annotated as repetitive sequences (Fig. 1 a, Supplementary table 7 ), which is 5% higher than the previous assembly. Class I mobile elements were by far the dominant group of repeats, making up 83% of the genome. This was mainly due to the amplification of the Ogre lineage of Ty/3 gypsy LTR-retroelements which alone constituted about 71% of the faba bean genome. While the most abundant Ty1/copia lineage was SIRE with 5.9% of the genome the other major repeat type was satellite DNA (satDNA), making up 8.5% of the genome ( Supplementary table 7 ). In contrast to the satellite DNA that forms large arrays constituting heterochromatic bands on metaphase chromosomes, mobile elements were found evenly dispersed along the whole chromosomes, except for the regions occupied by satDNA, consistent with version 1 (Fig. 1 a). In comparison to version 1, there is a higher proportion of Ogres (71% vs. 44.4% of the genome) in ‘Hedin/2 v.2’. While the global outlook on repeat landscape is similar between ‘Hedin/2 v.2’ and the previous version, we attribute some discrepancies in the proportion of repeat compositions between the versions to improved assembly quality and/or the result of different annotation tools. We mapped the positions of functional centromeres by Chromatin Immunoprecipitation Sequencing (ChIP-seq) of CENH3, a centromere-specific variant of histone H3 (Fig. 1 a) and determined the sizes of centromere, defined as the lengths of CENH3-enriched regions. Sizes ranged from 6.18 Mb (cen1) to 20.6 Mb (cen6) ( Supplementary table 8) . This makes them the largest monocentric centromeres described for plants to date. Centromere size was not proportional to chromosome length, with the largest chromosome (chromosome 1) having the smallest centromere. In most centromeres, the coverage with CENH3 was contiguous, with relatively even CENH3 ChIP-seq enrichment profiles (e.g. cen1). In cen5, the CENH3-associated domain was disrupted by a region of about 230 kb that contained satellite FabTR-53 and lacked CENH3. However, the main exception was the largest cen6, where the enrichment profile consisted of three distinct peaks separated by regions of lower CENH3 enrichment ( Supplementary data Fig. 1 –6). This organization and the association of CENH3 peaks with satDNA arrays resemble the structure of meta-polycentric chromosomes occurring in the related genera Pisum and Lathyrus 17 , 18 . Centromeres were enriched in satellite DNA (23% vs. 8.5% in the whole genome). However, the distribution of centromeric satellites was not even among the chromosomes. While cen1 was almost entirely composed of a mosaic of four satellite repeats, satDNA comprised only a minor part of the remaining centromeres, with cen3 lacking satDNA almost entirely. Arrays of 14 different satellite repeats were found to be associated with centromeric chromatin, most of them specific to a single chromosome ( Supplementary table 9 ). Genome-wide open chromatin regions Using assay for transposase-accessible chromatin using sequencing (ATAC-seq), we identified 121,443 and 93,501 accessible regions representing 39.6 Mb and 32.4 Mb in the adult leaf and seedling tissues, respectively ( Supplementary table 10 ). Of these accessible regions, 56,374 were shared between the two tissues. Accessible chromatin was made up of a much higher fraction of genes and gene-proximal sequences (~ 5%,0-3kb away) than distal (< 0.4%, 3-30kb away) and intergenic ( 30kb from gene) regions, though the number of accessible regions in intergenic regions was the highest of the four categories overall. Accessible regions that overlap genes tended to be larger in other areas of the genome ( Supplementary table 10 ). The distribution of the distances between accessible regions and genes showed a trimodal distribution ( Extended Data Fig. 5b ). While the largest peak corresponded to the genome-wide background, it was unclear whether the secondary peaks represented accessible sites nearby unannotated genes or a discrete class of accessible regions that tend to occur further away from genes due to biological phenomena such as transposable element insertion or regulatory mechanisms. As noted previously in other species 19 , chromatin accessibility is elevated near the annotated transcriptional start and end sites (Fig. 1 b). Genetic diversity and population structure of spring and winter faba bean Using over 10-fold resequencing of 406 accessions including 209 elite spring faba bean breeding lines and 197 winter inbred lines ( Supplementary table 11 ), we discovered 98,797,214 single nucleotide polymorphisms (SNPs) and 1,751,392 indels ( Supplementary table 12 ). These variants were uniformly distributed along the chromosomes except over satellite regions ( Extended Data Fig. 6 ). Principal component analysis (PCA) and phylogenetic analysis clearly showed distinct clusters of winter and spring types (Fig. 2 a, 2 b). This relationship was further strongly supported by ADMIXTURE 20 analysis providing two distinct population clusters at K = 2, spring and winter (Fig. 2 c). Calculation of genetic diversity using low-admixed individuals showed higher nucleotide diversity for winter ( π = 10.01× 10 − 3 ) than elite spring genotypes ( π = 8.79 × 10 − 3 ). Additionally, population genetic differentiation ( F ST ) between spring and winter populations was relatively low ( F ST = 0.064), indicating that only a small number of loci might be responsible for the winter growth habit. Analysis of linkage disequilibrium (LD) decay indicated an LD decay distance of 682 kb for elite spring and 462 kb for winter genotypes (Fig. 2 d). The relatively low diversity and slower rate of LD decay in elite spring lines suggest a reduction in genetic diversity due to selective breeding. Selective sweeps for improvement To identify the putative genomic regions selected for during spring faba bean breeding, we identified selective sweeps between the elite spring and winter population using the cross-population composite likelihood ratio test (XP-CLR) 21 . A total of 47 distinct sweep regions were identified representing 11.03 Mb (0.9%) of the Hedin/2 v.2’ genome sequence. In total, 228 genes overlapped with these candidate regions ( Supplementary table 13 ). Gene ontology (GO) and pathway enrichment analysis of these genes indicated they are involved in biological processes particularly related to nodulation and cold acclimation and pathways related to oxidative phosphorylation, lipid metabolisms ( Supplementary table 14 ), which are known to play a role in response to cold stress 22 . Notably, a prominent sweep was seen on chromosome 1, likely representing the VC1 locus 12 ( Extended Data Fig. 7a ). To verify that region, we grouped the elite spring lines into low VC (n = 90) and normal VC (n = 119) genotypes based on passport information and performed an additional pairwise comparison using XP-CLR analysis. Strikingly, the result confirmed that the sweep on chromosome 1 was associated with VC1 locus 12 ( Extended Data Fig. 7b ), illustrating breeding for low VC content may have swept this locus in spring breeding lines. In addition, this sweep locus concurred with a previously fine mapped VC region 13 ( Extended Data Fig. 7c ). Within this haplotype block, we identified four copies of the VC1 gene in ‘Hedin/2 v.2’ whereas there are five copies in the previously published ‘Tiffany’ genome 14 ( Extended Data Fig. 7d) . A haplotype analysis with SNPs showed more missing data for certain haplotypes ( Extended Data Fig. 8a ), suggesting the occurrence of VC1 gene copy number variation in faba bean. A set of 31 indel markers significantly differentiated the low VC haplotypes from the wild types ( Extended Data Fig. 8b) , providing a new set of markers for selection in breeding ( Supplementary table 15 ). Genome-wide association scan for winter-hardiness We investigated the genetic basis of winter hardiness in 208 accessions comprising 180 spring types and 28 winter lines, also referred to as the “ProFaba panel”, using genome-wide association studies (GWAS) analysis of the traits winter survival rate and late frost survival rate ( Supplementary table 16, 17 ). We identified a total of 1,131,399 SNPs using a combination of genotyping-by-sequencing (GBS) 23 and single primer enrichment technology (SPET). Two loci on chromosomes 1 and 5 were significantly associated with both traits (Fig. 3 a ). Further, these two significant loci coincided with the peak regions of high genetic differentiation ( F ST ) between spring and winter types. Intriguingly, the chromosome 1 GWAS peak overlapped with the previously identified major QTL for frost tolerance 7 (Fig. 3 b), indicating a crucial and consistent locus related to winter hardiness. In the following, we refer to this locus as FROST RESISTANCE 1 (FR-1) . A significantly associated T/C SNP, located downstream of the Vfaba.Hedin2.R2.1g002127 gene, enabled the classification of accessions into two major haplotypes. Accessions harboring the CC haplotype demonstrated significantly higher winter hardiness scores compared to those with the TT haplotype ( Fig. 3 d, 3 e ). Deeper haplotype analysis of the chromosome 1 locus showed a clear separation of winter and spring genotypes (Fig. 3 f). We further identified a set of 187 SNPs that could separate the spring and winter types ( Extended Data Fig. 9, Supplementary table 18 ) and thus, may be used as diagnostic markers for winter faba bean breeding. This sweep region contained a total of 30 genes including a cluster of 14 C-repeat binding factor (CBF)/dehydration-responsive element binding factor1 (DREB1 ) whose role in cold acclimation has been demonstrated in diverse plant species 24 – 26 . To identify potential candidate genes, we conducted cold treatment transcriptome assays sampling at 18°C, low temperature of 4°C and 0°C and freezing temperatures of -7°C using a winter hardy genotype (Wonjam1-ho from Korea) and a frost-susceptible spring genotype (PI 271634 from India) ( Supplementary table 19, 20 ). The expression data showed that the three CBF/DREB1 genes ( Vfaba.Hedin2.R2.1g002127 , Vfaba.Hedin2.R2.1g002152 and Vfaba.Hedin2.R2.1g002153 ) at the FR-1 locus were induced when the temperature was low and nonfreezing (4°C), a phenomenon known as cold acclimation 24 , in winter genotype compared to the spring accession (Fig. 3 c). This suggests that these CBF/DREB1 genes may trigger the cold responsive genes, also known as the CBF regulon 27 , that may increase the tolerance to freezing in winter hardy faba bean. Similarly, among 7 genes in chromosome 5 locus, a chalcone synthase ( CHS ) gene (Vfaba.Hedin2.R2.5g030100 ) showed increased expression when temperature was below 4°C. CHS is a key enzyme in the flavonoid pathway and its role in cold acclimation has been reported in Arabidopsis 28 and other crops 29 – 31 . These results indicate that the winter hardiness in faba bean is a complex process involving CBF regulon and cold induced flavonoids, like anthocyanins, as additional factors. Additionally, using our 183 winter accessions, we performed GWAS for visually assessed traits such as loss of leaf (and stem) color and turgor (Fig. 4 a) obtained from a recent study 10 . Two independent GWAS models consistently pinpointed two loci: one on chromosome 5 (PVE: 19.83%) and the other on chromosome 3 (PVE: 7.92%). Accessions with the GG haplotype exhibited significantly higher symptom scores compared to carriers of the CC haplotype on chromosome 5 (Fig. 4 d, 4 e, 4 f). Based on local LD blocks, 13 genes were identified within a 5.57 Mb interval on chromosome 5, while 4 genes were in a 125.29 kb region on chromosome 3 (Fig. 4 b). Expression analysis of these genes (Fig. 4 c) showed a strong expression pattern for Vfaba.Hedin2.R2.5g032310 gene on chromosome 5, which is FLK (flowering locus K homology domain) that was reported to repress Flowering Locus C (FLC) in Arabidopsis under cold temperature 32 . We observed a significant downregulation of the gene Vfaba.Hedin2.R2.3g019128 (U11/U12 small nuclear ribonucleoprotein 48 kDa protein) on chromosome 3. However, its functional relevance to environmental stresses is yet to be revealed. To further validate the two identified loci, we conducted a prediction study using two genomic best linear unbiased prediction (gBLUP) models. We aimed to assess the additional predictive power gained by incorporating the main effects of the two loci as fixed regressions, along with a random polygenic effect (model M2), compared to a genomic model that only includes a random polygenic effect (model M1) (Fig. 4 g). Using independent datasets for cross-validation, we found that model M2 consistently outperformed model M1 by around 30% across all traits. This indicates that the loci identified through GWAS analyses are tagging important genomic regions related to winter hardiness, that have not been discovered before 10 . Moreover, the strength of the correlations suggests that further improvements in freezing tolerance in winter-type faba bean can be achieved through molecular breeding. Discussion Faba bean is a high-yielding protein crop well-suited for improving soil fertility and increasing the production of plant protein that may reduce the use of meat proteins. A significant number of assembly gaps collapsed tandemly duplicated regions and repetitive elements 14 required improvements to contiguity and correctness for the faba bean reference sequence. In this study, we generated an optical map as a complementary dataset to overcome existing limitations. We increased the sequence contiguity to 33-fold, in turn, the number of sequences from thousands to a few hundred for pseudomolecule construction as compared to Hedin/2 v.1 14 . The new ‘Hedin/2 v.2’ reference sequence significantly improved the representation of complex repeats, protein-coding genes regulatory and centromeres landscape, delivering an upgraded foundational genomic tool for fine-scale genetic studies and precision breeding. Rising temperatures pose a severe threat to spring-sown faba bean whose growth and yield can be adversely affected by temperatures above 30°C 33 . While efforts are being made to identify the genetic basis of heat-tolerance 34 , faba bean winter types offer distinct advantages over spring types, including yield superiority, efficiency in using soil moisture and escape from drought and some pest attacks 4 . While strict prolonged cold exposure (known as vernalization) may not be required for winter types of faba bean, insufficient winter hardiness creates a risk of winter kill. Our comprehensive analysis using multiple datasets detected potential genomic loci for improving winter hardiness in faba bean. Remarkably, the chromosome 1 locus coincided with frost tolerance QTL identified by independent previous studies 7 , 35 , 36 . Further, this locus was reported to be syntenic with the freezing tolerance QTL identified on chromosome 6 of M. truncatula 37 and P. sativum 37 , respectively ( Supplementary data Fig. 7 ). Syntenic conservation of this locus among important grain legumes indicates the prospect of translational approaches to bestow winter hardiness in related grain legumes such as pea via genetic manipulations. Our new genomic resources and findings will enable further studies on the genetic basis of cold acclimation in related temperate grain legumes and breeding of winter faba bean cultivars with improved frost tolerance. Methods Optical genome map and hybrid scaffolding A total of 1.5 million G1 nuclei of faba bean cv. Hedin/2 (highly homozygous) was purified using flow cytometry. Sorted nuclei were embedded in agarose miniplugs and treated by proteinase K as described in Šimková et al. (2003) 38 . 490 ng of high molecular weight (HMW) DNA released from the miniplugs were directly labeled at DLE-1 recognition sites (CTTAAG motive) and stained following a standard Bionano Prep Direct Label and Stain (DLS) protocol (Bionano Genomics, San Diego, USA). Labeled and stained HMW DNA was analysed on Saphyr platform of Bionano Genomics using a single Saphyr chip G1.2. In total, 1,842.6 Gb of single molecule data > 150 kb, corresponding to 142x coverage of the V. faba genome, was collected and used to generate de novo optical genome map (OGM, Supplementary table 1 ) by Bionano Solve software (version Solve3.6.1_11162020) with “optArguments_nonhaplotype_noES_noCut_DEL1_saphyr.xml” used. The contigs sequence from version 1 and the OGM assembly were used as input files for automated hybrid scaffolding (HS) pipeline integrated with the Bionano Solve v3.6.1_11162020). The HS was run with default parameters and “Resolve conflict” option for conflict resolution. Pseudomolecule construction and validation These scaffolds were then used to construct the pseudomolecule with the TRITEX pipeline 39 . Finally, the order and orientation of scaffolds in each chromosome were manually inspected and corrected with evidence from Omni-C published in our previous assembly 14 . Besides, we used the tool Merqury v1.3 16 and LTR_retriever v2.9.0 40,41 to evaluate the genome completeness and consensus accuracy. The gene space completeness was evaluated using BUSCO v3.0.2b 42 analysis with Embryophyta database 9. Repeat annotation Tandem repeats and satellites were annotated using TideCluster v1.0 43,44 ( https://github.com/kavonrtep/TideCluster ), a wrapper for TideHunter 43 . Satellite repeats with a monomer size ranging from 40 to 3 kb and a minimum array length of 5 kb were annotated using the default TideCluster settings. Satellites with a monomer size between 10 to 39 bp and a minimum array length of 5 kb were identified using TideCluster with parameters -T "-p 10 -P 39 -c 5 -e 0.25" -m 5000. LTR retrotransposons (LTR-RT) were annotated using DANTE v0.1.8 45 ( https://github.com/kavonrtep/dante ) and the DANTE_LTR v0.2.3.2 45 pipeline ( https://github.com/kavonrtep/dante_ltr ) on the RepeatExplorer Galaxy server 46 . The sequences of the identified LTR-RT elements were used to create a custom library of LTR-RT elements using the "dante_ltr_to_library" script from the DANTE_LTR repository. A custom library of Class II transposable elements was obtained using RepeatExplorer clustering protocol 1 47 on unassembled Illumina paired-end reads. Contigs corresponding to Class II retrotransposons with a minimum read depth of 5 reads and a minimum length of 100 bp were obtained using tools on the RepeatExplorer Galaxy server. Consensus sequences of rRNA gene arrays including intergenic spacer sequences were fully reconstructed from the RepeatExplorer contigs. A custom library of LINE elements was created by extracting regions with LINE protein coding domains identified by DANTE, along with the upstream and downstream 4kb flanking regions. The extracted genomic sequences were split into 100 nt fragments and analyzed by RepeatExplorer clustering. Contigs corresponding to LINE elements with a read depth of at least 3 reads and a minimum length of 150 nt were converted into a custom library. All custom libraries were concatenated and used as a library for RepeatMasker v4.1.5 48 search. The RepeatMasker search was performed on the RepeatExplorer Galaxy server with options "-xsmall -no_is -e ncbi". All regions annotated as mobile elements with RepeatMasker based on custom library search which overlapped with satellite repeats annotated by TideCluster were removed from the annotation using bedtools 49 with command “bedtools subtract”. The resulting GFF3 was then merged with the DANTE annotation using a custom R script ( https://github.com/kavonrtep/granges_tools/merge_repeat_annotations.R ). The classification of mobile elements in the annotation files corresponds to the classification system used in the REXdb database 50 . For the final repeat-masking process, all of the above repeat annotation GFF3 files were consolidated. We merged the annotated regions into a single BED file using the bedtools v2.31.1 merge tool 49 . Analysis of centromeres Centromere positions in the assembly were revealed using ChIP-seq with CENH3 antibody CenH3-2_VF (raised against the peptide “CQT PRH ARE TQE KKK RRN KPG” 51 ) as described previously 11 . Duplicate experiments, including independent chromatin preparations, were performed and the resulting immunoprecipitated fragments along with the corresponding control samples (Input; digested chromatin not subjected to immunoprecipitation) were sequenced on the Illumina platform (Admera Health, NJ, USA) in paired-end, 150 bp mode. The resulting reads were quality-filtered and trimmed using Trimmomatic v0.39 52 (minimum allowed length = 100 nt), yielding 151–201 million reads per sample, which were mapped onto the assembly using Bowtie 2 v2.4.2 53 , with options -p 64 -U. Subsequent analysis was performed on full output from Bowtie2 program and on output where all multi-mapped reads were filtered out. Filtering of multi-mapped reads was performed using Sambamba v0.8.1 54 with options “-F [XS] = = null and not unmapped and not duplicate”. Regions with statistically significant ChIP/Input enrichment ratio were identified by comparing ChIP and Input mapped reads using the epic2 program 55 , with the parameter “--bin-size 200”. Alternative identification of enrichment was performed using MACS v3.0.0a6 56 , with options “--broad --broad-cutoff 0.1”. The ChIP/Input ratio was calculated for plotting purposes using bamCompare v3.5.1 from the deepTools package 57 . The program was run with the parameter “–binSize 200” to calculate the log2 ratio for the 200 nt window size. The resulting data were plotted using the rtracklayer package of R 58 . Iso-seq transcriptome assembly The raw PacBio Iso-Seq data were processed using the default parameters within the IsoSeq3 ( https://github.com/PacificBiosciences/pbbioconda ) analysis tool in SMRT Link. The processing steps encompassed various stages such as Circular Consensus Sequences (CCS) generation, trimming, refining, clustering and polishing. Initially, reads were categorized as either full-length (FL) or non-full length (NFL) based on the presence of 5’ primer and 3’ primer or 3’ poly-A tail. Full-length non-chimeric (FLNC) reads were identified as those containing both 5’ and 3’ cDNA primers along with a poly-A tail. Reads lacking any of these defining tags were classified as non-full length (NFL) reads. Further processing involved trimming FLNC reads to eliminate barcoded and unbarcoded cDNA primers, along with unwanted primers, and aligning the reads in the 5 -> 3 orientations. Preceding the generation of clustered consensus sequences, FLNC reads underwent refinement steps to remove poly-A tails and artificial concatemers. Aligned reads were used to assess completeness of transcriptomes. The Iterative Clustering and Error Correction (ICE) algorithm were employed to produce clustered consensus sequences from the refined FLNC reads. Subsequently, a polishing arrow model, which included Quiver (QV) tracks, was utilized to refine multiple FLNC reads derived from identical isoforms, aiming to obtain non-redundant isoform sequences. The final FLNC transcript sequences were then classified into high-quality (HQ) transcripts, exhibiting ≥ 99% post-correction accuracy, and low-quality (LQ) transcripts, characterized by an accuracy of < 99%. Clustered transcripts were aligned to the reference genome Hedin2 v2 using minimap2 v2.24 59 ( Supplementary Table 3 ). The transcripts were combined and refined using cDNA_Cupcake ( https://github.com/Magdoll/cDNA_Cupcake ) after initial filtration, which excluded transcripts unsupported by a minimum of two full-length reads per sample. The combined transcripts were used to annotate the genome. Gene annotation The gene annotation was conducted using BRAKER v2.1.6 60 with the etpmode and long reads’ protocol. RNA-seq data from 26 samples were downloaded 14 and aligned to the ‘Hedin/2 v.2’ sequence using HISAT2 v2.2.1 with standard parameters (--phred64 --sensitive --no-discordant --no-mixed -I 1 -X 1000) 61 . The Iso-Seq full-length transcripts as described before were used as input for the long reads’ protocol. The BREAKER Long reads’ annotation pipeline was performed with four steps: ab initio prediction, homology-based prediction, RNA-seq data-based prediction and Iso-seq data-based prediction. The final gene models were evaluated using both RNA-seq transcriptome data and Iso-seq transcriptome data, followed by manual correction. The gene function annotation was achieved according to the predicted protein-coding genes blasted again five databases of eggNOG 5.0 62 , GO 63 , KEGG 64 , Non-Redundant database (NR: https://www.ncbi.nlm.nih.gov/refseq/about/nonredundantproteins/ ) , and SwissProt 65 . The GO and KEGG pathway enrichment analyses were performed for genes using the R package Clusterprofiler 66 and GOplot. We defined the adjusted p-value of < 0.1 and q-value < 0.3 (Benjamini–Hochberg method) as the cutoff criteria. ATAC-seq library preparation and analysis ATAC-seq libraries were prepared in duplicate. Flash frozen tissue (0.5g) was ground in liquid nitrogen, suspended in nuclei isolation buffer (NIB, 10 mM MES-KOH (pH 5.4), 10 mM NaCl, 10 mM KCl, 0.1 mM spermine, 1mM spermidine, 1mM DTT and 1% Triton X-100 and protease inhibitors (Source), filtered twice through 30 micron filters into ice-cold NIB + 1.8M sucrose, washed twice following centrifugation at 1000g for 15 min and resuspended in NIB. Libraries were prepared from serial dilutions in 50ul reactions using the (Nextera DNA Library Prep Kit, Illumina, San Diego, California, USA) with 2.5 µl TDE1, a 30-minute incubation at 37C and amplified according to Buenrostro et al 67 . Libraries were sequenced using paired 125 based reads on an Illumina HiSeq 2500 system (Illumina, San Diego, California, USA). ATAC-seq data analysis followed the Encode paired-end ATAC-seq pipeline specifications (Hitz 2023, https://docs.google.com/document/d/1f0Cm4vRyDQDu0bMehHD7P7KOMxTOP-HiNoIvL1VcBt8/edit ) with the following modifications. Reads mapping to chloroplast sequences were also removed. No blacklist was used. Read alignment with Bowtie2 used a maximum insert size of 1000 bp (-X 1000) and up to ten alignments were reported (-k 10). Only the unique alignments (primary alignments with MAPQ > 30) were used for subsequent analysis. Heat maps and metaplots were generated using Deeptools 57 . SNP calling The resequencing reads were mapped to the ‘Hedin/2 v.2’ reference genome using minimap2 v2.24 59 using the default parameters. SAMtools v1.16.1 68 software was used to convert mapping results into the BAM format, and Novosort v3.06.05 ( http://www.novocraft.com ) was used to sort the bam file. Finally, BCFtools (v.1.8) 69 was used to call SNPs and short insertions and deletions (indels) with the parameters (-q 20 -Q 20 --excl-flags 3332). Finally, indels and SNPs with none bi-allelic, > 10% missing calls, > 10% heterozygosity and MAF < 0.001 were removed. SNPs with MAF < 0.05 were further removed for phylogenetic tree structure, GWAS, LD decay, PCA and population structure analyses. SNP density and genetic diversity across each chromosome were counted with a 10 kb sliding window using Perl script ( https://github.com/helen5588/IPK_faba_codes/blob/main/s2_snp_calling/get_snp_stat.pl ). Population genetics analysis We used the whole-genome SNPs to construct the ML phylogenetic tree with 100 bootstrap using Phylip v3.696 70 . The tool iTOL ( http://itol.embl.de ) was used to color the phylogenetic tree. The Admixture v1.3 71 was used to infer the population structure with K values ranging from 2 to 10 and ran the cross-validation error (CV) procedure. To identify the pattern of linkage disequilibrium among different faba bean accessions, the pairwise squared correlation coefficients (r 2 ) were calculated using PopLDdecay v3.30 72 . Rscript v3.5.1 73 was applied to plot the average r 2 values between all pairs of genotypes. Selective sweep analysis To identify differentiated loci between sub-populations, we performed a sliding window (10kb) analysis based on pairwise estimates of genetic differentiation (F ST ) 74 , 75 and genetic diversity (π-ratio) within each population. The π and FST were calculated respectively with Perl script ( https://github.com/helen5588/IPK_faba_codes/blob/main/s5_sweep ) 76 . The window size was 10 kb with a step size of 2 kb. To detect sweep signals of selection in the faba bean collection, we employed the cross-population composite likelihood ratio test (XP-CLR) using XP-CLR v.1.0, as described by Chen et al 21 . The XP-CLR analysis was performed with 50 kb sliding window and 10 kb step between spring vs winter and normal VC type vs low VC type in faba bean population separately. Phenotyping Of the 406 accessions, 210 faba bean samples (207 spring-type and 3 winter-type) originated from breeding lines from Norddeutsche Pflanzenzucht Hans-Georg Lembke KG (NPZ). The remaining 196 (194 winter-type and 2 spring-type) faba bean inbred lines (F > 9) were derived via SSD from the so-called Göttingen Winter Bean Population (GWBP). Independently, 183 winter faba bean inbred lines were evaluated for tolerance to freezing during winter (winter-frost; WF 8 , 10 ) and early spring (late-frost; LF 9 , 10 ) under controlled temperature conditions in n = 10 and n = 5 replicated climate chamber experiments. Visual frost stress symptoms were scored as loss of leaf (and stem) turgor and color on hardened or de-hardened (WF/LF) juvenile plants during three successive frost stress treatments at -16/-13°C, -18°C/-15° C, and − 19/-17°C. Scores from 1–9 (no-full symptoms) were summed to total loss of turgor (LossT), color (LossC), and grand sum (LossTC). Trait repeatability was high (0.77 ≤ h 2 ≤ 0.92) and ANOVA revealed significant genotypic variance among lines. A subset of the ProFaba panel (208 out of 239 accessions) was tested for winter hardiness at experimental station Reinshof, University of Göttingen, Lower Saxony (51.48 °N, 9.93 °E), Germany in season 2021/2022. The ProFaba panel and two check cultivars Augusta (winter-type, NPZ) and Tiffany (spring-type, NPZ) were tested in double rows in an unreplicated, randomized complete block design (RCBD). Within the block, the 180 spring type and 28 winter type faba bean accessions of the panel were each arranged as RCBD to prevent neighboring effects later in season. Checks were replicated 21 times each to estimate repeatability (h² >0.80). Winter survival and late frost survival (number of surviving plants) were assessed after winter in February and following the late frost in April. Winter survival rate and late frost survival rate were calculated in relation to emerged plants. GWAS analysis GEMMA v0.98.5 77 , and Bayesian-information and Linkage-disequilibrium Iteratively Nested Keyway (BLINK) method, BLINK 78 were used for the association analysis. The first three PCA values (eigenvectors), which were derived from whole-genome SNPs, were used as covariates to correct for population stratification. Most highly significant SNPs associated with freezing tolerance were compared with positions of Hedin/2 protein-coding genes. Orthologues of the gene overlapping variant were identified using MCScanX 79 . We employed LDBlockShow v1.40 80 to generate and visualize the LD pattern of haplotype block. Genomic-assisted prediction We evaluated the ability of a genomic-enabled model to predict frost stress tolerance as loss of turgor, loss of color and loss of turgor and color in our 183 winter-type faba bean lines using WF 8 , 10 and LF 9 , 10 . The WF dataset was also used for a GWAS and is referred to here as the discovery set. Predictive ability was assessed by repeated random cross-validation, where 60% of the lines were assigned to the training set and 40% to the testing set. In each of the 1000 iterations, two GBLUP models were fitted to the data of the discovery set, and breeding values for the testing set were correlated with observed data from the LF dataset. The first model (M1) takes the form \(\:\mathsf{y}=1{\mu\:}+\mathsf{Z}\mathsf{u}+\mathsf{e}\) , where 𝘆 is the response vector, µ is the intercept with associated vector 1 of 1s, 𝘂 is the vector of breeding values with associated incidence matrix Z , and 𝗲 is the vector of residual effect. Vectors 𝘂 and 𝗲 are assumed \(\:\mathsf{u}\sim\text{N}(0,\mathsf{\:}\mathsf{G}{{\sigma\:}}_{\text{u}}^{2})\) , and \(\:\mathsf{e}\sim\text{N}(0,\mathsf{\:}\mathbf{I}{{\sigma\:}}_{\text{e}}^{2})\) , respectively, where \(\:\mathsf{G}\) is the genomic relationship matrix 81 , and I is an identity matrix. The second model (M2) incorporates the genotypes of two QTLs, detected by GWAS analysis on the discovery set, as fixed effects. The structure of this model is analogous to M1, except that the term \(\:1{\mu\:}\) is replaced by \(\:\mathsf{X}\varvec{\beta\:}\) , where \(\:\mathsf{X}\) represents the incidence matrix for the fixed effects \(\:\varvec{\beta\:}\) . We performed the cross-validation task using the “mixed.solve” function of the R package rrBLUP v4.6.3 82 . Frost treatments and RNA-Seq Wonjam1-ho was developed from PI 469181, which is a new variety with superior adaptability to the winter environmental conditions in South Korea. This variety was generated by irradiation with 100 Gy gamma-rays using a 60 Co gamma irradiator (150 TBq capacity; ACEL, Ottawa, ON, Canada). The faba bean accessions were obtained from the USDA-Agricultural Research Service (ARS) in Pullman, WA, USA. Briefly, 1000 M 1 seeds were planted in the experimental field at the Korea Atomic Energy Research Institute (KAERI; 35.5699°N 126.9772°E, Jeongeup, Jeollabuk, Republic of Korea) in 2015, and 2000 M 2 seeds were pooled for next generation. Subsequently, a total of 2000 M 2 plants were grown and harvested individually in 2017. M 3:5 seeds were then propagated from single plants, and M 5:7 plants were generated by line selection during 2020–2023. Frost tolerance was tested in the experimental filed of KAERI with faba bean mutant population consisting of 291 mutant lines during 2022–2023. Finally, we selected the 230-5 mutant variety with high frost tolerance and named wonjam1-ho. To investigate the transcriptome response for frost tolerance between Wonjam1-ho (tolerant) and PI 271634 from India (susceptible), both were placed in 15 cm plastic pots with commercial soil (Hungnong, Pyeongtaek, Republic of Korea) and were grown in a growth chamber (Vision Science, Republic of Korea) set at 18 ℃ for 20 days. Next, the temperature in the growth chamber was gradually decreased at a rate of 1°C/h until it reached 4°C, where it was held for 3 days (12-h day/12-h night). The temperature was further lowered by 1°C/h to − 7°C, maintained for 12-h night, and then raised in reverse order back to 18°C for recovery. Leaves were collected after treatments (at 18°C, 4°C, 0°C, and − 7°C), immediately frozen in liquid nitrogen, and used for RNA extraction. The expression data for these samples were used to quantify gene expression using RSEM v 1.3.3 83 . Declarations Data availability The ‘Hedin/2 v.2’ genome is deposited at European Nucleotide Archive-ENA under accession PRJEB77688. The gene annotation files are stored in e!DAL 84 (http://dx.doi.org/10.5447/ipk/2024/10). Resequencing reads are available in ENA under accession PRJEB76803. ChIP-seq data are accessible under SRA archive run accessions ERR9966691-ERR9966694. Iso-Seq data and RNA-seq data are available in ENA under accession PRJEB81208 and PRJEB81208. ATAC-seq raw reads are stored in ENA under accession PRJNA912311. ProFaba GBS reads are stored in ENA under accession PRJEB75354. Code availability Scripts used in this study have been made available at GitHub (https://github.com/helen5588/IPK_faba_codes.git). Acknowledgments We thank J. Bauernfeind, T. Münch and H. Miehe for IT administration; A. Arend, A. Fiebig, M. Maruschewski for help with data submission; K.Trnka for plant sampling. This work was funded by the Leibniz Association to M.J. (REPLACE, J118/2021), the European Union’s Horizon 2020 Programme for Research & Innovation in the ERA-NET Cofund SusCrop project ProFaba (grant no. 771134) to S.U.A. & W.L. part of the Joint Programming Initiative on Agriculture, Food Security, and Climate Change (FACCE-JPI), the Green Development and Demonstration Programme (GUDP) grant 34009-20-1761 for the IMFABA project and the National Research Council Canada’s Sustainable Protein Production (SPP) program. J.M. and P.N. were supported by the Czech Science Foundation grant no. 24-10036S. Computational resources and data storage facilities were in part provided by the ELIXIR-CZ Research Infrastructure Project (LM2023055). Z.T. and H.S. were supported by the project TowArds Next GENeration Crops, reg. no. CZ.02.01.01/00/22_008/0004581 of the ERDF Programme Johannes Amos Comenius. Cooperative Research Program for Agriculture Science and Technology Development (Project No. PJ017134), Rural Development Administration and the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2020R1A2C3007885) Republic of Korea to T.J.Y. S.J.K. is supported by the research program of the Korea Atomic Energy Research Institute (Project No. 523420-24). Author Contributions M.J. and S.U.A. designed the study. M.J., A.H. coordinated sequencing. M.J., H.S., supervised sequence assembly. S.U.A., W.L., supervised phenotyping experiments. T.J.Y., S.J.K., supervised cold-treated transcriptome assay. Optical genome mapping and hybrid scaffolding: Z.T., H.S. Pseudomolecule construction and gene annotation: H.Z., M.M., M.J. Repeat and centromere annotation: P.N., J.M. ATAC sequencing and analysis: D.C., Y.C., D.J.F.K. Phenotyping, seed supply and multiplication: A.W., G.W., O.S. Cold treatment, RNA-Seq analysis: A.W., J.M.K., H.Z. Genomic prediction and association analyses: E.B., H.Z. Data management and submission: H.Z., D.J.F.K., J.M. Writing: M.J., S.U.A, H.S., J.M., D.J.F.K, H.Z., E.B., S.J.K. Coordination: S.U.A., M.J. All authors read and commented on the manuscript. Competing interests OS, GW are employees of NPZ Innovation (NPZi) GmbH. All other authors declare no competing interests. References Khazaei, H. & Vandenberg, A. Seed Mineral Composition and Protein Content of Faba Beans (Vicia faba L.) with Contrasting Tannin Contents. Agronomy 10, 511 (2020). Herridge, D.F., Peoples, M.B. & Boddey, R.M. Global inputs of biological nitrogen fixation in agricultural systems. Plant and Soil 311, 1–18 (2008). Zander, P. et al. Grain legume decline and potential recovery in European agriculture: a review. Agronomy for Sustainable Development 36, 26 (2016). Link, W., Balko, C. & Stoddard, F.L. Winter hardiness in faba bean: Physiology and breeding. Field Crops Research 115, 287–296 (2010). Sallam, A., Arbaoui, M., El-Esawi, M., Abshire, N. & Martsch, R. Identification and verification of QTL associated with frost tolerance using linkage mapping and GWAS in winter faba bean. Frontiers in Plant Science 7, 1098 (2016). Landry, E.J. & Hu, J. Increasing pre-acclimation temperature reduces the freezing tolerance of winter-type faba bean (Vicia faba L.). Journal of Agronomy and Crop Science 205, 46–53 (2019). Carrillo-Perdomo, E. et al. A QTL approach in faba bean highlights the conservation of genetic control of frost tolerance among legume species. Frontiers in Plant Science 13, 970865 (2022). Ali, M.B.M. et al. Association analyses to genetically improve drought and freezing tolerance of faba bean ( Vicia faba L. ). Crop Science 56, 1036–1048 (2016). Windhorst, A., Skovbjerg, C.K., Andersen, S.U. & Link, W. Improving overwintering in times of climate change - a GWAS for late-frost tolerance of winter faba bean. in 73rd Conference of the Vereinigung der Pflanzenzüchter und Saatgutkaufleute Österreichs 29–36 (Raumberg-Gumpenstein, Irdning, Austria, 2023). Windhorst, A. et al. Genome-wide association analyses identify QTL for tolerance to freezing during winter and early spring as a basis for in-depth genetic analysis and implementation in winter faba bean ( Vicia faba L .) breeding. bioRxiv (2024). Crépon, K. et al. Nutritional value of faba bean (Vicia faba L.) seeds for feed and food. Field Crops Research 115, 329–339 (2010). Björnsdotter, E. et al. VC1 catalyses a key step in the biosynthesis of vicine in faba bean. Nature Plants 7, 923–931 (2021). Tacke, R., Ecke, W., Höfer, M., Sass, O. & Link, W. Fine-mapping of the major locus for vicine and convicine in faba bean (Vicia faba) and marker-assisted breeding of a novel, low vicine and convicine winter faba bean population. Plant Breeding 141, 644–657 (2022). Jayakodi, M. et al. The giant diploid faba genome unlocks variation in a global protein crop. Nature 615, 652–659 (2023). Skovbjerg, C.K. et al. Genetic analysis of global faba bean diversity, agronomic traits and selection signatures. Theoretical and Applied Genetics 136, 114 (2023). Rhie, A., Walenz, B.P., Koren, S. & Phillippy, A.M. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biology 21, 245 (2020). Neumann, P. et al. Epigenetic histone marks of extended meta-polycentric centromeres of Lathyrus and Pisum chromosomes. Frontiers in Plant Science 7, 234 (2016). Macas, J. et al. Assembly of the 81.6 Mb centromere of pea chromosome 6 elucidates the structure and evolution of metapolycentric chromosomes. PLOS Genetics 19, e1010633 (2023). Lu, Z. et al. The prevalence, evolution and chromatin signatures of plant regulatory elements. Nature Plants 5, 1250–1259 (2019). Alexander, D.H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Research 19, 1655–1664 (2009). Chen, H., Patterson, N. & Reich, D. Population differentiation as a test for selective sweeps. Genome Res 20, 393–402 (2010). Liang, Y., Huang, Y., Liu, C., Chen, K. & Li, M. Functions and interaction of plant lipid signalling under abiotic stresses. Plant Biology 25, 361–378 (2023). Zhang, H. et al. Optimization of Genotyping-by-Sequencing (GBS) for Germplasm Fingerprinting and Trait Mapping in Faba Bean. Legume Science 6, e254 (2024). Vágújfalvi, A., Galiba, G., Cattivelli, L. & Dubcovsky, J. The cold-regulated transcriptional activator Cbf3 is linked to the frost-tolerance locus Fr-A2 on wheat chromosome 5A. Molecular Genetics and Genomics 269, 60–67 (2003). Sandve, S.R. et al. Molecular mechanisms underlying frost tolerance in perennial grasses adapted to cold climates. Plant Science 180, 69–77 (2011). Thomashow, M.F. Molecular Basis of Plant Cold Acclimation: Insights Gained from Studying the CBF Cold Response Pathway. Plant Physiology 154, 571–577 (2010). Fowler, S. & Thomashow, M.F. Arabidopsis transcriptome profiling indicates that multiple regulatory pathways are activated during cold acclimation in addition to the CBF cold response pathway. The Plant Cell 14, 1675–1690 (2002). Schulz, E., Tohge, T., Zuther, E., Fernie, A.R. & Hincha, D.K. Flavonoids are determinants of freezing tolerance and cold acclimation in Arabidopsis thaliana. Scientific Reports 6, 34027 (2016). Ahmad, S., Ali, S., Shah, A.Z., Khan, A. & Faria, S. Chalcone synthase (CHS) family genes regulate the growth and response of cucumber (Cucumis sativus L.) to Botrytis cinerea and abiotic stresses. Plant Stress 8, 100159 (2023). Jayaraman, K. et al. Stress-inducible expression of chalcone isomerase2 gene improves accumulation of flavonoids and imparts enhanced abiotic stress tolerance to rice. Environmental and Experimental Botany 190, 104582 (2021). Wang, Y. et al. The regulation of adaptation to cold and drought stresses in Poa crymophila Keng revealed by integrative transcriptomics and metabolomics analysis. Frontiers in Plant Science 12, 631117 (2021). Mockler, T.C. et al. Regulation of flowering time in Arabidopsis by K homology domain proteins. Proceedings of the National Academy of Sciences 101, 12759–12764 (2004). Lavania, D. et al. Genetic approaches for breeding heat stress tolerance in faba bean (Vicia faba L.). Acta Physiologiae Plantarum 37, 1737 (2014). Maalouf, F. et al. Genetic Dissection of Heat Stress Tolerance in Faba Bean (Vicia faba L.) Using GWAS. Plants 11, 1108 (2022). Sallam, A. & Martsch, R. Association mapping for frost tolerance using multi-parent advanced generation inter-cross (MAGIC) population in faba bean (Vicia faba L.). Genetica 143, 501–514 (2015). Bornhofen, E. et al. Genetics of faba bean yield and yield stability. bioRxiv (2024). Tayeh, N. et al. A tandem array of CBF/DREB1 genes is located in a major freezing tolerance QTL region on Medicago truncatula chromosome 6. BMC Genomics 14, 814 (2013). Šimková, H., Číhalíková, J., Vrána, J., Lysák, M.A. & Doležel, J. Preparation of HMW DNA from Plant Nuclei and Chromosomes Isolated from Root Tips. Biologia Plantarum 46, 369–373 (2003). Monat, C. et al. TRITEX: chromosome-scale sequence assembly of Triticeae genomes with open-source tools. Genome Biology 20, 284 (2019). Ou, S. & Jiang, N. LTR_retriever: a highly accurate and sensitive program for identification of long terminal repeat retrotransposons. Plant Physiology 176, 1410–1422 (2017). Ou, S., Chen, J. & Jiang, N. Assessing genome assembly quality using the LTR Assembly Index (LAI). Nucleic Acids Research 46, e126-e126 (2018). Simão, F.A., Waterhouse, R.M., Ioannidis, P., Kriventseva, E.V. & Zdobnov, E.M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015). Gao, Y., Liu, B., Wang, Y. & Xing, Y. TideHunter: efficient and sensitive tandem repeat detection from noisy long-reads using seed-and-chain. Bioinformatics 35, i200-i207 (2019). Novák, P. et al. TAREAN: a computational tool for identification and characterization of satellite DNA from unassembled short reads. Nucleic Acids Research 45, e111-e111 (2017). Novák, P., Hoštáková, N., Neumann, P. & Macas, J. DANTE and DANTE_LTR: lineage-centric annotation pipelines for long terminal repeat retrotransposons in plant genomes. NAR Genomics and Bioinformatics 6, lqae113 (2024). Novák, P., Neumann, P., Pech, J., Steinhaisl, J. & Macas, J. RepeatExplorer: a Galaxy-based web server for genome-wide characterization of eukaryotic repetitive elements from next-generation sequence reads. Bioinformatics 29, 792–793 (2013). Novák, P., Neumann, P. & Macas, J. Global analysis of repetitive DNA from unassembled sequence reads using RepeatExplorer2. Nature Protocols 15, 3745–3776 (2020). Smit, A.F.A., Hubley, R. & Green, P. RepeatMasker Open-4.0. https://www.repeatmasker.org/ . (2013–2015). Quinlan, A.R. & Hall, I.M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010). Neumann, P., Novák, P., Hoštáková, N. & Macas, J. Systematic survey of plant LTR-retrotransposons elucidates phylogenetic relationships of their polyprotein domains and provides a reference for element classification. Mobile DNA 10, 1 (2019). Neumann, P. et al. Centromeres Off the Hook: Massive Changes in Centromere Size and Structure Following Duplication of CenH3 Gene in Fabeae Species. Molecular Biology and Evolution 32, 1862–1879 (2015). Bolger, A.M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014). Langmead, B. & Salzberg, S.L. Fast gapped-read alignment with Bowtie 2. Nature Methods 9, 357–359 (2012). Tarasov, A., Vilella, A.J., Cuppen, E., Nijman, I.J. & Prins, P. Sambamba: fast processing of NGS alignment formats. Bioinformatics 31, 2032–2034 (2015). Stovner, E.B. & Sætrom, P. epic2 efficiently finds diffuse domains in ChIP-seq data. Bioinformatics 35, 4392–4393 (2019). Zhang, Y. et al. Model-based Analysis of ChIP-Seq (MACS). Genome Biology 9, R137 (2008). Ramírez, F. et al. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Research 44, W160-W165 (2016). Lawrence, M., Gentleman, R. & Carey, V. rtracklayer: an R package for interfacing with genome browsers. Bioinformatics 25, 1841–1842 (2009). Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018). Brůna, T., Hoff, K.J., Lomsadze, A., Stanke, M. & Borodovsky, M. BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP + and AUGUSTUS supported by a protein database. NAR genomics and bioinformatics 3, lqaa108 (2021). Kim, D., Paggi, J.M., Park, C., Bennett, C. & Salzberg, S.L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nature Biotechnology 37, 907–915 (2019). Huerta-Cepas, J. et al. eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic acids research 47, D309-D314 (2019). Ashburner, M. et al. Gene Ontology: tool for the unification of biology. Nature Genetics 25, 25–29 (2000). Kanehisa, M. & Goto, S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28, 27–30 (2000). Boutet, E., Lieberherr, D., Tognolli, M., Schneider, M. & Bairoch, A. UniProtKB/Swiss-Prot. Methods Mol Biol 406, 89–112 (2007). Wu, T. et al. clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. The innovation 2, 100141 (2021). Buenrostro, J.D., Wu, B., Chang, H.Y. & Greenleaf, W.J. ATAC-seq: A Method for Assaying Chromatin Accessibility Genome-Wide. Current Protocols in Molecular Biology 109, 21.29.1-21.29.9 (2015). Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009). Li, H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27, 2987–2993 (2011). Felsenstein, J. PHYLIP (phylogeny inference package), version 3.5 c , (Produced and distributed by author/Department of Genetics, University of Washington, 1993). Alexander, D.H. & Lange, K. Enhancements to the ADMIXTURE algorithm for individual ancestry estimation. BMC Bioinformatics 12, 246 (2011). Zhang, C., Dong, S.S., Xu, J.Y., He, W.M. & Yang, T.L. PopLDdecay: a fast and effective tool for linkage disequilibrium decay analysis based on variant call format files. Bioinformatics 35, 1786–1788 (2019). Team, R.C. R: A language and environment for statistical computing. (R Foundation for Statistical Computing, Vienna, Austria, 2013). Lewontin, R.C. & Krakauer, J. Distribution of gene frequency as a test of the theory of the selective neutrality of polymorphisms. Genetics 74, 175–195 (1973). Akey, J.M., Zhang, G., Zhang, K., Jin, L. & Shriver, M.D. Interrogating a high-density SNP map for signatures of natural selection. Genome research 12, 1805–1814 (2002). Zeng, X. et al. Origin and evolution of qingke barley in Tibet. Nature Communications 9, 5433 (2018). Zhou, X. & Stephens, M. Genome-wide efficient mixed-model analysis for association studies. Nature Genetics 44, 821–824 (2012). Huang, M., Liu, X., Zhou, Y., Summers, R.M. & Zhang, Z. BLINK: a package for the next level of genome-wide association studies with both individuals and markers in the millions. GigaScience 8, giy154 (2018). Wang, Y. et al. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic acids research 40, e49-e49 (2012). Dong, S.S. et al. LDBlockShow: a fast and convenient tool for visualizing linkage disequilibrium and haplotype blocks based on variant call format files. Briefings in Bioinformatics 22, bbaa227 (2020). VanRaden, P.M. Efficient methods to compute genomic predictions. Journal of Dairy Science 91, 4414–4423 (2008). Endelman, J.B. Ridge regression and other kernels for genomic selection with R package rrBLUP. The Plant Genome 4, 250–255 (2011). Li, B. & Dewey, C.N. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 12, 323 (2011). Arend, D. et al. e!DAL - a framework to store, share and publish research data. BMC Bioinformatics 15, 214 (2014). Tables Table 1 is available in the Supplementary Files section. Extended Data Extended Data Figures are not available with this version. Extended Data Fig. 1: Contact matrix generated from the Hi-C data analysis showing sequence interactions in chromosomes. The color bar (red to white) indicates the logarithm of the contact density. Extended Data Fig. 2: Alignments of faba bean genome v.1 versus v.2 chromosomes . Orange lines indicate fixed orientation and placement errors in version 2. Extended Data Fig. 3: Chromosome locations of telomeric repeats. The motif of the telomeric region is TTAGGG. Extended Data Fig. 4: Comparison between the gene models predicted in the v.1 and v.2 genome assemblies. (a) Example of a merged gene, predicted as one gene in v.2 and two genes in v.1; (b) Example of a gene in v.2 that is incompletely predicted in v.1; (c) Example of split genes, predicted as one gene in v.1 and correctly identified as two genes in v.2. (d) Example of a gene predicted in v.2 but not predicted in v.1; The quality of the predicted genes was supported by experimental evidence, including Iso-seq and Illumina transcriptome sequences as well as protein evidence including pigeon pea, soybean, lentil, barrel clover, pea. Extended Data Fig. 5: Representation of tandem genes and chromatin accessible regions (a) Diagrammatic depiction of the genomic area spanning VfLeB4 cluster genes in the Hedin/2 v.1 and v.2 genome assemblies. VfLeB4 gene families were not assembled at the chromosome level in v.1 assembly, while the region including VfLeB4 family was fully assembled in the v.2 assembly. (b) Distribution of distance (bp) of accessible regions. Extended Data Fig. 6: Summary of the genomic variations identified across 406 accessions. SNP density across all chromosomes of the faba bean genome is scaled to a 1 Mb size. Extended Data Fig 7: Genome-wide distribution of spring and winter selective sweeps. ( a , b ) Genome-wide likelihood (XP-CLR) values for selection contrasting the spring and winter populations (a) and Low vicine and convicine (VC) and Normal VC populations within the spring populations (b), with chromosome number indicated along the x axis. c) Candidate regions for VC identified by selective sweep analysis and linkage disequilibrium (r 2 ) around a 1.44Mb region identified on chromosome 1 confirmed by the previous Fine-mapping study. d) Genomic alignment between Hedin/2 and Tiffany in VC genes region. Extended Data Fig 8: Sweep signals on chromosome 1. a. Selected regions on chromosome 1 based on sweep analysis. b. 31 indel markers designed for selecting low VC accessions. Extended Data Fig. 9: Selected regions on chromosome 3 (a) and chromosome 5 (b) based on Fst analysis . Extended Data Fig 10: 187 SNP markers designed for selecting frost-tolerant accessions. ( a) LD square of 187 SNP markers . (b) Heatmap plot of 187 markers in 406 faba bean accessions, where blue indicates winter haplotype and orange indicates spring haplotype. (c) Heatmap plot of 7 markers in the ProFaba panel with blue indicating winter haplotype and orange indicating spring haplotype. (d, e) Heatmap plot of 7 markers sorted by winter survival rate (d)and late frost survival rate (e) (Low->High) in the ProFaba panel with blue indicating winter haplotype and orange indicating spring haplotype. Additional Declarations There is NO Competing Interest. Supplementary Files maintable.pdf Table 1 Supplementarytables.xlsx suppl.datafigNEW.pdf Cite Share Download PDF Status: Published Journal Publication published 10 Mar, 2026 Read the published version in Nature Genetics → Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-5356723","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":377343138,"identity":"0ac6ffd2-1099-45bb-b6d9-4f07b19ff9aa","order_by":0,"name":"Murukarthick Jayakodi","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA/0lEQVRIiWNgGAWjYNACHgiWSKhAiDE2EKflDJDBRpQWqDYJxjYitOjOyH0mwSBjl9jfc/jgjYfz7Owa5JufbuZhsJHdcAC7FrMb6WYSDDzJiTPOtiVbJG5LTm5gYzO7zcOQZoxbSxobUAtzYsN5HjOJxG3MyUCHgbQcTiSgpT5xPljLnHqgFvZvQC3/CWkBmnm2B6il4bAdAxsPyJYDuLWcecZskcBz3HjjmWPJFgnHjiewseWU3ZxjkGw8E5eW42mMNz72VMvOO5N88OaPmmp7fubj2268qbCT7cOhBQwSexgcG2DsNjBlgEc5GPxgsIcx7fGpGwWjYBSMgpEJAD74W3cgh+AvAAAAAElFTkSuQmCC","orcid":"https://orcid.org/0000-0003-2951-0541","institution":"Texas A\u0026M AgriLife Research","correspondingAuthor":true,"prefix":"","firstName":"Murukarthick","middleName":"","lastName":"Jayakodi","suffix":""},{"id":377343139,"identity":"ed4a30f9-01bc-4190-a11d-3f5f8b6090b5","order_by":1,"name":"Hailin Zhang","email":"","orcid":"https://orcid.org/0000-0001-8757-9905","institution":"Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben","correspondingAuthor":false,"prefix":"","firstName":"Hailin","middleName":"","lastName":"Zhang","suffix":""},{"id":377343140,"identity":"1d2b57cf-4f8d-434f-9117-ad914bcb6d90","order_by":2,"name":"Alex Windhorst","email":"","orcid":"","institution":"Division of Plant Breeding Methodology, Georg-August-University Goettingen","correspondingAuthor":false,"prefix":"","firstName":"Alex","middleName":"","lastName":"Windhorst","suffix":""},{"id":377343141,"identity":"276b7840-fcf4-4039-89fb-86cb18cf3e87","order_by":3,"name":"Elesandro Bornhofen","email":"","orcid":"https://orcid.org/0000-0002-7447-6226","institution":"Center for Quantitative Genetics and Genomics, Aarhus University","correspondingAuthor":false,"prefix":"","firstName":"Elesandro","middleName":"","lastName":"Bornhofen","suffix":""},{"id":377343142,"identity":"5987210b-f33e-4a5c-89c1-b4068a50bcda","order_by":4,"name":"Zuzana Tulpova","email":"","orcid":"https://orcid.org/0000-0002-4403-9367","institution":"Institute of Experimental Botany of the Czech Academy of Sciences","correspondingAuthor":false,"prefix":"","firstName":"Zuzana","middleName":"","lastName":"Tulpova","suffix":""},{"id":377343143,"identity":"2609907e-6fac-4f7e-9176-e33a0a87e1e1","order_by":5,"name":"Petr Novak","email":"","orcid":"https://orcid.org/0000-0002-5068-9681","institution":"Biology Centre of the Czech Academy of Sciences","correspondingAuthor":false,"prefix":"","firstName":"Petr","middleName":"","lastName":"Novak","suffix":""},{"id":377343144,"identity":"41a7165c-919a-45b7-b0bc-9e0df503d4ec","order_by":6,"name":"Jiri Macas","email":"","orcid":"https://orcid.org/0000-0003-0829-1570","institution":"Biology Centre, Czech Academy of Sciences","correspondingAuthor":false,"prefix":"","firstName":"Jiri","middleName":"","lastName":"Macas","suffix":""},{"id":377343145,"identity":"f98794ef-1180-4468-8a5d-585a07af315a","order_by":7,"name":"Hana Simkova","email":"","orcid":"https://orcid.org/0000-0003-4159-7619","institution":"Centre of the Region Haná for Biotechnological and Agricultural Research","correspondingAuthor":false,"prefix":"","firstName":"Hana","middleName":"","lastName":"Simkova","suffix":""},{"id":377343146,"identity":"a860eee1-7ef2-406a-86d7-e4c9046f4ce0","order_by":8,"name":"Marcin Nadzieja","email":"","orcid":"","institution":"Aarhus University","correspondingAuthor":false,"prefix":"","firstName":"Marcin","middleName":"","lastName":"Nadzieja","suffix":""},{"id":377343147,"identity":"827257c7-689a-41fd-b444-d79656f66a66","order_by":9,"name":"Jung Min Kim","email":"","orcid":"","institution":"Advanced Radiation Technology Institute, Korea Atomic Energy Research Institute","correspondingAuthor":false,"prefix":"","firstName":"Jung","middleName":"Min","lastName":"Kim","suffix":""},{"id":377343148,"identity":"f5b9964a-cea2-4a6d-8230-5773bc397d59","order_by":10,"name":"Dustin Cram","email":"","orcid":"","institution":"National Research Council Canada","correspondingAuthor":false,"prefix":"","firstName":"Dustin","middleName":"","lastName":"Cram","suffix":""},{"id":377343149,"identity":"14d86883-5d3b-4345-a492-8fe90f4de034","order_by":11,"name":"Yongguo Cao","email":"","orcid":"","institution":"National Research Council Canada","correspondingAuthor":false,"prefix":"","firstName":"Yongguo","middleName":"","lastName":"Cao","suffix":""},{"id":377343150,"identity":"8da26b11-12ec-4163-8ea5-ffb93078ad4b","order_by":12,"name":"David Konkin","email":"","orcid":"","institution":"National Research Council Canada","correspondingAuthor":false,"prefix":"","firstName":"David","middleName":"","lastName":"Konkin","suffix":""},{"id":377343151,"identity":"a986add3-297e-4d8c-a91b-f12128b1de6d","order_by":13,"name":"Olaf Sass","email":"","orcid":"","institution":"Norddeutsche Pflanzenzucht Hans-Georg Lembke KG","correspondingAuthor":false,"prefix":"","firstName":"Olaf","middleName":"","lastName":"Sass","suffix":""},{"id":377343152,"identity":"8af95f17-131d-449c-8640-7d0b5ab0defb","order_by":14,"name":"Gregor Welna","email":"","orcid":"","institution":"Norddeutsche Pflanzenzucht Hans-Georg Lembke KG","correspondingAuthor":false,"prefix":"","firstName":"Gregor","middleName":"","lastName":"Welna","suffix":""},{"id":377343153,"identity":"1c6f0bd7-7f1e-4026-9c24-0374ac4a996e","order_by":15,"name":"Axel Himmelbach","email":"","orcid":"","institution":"Leibniz Institute of Plant Genetics and Crop Plant Research","correspondingAuthor":false,"prefix":"","firstName":"Axel","middleName":"","lastName":"Himmelbach","suffix":""},{"id":377343154,"identity":"aa8823b8-d6d3-4204-a607-95560615d094","order_by":16,"name":"Martin Mascher","email":"","orcid":"https://orcid.org/0000-0001-6373-6013","institution":"Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben","correspondingAuthor":false,"prefix":"","firstName":"Martin","middleName":"","lastName":"Mascher","suffix":""},{"id":377343155,"identity":"29a48435-1c2e-47ea-b689-5b07e62aa729","order_by":17,"name":"Wolfgang Link","email":"","orcid":"","institution":"Division of Plant Breeding Methodology, Georg-August-University Goettingen","correspondingAuthor":false,"prefix":"","firstName":"Wolfgang","middleName":"","lastName":"Link","suffix":""},{"id":377343156,"identity":"3b58cd1d-b970-456a-9d5c-596d51c7c0ba","order_by":18,"name":"Soon-Jae Kwon","email":"","orcid":"https://orcid.org/0000-0002-9560-6827","institution":"Advanced Radiation Technology Institute, Korea Atomic Energy Research Institute","correspondingAuthor":false,"prefix":"","firstName":"Soon-Jae","middleName":"","lastName":"Kwon","suffix":""},{"id":377343157,"identity":"21bc6c80-5164-4878-b09b-ff11f1e741b5","order_by":19,"name":"Stig Andersen","email":"","orcid":"https://orcid.org/0000-0002-1096-1468","institution":"Aarhus University","correspondingAuthor":false,"prefix":"","firstName":"Stig","middleName":"","lastName":"Andersen","suffix":""},{"id":377343158,"identity":"98e0794d-bd63-4289-9245-afe0e0490ee6","order_by":20,"name":"Tae-Jin Yang","email":"","orcid":"https://orcid.org/0000-0002-9676-8801","institution":"Seoul National University","correspondingAuthor":false,"prefix":"","firstName":"Tae-Jin","middleName":"","lastName":"Yang","suffix":""}],"badges":[],"createdAt":"2024-10-29 19:40:10","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-5356723/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-5356723/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1038/s41588-026-02524-y","type":"published","date":"2026-03-10T04:00:00+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":69016258,"identity":"47b7fc80-4baa-4312-a60d-0a307eb02b38","added_by":"auto","created_at":"2024-11-14 14:53:32","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":1566704,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eGenomic feature distribution across \u003c/strong\u003e\u003cem\u003e\u003cstrong\u003eVicia faba\u003c/strong\u003e\u003c/em\u003e\u003cstrong\u003e chromosomes. (a) \u003c/strong\u003eThe Circos plot illustrates the distribution of various genomic features across the six chromosome-scale scaffolds of the \u003cem\u003eVicia faba\u003c/em\u003e genome. The tracks, arranged from outside to inside, represent the following: (1) Density of transposable elements. (2) Number of full-length LTR-retrotransposons, identified by the DANTE_LTR pipeline. (3) Position of centromeres, determined by ChIP-seq using the CENH3 antibody, shown as the ChIP/Input ratio. (4) Density of tandem repeats. (5) Number of protein-coding genes. (6) Density of regions with similarity to 45S rDNA and 5S rDNA. All densities and feature counts are averaged over 5 Mbp windows, except for ChIP/Input ratios which are averaged over 2 Mbp windows. \u003cstrong\u003e(b)\u003c/strong\u003e Metagene plot and heatmap visualizations of accessibility around genes in adult leaf (left) and seedling (right) tissues of faba bean.\u003c/p\u003e","description":"","filename":"Figure1.png","url":"https://assets-eu.researchsquare.com/files/rs-5356723/v1/a0e5e2e3a6a399dfa8484e17.png"},{"id":69015770,"identity":"6477beeb-60d4-420c-871e-871abf4e1110","added_by":"auto","created_at":"2024-11-14 14:45:32","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":971771,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eThe population structure of spring and winter populations in faba bean.\u003c/strong\u003e \u003cstrong\u003e(a)\u003c/strong\u003e Principal component analysis of 406 faba bean accessions including spring and winter groups. \u003cstrong\u003e(b)\u003c/strong\u003e Maximum likelihood phylogenetic tree of faba bean accessions inferred from whole-genome SNPs with 100 nonparametric bootstraps. \u003cstrong\u003e(c)\u003c/strong\u003e Population structure of the 406 faba bean accessions. \u003cstrong\u003e(d) \u003c/strong\u003eGenome-wide linkage disequilibrium (LD) decay across the two faba bean populations.\u003c/p\u003e","description":"","filename":"Figure2.png","url":"https://assets-eu.researchsquare.com/files/rs-5356723/v1/c1e1b382a4f9f8d4b4952e0e.png"},{"id":69016904,"identity":"90aaa293-f60d-4f13-b2df-593ae09545f1","added_by":"auto","created_at":"2024-11-14 15:01:33","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":1135080,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eCandidate selection regions for winter survival and late frost tolerance identified by GWAS and population-differentiation statistics (FST).\u003c/strong\u003e \u003cstrong\u003e(a)\u003c/strong\u003e The Genome-wide association statistics for late frost survival (top plot) and winter survival (middle plot). The y axis corresponds to late frost survival scores and winter survival scores (-log\u003csub\u003e10\u003c/sub\u003e (P-values)), while the x axis corresponds to genomic coordinates (Hedin/2). The bottom plot shows a genome-wide view in 10-kb/2-kb sliding windows of differentiation (FST) associated with differentiation within the spring and winter faba bean gene pools. \u003cstrong\u003e(b)\u003c/strong\u003e A close-up overlap view of the significant locus and heatmap around an 8.48Mb region identified on chromosome 1 with the previous QTL mapping. The heatmap plot shows SNPs (blue represents alternative allele, orange represents reference allele, green represents heterozygosity and red represents missing) in this region.\u003cstrong\u003e (c) \u003c/strong\u003eHeatmap of expression for 37 genes across all samples.\u003cstrong\u003e (d, e) \u003c/strong\u003eBoxplots of the significant locus on chromosome 1 based on late frost survival rate (d) and winter survival rate (e).\u003cstrong\u003e (f) \u003c/strong\u003ePhylogenetic tree illustrating differentiation between spring and winter populations using the selected 187 SNP markers.\u003c/p\u003e","description":"","filename":"Figure3.png","url":"https://assets-eu.researchsquare.com/files/rs-5356723/v1/893c801792fb979b0f48309f.png"},{"id":69015772,"identity":"96e57c1f-84fc-4b82-9969-200f59f9d03b","added_by":"auto","created_at":"2024-11-14 14:45:32","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":1501805,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eGWAS for winter frost traits in the winter faba bean population. (a) \u003c/strong\u003eManhattan plots showing loss of turgor (top plot), loss of color (middle plot) and the combination of the loss of turgor and color traits (bottom) using BLINK model and GEMMA MLM model.\u003cstrong\u003e (b) \u003c/strong\u003eLinkage disequilibrium (r\u003csup\u003e2\u003c/sup\u003e) plots highlighting the significant loci identified on chromosome 3 and chromosome 5. \u003cstrong\u003e(c) \u003c/strong\u003eHeatmap of expression for 17 genes across all samples.\u003cstrong\u003e (d, e, f) \u003c/strong\u003eBoxplot of the significant locus on chromosome 5 based on loss of turgor (d), loss of color (e), and the combination of the loss of turgor and color traits (f). (\u003cstrong\u003eg) \u003c/strong\u003ePredictive ability of a linear model fitting random genome-wide markers (M1) is compared with an alternative model (M2), where significant markers reported in panel (a) are explicitly modeled as fixed effects.\u003c/p\u003e","description":"","filename":"Figure4.png","url":"https://assets-eu.researchsquare.com/files/rs-5356723/v1/375fedd786fae0a1f6958225.png"},{"id":104382728,"identity":"5b19b693-babb-41a4-8f1f-e607b7b57452","added_by":"auto","created_at":"2026-03-11 07:57:47","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":6539879,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-5356723/v1/63ff8fd3-cd68-4c91-8f62-737e73485d66.pdf"},{"id":69015769,"identity":"36a9614b-d5ec-4e26-b423-5ad90f507bc7","added_by":"auto","created_at":"2024-11-14 14:45:32","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":24493,"visible":true,"origin":"","legend":"Table 1","description":"","filename":"maintable.pdf","url":"https://assets-eu.researchsquare.com/files/rs-5356723/v1/65ddf102d149febda966f072.pdf"},{"id":69015775,"identity":"da1ea2be-d95c-4c6a-81d2-29e67ea69915","added_by":"auto","created_at":"2024-11-14 14:45:33","extension":"xlsx","order_by":7,"title":"","display":"","copyAsset":false,"role":"supplement","size":5404954,"visible":true,"origin":"","legend":"","description":"","filename":"Supplementarytables.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-5356723/v1/04c69d80cc9851312c60cfdc.xlsx"},{"id":69015774,"identity":"4a82e3eb-a6e7-4884-bdf5-7bbd90ea86fc","added_by":"auto","created_at":"2024-11-14 14:45:33","extension":"pdf","order_by":8,"title":"","display":"","copyAsset":false,"role":"supplement","size":6783042,"visible":true,"origin":"","legend":"","description":"","filename":"suppl.datafigNEW.pdf","url":"https://assets-eu.researchsquare.com/files/rs-5356723/v1/2e3b0159dfe488edbae94149.pdf"}],"financialInterests":"There is \u003cb\u003eNO\u003c/b\u003e Competing Interest.","formattedTitle":"Allelic variation at a single locus distinguishes spring and winter faba beans","fulltext":[{"header":"Main","content":"\u003cp\u003eFaba bean (\u003cem\u003eVicia faba\u003c/em\u003e, 2n\u0026thinsp;=\u0026thinsp;2x\u0026thinsp;=\u0026thinsp;12) is renowned for its high protein content\u003csup\u003e\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e\u003c/sup\u003e (~\u0026thinsp;29%), superior nitrogen-fixing ability\u003csup\u003e\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e\u003c/sup\u003e and sustainability credentials\u003csup\u003e\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e\u003c/sup\u003e. Faba bean production has grown steadily during the past ten years in Europe and other parts of the world (FAO 2022) and both spring and winter types are grown. Notably, winter faba beans have significant yield advantages (up to 47%) over spring cultivars\u003csup\u003e\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e\u003c/sup\u003e and are mostly grown in North-West Europe and the UK. Against the backdrop of increasing temperature and drought, winter faba bean is recognized as a promising crop to simultaneously boost local protein production and agricultural sustainability via crop rotation and as a cover crop. However, winter faba bean cultivation remains limited due to insufficient winter hardiness, including winter survival and tolerance against frost during winter and early spring, so-called \u0026ldquo;late frost\u0026rdquo;, of the available winter faba bean cultivars. Despite attempts to identify quantitative trait loci (QTL) controlling winter hardiness\u003csup\u003e\u003cspan additionalcitationids=\"CR6 CR7 CR8 CR9\" citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e\u003c/sup\u003e, the genetic basis for winter hardiness remains unclear.\u003c/p\u003e \u003cp\u003eFaba bean seeds contain vicine and convicine (VC), pyrimidine glucosides that cause hemolytic anemia (also known as favism) in individuals deficient in glucose-6-phosphate dehydrogenase. This is a major restricting factor in faba bean production and consumption\u003csup\u003e\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e\u003c/sup\u003e, targeted primarily in faba bean breeding. Substantial breeding and research efforts have cloned the major \u003cem\u003eVC1\u003c/em\u003e gene\u003csup\u003e\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e\u003c/sup\u003e, which controls the rate of VC accumulation in seed, and several low VC faba bean spring-type cultivars. Nevertheless, the development of low VC winter cultivars is yet to be realized in faba bean\u003csup\u003e\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e\u003c/sup\u003e. The release of the first faba bean reference genome has facilitated genetic studies\u003csup\u003e\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e\u003c/sup\u003e and the genomic characterization of faba bean germplasm\u003csup\u003e\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e\u003c/sup\u003e. However, the genomic loci that have been under selection due to breeding remain largely unknown. Many of the agronomic traits are still selected phenotypically by breeders. Therefore, charting genomic regions shaped by breeding efforts, termed \u0026ldquo;breeding signatures\u0026rdquo;, could potentially discover genes underlying important agronomic and nutritional quality traits and accelerate the selection process by trait-specific molecular markers. Compared to the spring gene pool, winter types have received less attention by breeders, which is partially due to a lack of diversity in the winter gene pool. Therefore, identifying genomic regions selected for in the elite spring gene pool during intensive breeding may provide targets also for winter faba bean improvement, especially for quality traits.\u003c/p\u003e \u003cp\u003eTo improve faba bean winter hardiness and breeding in general, we present an improved genome assembly and gene annotation of the reference cultivar (cv.) Hedin/2. In addition, resequencing of diverse elite breeding lines and winter genotypes uncovers the genes related to winter hardiness and genomic regions that were selected during breeding. The findings from our study and the greatly improved new genome sequence, provide genomic insights into faba bean cold acclimation and genetic improvement.\u003c/p\u003e"},{"header":"Results","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003eImproved faba bean reference sequence and gene annotation\u003c/h2\u003e \u003cp\u003eWe generated a Bionano optical map for the reference faba bean cv. \u0026lsquo;Hedin/2\u0026rsquo; and obtained a contig N50 of 41 Mb (\u003cb\u003eSupplementary table \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003e1\u003c/span\u003e)\u003c/b\u003e. Hybrid scaffolding, which combined the Bionano map and contigs from the first version of the \u0026lsquo;Hedin/2 v.1\u0026rsquo; reference genome generated from PacBio HiFi reads, resulted in 317 scaffolds with a greatly improved N50 value of 100 Mb (\u003cb\u003eTable\u0026nbsp;1\u003c/b\u003e). We then used chromosome conformation capture sequencing (Hi-C) data to order and orient the hybrid scaffolds into six large chromosomal pseudomolecules with a total size of 11.7 Gb (hereafter \u0026lsquo;Hedin/2 v.2\u0026rsquo;) (\u003cb\u003eExtended Data\u003c/b\u003e Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). Over 97% of the scaffolds were assigned to precise chromosomal locations (\u003cb\u003eTable\u0026nbsp;1, Supplementary table \u003cspan refid=\"MOESM2\" class=\"InternalRef\"\u003e2\u003c/span\u003e\u003c/b\u003e), compared to 94% in Hedin/2 v.1. There were fewer sequence gaps in the second version (335) than in the first version (5,195) (\u003cb\u003eSupplementary table \u003cspan refid=\"MOESM2\" class=\"InternalRef\"\u003e2\u003c/span\u003e)\u003c/b\u003e. The size of unanchored sequences in \u0026lsquo;Hedin/2 v.2\u0026rsquo; was 295 Mb, compared to 648 Mb in the first version. Further assembly evaluation by Merqury\u003csup\u003e\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e\u003c/sup\u003e revealed comparable and yet improved accuracy relating to structural and base-level accuracy as well as higher gene space completeness in the second version (\u003cb\u003eTable\u0026nbsp;1\u003c/b\u003e). While overall chromosomal collinearity between both versions was high, many inverted sequence orientations were corrected in Hedin/2 v.2\u0026rsquo; (\u003cb\u003eExtended Data\u003c/b\u003e Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e). In addition, the search for telomeric sequence motif (\u0026ldquo;TTTAGGG\u0026rdquo;) in \u0026lsquo;Hedin/2 v.2\u0026rsquo; revealed the presence of telomeres at most chromosome ends (\u003cb\u003eExtended Data\u003c/b\u003e Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e\u003cb\u003e)\u003c/b\u003e.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eAn improved de novo gene annotation was generated by integrating newly generated PacBio Iso-Seq datasets from 11 tissues (\u003cb\u003eSupplementary table 3\u003c/b\u003e), previous RNA-Seq datasets\u003csup\u003e\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e\u003c/sup\u003e and related protein evidence (\u003cb\u003eSupplementary table 4\u003c/b\u003e) resulting in a total of 35,107 protein-coding genes, including 963 genes not identified in the Hedin/2 v.1 sequence. In total, 55,283 transcripts were identified from the coding genes. This is higher than for Hedin/2 v.1, which had 37,065 transcripts, indicating notable improvement in the annotation of alternative splicing (\u003cb\u003eSupplementary table 5\u003c/b\u003e). Further, manual curation resolved the fragmented and merged gene models using full-length lso-Seq datasets (\u003cb\u003eExtended Data\u003c/b\u003e Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e). The long intron genes were well annotated in \u0026lsquo;Hedin/2 v.2\u0026rsquo; (max. intron size of 245 kb) compared to version 1 with a maximum intron size of 145 kb. Seed storage protein (SSP) genes are crucial in grain legumes and are often tandemly clustered making them difficult to annotate with short RNA-Seq reads. Comparative analysis of SSP genes between the two annotation versions (\u003cb\u003eSupplementary table 6\u003c/b\u003e) revealed that a cluster of five genes belonging to the subgroup \u0026ldquo;legumin B4\u0026rdquo; was newly annotated and anchored in \u0026lsquo;Hedin/2 v.2\u0026rsquo; chromosomes (\u003cb\u003eExtended Data Fig.\u0026nbsp;5a\u003c/b\u003e), indicating the advantages of including the optical mapping and Iso-Seq data both in assembly and annotation, respectively. Overall, our improved reference genome sequence and gene annotation offer an accurate physical guide for the orientation of contigs in future faba bean genome assemblies, especially in pangenome studies, and may facilitate the precise characterization of agronomically and economically important genes.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eRepeat composition and centromere landscape\u003c/h3\u003e\n\u003cp\u003eUsing a combination of structure- and similarity-based approaches, 93% of the genome was annotated as repetitive sequences (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003ea, \u003cb\u003eSupplementary table 7\u003c/b\u003e), which is 5% higher than the previous assembly. Class I mobile elements were by far the dominant group of repeats, making up 83% of the genome. This was mainly due to the amplification of the Ogre lineage of Ty/3 gypsy LTR-retroelements which alone constituted about 71% of the faba bean genome. While the most abundant Ty1/copia lineage was SIRE with 5.9% of the genome the other major repeat type was satellite DNA (satDNA), making up 8.5% of the genome (\u003cb\u003eSupplementary table 7\u003c/b\u003e). In contrast to the satellite DNA that forms large arrays constituting heterochromatic bands on metaphase chromosomes, mobile elements were found evenly dispersed along the whole chromosomes, except for the regions occupied by satDNA, consistent with version 1 (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003ea). In comparison to version 1, there is a higher proportion of Ogres (71% vs. 44.4% of the genome) in \u0026lsquo;Hedin/2 v.2\u0026rsquo;. While the global outlook on repeat landscape is similar between \u0026lsquo;Hedin/2 v.2\u0026rsquo; and the previous version, we attribute some discrepancies in the proportion of repeat compositions between the versions to improved assembly quality and/or the result of different annotation tools.\u003c/p\u003e \u003cp\u003eWe mapped the positions of functional centromeres by Chromatin Immunoprecipitation Sequencing (ChIP-seq) of CENH3, a centromere-specific variant of histone H3 (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003ea) and determined the sizes of centromere, defined as the lengths of CENH3-enriched regions. Sizes ranged from 6.18 Mb (cen1) to 20.6 Mb (cen6) (\u003cb\u003eSupplementary table 8)\u003c/b\u003e. This makes them the largest monocentric centromeres described for plants to date. Centromere size was not proportional to chromosome length, with the largest chromosome (chromosome 1) having the smallest centromere. In most centromeres, the coverage with CENH3 was contiguous, with relatively even CENH3 ChIP-seq enrichment profiles (e.g. cen1). In cen5, the CENH3-associated domain was disrupted by a region of about 230 kb that contained satellite FabTR-53 and lacked CENH3. However, the main exception was the largest cen6, where the enrichment profile consisted of three distinct peaks separated by regions of lower CENH3 enrichment (\u003cb\u003eSupplementary data\u003c/b\u003e Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e\u0026ndash;6). This organization and the association of CENH3 peaks with satDNA arrays resemble the structure of meta-polycentric chromosomes occurring in the related genera \u003cem\u003ePisum\u003c/em\u003e and \u003cem\u003eLathyrus\u003c/em\u003e\u003csup\u003e\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e,\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e\u003c/sup\u003e. Centromeres were enriched in satellite DNA (23% vs. 8.5% in the whole genome). However, the distribution of centromeric satellites was not even among the chromosomes. While cen1 was almost entirely composed of a mosaic of four satellite repeats, satDNA comprised only a minor part of the remaining centromeres, with cen3 lacking satDNA almost entirely. Arrays of 14 different satellite repeats were found to be associated with centromeric chromatin, most of them specific to a single chromosome (\u003cb\u003eSupplementary table 9\u003c/b\u003e).\u003c/p\u003e\n\u003ch3\u003eGenome-wide open chromatin regions\u003c/h3\u003e\n\u003cp\u003eUsing assay for transposase-accessible chromatin using sequencing (ATAC-seq), we identified 121,443 and 93,501 accessible regions representing 39.6 Mb and 32.4 Mb in the adult leaf and seedling tissues, respectively (\u003cb\u003eSupplementary table 10\u003c/b\u003e). Of these accessible regions, 56,374 were shared between the two tissues. Accessible chromatin was made up of a much higher fraction of genes and gene-proximal sequences (~\u0026thinsp;5%,0-3kb away) than distal (\u0026lt;\u0026thinsp;0.4%, 3-30kb away) and intergenic (\u0026lt;\u0026thinsp;0.2%, \u0026gt;\u0026thinsp;30kb from gene) regions, though the number of accessible regions in intergenic regions was the highest of the four categories overall. Accessible regions that overlap genes tended to be larger in other areas of the genome (\u003cb\u003eSupplementary table 10\u003c/b\u003e). The distribution of the distances between accessible regions and genes showed a trimodal distribution (\u003cb\u003eExtended Data Fig.\u0026nbsp;5b\u003c/b\u003e). While the largest peak corresponded to the genome-wide background, it was unclear whether the secondary peaks represented accessible sites nearby unannotated genes or a discrete class of accessible regions that tend to occur further away from genes due to biological phenomena such as transposable element insertion or regulatory mechanisms. As noted previously in other species\u003csup\u003e\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e\u003c/sup\u003e, chromatin accessibility is elevated near the annotated transcriptional start and end sites (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eb).\u003c/p\u003e\n\u003ch3\u003eGenetic diversity and population structure of spring and winter faba bean\u003c/h3\u003e\n\u003cp\u003eUsing over 10-fold resequencing of 406 accessions including 209 elite spring faba bean breeding lines and 197 winter inbred lines (\u003cb\u003eSupplementary table 11\u003c/b\u003e), we discovered 98,797,214 single nucleotide polymorphisms (SNPs) and 1,751,392 indels (\u003cb\u003eSupplementary table 12\u003c/b\u003e). These variants were uniformly distributed along the chromosomes except over satellite regions (\u003cb\u003eExtended Data Fig.\u0026nbsp;6\u003c/b\u003e). Principal component analysis (PCA) and phylogenetic analysis clearly showed distinct clusters of winter and spring types (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003ea, \u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eb). This relationship was further strongly supported by ADMIXTURE\u003csup\u003e\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e\u003c/sup\u003e analysis providing two distinct population clusters at \u003cem\u003eK\u003c/em\u003e\u0026thinsp;=\u0026thinsp;2, spring and winter (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003ec). Calculation of genetic diversity using low-admixed individuals showed higher nucleotide diversity for winter (\u003cem\u003eπ\u003c/em\u003e\u0026thinsp;=\u0026thinsp;10.01\u0026times; 10\u003csup\u003e\u0026minus;\u0026thinsp;3\u003c/sup\u003e) than elite spring genotypes (\u003cem\u003eπ\u003c/em\u003e\u0026thinsp;=\u0026thinsp;8.79 \u0026times; 10\u003csup\u003e\u0026minus;\u0026thinsp;3\u003c/sup\u003e). Additionally, population genetic differentiation (\u003cem\u003eF\u003c/em\u003e\u003csub\u003eST\u003c/sub\u003e) between spring and winter populations was relatively low (\u003cem\u003eF\u003c/em\u003e\u003csub\u003eST\u003c/sub\u003e = 0.064), indicating that only a small number of loci might be responsible for the winter growth habit. Analysis of linkage disequilibrium (LD) decay indicated an LD decay distance of 682 kb for elite spring and 462 kb for winter genotypes (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003ed). The relatively low diversity and slower rate of LD decay in elite spring lines suggest a reduction in genetic diversity due to selective breeding.\u003c/p\u003e\n\u003ch3\u003eSelective sweeps for improvement\u003c/h3\u003e\n\u003cp\u003eTo identify the putative genomic regions selected for during spring faba bean breeding, we identified selective sweeps between the elite spring and winter population using the cross-population composite likelihood ratio test (XP-CLR)\u003csup\u003e\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e\u003c/sup\u003e. A total of 47 distinct sweep regions were identified representing 11.03 Mb (0.9%) of the Hedin/2 v.2\u0026rsquo; genome sequence. In total, 228 genes overlapped with these candidate regions (\u003cb\u003eSupplementary table 13\u003c/b\u003e). Gene ontology (GO) and pathway enrichment analysis of these genes indicated they are involved in biological processes particularly related to nodulation and cold acclimation and pathways related to oxidative phosphorylation, lipid metabolisms (\u003cb\u003eSupplementary table 14\u003c/b\u003e), which are known to play a role in response to cold stress\u003csup\u003e\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e\u003c/sup\u003e. Notably, a prominent sweep was seen on chromosome 1, likely representing the \u003cem\u003eVC1\u003c/em\u003e locus\u003csup\u003e\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e\u003c/sup\u003e (\u003cb\u003eExtended Data Fig.\u0026nbsp;7a\u003c/b\u003e). To verify that region, we grouped the elite spring lines into low VC (n\u0026thinsp;=\u0026thinsp;90) and normal VC (n\u0026thinsp;=\u0026thinsp;119) genotypes based on passport information and performed an additional pairwise comparison using XP-CLR analysis. Strikingly, the result confirmed that the sweep on chromosome 1 was associated with \u003cem\u003eVC1\u003c/em\u003e locus\u003csup\u003e\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e\u003c/sup\u003e (\u003cb\u003eExtended Data Fig.\u0026nbsp;7b\u003c/b\u003e), illustrating breeding for low VC content may have swept this locus in spring breeding lines. In addition, this sweep locus concurred with a previously fine mapped VC region\u003csup\u003e\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e\u003c/sup\u003e (\u003cb\u003eExtended Data Fig.\u0026nbsp;7c\u003c/b\u003e). Within this haplotype block, we identified four copies of the \u003cem\u003eVC1\u003c/em\u003e gene in \u0026lsquo;Hedin/2 v.2\u0026rsquo; whereas there are five copies in the previously published \u0026lsquo;Tiffany\u0026rsquo; genome\u003csup\u003e\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e\u003c/sup\u003e (\u003cb\u003eExtended Data Fig.\u0026nbsp;7d)\u003c/b\u003e. A haplotype analysis with SNPs showed more missing data for certain haplotypes (\u003cb\u003eExtended Data Fig.\u0026nbsp;8a\u003c/b\u003e), suggesting the occurrence of \u003cem\u003eVC1\u003c/em\u003e gene copy number variation in faba bean. A set of 31 indel markers significantly differentiated the low VC haplotypes from the wild types (\u003cb\u003eExtended Data Fig.\u0026nbsp;8b)\u003c/b\u003e, providing a new set of markers for selection in breeding (\u003cb\u003eSupplementary table 15\u003c/b\u003e).\u003c/p\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003eGenome-wide association scan for winter-hardiness\u003c/h2\u003e \u003cp\u003eWe investigated the genetic basis of winter hardiness in 208 accessions comprising 180 spring types and 28 winter lines, also referred to as the \u0026ldquo;ProFaba panel\u0026rdquo;, using genome-wide association studies (GWAS) analysis of the traits winter survival rate and late frost survival rate (\u003cb\u003eSupplementary table 16, 17\u003c/b\u003e). We identified a total of 1,131,399 SNPs using a combination of genotyping-by-sequencing (GBS)\u003csup\u003e\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e\u003c/sup\u003e and single primer enrichment technology (SPET). Two loci on chromosomes 1 and 5 were significantly associated with both traits (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003ea\u003cb\u003e).\u003c/b\u003e Further, these two significant loci coincided with the peak regions of high genetic differentiation (\u003cem\u003eF\u003c/em\u003e\u003csub\u003eST\u003c/sub\u003e) between spring and winter types. Intriguingly, the chromosome 1 GWAS peak overlapped with the previously identified major QTL for frost tolerance\u003csup\u003e\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e\u003c/sup\u003e (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eb), indicating a crucial and consistent locus related to winter hardiness. In the following, we refer to this locus as \u003cem\u003eFROST RESISTANCE 1 (FR-1)\u003c/em\u003e. A significantly associated T/C SNP, located downstream of the \u003cem\u003eVfaba.Hedin2.R2.1g002127\u003c/em\u003e gene, enabled the classification of accessions into two major haplotypes. Accessions harboring the CC haplotype demonstrated significantly higher winter hardiness scores compared to those with the TT haplotype \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003ed, \u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003ee\u003cb\u003e).\u003c/b\u003e Deeper haplotype analysis of the chromosome 1 locus showed a clear separation of winter and spring genotypes (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003ef). We further identified a set of 187 SNPs that could separate the spring and winter types (\u003cb\u003eExtended Data Fig.\u0026nbsp;9, Supplementary table 18\u003c/b\u003e) and thus, may be used as diagnostic markers for winter faba bean breeding. This sweep region contained a total of 30 genes including a cluster of 14 \u003cem\u003eC-repeat binding factor (CBF)/dehydration-responsive element binding factor1 (DREB1\u003c/em\u003e) whose role in cold acclimation has been demonstrated in diverse plant species\u003csup\u003e\u003cspan additionalcitationids=\"CR25\" citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e\u003c/sup\u003e. To identify potential candidate genes, we conducted cold treatment transcriptome assays sampling at 18\u0026deg;C, low temperature of 4\u0026deg;C and 0\u0026deg;C and freezing temperatures of -7\u0026deg;C using a winter hardy genotype (Wonjam1-ho from Korea) and a frost-susceptible spring genotype (PI 271634 from India) (\u003cb\u003eSupplementary table 19, 20\u003c/b\u003e). The expression data showed that the three \u003cem\u003eCBF/DREB1\u003c/em\u003e genes (\u003cem\u003eVfaba.Hedin2.R2.1g002127\u003c/em\u003e, \u003cem\u003eVfaba.Hedin2.R2.1g002152\u003c/em\u003e and \u003cem\u003eVfaba.Hedin2.R2.1g002153\u003c/em\u003e) at the \u003cem\u003eFR-1\u003c/em\u003e locus were induced when the temperature was low and nonfreezing (4\u0026deg;C), a phenomenon known as cold acclimation\u003csup\u003e\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e\u003c/sup\u003e, in winter genotype compared to the spring accession (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003ec). This suggests that these \u003cem\u003eCBF/DREB1\u003c/em\u003e genes may trigger the cold responsive genes, also known as the CBF regulon\u003csup\u003e\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e\u003c/sup\u003e, that may increase the tolerance to freezing in winter hardy faba bean. Similarly, among 7 genes in chromosome 5 locus, a chalcone synthase (\u003cem\u003eCHS\u003c/em\u003e) gene \u003cem\u003e(Vfaba.Hedin2.R2.5g030100\u003c/em\u003e) showed increased expression when temperature was below 4\u0026deg;C. CHS is a key enzyme in the flavonoid pathway and its role in cold acclimation has been reported in Arabidopsis\u003csup\u003e\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e\u003c/sup\u003e and other crops\u003csup\u003e\u003cspan additionalcitationids=\"CR30\" citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e\u003c/sup\u003e. These results indicate that the winter hardiness in faba bean is a complex process involving CBF regulon and cold induced flavonoids, like anthocyanins, as additional factors.\u003c/p\u003e \u003cp\u003eAdditionally, using our 183 winter accessions, we performed GWAS for visually assessed traits such as loss of leaf (and stem) color and turgor (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003ea) obtained from a recent study\u003csup\u003e\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e\u003c/sup\u003e. Two independent GWAS models consistently pinpointed two loci: one on chromosome 5 (PVE: 19.83%) and the other on chromosome 3 (PVE: 7.92%). Accessions with the GG haplotype exhibited significantly higher symptom scores compared to carriers of the CC haplotype on chromosome 5 (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003ed, \u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003ee, \u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003ef). Based on local LD blocks, 13 genes were identified within a 5.57 Mb interval on chromosome 5, while 4 genes were in a 125.29 kb region on chromosome 3 (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eb). Expression analysis of these genes (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003ec) showed a strong expression pattern for \u003cem\u003eVfaba.Hedin2.R2.5g032310\u003c/em\u003e gene on chromosome 5, which is \u003cem\u003eFLK\u003c/em\u003e (flowering locus K homology domain) that was reported to repress \u003cem\u003eFlowering Locus C (FLC)\u003c/em\u003e in Arabidopsis under cold temperature\u003csup\u003e\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e\u003c/sup\u003e. We observed a significant downregulation of the gene \u003cem\u003eVfaba.Hedin2.R2.3g019128\u003c/em\u003e (U11/U12 small nuclear ribonucleoprotein 48 kDa protein) on chromosome 3. However, its functional relevance to environmental stresses is yet to be revealed. To further validate the two identified loci, we conducted a prediction study using two genomic best linear unbiased prediction (gBLUP) models. We aimed to assess the additional predictive power gained by incorporating the main effects of the two loci as fixed regressions, along with a random polygenic effect (model M2), compared to a genomic model that only includes a random polygenic effect (model M1) (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eg). Using independent datasets for cross-validation, we found that model M2 consistently outperformed model M1 by around 30% across all traits. This indicates that the loci identified through GWAS analyses are tagging important genomic regions related to winter hardiness, that have not been discovered before\u003csup\u003e\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e\u003c/sup\u003e. Moreover, the strength of the correlations suggests that further improvements in freezing tolerance in winter-type faba bean can be achieved through molecular breeding.\u003c/p\u003e \u003c/div\u003e"},{"header":"Discussion","content":"\u003cp\u003eFaba bean is a high-yielding protein crop well-suited for improving soil fertility and increasing the production of plant protein that may reduce the use of meat proteins. A significant number of assembly gaps collapsed tandemly duplicated regions and repetitive elements\u003csup\u003e\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e\u003c/sup\u003e required improvements to contiguity and correctness for the faba bean reference sequence. In this study, we generated an optical map as a complementary dataset to overcome existing limitations. We increased the sequence contiguity to 33-fold, in turn, the number of sequences from thousands to a few hundred for pseudomolecule construction as compared to Hedin/2 v.1\u003csup\u003e14\u003c/sup\u003e. The new \u0026lsquo;Hedin/2 v.2\u0026rsquo; reference sequence significantly improved the representation of complex repeats, protein-coding genes regulatory and centromeres landscape, delivering an upgraded foundational genomic tool for fine-scale genetic studies and precision breeding.\u003c/p\u003e \u003cp\u003eRising temperatures pose a severe threat to spring-sown faba bean whose growth and yield can be adversely affected by temperatures above 30\u0026deg;C\u003csup\u003e\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e\u003c/sup\u003e. While efforts are being made to identify the genetic basis of heat-tolerance\u003csup\u003e\u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e\u003c/sup\u003e, faba bean winter types offer distinct advantages over spring types, including yield superiority, efficiency in using soil moisture and escape from drought and some pest attacks\u003csup\u003e\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e\u003c/sup\u003e. While strict prolonged cold exposure (known as vernalization) may not be required for winter types of faba bean, insufficient winter hardiness creates a risk of winter kill. Our comprehensive analysis using multiple datasets detected potential genomic loci for improving winter hardiness in faba bean. Remarkably, the chromosome 1 locus coincided with frost tolerance QTL identified by independent previous studies\u003csup\u003e\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e,\u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e,\u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e\u003c/sup\u003e. Further, this locus was reported to be syntenic with the freezing tolerance QTL identified on chromosome 6 of \u003cem\u003eM. truncatula\u003c/em\u003e\u003csup\u003e\u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e37\u003c/span\u003e\u003c/sup\u003e and \u003cem\u003eP. sativum\u003c/em\u003e\u003csup\u003e\u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e37\u003c/span\u003e\u003c/sup\u003e, respectively (\u003cb\u003eSupplementary data Fig.\u0026nbsp;7\u003c/b\u003e). Syntenic conservation of this locus among important grain legumes indicates the prospect of translational approaches to bestow winter hardiness in related grain legumes such as pea via genetic manipulations. Our new genomic resources and findings will enable further studies on the genetic basis of cold acclimation in related temperate grain legumes and breeding of winter faba bean cultivars with improved frost tolerance.\u003c/p\u003e"},{"header":"Methods","content":"\u003cdiv id=\"Sec11\" class=\"Section2\"\u003e \u003ch2\u003eOptical genome map and hybrid scaffolding\u003c/h2\u003e \u003cp\u003eA total of 1.5\u0026nbsp;million G1 nuclei of faba bean cv. Hedin/2 (highly homozygous) was purified using flow cytometry. Sorted nuclei were embedded in agarose miniplugs and treated by proteinase K as described in Šimkov\u0026aacute; \u003cem\u003eet al.\u003c/em\u003e (2003)\u003csup\u003e\u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e\u003c/sup\u003e. 490 ng of high molecular weight (HMW) DNA released from the miniplugs were directly labeled at DLE-1 recognition sites (CTTAAG motive) and stained following a standard Bionano Prep Direct Label and Stain (DLS) protocol (Bionano Genomics, San Diego, USA). Labeled and stained HMW DNA was analysed on Saphyr platform of Bionano Genomics using a single Saphyr chip G1.2. In total, 1,842.6 Gb of single molecule data\u0026thinsp;\u0026gt;\u0026thinsp;150 kb, corresponding to 142x coverage of the \u003cem\u003eV. faba\u003c/em\u003e genome, was collected and used to generate \u003cem\u003ede novo\u003c/em\u003e optical genome map (OGM, \u003cb\u003eSupplementary table \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003e1\u003c/span\u003e\u003c/b\u003e) by Bionano Solve software (version Solve3.6.1_11162020) with \u0026ldquo;optArguments_nonhaplotype_noES_noCut_DEL1_saphyr.xml\u0026rdquo; used. The contigs sequence from version 1 and the OGM assembly were used as input files for automated hybrid scaffolding (HS) pipeline integrated with the Bionano Solve v3.6.1_11162020). The HS was run with default parameters and \u0026ldquo;Resolve conflict\u0026rdquo; option for conflict resolution.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec12\" class=\"Section2\"\u003e \u003ch2\u003ePseudomolecule construction and validation\u003c/h2\u003e \u003cp\u003eThese scaffolds were then used to construct the pseudomolecule with the TRITEX pipeline\u003csup\u003e\u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e\u003c/sup\u003e. Finally, the order and orientation of scaffolds in each chromosome were manually inspected and corrected with evidence from Omni-C published in our previous assembly\u003csup\u003e\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e\u003c/sup\u003e. Besides, we used the tool Merqury v1.3\u003csup\u003e16\u003c/sup\u003e and LTR_retriever v2.9.0\u003csup\u003e40,41\u003c/sup\u003e to evaluate the genome completeness and consensus accuracy. The gene space completeness was evaluated using BUSCO v3.0.2b\u003csup\u003e\u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e\u003c/sup\u003e analysis with Embryophyta database 9.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec13\" class=\"Section2\"\u003e \u003ch2\u003eRepeat annotation\u003c/h2\u003e \u003cp\u003eTandem repeats and satellites were annotated using TideCluster v1.0\u003csup\u003e43,44\u003c/sup\u003e (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/kavonrtep/TideCluster\u003c/span\u003e\u003cspan address=\"https://github.com/kavonrtep/TideCluster\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e), a wrapper for TideHunter\u003csup\u003e\u003cspan citationid=\"CR43\" class=\"CitationRef\"\u003e43\u003c/span\u003e\u003c/sup\u003e. Satellite repeats with a monomer size ranging from 40 to 3 kb and a minimum array length of 5 kb were annotated using the default TideCluster settings. Satellites with a monomer size between 10 to 39 bp and a minimum array length of 5 kb were identified using TideCluster with parameters -T \"-p 10 -P 39 -c 5 -e 0.25\" -m 5000. LTR retrotransposons (LTR-RT) were annotated using DANTE v0.1.8\u003csup\u003e45\u003c/sup\u003e (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/kavonrtep/dante\u003c/span\u003e\u003cspan address=\"https://github.com/kavonrtep/dante\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e) and the DANTE_LTR v0.2.3.2\u003csup\u003e45\u003c/sup\u003e pipeline (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/kavonrtep/dante_ltr\u003c/span\u003e\u003cspan address=\"https://github.com/kavonrtep/dante_ltr\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e) on the RepeatExplorer Galaxy server\u003csup\u003e\u003cspan citationid=\"CR46\" class=\"CitationRef\"\u003e46\u003c/span\u003e\u003c/sup\u003e. The sequences of the identified LTR-RT elements were used to create a custom library of LTR-RT elements using the \"dante_ltr_to_library\" script from the DANTE_LTR repository. A custom library of Class II transposable elements was obtained using RepeatExplorer clustering protocol 1\u003csup\u003e47\u003c/sup\u003e on unassembled Illumina paired-end reads. Contigs corresponding to Class II retrotransposons with a minimum read depth of 5 reads and a minimum length of 100 bp were obtained using tools on the RepeatExplorer Galaxy server. Consensus sequences of rRNA gene arrays including intergenic spacer sequences were fully reconstructed from the RepeatExplorer contigs. A custom library of LINE elements was created by extracting regions with LINE protein coding domains identified by DANTE, along with the upstream and downstream 4kb flanking regions. The extracted genomic sequences were split into 100 nt fragments and analyzed by RepeatExplorer clustering. Contigs corresponding to LINE elements with a read depth of at least 3 reads and a minimum length of 150 nt were converted into a custom library. All custom libraries were concatenated and used as a library for RepeatMasker v4.1.5\u003csup\u003e48\u003c/sup\u003e search. The RepeatMasker search was performed on the RepeatExplorer Galaxy server with options \"-xsmall -no_is -e ncbi\". All regions annotated as mobile elements with RepeatMasker based on custom library search which overlapped with satellite repeats annotated by TideCluster were removed from the annotation using bedtools\u003csup\u003e\u003cspan citationid=\"CR49\" class=\"CitationRef\"\u003e49\u003c/span\u003e\u003c/sup\u003e with command \u0026ldquo;bedtools subtract\u0026rdquo;. The resulting GFF3 was then merged with the DANTE annotation using a custom R script ( \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/kavonrtep/granges_tools/merge_repeat_annotations.R\u003c/span\u003e\u003cspan address=\"https://github.com/kavonrtep/granges_tools/merge_repeat_annotations.R\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e).\u003c/span\u003e The classification of mobile elements in the annotation files corresponds to the classification system used in the REXdb database\u003csup\u003e\u003cspan citationid=\"CR50\" class=\"CitationRef\"\u003e50\u003c/span\u003e\u003c/sup\u003e. For the final repeat-masking process, all of the above repeat annotation GFF3 files were consolidated. We merged the annotated regions into a single BED file using the bedtools v2.31.1 merge tool\u003csup\u003e\u003cspan citationid=\"CR49\" class=\"CitationRef\"\u003e49\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec14\" class=\"Section2\"\u003e \u003ch2\u003eAnalysis of centromeres\u003c/h2\u003e \u003cp\u003eCentromere positions in the assembly were revealed using ChIP-seq with CENH3 antibody CenH3-2_VF (raised against the peptide \u0026ldquo;CQT PRH ARE TQE KKK RRN KPG\u0026rdquo;\u003csup\u003e\u003cspan citationid=\"CR51\" class=\"CitationRef\"\u003e51\u003c/span\u003e\u003c/sup\u003e) as described previously\u003csup\u003e\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e\u003c/sup\u003e. Duplicate experiments, including independent chromatin preparations, were performed and the resulting immunoprecipitated fragments along with the corresponding control samples (Input; digested chromatin not subjected to immunoprecipitation) were sequenced on the Illumina platform (Admera Health, NJ, USA) in paired-end, 150 bp mode. The resulting reads were quality-filtered and trimmed using Trimmomatic v0.39\u003csup\u003e52\u003c/sup\u003e (minimum allowed length\u0026thinsp;=\u0026thinsp;100 nt), yielding 151\u0026ndash;201\u0026nbsp;million reads per sample, which were mapped onto the assembly using Bowtie 2 v2.4.2\u003csup\u003e53\u003c/sup\u003e, with options -p 64 -U. Subsequent analysis was performed on full output from Bowtie2 program and on output where all multi-mapped reads were filtered out. Filtering of multi-mapped reads was performed using Sambamba v0.8.1\u003csup\u003e54\u003c/sup\u003e with options \u0026ldquo;-F [XS]\u0026thinsp;=\u0026thinsp;=\u0026thinsp;null and not unmapped and not duplicate\u0026rdquo;. Regions with statistically significant ChIP/Input enrichment ratio were identified by comparing ChIP and Input mapped reads using the epic2 program\u003csup\u003e\u003cspan citationid=\"CR55\" class=\"CitationRef\"\u003e55\u003c/span\u003e\u003c/sup\u003e, with the parameter \u0026ldquo;--bin-size 200\u0026rdquo;. Alternative identification of enrichment was performed using MACS v3.0.0a6\u003csup\u003e56\u003c/sup\u003e, with options \u0026ldquo;--broad --broad-cutoff 0.1\u0026rdquo;. The ChIP/Input ratio was calculated for plotting purposes using bamCompare v3.5.1 from the deepTools package\u003csup\u003e\u003cspan citationid=\"CR57\" class=\"CitationRef\"\u003e57\u003c/span\u003e\u003c/sup\u003e. The program was run with the parameter \u0026ldquo;\u0026ndash;binSize 200\u0026rdquo; to calculate the log2 ratio for the 200 nt window size. The resulting data were plotted using the rtracklayer package of R\u003csup\u003e\u003cspan citationid=\"CR58\" class=\"CitationRef\"\u003e58\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec15\" class=\"Section2\"\u003e \u003ch2\u003eIso-seq transcriptome assembly\u003c/h2\u003e \u003cp\u003eThe raw PacBio Iso-Seq data were processed using the default parameters within the IsoSeq3 (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/PacificBiosciences/pbbioconda\u003c/span\u003e\u003cspan address=\"https://github.com/PacificBiosciences/pbbioconda\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e) analysis tool in SMRT Link. The processing steps encompassed various stages such as Circular Consensus Sequences (CCS) generation, trimming, refining, clustering and polishing. Initially, reads were categorized as either full-length (FL) or non-full length (NFL) based on the presence of 5\u0026rsquo; primer and 3\u0026rsquo; primer or 3\u0026rsquo; poly-A tail. Full-length non-chimeric (FLNC) reads were identified as those containing both 5\u0026rsquo; and 3\u0026rsquo; cDNA primers along with a poly-A tail. Reads lacking any of these defining tags were classified as non-full length (NFL) reads. Further processing involved trimming FLNC reads to eliminate barcoded and unbarcoded cDNA primers, along with unwanted primers, and aligning the reads in the 5 -\u0026gt; 3 orientations. Preceding the generation of clustered consensus sequences, FLNC reads underwent refinement steps to remove poly-A tails and artificial concatemers. Aligned reads were used to assess completeness of transcriptomes. The Iterative Clustering and Error Correction (ICE) algorithm were employed to produce clustered consensus sequences from the refined FLNC reads. Subsequently, a polishing arrow model, which included Quiver (QV) tracks, was utilized to refine multiple FLNC reads derived from identical isoforms, aiming to obtain non-redundant isoform sequences. The final FLNC transcript sequences were then classified into high-quality (HQ) transcripts, exhibiting\u0026thinsp;\u0026ge;\u0026thinsp;99% post-correction accuracy, and low-quality (LQ) transcripts, characterized by an accuracy of \u0026lt;\u0026thinsp;99%. Clustered transcripts were aligned to the reference genome Hedin2 v2 using minimap2 v2.24\u003csup\u003e59\u003c/sup\u003e (\u003cb\u003eSupplementary Table\u0026nbsp;3\u003c/b\u003e). The transcripts were combined and refined using cDNA_Cupcake (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/Magdoll/cDNA_Cupcake\u003c/span\u003e\u003cspan address=\"https://github.com/Magdoll/cDNA_Cupcake\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e) after initial filtration, which excluded transcripts unsupported by a minimum of two full-length reads per sample. The combined transcripts were used to annotate the genome.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec16\" class=\"Section2\"\u003e \u003ch2\u003eGene annotation\u003c/h2\u003e \u003cp\u003eThe gene annotation was conducted using BRAKER v2.1.6\u003csup\u003e60\u003c/sup\u003e with the etpmode and long reads\u0026rsquo; protocol. RNA-seq data from 26 samples were downloaded\u003csup\u003e\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e\u003c/sup\u003e and aligned to the \u0026lsquo;Hedin/2 v.2\u0026rsquo; sequence using HISAT2 v2.2.1 with standard parameters (--phred64 --sensitive --no-discordant --no-mixed -I 1 -X 1000)\u003csup\u003e\u003cspan citationid=\"CR61\" class=\"CitationRef\"\u003e61\u003c/span\u003e\u003c/sup\u003e. The Iso-Seq full-length transcripts as described before were used as input for the long reads\u0026rsquo; protocol. The BREAKER Long reads\u0026rsquo; annotation pipeline was performed with four steps: \u003cem\u003eab initio\u003c/em\u003e prediction, homology-based prediction, RNA-seq data-based prediction and Iso-seq data-based prediction. The final gene models were evaluated using both RNA-seq transcriptome data and Iso-seq transcriptome data, followed by manual correction. The gene function annotation was achieved according to the predicted protein-coding genes blasted again five databases of eggNOG 5.0\u003csup\u003e62\u003c/sup\u003e, GO\u003csup\u003e\u003cspan citationid=\"CR63\" class=\"CitationRef\"\u003e63\u003c/span\u003e\u003c/sup\u003e, KEGG\u003csup\u003e\u003cspan citationid=\"CR64\" class=\"CitationRef\"\u003e64\u003c/span\u003e\u003c/sup\u003e, Non-Redundant database (NR: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.ncbi.nlm.nih.gov/refseq/about/nonredundantproteins/\u003c/span\u003e\u003cspan address=\"https://www.ncbi.nlm.nih.gov/refseq/about/nonredundantproteins/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e)\u003c/span\u003e, and SwissProt\u003csup\u003e\u003cspan citationid=\"CR65\" class=\"CitationRef\"\u003e65\u003c/span\u003e\u003c/sup\u003e. The GO and KEGG pathway enrichment analyses were performed for genes using the R package Clusterprofiler\u003csup\u003e\u003cspan citationid=\"CR66\" class=\"CitationRef\"\u003e66\u003c/span\u003e\u003c/sup\u003e and GOplot. We defined the adjusted p-value of \u0026lt;\u0026thinsp;0.1 and q-value\u0026thinsp;\u0026lt;\u0026thinsp;0.3 (Benjamini\u0026ndash;Hochberg method) as the cutoff criteria.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec17\" class=\"Section2\"\u003e \u003ch2\u003eATAC-seq library preparation and analysis\u003c/h2\u003e \u003cp\u003eATAC-seq libraries were prepared in duplicate. Flash frozen tissue (0.5g) was ground in liquid nitrogen, suspended in nuclei isolation buffer (NIB, 10 mM MES-KOH (pH 5.4), 10 mM NaCl, 10 mM KCl, 0.1 mM spermine, 1mM spermidine, 1mM DTT and 1% Triton X-100 and protease inhibitors (Source), filtered twice through 30 micron filters into ice-cold NIB\u0026thinsp;+\u0026thinsp;1.8M sucrose, washed twice following centrifugation at 1000g for 15 min and resuspended in NIB. Libraries were prepared from serial dilutions in 50ul reactions using the (Nextera DNA Library Prep Kit, Illumina, San Diego, California, USA) with 2.5 \u0026micro;l TDE1, a 30-minute incubation at 37C and amplified according to Buenrostro et al\u003csup\u003e\u003cspan citationid=\"CR67\" class=\"CitationRef\"\u003e67\u003c/span\u003e\u003c/sup\u003e. Libraries were sequenced using paired 125 based reads on an Illumina HiSeq 2500 system (Illumina, San Diego, California, USA). ATAC-seq data analysis followed the Encode paired-end ATAC-seq pipeline specifications (Hitz 2023, \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://docs.google.com/document/d/1f0Cm4vRyDQDu0bMehHD7P7KOMxTOP-HiNoIvL1VcBt8/edit\u003c/span\u003e\u003cspan address=\"https://docs.google.com/document/d/1f0Cm4vRyDQDu0bMehHD7P7KOMxTOP-HiNoIvL1VcBt8/edit\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e)\u003c/span\u003e with the following modifications. Reads mapping to chloroplast sequences were also removed. No blacklist was used. Read alignment with Bowtie2 used a maximum insert size of 1000 bp (-X 1000) and up to ten alignments were reported (-k 10). Only the unique alignments (primary alignments with MAPQ\u0026thinsp;\u0026gt;\u0026thinsp;30) were used for subsequent analysis. Heat maps and metaplots were generated using Deeptools\u003csup\u003e\u003cspan citationid=\"CR57\" class=\"CitationRef\"\u003e57\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec18\" class=\"Section2\"\u003e \u003ch2\u003eSNP calling\u003c/h2\u003e \u003cp\u003eThe resequencing reads were mapped to the \u0026lsquo;Hedin/2 v.2\u0026rsquo; reference genome using minimap2 v2.24\u003csup\u003e59\u003c/sup\u003e using the default parameters. SAMtools v1.16.1\u003csup\u003e68\u003c/sup\u003e software was used to convert mapping results into the BAM format, and Novosort v3.06.05 (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://www.novocraft.com\u003c/span\u003e\u003cspan address=\"http://www.novocraft.com\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e) was used to sort the bam file. Finally, BCFtools (v.1.8)\u003csup\u003e\u003cspan citationid=\"CR69\" class=\"CitationRef\"\u003e69\u003c/span\u003e\u003c/sup\u003e was used to call SNPs and short insertions and deletions (indels) with the parameters (-q 20 -Q 20 --excl-flags 3332). Finally, indels and SNPs with none bi-allelic, \u0026gt;\u0026thinsp;10% missing calls, \u0026gt;\u0026thinsp;10% heterozygosity and MAF\u0026thinsp;\u0026lt;\u0026thinsp;0.001 were removed. SNPs with MAF\u0026thinsp;\u0026lt;\u0026thinsp;0.05 were further removed for phylogenetic tree structure, GWAS, LD decay, PCA and population structure analyses. SNP density and genetic diversity across each chromosome were counted with a 10 kb sliding window using Perl script (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/helen5588/IPK_faba_codes/blob/main/s2_snp_calling/get_snp_stat.pl\u003c/span\u003e\u003cspan address=\"https://github.com/helen5588/IPK_faba_codes/blob/main/s2_snp_calling/get_snp_stat.pl\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec19\" class=\"Section2\"\u003e \u003ch2\u003ePopulation genetics analysis\u003c/h2\u003e \u003cp\u003eWe used the whole-genome SNPs to construct the ML phylogenetic tree with 100 bootstrap using Phylip v3.696\u003csup\u003e70\u003c/sup\u003e. The tool iTOL (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://itol.embl.de\u003c/span\u003e\u003cspan address=\"http://itol.embl.de\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e) was used to color the phylogenetic tree. The Admixture v1.3\u003csup\u003e71\u003c/sup\u003e was used to infer the population structure with K values ranging from 2 to 10 and ran the cross-validation error (CV) procedure. To identify the pattern of linkage disequilibrium among different faba bean accessions, the pairwise squared correlation coefficients (r\u003csup\u003e\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e\u003c/sup\u003e) were calculated using PopLDdecay v3.30\u003csup\u003e72\u003c/sup\u003e. Rscript v3.5.1\u003csup\u003e73\u003c/sup\u003e was applied to plot the average r\u003csup\u003e\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e\u003c/sup\u003e values between all pairs of genotypes.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec20\" class=\"Section2\"\u003e \u003ch2\u003eSelective sweep analysis\u003c/h2\u003e \u003cp\u003eTo identify differentiated loci between sub-populations, we performed a sliding window (10kb) analysis based on pairwise estimates of genetic differentiation (F\u003csub\u003eST\u003c/sub\u003e)\u003csup\u003e\u003cspan citationid=\"CR74\" class=\"CitationRef\"\u003e74\u003c/span\u003e,\u003cspan citationid=\"CR75\" class=\"CitationRef\"\u003e75\u003c/span\u003e\u003c/sup\u003e and genetic diversity (π-ratio) within each population. The π and FST were calculated respectively with Perl script (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/helen5588/IPK_faba_codes/blob/main/s5_sweep\u003c/span\u003e\u003cspan address=\"https://github.com/helen5588/IPK_faba_codes/blob/main/s5_sweep\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e)\u003csup\u003e76\u003c/sup\u003e. The window size was 10 kb with a step size of 2 kb. To detect sweep signals of selection in the faba bean collection, we employed the cross-population composite likelihood ratio test (XP-CLR) using XP-CLR v.1.0, as described by Chen et al\u003csup\u003e\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e\u003c/sup\u003e. The XP-CLR analysis was performed with 50 kb sliding window and 10 kb step between spring vs winter and normal VC type vs low VC type in faba bean population separately.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec21\" class=\"Section2\"\u003e \u003ch2\u003ePhenotyping\u003c/h2\u003e \u003cp\u003eOf the 406 accessions, 210 faba bean samples (207 spring-type and 3 winter-type) originated from breeding lines from Norddeutsche Pflanzenzucht Hans-Georg Lembke KG (NPZ). The remaining 196 (194 winter-type and 2 spring-type) faba bean inbred lines (F\u0026thinsp;\u0026gt;\u0026thinsp;9) were derived via SSD from the so-called G\u0026ouml;ttingen Winter Bean Population (GWBP). Independently, 183 winter faba bean inbred lines were evaluated for tolerance to freezing during winter (winter-frost; WF\u003csup\u003e\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e,\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e\u003c/sup\u003e) and early spring (late-frost; LF\u003csup\u003e\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e,\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e\u003c/sup\u003e) under controlled temperature conditions in n\u0026thinsp;=\u0026thinsp;10 and n\u0026thinsp;=\u0026thinsp;5 replicated climate chamber experiments. Visual frost stress symptoms were scored as loss of leaf (and stem) turgor and color on hardened or de-hardened (WF/LF) juvenile plants during three successive frost stress treatments at -16/-13\u0026deg;C, -18\u0026deg;C/-15\u0026deg; C, and \u0026minus;\u0026thinsp;19/-17\u0026deg;C. Scores from 1\u0026ndash;9 (no-full symptoms) were summed to total loss of turgor (LossT), color (LossC), and grand sum (LossTC). Trait repeatability was high (0.77\u0026thinsp;\u0026le;\u0026thinsp;h\u003csup\u003e2\u003c/sup\u003e\u0026thinsp;\u0026le;\u0026thinsp;0.92) and ANOVA revealed significant genotypic variance among lines.\u003c/p\u003e \u003cp\u003eA subset of the ProFaba panel (208 out of 239 accessions) was tested for winter hardiness at experimental station Reinshof, University of G\u0026ouml;ttingen, Lower Saxony (51.48 \u0026deg;N, 9.93 \u0026deg;E), Germany in season 2021/2022. The ProFaba panel and two check cultivars Augusta (winter-type, NPZ) and Tiffany (spring-type, NPZ) were tested in double rows in an unreplicated, randomized complete block design (RCBD). Within the block, the 180 spring type and 28 winter type faba bean accessions of the panel were each arranged as RCBD to prevent neighboring effects later in season. Checks were replicated 21 times each to estimate repeatability (h\u0026sup2; \u0026gt;0.80). Winter survival and late frost survival (number of surviving plants) were assessed after winter in February and following the late frost in April. Winter survival rate and late frost survival rate were calculated in relation to emerged plants.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec22\" class=\"Section2\"\u003e \u003ch2\u003eGWAS analysis\u003c/h2\u003e \u003cp\u003eGEMMA v0.98.5\u003csup\u003e77\u003c/sup\u003e, and Bayesian-information and Linkage-disequilibrium Iteratively Nested Keyway (BLINK) method, BLINK\u003csup\u003e\u003cspan citationid=\"CR78\" class=\"CitationRef\"\u003e78\u003c/span\u003e\u003c/sup\u003e were used for the association analysis. The first three PCA values (eigenvectors), which were derived from whole-genome SNPs, were used as covariates to correct for population stratification. Most highly significant SNPs associated with freezing tolerance were compared with positions of Hedin/2 protein-coding genes. Orthologues of the gene overlapping variant were identified using MCScanX\u003csup\u003e\u003cspan citationid=\"CR79\" class=\"CitationRef\"\u003e79\u003c/span\u003e\u003c/sup\u003e. We employed LDBlockShow v1.40\u003csup\u003e80\u003c/sup\u003e to generate and visualize the LD pattern of haplotype block.\u003c/p\u003e \u003cdiv id=\"Sec23\" class=\"Section3\"\u003e \u003ch2\u003eGenomic-assisted prediction\u003c/h2\u003e \u003cp\u003eWe evaluated the ability of a genomic-enabled model to predict frost stress tolerance as loss of turgor, loss of color and loss of turgor and color in our 183 winter-type faba bean lines using WF\u003csup\u003e\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e,\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e\u003c/sup\u003e and LF\u003csup\u003e\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e,\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e\u003c/sup\u003e. The WF dataset was also used for a GWAS and is referred to here as the discovery set. Predictive ability was assessed by repeated random cross-validation, where 60% of the lines were assigned to the training set and 40% to the testing set. In each of the 1000 iterations, two GBLUP models were fitted to the data of the discovery set, and breeding values for the testing set were correlated with observed data from the LF dataset. The first model (M1) takes the form \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\mathsf{y}=1{\\mu\\:}+\\mathsf{Z}\\mathsf{u}+\\mathsf{e}\\)\u003c/span\u003e\u003c/span\u003e, where \u0026#120326; is the response vector, \u0026micro; is the intercept with associated vector \u003cb\u003e1\u003c/b\u003e of 1s, \u0026#120322; is the vector of breeding values with associated incidence matrix \u003cb\u003eZ\u003c/b\u003e, and \u0026#120306; is the vector of residual effect. Vectors \u0026#120322; and \u0026#120306; are assumed \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\mathsf{u}\\sim\\text{N}(0,\\mathsf{\\:}\\mathsf{G}{{\\sigma\\:}}_{\\text{u}}^{2})\\)\u003c/span\u003e\u003c/span\u003e, and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\mathsf{e}\\sim\\text{N}(0,\\mathsf{\\:}\\mathbf{I}{{\\sigma\\:}}_{\\text{e}}^{2})\\)\u003c/span\u003e\u003c/span\u003e, respectively, where \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\mathsf{G}\\)\u003c/span\u003e\u003c/span\u003e is the genomic relationship matrix\u003csup\u003e\u003cspan citationid=\"CR81\" class=\"CitationRef\"\u003e81\u003c/span\u003e\u003c/sup\u003e, and \u003cb\u003eI\u003c/b\u003e is an identity matrix. The second model (M2) incorporates the genotypes of two QTLs, detected by GWAS analysis on the discovery set, as fixed effects. The structure of this model is analogous to M1, except that the term \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:1{\\mu\\:}\\)\u003c/span\u003e\u003c/span\u003e is replaced by \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\mathsf{X}\\varvec{\\beta\\:}\\)\u003c/span\u003e\u003c/span\u003e, where \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\mathsf{X}\\)\u003c/span\u003e\u003c/span\u003e represents the incidence matrix for the fixed effects \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\varvec{\\beta\\:}\\)\u003c/span\u003e\u003c/span\u003e. We performed the cross-validation task using the \u0026ldquo;mixed.solve\u0026rdquo; function of the R package rrBLUP v4.6.3\u003csup\u003e82\u003c/sup\u003e.\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv id=\"Sec24\" class=\"Section2\"\u003e \u003ch2\u003eFrost treatments and RNA-Seq\u003c/h2\u003e \u003cp\u003eWonjam1-ho was developed from PI 469181, which is a new variety with superior adaptability to the winter environmental conditions in South Korea. This variety was generated by irradiation with 100 Gy gamma-rays using a \u003csup\u003e\u003cspan citationid=\"CR60\" class=\"CitationRef\"\u003e60\u003c/span\u003e\u003c/sup\u003eCo gamma irradiator (150 TBq capacity; ACEL, Ottawa, ON, Canada). The faba bean accessions were obtained from the USDA-Agricultural Research Service (ARS) in Pullman, WA, USA. Briefly, 1000 M\u003csub\u003e1\u003c/sub\u003e seeds were planted in the experimental field at the Korea Atomic Energy Research Institute (KAERI; 35.5699\u0026deg;N 126.9772\u0026deg;E, Jeongeup, Jeollabuk, Republic of Korea) in 2015, and 2000 M\u003csub\u003e2\u003c/sub\u003e seeds were pooled for next generation. Subsequently, a total of 2000 M\u003csub\u003e2\u003c/sub\u003e plants were grown and harvested individually in 2017. M\u003csub\u003e3:5\u003c/sub\u003e seeds were then propagated from single plants, and M\u003csub\u003e5:7\u003c/sub\u003e plants were generated by line selection during 2020\u0026ndash;2023. Frost tolerance was tested in the experimental filed of KAERI with faba bean mutant population consisting of 291 mutant lines during 2022\u0026ndash;2023. Finally, we selected the 230-5 mutant variety with high frost tolerance and named wonjam1-ho. To investigate the transcriptome response for frost tolerance between Wonjam1-ho (tolerant) and PI 271634 from India (susceptible), both were placed in 15 cm plastic pots with commercial soil (Hungnong, Pyeongtaek, Republic of Korea) and were grown in a growth chamber (Vision Science, Republic of Korea) set at 18 ℃ for 20 days. Next, the temperature in the growth chamber was gradually decreased at a rate of 1\u0026deg;C/h until it reached 4\u0026deg;C, where it was held for 3 days (12-h day/12-h night). The temperature was further lowered by 1\u0026deg;C/h to \u0026minus;\u0026thinsp;7\u0026deg;C, maintained for 12-h night, and then raised in reverse order back to 18\u0026deg;C for recovery. Leaves were collected after treatments (at 18\u0026deg;C, 4\u0026deg;C, 0\u0026deg;C, and \u0026minus;\u0026thinsp;7\u0026deg;C), immediately frozen in liquid nitrogen, and used for RNA extraction. The expression data for these samples were used to quantify gene expression using RSEM v 1.3.3\u003csup\u003e83\u003c/sup\u003e.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eData availability\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe \u0026lsquo;Hedin/2 v.2\u0026rsquo; genome is deposited at European Nucleotide Archive-ENA under accession PRJEB77688. The gene annotation files are stored in e!DAL\u003csup\u003e84\u003c/sup\u003e (http://dx.doi.org/10.5447/ipk/2024/10). Resequencing reads are available in ENA under accession PRJEB76803. ChIP-seq data are accessible under SRA archive run accessions ERR9966691-ERR9966694. Iso-Seq data and RNA-seq data are available in ENA under accession PRJEB81208 and PRJEB81208. ATAC-seq raw reads are stored in ENA under accession PRJNA912311. ProFaba GBS reads are stored in ENA under accession PRJEB75354.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCode availability\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eScripts used in this study have been made available at GitHub (https://github.com/helen5588/IPK_faba_codes.git).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAcknowledgments\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe thank J. Bauernfeind, T. M\u0026uuml;nch and H. Miehe for IT administration; A. Arend, A. Fiebig, M. Maruschewski for help with data submission; K.Trnka for plant sampling.\u0026nbsp;This work was funded by the Leibniz Association to M.J. (REPLACE, J118/2021), the European Union\u0026rsquo;s Horizon 2020 Programme for Research \u0026amp; Innovation in the ERA-NET Cofund SusCrop project ProFaba (grant no. 771134) to S.U.A. \u0026amp; W.L. part of the Joint Programming Initiative on Agriculture, Food Security, and Climate Change (FACCE-JPI), the Green Development and Demonstration Programme (GUDP) grant 34009-20-1761 for the IMFABA project and the National Research Council Canada\u0026rsquo;s Sustainable Protein Production (SPP) program. J.M. and P.N. were supported by the Czech Science Foundation grant no. 24-10036S. Computational resources and data storage facilities were in part provided by the ELIXIR-CZ Research Infrastructure Project (LM2023055). Z.T. and H.S. were supported by the project TowArds Next GENeration Crops, reg. no. CZ.02.01.01/00/22_008/0004581 of the ERDF Programme Johannes Amos Comenius. Cooperative Research Program for Agriculture Science and Technology Development (Project No. PJ017134), Rural Development Administration and the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2020R1A2C3007885) Republic of Korea to T.J.Y. S.J.K. is supported by the research program of the Korea Atomic Energy Research Institute (Project No. 523420-24).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthor Contributions\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eM.J. and S.U.A. designed the study. M.J., A.H. coordinated sequencing. M.J., H.S., supervised sequence assembly. S.U.A., W.L., supervised phenotyping experiments. T.J.Y., S.J.K., supervised cold-treated transcriptome assay. Optical genome mapping and hybrid scaffolding: Z.T., H.S. Pseudomolecule construction and gene annotation: H.Z., M.M., M.J. Repeat and centromere annotation: P.N., J.M. ATAC sequencing and analysis: D.C., Y.C., D.J.F.K. Phenotyping, seed supply and multiplication: A.W., G.W., O.S. Cold treatment, RNA-Seq analysis: A.W., J.M.K., H.Z. Genomic prediction and association analyses: E.B., H.Z. Data management and submission: H.Z., D.J.F.K., J.M. Writing: M.J., S.U.A, H.S., J.M., D.J.F.K, H.Z., E.B., S.J.K. Coordination: S.U.A., M.J. All authors read and commented on the manuscript.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCompeting interests\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eOS, GW are employees of NPZ Innovation (NPZi) GmbH. All other authors declare no competing interests.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eKhazaei, H. \u0026amp; Vandenberg, A. Seed Mineral Composition and Protein Content of Faba Beans (Vicia faba L.) with Contrasting Tannin Contents. Agronomy 10, 511 (2020).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHerridge, D.F., Peoples, M.B. \u0026amp; Boddey, R.M. Global inputs of biological nitrogen fixation in agricultural systems. Plant and Soil 311, 1\u0026ndash;18 (2008).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZander, P. \u003cem\u003eet al.\u003c/em\u003e Grain legume decline and potential recovery in European agriculture: a review. Agronomy for Sustainable Development 36, 26 (2016).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLink, W., Balko, C. \u0026amp; Stoddard, F.L. Winter hardiness in faba bean: Physiology and breeding. Field Crops Research 115, 287\u0026ndash;296 (2010).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSallam, A., Arbaoui, M., El-Esawi, M., Abshire, N. \u0026amp; Martsch, R. Identification and verification of QTL associated with frost tolerance using linkage mapping and GWAS in winter faba bean. Frontiers in Plant Science 7, 1098 (2016).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLandry, E.J. \u0026amp; Hu, J. Increasing pre-acclimation temperature reduces the freezing tolerance of winter-type faba bean (Vicia faba L.). Journal of Agronomy and Crop Science 205, 46\u0026ndash;53 (2019).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCarrillo-Perdomo, E. \u003cem\u003eet al.\u003c/em\u003e A QTL approach in faba bean highlights the conservation of genetic control of frost tolerance among legume species. Frontiers in Plant Science 13, 970865 (2022).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAli, M.B.M. \u003cem\u003eet al.\u003c/em\u003e Association analyses to genetically improve drought and freezing tolerance of faba bean (\u003cem\u003eVicia faba L.\u003c/em\u003e). Crop Science 56, 1036\u0026ndash;1048 (2016).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWindhorst, A., Skovbjerg, C.K., Andersen, S.U. \u0026amp; Link, W. Improving overwintering in times of climate change - a GWAS for late-frost tolerance of winter faba bean. in \u003cem\u003e73rd Conference of the Vereinigung der Pflanzenz\u0026uuml;chter und Saatgutkaufleute \u0026Ouml;sterreichs\u003c/em\u003e 29\u0026ndash;36 (Raumberg-Gumpenstein, Irdning, Austria, 2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWindhorst, A. \u003cem\u003eet al.\u003c/em\u003e Genome-wide association analyses identify QTL for tolerance to freezing during winter and early spring as a basis for in-depth genetic analysis and implementation in winter faba bean (\u003cem\u003eVicia faba L\u003c/em\u003e.) breeding. \u003cem\u003ebioRxiv\u003c/em\u003e (2024).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCr\u0026eacute;pon, K. \u003cem\u003eet al.\u003c/em\u003e Nutritional value of faba bean (Vicia faba L.) seeds for feed and food. Field Crops Research 115, 329\u0026ndash;339 (2010).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBj\u0026ouml;rnsdotter, E. \u003cem\u003eet al.\u003c/em\u003e VC1 catalyses a key step in the biosynthesis of vicine in faba bean. Nature Plants 7, 923\u0026ndash;931 (2021).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTacke, R., Ecke, W., H\u0026ouml;fer, M., Sass, O. \u0026amp; Link, W. Fine-mapping of the major locus for vicine and convicine in faba bean (Vicia faba) and marker-assisted breeding of a novel, low vicine and convicine winter faba bean population. Plant Breeding 141, 644\u0026ndash;657 (2022).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJayakodi, M. \u003cem\u003eet al.\u003c/em\u003e The giant diploid faba genome unlocks variation in a global protein crop. Nature 615, 652\u0026ndash;659 (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSkovbjerg, C.K. \u003cem\u003eet al.\u003c/em\u003e Genetic analysis of global faba bean diversity, agronomic traits and selection signatures. Theoretical and Applied Genetics 136, 114 (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRhie, A., Walenz, B.P., Koren, S. \u0026amp; Phillippy, A.M. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biology 21, 245 (2020).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNeumann, P. \u003cem\u003eet al.\u003c/em\u003e Epigenetic histone marks of extended meta-polycentric centromeres of \u003cem\u003eLathyrus\u003c/em\u003e and \u003cem\u003ePisum\u003c/em\u003e chromosomes. Frontiers in Plant Science 7, 234 (2016).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMacas, J. \u003cem\u003eet al.\u003c/em\u003e Assembly of the 81.6 Mb centromere of pea chromosome 6 elucidates the structure and evolution of metapolycentric chromosomes. PLOS Genetics 19, e1010633 (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLu, Z. \u003cem\u003eet al.\u003c/em\u003e The prevalence, evolution and chromatin signatures of plant regulatory elements. Nature Plants 5, 1250\u0026ndash;1259 (2019).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAlexander, D.H., Novembre, J. \u0026amp; Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Research 19, 1655\u0026ndash;1664 (2009).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChen, H., Patterson, N. \u0026amp; Reich, D. Population differentiation as a test for selective sweeps. Genome Res 20, 393\u0026ndash;402 (2010).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLiang, Y., Huang, Y., Liu, C., Chen, K. \u0026amp; Li, M. Functions and interaction of plant lipid signalling under abiotic stresses. Plant Biology 25, 361\u0026ndash;378 (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang, H. \u003cem\u003eet al.\u003c/em\u003e Optimization of Genotyping-by-Sequencing (GBS) for Germplasm Fingerprinting and Trait Mapping in Faba Bean. Legume Science 6, e254 (2024).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eV\u0026aacute;g\u0026uacute;jfalvi, A., Galiba, G., Cattivelli, L. \u0026amp; Dubcovsky, J. The cold-regulated transcriptional activator Cbf3 is linked to the frost-tolerance locus Fr-A2 on wheat chromosome 5A. Molecular Genetics and Genomics 269, 60\u0026ndash;67 (2003).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSandve, S.R. \u003cem\u003eet al.\u003c/em\u003e Molecular mechanisms underlying frost tolerance in perennial grasses adapted to cold climates. Plant Science 180, 69\u0026ndash;77 (2011).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eThomashow, M.F. Molecular Basis of Plant Cold Acclimation: Insights Gained from Studying the CBF Cold Response Pathway. Plant Physiology 154, 571\u0026ndash;577 (2010).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFowler, S. \u0026amp; Thomashow, M.F. Arabidopsis transcriptome profiling indicates that multiple regulatory pathways are activated during cold acclimation in addition to the CBF cold response pathway. The Plant Cell 14, 1675\u0026ndash;1690 (2002).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSchulz, E., Tohge, T., Zuther, E., Fernie, A.R. \u0026amp; Hincha, D.K. Flavonoids are determinants of freezing tolerance and cold acclimation in Arabidopsis thaliana. Scientific Reports 6, 34027 (2016).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAhmad, S., Ali, S., Shah, A.Z., Khan, A. \u0026amp; Faria, S. Chalcone synthase (CHS) family genes regulate the growth and response of cucumber (Cucumis sativus L.) to Botrytis cinerea and abiotic stresses. Plant Stress 8, 100159 (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJayaraman, K. \u003cem\u003eet al.\u003c/em\u003e Stress-inducible expression of chalcone isomerase2 gene improves accumulation of flavonoids and imparts enhanced abiotic stress tolerance to rice. Environmental and Experimental Botany 190, 104582 (2021).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang, Y. \u003cem\u003eet al.\u003c/em\u003e The regulation of adaptation to cold and drought stresses in Poa crymophila Keng revealed by integrative transcriptomics and metabolomics analysis. Frontiers in Plant Science 12, 631117 (2021).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMockler, T.C. \u003cem\u003eet al.\u003c/em\u003e Regulation of flowering time in Arabidopsis by K homology domain proteins. \u003cem\u003eProceedings of the National Academy of Sciences\u003c/em\u003e 101, 12759\u0026ndash;12764 (2004).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLavania, D. \u003cem\u003eet al.\u003c/em\u003e Genetic approaches for breeding heat stress tolerance in faba bean (Vicia faba L.). Acta Physiologiae Plantarum 37, 1737 (2014).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMaalouf, F. \u003cem\u003eet al.\u003c/em\u003e Genetic Dissection of Heat Stress Tolerance in Faba Bean (Vicia faba L.) Using GWAS. \u003cem\u003ePlants\u003c/em\u003e 11, 1108 (2022).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSallam, A. \u0026amp; Martsch, R. Association mapping for frost tolerance using multi-parent advanced generation inter-cross (MAGIC) population in faba bean (Vicia faba L.). Genetica 143, 501\u0026ndash;514 (2015).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBornhofen, E. \u003cem\u003eet al.\u003c/em\u003e Genetics of faba bean yield and yield stability. bioRxiv (2024).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTayeh, N. \u003cem\u003eet al.\u003c/em\u003e A tandem array of CBF/DREB1 genes is located in a major freezing tolerance QTL region on Medicago truncatula chromosome 6. BMC Genomics 14, 814 (2013).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eŠimkov\u0026aacute;, H., Č\u0026iacute;hal\u0026iacute;kov\u0026aacute;, J., Vr\u0026aacute;na, J., Lys\u0026aacute;k, M.A. \u0026amp; Doležel, J. Preparation of HMW DNA from Plant Nuclei and Chromosomes Isolated from Root Tips. Biologia Plantarum 46, 369\u0026ndash;373 (2003).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMonat, C. \u003cem\u003eet al.\u003c/em\u003e TRITEX: chromosome-scale sequence assembly of Triticeae genomes with open-source tools. Genome Biology 20, 284 (2019).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eOu, S. \u0026amp; Jiang, N. LTR_retriever: a highly accurate and sensitive program for identification of long terminal repeat retrotransposons. Plant Physiology 176, 1410\u0026ndash;1422 (2017).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eOu, S., Chen, J. \u0026amp; Jiang, N. Assessing genome assembly quality using the LTR Assembly Index (LAI). Nucleic Acids Research 46, e126-e126 (2018).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSim\u0026atilde;o, F.A., Waterhouse, R.M., Ioannidis, P., Kriventseva, E.V. \u0026amp; Zdobnov, E.M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210\u0026ndash;3212 (2015).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGao, Y., Liu, B., Wang, Y. \u0026amp; Xing, Y. TideHunter: efficient and sensitive tandem repeat detection from noisy long-reads using seed-and-chain. Bioinformatics 35, i200-i207 (2019).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNov\u0026aacute;k, P. \u003cem\u003eet al.\u003c/em\u003e TAREAN: a computational tool for identification and characterization of satellite DNA from unassembled short reads. Nucleic Acids Research 45, e111-e111 (2017).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNov\u0026aacute;k, P., Hošt\u0026aacute;kov\u0026aacute;, N., Neumann, P. \u0026amp; Macas, J. DANTE and DANTE_LTR: lineage-centric annotation pipelines for long terminal repeat retrotransposons in plant genomes. NAR Genomics and Bioinformatics 6, lqae113 (2024).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNov\u0026aacute;k, P., Neumann, P., Pech, J., Steinhaisl, J. \u0026amp; Macas, J. RepeatExplorer: a Galaxy-based web server for genome-wide characterization of eukaryotic repetitive elements from next-generation sequence reads. Bioinformatics 29, 792\u0026ndash;793 (2013).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNov\u0026aacute;k, P., Neumann, P. \u0026amp; Macas, J. Global analysis of repetitive DNA from unassembled sequence reads using RepeatExplorer2. Nature Protocols 15, 3745\u0026ndash;3776 (2020).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSmit, A.F.A., Hubley, R. \u0026amp; Green, P. RepeatMasker Open-4.0. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.repeatmasker.org/\u003c/span\u003e\u003cspan address=\"https://www.repeatmasker.org/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. (2013\u0026ndash;2015).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eQuinlan, A.R. \u0026amp; Hall, I.M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841\u0026ndash;842 (2010).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNeumann, P., Nov\u0026aacute;k, P., Hošt\u0026aacute;kov\u0026aacute;, N. \u0026amp; Macas, J. Systematic survey of plant LTR-retrotransposons elucidates phylogenetic relationships of their polyprotein domains and provides a reference for element classification. Mobile DNA 10, 1 (2019).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNeumann, P. \u003cem\u003eet al.\u003c/em\u003e Centromeres Off the Hook: Massive Changes in Centromere Size and Structure Following Duplication of CenH3 Gene in Fabeae Species. Molecular Biology and Evolution 32, 1862\u0026ndash;1879 (2015).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBolger, A.M., Lohse, M. \u0026amp; Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114\u0026ndash;2120 (2014).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLangmead, B. \u0026amp; Salzberg, S.L. Fast gapped-read alignment with Bowtie 2. Nature Methods 9, 357\u0026ndash;359 (2012).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTarasov, A., Vilella, A.J., Cuppen, E., Nijman, I.J. \u0026amp; Prins, P. Sambamba: fast processing of NGS alignment formats. Bioinformatics 31, 2032\u0026ndash;2034 (2015).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eStovner, E.B. \u0026amp; S\u0026aelig;trom, P. epic2 efficiently finds diffuse domains in ChIP-seq data. Bioinformatics 35, 4392\u0026ndash;4393 (2019).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang, Y. \u003cem\u003eet al.\u003c/em\u003e Model-based Analysis of ChIP-Seq (MACS). Genome Biology 9, R137 (2008).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRam\u0026iacute;rez, F. \u003cem\u003eet al.\u003c/em\u003e deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Research 44, W160-W165 (2016).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLawrence, M., Gentleman, R. \u0026amp; Carey, V. rtracklayer: an R package for interfacing with genome browsers. Bioinformatics 25, 1841\u0026ndash;1842 (2009).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094\u0026ndash;3100 (2018).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBrůna, T., Hoff, K.J., Lomsadze, A., Stanke, M. \u0026amp; Borodovsky, M. BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP\u0026thinsp;+\u0026thinsp;and AUGUSTUS supported by a protein database. NAR genomics and bioinformatics 3, lqaa108 (2021).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKim, D., Paggi, J.M., Park, C., Bennett, C. \u0026amp; Salzberg, S.L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nature Biotechnology 37, 907\u0026ndash;915 (2019).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHuerta-Cepas, J. \u003cem\u003eet al.\u003c/em\u003e eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic acids research 47, D309-D314 (2019).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAshburner, M. \u003cem\u003eet al.\u003c/em\u003e Gene Ontology: tool for the unification of biology. Nature Genetics 25, 25\u0026ndash;29 (2000).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKanehisa, M. \u0026amp; Goto, S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28, 27\u0026ndash;30 (2000).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBoutet, E., Lieberherr, D., Tognolli, M., Schneider, M. \u0026amp; Bairoch, A. UniProtKB/Swiss-Prot. Methods Mol Biol 406, 89\u0026ndash;112 (2007).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWu, T. \u003cem\u003eet al.\u003c/em\u003e clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. The innovation 2, 100141 (2021).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBuenrostro, J.D., Wu, B., Chang, H.Y. \u0026amp; Greenleaf, W.J. ATAC-seq: A Method for Assaying Chromatin Accessibility Genome-Wide. \u003cem\u003eCurrent Protocols in Molecular Biology\u003c/em\u003e 109, 21.29.1-21.29.9 (2015).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi, H. \u003cem\u003eet al.\u003c/em\u003e The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078\u0026ndash;2079 (2009).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi, H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27, 2987\u0026ndash;2993 (2011).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFelsenstein, J. \u003cem\u003ePHYLIP (phylogeny inference package), version 3.5 c\u003c/em\u003e, (Produced and distributed by author/Department of Genetics, University of Washington, 1993).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAlexander, D.H. \u0026amp; Lange, K. Enhancements to the ADMIXTURE algorithm for individual ancestry estimation. BMC Bioinformatics 12, 246 (2011).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang, C., Dong, S.S., Xu, J.Y., He, W.M. \u0026amp; Yang, T.L. PopLDdecay: a fast and effective tool for linkage disequilibrium decay analysis based on variant call format files. Bioinformatics 35, 1786\u0026ndash;1788 (2019).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTeam, R.C. R: A language and environment for statistical computing. (R Foundation for Statistical Computing, Vienna, Austria, 2013).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLewontin, R.C. \u0026amp; Krakauer, J. Distribution of gene frequency as a test of the theory of the selective neutrality of polymorphisms. Genetics 74, 175\u0026ndash;195 (1973).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAkey, J.M., Zhang, G., Zhang, K., Jin, L. \u0026amp; Shriver, M.D. Interrogating a high-density SNP map for signatures of natural selection. Genome research 12, 1805\u0026ndash;1814 (2002).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZeng, X. \u003cem\u003eet al.\u003c/em\u003e Origin and evolution of qingke barley in Tibet. Nature Communications 9, 5433 (2018).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhou, X. \u0026amp; Stephens, M. Genome-wide efficient mixed-model analysis for association studies. Nature Genetics 44, 821\u0026ndash;824 (2012).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHuang, M., Liu, X., Zhou, Y., Summers, R.M. \u0026amp; Zhang, Z. BLINK: a package for the next level of genome-wide association studies with both individuals and markers in the millions. \u003cem\u003eGigaScience\u003c/em\u003e 8, giy154 (2018).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang, Y. \u003cem\u003eet al.\u003c/em\u003e MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic acids research 40, e49-e49 (2012).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDong, S.S. \u003cem\u003eet al.\u003c/em\u003e LDBlockShow: a fast and convenient tool for visualizing linkage disequilibrium and haplotype blocks based on variant call format files. Briefings in Bioinformatics 22, bbaa227 (2020).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eVanRaden, P.M. Efficient methods to compute genomic predictions. Journal of Dairy Science 91, 4414\u0026ndash;4423 (2008).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eEndelman, J.B. Ridge regression and other kernels for genomic selection with R package rrBLUP. The Plant Genome 4, 250\u0026ndash;255 (2011).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi, B. \u0026amp; Dewey, C.N. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 12, 323 (2011).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eArend, D. \u003cem\u003eet al.\u003c/em\u003e e!DAL - a framework to store, share and publish research data. BMC Bioinformatics 15, 214 (2014).\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"},{"header":"Tables","content":"\u003cp\u003eTable 1 is available in the Supplementary Files section.\u003c/p\u003e"},{"header":"Extended Data","content":"\u003cp\u003eExtended Data Figures are not available with this version.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eExtended Data Fig. 1:\u003c/strong\u003e \u003cstrong\u003eContact matrix generated from the Hi-C data analysis showing sequence interactions in chromosomes.\u003c/strong\u003e The\u0026nbsp;color bar (red to white) indicates the logarithm of the contact density.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eExtended Data Fig. 2:\u003c/strong\u003e \u003cstrong\u003eAlignments of faba bean genome v.1 versus v.2 chromosomes\u003c/strong\u003e. Orange lines indicate fixed orientation and placement errors in version 2.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eExtended Data Fig. 3:\u003c/strong\u003e \u003cstrong\u003eChromosome locations of telomeric repeats.\u0026nbsp;\u003c/strong\u003eThe motif of the telomeric region is TTAGGG.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eExtended Data Fig. 4: Comparison between the gene models predicted in the v.1 and v.2 genome assemblies. (a)\u0026nbsp;\u003c/strong\u003eExample of a merged gene, predicted as one gene in v.2 and two genes in v.1; \u003cstrong\u003e(b)\u0026nbsp;\u003c/strong\u003eExample of a gene in v.2 that is incompletely predicted in v.1; \u003cstrong\u003e(c)\u003c/strong\u003e Example of split genes, predicted as one gene in v.1 and\u0026nbsp;correctly identified as\u0026nbsp;two genes in v.2. \u003cstrong\u003e(d)\u0026nbsp;\u003c/strong\u003eExample of a gene predicted in v.2 but not predicted in v.1; The quality of the predicted genes was supported by experimental evidence, including Iso-seq and Illumina transcriptome sequences as well as protein evidence including pigeon pea, soybean, lentil, barrel clover, pea.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eExtended Data Fig.\u0026nbsp;\u003c/strong\u003e\u003cstrong\u003e5:\u003c/strong\u003e\u003cstrong\u003e\u0026nbsp;Representation of tandem genes and chromatin accessible regions (a)\u003c/strong\u003e Diagrammatic depiction of the genomic area spanning VfLeB4 cluster genes in the Hedin/2 v.1 and v.2 genome assemblies. VfLeB4 gene families were not assembled at the chromosome level in v.1 assembly, while the region including VfLeB4 family was fully assembled in the v.2 assembly. \u003cstrong\u003e(b)\u003c/strong\u003e Distribution of distance (bp) of accessible regions.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eExtended Data Fig. 6: Summary of the genomic variations identified across 406 accessions. \u0026nbsp;\u003c/strong\u003eSNP density across all chromosomes of the faba bean genome\u0026nbsp;is\u0026nbsp;scaled to a 1 Mb size.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eExtended Data Fig 7: Genome-wide distribution of spring and winter selective sweeps.\u003c/strong\u003e \u003cstrong\u003e(\u003c/strong\u003e\u003cstrong\u003ea\u003c/strong\u003e\u003cstrong\u003e, b\u003c/strong\u003e\u003cstrong\u003e)\u0026nbsp;\u003c/strong\u003eGenome-wide likelihood (XP-CLR) values for selection contrasting the spring and winter populations \u003cstrong\u003e(a)\u003c/strong\u003e and Low vicine and convicine (VC) and Normal VC populations within the spring populations \u003cstrong\u003e(b),\u003c/strong\u003e with chromosome number indicated along the x axis. \u003cstrong\u003ec)\u0026nbsp;\u003c/strong\u003eCandidate regions for VC identified by selective sweep analysis\u0026nbsp;and linkage disequilibrium (r\u003csup\u003e2\u003c/sup\u003e) around a 1.44Mb region identified on chromosome 1\u0026nbsp;confirmed by the previous Fine-mapping study. \u003cstrong\u003ed)\u0026nbsp;\u003c/strong\u003eGenomic alignment between Hedin/2 and Tiffany in VC genes region.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eExtended Data Fig 8: Sweep signals on chromosome 1. a.\u0026nbsp;\u003c/strong\u003eSelected regions on chromosome 1 based on sweep analysis.\u003cstrong\u003e\u0026nbsp; b.\u0026nbsp;\u003c/strong\u003e31 indel markers designed for selecting low VC accessions.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eExtended Data Fig. 9: Selected regions on chromosome 3 (a)\u0026nbsp;\u003c/strong\u003eand chromosome 5\u003cstrong\u003e\u0026nbsp;(b)\u0026nbsp;\u003c/strong\u003ebased on Fst analysis\u003cstrong\u003e.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eExtended Data Fig 10: 187 SNP markers designed for selecting frost-tolerant accessions.\u003c/strong\u003e \u003cstrong\u003e(\u003c/strong\u003e\u003cstrong\u003ea)\u0026nbsp;\u003c/strong\u003eLD square of 187 SNP markers\u003cstrong\u003e. (b)\u0026nbsp;\u003c/strong\u003eHeatmap plot of 187 markers in 406 faba bean accessions, where blue indicates winter haplotype and orange indicates spring haplotype. \u003cstrong\u003e(c)\u0026nbsp;\u003c/strong\u003eHeatmap plot of 7 markers in the ProFaba panel with blue indicating winter haplotype and orange indicating spring haplotype. \u003cstrong\u003e(d, e)\u003c/strong\u003e Heatmap plot of 7 markers sorted by winter survival rate (d)and late frost survival rate (e) (Low-\u0026gt;High) in the ProFaba panel with blue indicating winter haplotype and orange indicating spring haplotype.\u0026nbsp;\u003c/p\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"nature-portfolio","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"","title":"Nature Portfolio","twitterHandle":"","acdcEnabled":false,"dfaEnabled":false,"editorialSystem":"ejp","reportingPortfolio":"","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"","lastPublishedDoi":"10.21203/rs.3.rs-5356723/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-5356723/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eWinter faba beans exhibit significant yield advantages over spring cultivars and hold promise for enhancing local protein production and agricultural sustainability. However, the threat of winter kill limits wider cultivation and the genetics of faba bean winter hardiness remain unresolved. Here, we develop a highly improved faba bean reference genome and combine this with resequencing and phenotyping of winter and spring faba bean accessions to identify genetic determinants of winter hardiness. Genome-wide association analysis of frost tolerance traits identifies a major winter hardiness locus where the most strongly associated variant explains the vast majority of phenotypic variation and accurately differentiates between winter and spring types. Furthermore, we identify additional signals within the winter faba bean gene pool that pave the way for further improvement of winter hardiness. Our work provides improved genomic resources and resolves the genetics of a key agronomic trait in a global protein crop to facilitate future breeding efforts.\u003c/p\u003e","manuscriptTitle":"Allelic variation at a single locus distinguishes spring and winter faba beans","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-11-14 14:45:27","doi":"10.21203/rs.3.rs-5356723/v1","editorialEvents":[],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"nature-genetics","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"ng","sideBox":"Learn more about [Nature Genetics](http://www.nature.com/ng/)","snPcode":"","submissionUrl":"","title":"Nature Genetics","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"ejp","reportingPortfolio":"Nature Research","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"872fabc1-1953-485c-b9ed-459406c60e12","owner":[],"postedDate":"November 14th, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[{"id":40174120,"name":"Biological sciences/Genetics/Genomics/Genome assembly algorithms"},{"id":40174121,"name":"Biological sciences/Genetics/Genetic association study/Genome-wide association studies"},{"id":40174122,"name":"Biological sciences/Genetics/Gene expression"}],"tags":[],"updatedAt":"2026-03-11T07:57:36+00:00","versionOfRecord":{"articleIdentity":"rs-5356723","link":"https://doi.org/10.1038/s41588-026-02524-y","journal":{"identity":"nature-genetics","isVorOnly":false,"title":"Nature Genetics"},"publishedOn":"2026-03-10 04:00:00","publishedOnDateReadable":"March 10th, 2026"},"versionCreatedAt":"2024-11-14 14:45:27","video":"","vorDoi":"10.1038/s41588-026-02524-y","vorDoiUrl":"https://doi.org/10.1038/s41588-026-02524-y","workflowStages":[]},"version":"v1","identity":"rs-5356723","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-5356723","identity":"rs-5356723","version":["v1"]},"buildId":"qtupq5eGEP_6zYnWcrvyt","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.