Unveiling centromeric retrotransposon dynamics through a near-complete rye genome assembly

doi:10.21203/rs.3.rs-7013114/v1

Unveiling centromeric retrotransposon dynamics through a near-complete rye genome assembly

2025 · doi:10.21203/rs.3.rs-7013114/v1

preprint OA: closed

Full text JSON View at publisher

Full text 166,976 characters · extracted from preprint-html · click to expand

Unveiling centromeric retrotransposon dynamics through a near-complete rye genome assembly | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Unveiling centromeric retrotransposon dynamics through a near-complete rye genome assembly Nils Stein, Erwang Chen, Srijan Jhingan, Zihao Zhu, Jianyong Chen, and 11 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-7013114/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted You are reading this latest preprint version Abstract Rye ( Secale cereale L.) is an important cereal crop known for its high yield potential and tolerance to biotic and abiotic stresses. However, its large, repeat-rich, and heterozygous genome has posed challenges for assembly compared to related species such as wheat and barley. Here, we present a high-quality, chromosome-scale genome assembly of the inbred line Lo7, generated using PacBio HiFi, Oxford Nanopore, Hi-C, and BioNano technologies using the TRITEX pipeline. The resulting Lo7_V3 assembly spans 6.76 Gb with a contig N50 of 128 Mb, correcting previous misorientations and achieving complete centromere assemblies across all seven chromosomes. Repetitive clusters containing rye-specific satellite sequences (pSc200 and pSc250) were contiguously assembled. Their chromosomal positions were validated using FISH. Centromeric retrotransposon analysis highlighted RLG_Abia as a prominent element in rye, showing signs of recent activity and high abundance, unlike in wheat. Collectively, the new Lo7_V3 genome assembly provides a highly improved resource that will support future genomic research and crop improvement efforts in rye and related cereal species. Biological sciences/Plant sciences/Plant genetics Biological sciences/Genetics/Genome Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Introduction Rye is a diploid cereal crop within the tribe Triticeae , which poses unique challenges for genome assembly due to its large genome (∼7 Gb), high repeat contents, and extensive structural heterogeneity 1 . The 1RS/1BL translocation, in which the short arm of rye chromosome 1R (1RS) replaces the short arm of wheat chromosome 1B, has been widely utilized in wheat breeding programs 2 . This translocation has contributed significantly to disease resistance, abiotic stress tolerance, and yield improvement. The introgression of rye chromatin, particularly from 1RS, has introduced beneficial alleles associated with resistance to rusts ( Sr31 , Lr26 and Yr9 ) 3 , and powdery mildew ( Pm8 ) 4 , making it one of the most successful examples of wide hybridization in modern wheat breeding. Despite its importance for improving stress tolerance and disease resistance in wheat, progress in rye genomics has been hindered by the lack of a high-quality reference genome. To date, high-quality genome assemblies for two rye inbred lines have been published: Lo7 and Weining 5 , 6 . While both assemblies represented significant breakthroughs for rye research and breeding, they suffered from gaps and structural inaccuracy, low repeat resolution, and missing centromere representation. Based on the advances in sequencing technology, it was our aim to overcome these shortcomings, ultimately advancing rye genomics. Recent advancements in long-read sequencing technologies such as PacBio HiFi and Oxford Nanopore, combined with techniques like high-throughput chromatin conformation capture (Hi-C) or BioNano optical map scaffolding, have facilitated the completion of complex genomes in human and plants 7 , 8 . For instance, a complete T2T genome assembly for Arabidopsis thaliana revealed comprehensive insights into centromeric regions, structural variations, and evolutionary dynamics 8 , while recent T2T (or near-complete) genome assemblies of crops like rice 9 , maize 10 , soybean 11 , oat 12 , cotton 13 , 14 , sorghum 15 and wheat 16 , 17 have significantly expanded our understanding of genome architecture and function. These achievements highlight the power of T2T assemblies in addressing previously unresolved genomic challenges, such as repetitive regions and structural variants, across diverse species. Given the recent success of complete assembly in species like wheat, an improved rye genome assembly will improve the basis for studying rye genome organization and architecture. The rye genome is exceptionally rich in repetitive sequences, with approximately 90% of its genome composed of various types of repeats—a proportion notably higher than that found in related Triticeae species such as barley and wheat 6 , 18 . For example, the nucleolar organizer region (NOR) on chromosome 1R contains large arrays of 45S rDNA tandem repeats, serving as the site for ribosomal RNA gene clusters and contributing significantly to the structural complexity 19 . Transposable elements (TEs), particularly long terminal repeat retrotransposons (LTR-RTs), like Gypsy and Copia superfamilies, dominate the rye genome and play a central role in genome size expansion and structural variation 5 , 6 . In addition to dispersed TEs, satellite sequences are widely distributed throughout the genome and are often organized in large blocks, contributing to heterochromatin formation. Likewise, rye-specific tandem repeats, pSc200, pSc250, and pSc119.2, are predominantly localized in the subtelomeric regions of chromosomes 1R to 7R 20 , 21 . With this, the abundance and diversity of repetitive sequences in the rye genome not only reflect its complex evolutionary history but also present challenges for genome assembly. Previous studies have shown that centromeres in rye are enriched with centromere-specific retrotransposons and satellite repeats 22 , 23 . The centromeric regions of the rye genome are predominantly composed of TEs, with LTR retrotransposons being the major components. Among these, the Gypsy superfamily is particularly abundant and plays a central role in shaping the structure of centromeric chromatin. Although Copia elements are also present, they are less prevalent compared to Gypsy elements. The high density and accumulation of Gypsy -type retrotransposons in the centromeres suggest their importance in centromere function and evolution in rye. Another two previous studies presented the analysis of near-complete centromere sequences in Einkorn wheat 24 , 25 . These two studies found that Triticum monococcum centromeres are primarily composed of two retrotransposon families, RLG_Cereba and RLG_Quinta , both of which belong to the Gypsy superfamily of LTR retrotransposons. Notably, RLG_Quinta is a non-autonomous element, lacking essential coding sequences such as reverse transcriptase and integrase, and thus relies on the enzymatic machinery provided by its autonomous partner like RLG_Cereba for its propagation (i.e., RLG_Quinta is a “parasite” of RLG_Cereba ). In this study, we present a chromosome-scale, near-complete genome assembly for rye inbred variety Lo7, utilizing state-of-the-art sequencing and scaffolding techniques. Compared with previous versions, Lo7_V1 26 and Lo7_V2 5 , this assembly further improved the resolution of complex repetitive regions in subtelomeres and centromeres. We have significantly improved the Lo7 genome by reducing assembly gaps, correcting orientation mistakes, and enhancing genome completeness. Results High-quality genome assembly of rye genotype Lo7 To generate a high-quality reference assembly for rye Lo7, we employed the TRITEX pipeline 27 , integrating data generated by multiple advanced sequencing technologies: PacBio HiFi reads (38-fold haploid coverage), ONT long reads (> 25 kb, 9-fold coverage), and Hi-C reads (20-fold coverage) for chromosome-level scaffolding ( Supplementary Fig. 1a and Supplementary Table 1 ). We estimated the heterozygosity of Lo7 using k-mer analysis, determining it to be 0.06% ( Supplementary Fig. 1b ). The initial contigs were assembled by combining HiFi and ONT long reads using hifiasm 28 . These contigs were manually curated with Hi-C data to generate a draft assembly ( Supplementary Fig. 2 ), which was further evaluated and refined using a BioNano optical map ( Supplementary Fig. 3 and Supplementary Table 2 ). Finally, genome annotation was conducted using RNA-seq and Iso-seq data (Fig. 1 a). Additional features were systematically annotated, including chromosome lengths, repetitive sequences, centromere locations, and 5mCpG methylation (Fig. 1 b). The final assembly spanned 6.76 Gb, with a contig N50 of 128 Mb and a GC content of 50% (Table 1 ). The lengths of the seven chromosomes ranged from 776 Mb to 1.1 Gb, which is 7.4% longer on average compared to each chromosome length in the earlier Lo7_V2 assembly ( Supplementary Table 3 ). We annotated 75,850 protein-coding genes, including 43,754 high-confidence (HC) and 32,096 low-confidence (LC) genes (Table 1 ). LTR-RT annotation yielded an average LTR Assembly Index (LAI) of 23.47, indicating high assembly quality ( Supplementary Table 4 ). We identified two enrichment regions for 5S and 45S rDNA sequences on chromosome 1R, coinciding and thus confirming previous findings of the NOR on chromosome 1R ( Supplementary Fig. 4a ). Moreover, by mapping telomere-specific short sequence motifs (TTTAGGG), we identified 6 of the expected 14 telomeres ( Supplementary Fig. 4a ). Overall, we provided a Lo7_V3 genome assembly with high contiguity. Table 1 Genome assembly comparisons between Lo7_V2 5 and Lo7_V3. Completeness of two assemblies were evaluated based on HiFi long reads. HC, high-confidence. LC, low-confidence. x indicates sequencing depth. Lo7_V2 Lo7_V3 Illumina (Gb) 947 (120x) - HiFi (Gb) - 266.7 (38x) ONT (Gb) - 65.1 (> 25 kb, 9x) N50 (Mb) 15.2 (Scaffold) 128.0 (Contig) Anchored (Gb) 6.21 6.68 Unanchored (Mb) 528.4 79.6 Completeness (%) 97.3 97.7 Total (Gb) 6.74 6.76 All_genes 57,222 75,850 HC_genes 34,441 43,754 LC_genes 22,781 32,096 HC_chrUn 1,939 114 BUSCO_gene (%) 98.4 98.9 GC content (%) 48.8 50.0 Lo7_V3 outperforms Lo7_V2 in assembly contiguity, orientation correction, and gene annotation quality Lo7_V2 was assembled using high-coverage short reads (120-fold coverage), while Lo7_V3 incorporated both long-read HiFi and ONT sequencing technologies. Although the overall genome length remained comparable, Lo7_V3 demonstrated substantial improvements in chromosomal anchoring, reducing unanchored regions. Assembly completeness was assessed through k-mer analysis of HiFi reads, showing a slight increase from 97.3% in Lo7_V2 to 97.7% in Lo7_V3 (Table 1 ). Gene annotation quality improved significantly, with more genes annotated and higher BUSCO scores for both HC and LC gene sets ( Supplementary Table 5 ). Notably, the number of genes in unanchored regions decreased dramatically from 1,939 in Lo7_V2 to just 114 in Lo7_V3 (Table 1 ). Following improvements in gene annotation, the BUSCO completeness score rose from 98.4–98.9% ( Supplementary Table 5 ). Assembly contiguity was strongly enhanced, with the number of remaining gaps decreasing from 570 to 106 ( Supplementary Fig. 4b ). Taken together, these results demonstrate that Lo7_V3 represents a substantial upgrade over Lo7_V2 in terms of contiguity, scaffolding accuracy, and annotation quality. To evaluate orientation corrections, we generated collinearity plots comparing the scaffold-based Lo7_V2 and contig-based Lo7_V3 assemblies ( Supplementary Fig. 5 ). Analysis revealed nine regions with orientation errors in Lo7_V2, all of which were resolved in Lo7_V3 ( Supplementary Fig. 5 ). On chromosome 1R, Lo7_V2 contained two inversion-like orientation errors and three assembly gaps, particularly in the centromeric and pericentromeric regions (200–300 Mb) (Fig. 2 a). These issues were corrected in Lo7_V3 within a single contig spanning the coordinate region between 260 Mb and 320 Mb of the respective chromosome sequence (Fig. 2 b). Hi-C contact maps comparing both assemblies further provided validation of these structural improvements (Fig. 2 c,d). Short tandem repeats form subtelomeric clusters Consistent with previous findings in repeats annotation 6 , we revealed that 89.4% of the genome was annotated as repetitive sequences, including transposable elements (TEs) and other repeat fragments, which underscored the highly repetitive nature of the rye genome ( Supplementary Fig. 5 and Supplementary Table 6 ). To verify the accuracy of the assembly, we designed chromosome barcode painting probes from the assembly and validated them using fluorescence in situ hybridization (FISH) ( Supplementary Fig. 7a ). These probes, selected based on single-copy sequences near the subtelomeric regions and labeled with distinct fluorescent dyes (green and red), enabled precise differentiation of each chromosome ( Supplementary Fig. 7b ). By integrating HiFi and ONT data, Lo7_V3 successfully resolved complex regions containing rye satellite sequences, which reduced fragmentation within the unanchored regions compared to Lo7_V2 ( Supplementary Fig. 8 ). We found that pSc119.2 (3.5 Mb,representing 0.05% of the genome size), pSc200 (226.9 Mb, 3.4%) and pSc250 (29.6 Mb, 0.44%) occupied the majority of these regions, particularly in the subtelomeric regions (Fig. 3 a and Supplementary Fig. 9 ). To test the accuracy of these repeats, we also performed FISH experiments by using mixed probes, pSc200 and pSc250 as localization markers (Fig. 3 b and Supplementary Fig. 7b ). By correlating the repeat and chromosome painting signals, we were able to determine the specific distribution of pSc200 and pSc250 on rye chromosomes. The observed FISH signals aligned well with the predicted distribution from the assembly, further confirming the consistency between the experimental results and computational simulations (Fig. 3 b). These results indicate that Lo7_V3 significantly improved the anchoring of satellite sequences to the subtelomeric regions. Structural analysis of seven complete centromeres Centromeres are notoriously difficult to assemble due to their highly repetitive nature 29 . Previous analyses of rye centromeres have primarily relied on two published genome assemblies, both of which lack fully assembled centromeres for all seven chromosomes 22 , 30 . In Lo7_V2, centromere-associated regions were fragmented and mapped to unanchored region (Fig. 4 a,b). However, based on the CENH3 ChIP-seq data, the Lo7_V3 assembly achieved a significant breakthrough by successfully assembling all seven centromeres, each accurately anchored to its respective chromosome (Fig. 4 a, Supplementary Table 7 and Supplementary Fig. 10 ). We presented the predicted functional centromere lengths for chromosomes 1R to 7R to be in the range of approximately 10–12 Mb ( Supplementary Table 8 ). Gene content analysis identified eleven HC genes within five of seven rye centromeres. Centromeres of the chromosome 4R and 6R were lacking any evidence of functional genes ( Supplementary Table 9 ). Notably, the chromosome 1R centromere contained a gene encoding translation initiation factor eIF-2B subunit beta-like protein. To further validate the accuracy of genes annotated in the centromeric regions, we also integrated data from Transposase-Accessible Chromatin sequencing (ATAC-seq), 5mCpG methylation, HiFi and ONT read mapping, TE annotation and examined transcriptomic support for the annotation (Fig. 4 b and Supplementary Fig. 11 ). These results revealed that accessible chromatin regions are close to the transcription site (within an accessible chromatin region) with low DNA methylation. RNA-seq data further indicated that eIF-2B subunit beta-like gene is transcriptionally active across the examined tissues, including spike, root, and aerial organ, supporting its correct annotation (Fig. 4 b and Supplementary Table 9 ). Retrotransposon dynamics in rye centromeres Consistent with previous findings, we performed a comprehensive TE analysis on our genome assembly. Our results revealed that LTR retrotransposons (64.79%), particularly those belonging to the Gypsy superfamily (37.04%), are the predominant TEs in the genome ( Supplementary Table 6 ). Notably, Gypsy elements exhibit a strong centromeric enrichment, suggesting their significant role in the structural organization and evolution of centromeric regions ( Supplementary Fig. 11 ). In contrast, Copia elements (6.58%) were more broadly distributed across the chromosome arms ( Supplementary Fig. 11 ). To reveal the evolution of centromere-specific retrotransposable elements in rye, we identified three families ( RLG _ Cereba , RLG_Quinta and RLG _ Abia ) within the seven complete centromeres of Lo7 (Fig. 5 a). We found RLG_Cereba , an element family well known to be present in the centromeres of the tribe Triticeae 5 , 31 – 33 , and its non-autonomous partner family RLG _ Quinta 25 . In addition, we identified a recently highly active family in rye, RLG_Abia , which was previously found in wheat centromeres, but at very low abundance 18 . Using a TE population analysis pipeline 34 , we identified 2,098, 1,298 and 1,602 full length elements of RLG_Cereba , RLG_Quinta and RLG_Abia , respectively. The three TE families were highly enriched in centromeric and peri-centromeric regions (Fig. 5 a). Insertion ages of all retrotransposon copies were estimated based on divergence of their LTRs. We found the youngest copies of these families consistently to be located in the same regions where we identified the functional centromeres using CENH3 ChIP-seq analysis (Fig. 5 and Supplementary Fig. 12 ). This indicates that the centromeric retrotransposons actively target the functional centromere, similar to previous findings in wheat 24 , 25 . As a consequence, the older copies are pushed away from the active centromere over time. The example of the centromere of chromosome 4R (Fig. 5 b) shows this gradual, passive movement outwards of older insertions through insertions of new copies in the region of the functional centromere. Interestingly, we identified multiple shifts of the centromere on chromosome 7R. Based on TE insertion ages and sites we propose that these shifts are the result of a series of inversions: The first inversion, here named Inversion A, occurred ~ 0.6 million years ago (mya) and led to the establishment of a new centromere in the region from around 416 to 422.5 Mb (Fig. 5 c). The second one, Inversion B, took place around 0.25 mya, when a segment of the previous centromere was brought closer to the position of the current functional centromere. Lastly, an inversion of ~ 5 Mb (Inversion C), took place ~ 0.2 mya (Fig. 5 d). Finally, although less prominent, we also found breaks in the continuity of TE insertion sites and ages in other centromeres of Lo7, for example in chromosomes 2R, 3R and 6R ( Supplementary Fig. 12 ), suggesting that these also underwent similar centromere shifts in the past. However, the insertion site and age patterns are less clear and did not allow a reconstruction or dating of the events. A main difference to the centromeres of wheat is very high abundance of RLG_Abia elements. While in wheat, they were silent for a long time and were only found in traces, the RLG_Abia family seems currently highly active in rye. This indicates that centromeres, even between very closely related species, can take different evolutionary trajectories and that different types of TEs may change their activity levels over relatively short evolutionary time periods. Sequence diversity of wheat/rye 1RS introgressions is low compared to the rye genepool Recent wheat pan-genome studies have highlighted the crucial role of the rye 1BL/1RS translocation in modern wheat programs 2 . High-quality wheat genome assemblies have detailed syntenic analyses of 1RS, facilitating comparisons between rye and multiple wheat accessions carrying this translocation. To investigate 1RS genetic diversity between rye and wheat, we analyzed ~ 300 Mb of 1RS/1BS sequences from multiple assemblies, including rye accessions Weining and Lo7, as well as 1BL/1RS carrying wheat varieties Z8425B, HD6172, ZM16, ZM22, AMN, KF11, S4185, and Aikang58 2,35–37 . Syntenic analysis revealed limited 1RS diversity among wheat accessions ( Supplementary Fig. 13 ). Collinearity analysis of 1RS suggested this homogeneity may result from the widespread use of early translocation lines in wheat breeding. In contrast, the two rye accessions showed substantially greater diversity in this region ( Supplementary Table 10 ). Notably, resistance genes such as Pm8 and Yr9 located on 1RS have been successfully deployed in wheat breeding, providing strong disease resistance 4 , 38 , 39 . We further updated the annotation of resistance (R) genes containing NLR domains based on the whole-genome assembly of Lo7 (Lo7_V3), providing a clearer view of the genetic landscape of the 1RS chromosome arm ( Supplementary Fig. 14 ). These observations underscore the potential value of introducing more diverse rye 1RS into wheat and triticale breeding programs to enhance genetic variation. Discussion In this study, we present an improved and updated chromosome-scale reference genome assembly for the rye inbred line Lo7, representing a major milestone in crop genomics. This work makes rye more accessible to genomic research and a valuable resource for crop improvement. The Lo7 genome assembly is near-complete, however not T2T, reflecting a significant advancement, especially given the challenges of rye's high repeat content, structural variability, and large genome size. This achievement is especially relevant in the context of complex crop genomes, as evidenced by recent near-complete or T2T genome assemblies of wheat 16 , 17 . The integration of multiple sequencing technologies not only improves assembly quality but also paves the way for further genomic advances in rye 5 . Looking ahead, further advancements will focus on integrating more diverse genomic data sources, such as transcriptomics, to enhance genome annotations. Additionally, more comprehensive gene family analyses, including those related to flowering and vernalization, will provide valuable insights into crop evolution, particularly in the context of wheat and other cereal crops. Comparative analysis with Lo7_V2 underscores the critical role of high-fidelity long read data in revealing the true complexity of rye’s genomic architecture. In this study, about 14% of the genome was annotated as “repeat fragment”. We speculate that a substantial portion of these fragments may correspond to short tandem repeats (STRs), which were not fully reconstructed or classified by the repeat annotation tools due to their short length, high copy number, or structural variability. The accuracy of these regions was further validated through FISH, confirming the structural integrity of repetitive regions. Specifically, the subtelomeric regions of 1RL and 6RS remain incomplete, and contig gaps are still present across the genome. These gaps, caused by the presence of repetitive sequences, have hindered the full assembly of subtelomeres or the majority of telomeres. Additionally, current technologies such as optical genome mapping and Hi-C scaffolding remain insufficient for accurately resolving orientation errors in contigs located in subtelomeric regions, particularly those enriched with satellite sequences. These gaps could be addressed in the future by incorporating additional ultra-long ONT sequencing reads to improve genome continuity and completeness. In this context, the highly repetitive supernumerary B chromosome of rye also offers a unique system to explore the accumulation and activity of transposable elements beyond the standard A chromosomes 40 . We assembled the complete centromeric regions in rye and conducted an in-depth structural analysis. We identified centromere-located HC genes supported by expression data and estimated the size of active (CENH3-interacting) centromere regions. In particular, we performed evolutionary analyses within the Gypsy superfamily and identified centromere-enriched subfamilies. Among these, the highly active RLG_Abia family, was discovered, highlighting a potentially rye-specific expansion of centromeric retrotransposons. Similar patterns have recently been reported in Oats ( Avena sativa ) 41 and other grass genomes, supporting the notion that RLG_Cereba and RLG_Abia retrotransposons have been competing for the centromeric niche over millions of years of grass genome evolution. As a next step, we plan to conduct a rye pan-genome analysis by incorporating high-quality genome assemblies from diverse genotypes. This will enable us to investigate the evolutionary dynamics of centromeric transposable elements and their contribution to genomic diversity. Although we compared multiple 1RS of various rye and 1RS/1BL translocation lines, we found that the genetic diversity of the introgressed rye segment utilized in wheat remains remarkably narrow. This documents that the 1RS chromosomal fragments historically introduced into wheat breeding were derived from a very limited genetic pool, leaving much of rye’s genetic potential underexplored in wheat improvement 42 . Looking forward, the development of novel translocation lines carrying diverse rye chromosome segments may hold great promise for broadening the genetic base of wheat and triticale breeding and for delivering new insights into resistance improvement. Methods Plant materials and growth conditions Plants were grown under controlled conditions in the green house with a 16-hour light/8-hour dark photoperiod at 25°C (day) and 18°C (night). Humidity was maintained at 60%, and plants were irrigated with a nutrient solution weekly. Samples for DNA extraction were collected from young leaf tissues (2–3 weeks). HMW DNA extraction, library preparation, PacBio HiFi and Oxford Nanopore sequencing High molecular weight (HMW) DNA was extracted from a single Lo7 plant using the MACHEREY-NAGEL NucleoBond HMW DNA kit, following the manufacturer’s protocol. The extracted HMW DNA was quality-controlled for concentration using the Qubit 2.0® dsDNA HS assay (Invitrogen, Waltham, USA) and for fragment size distribution using the Agilent Femto Pulse System (Agilent Technologies, Santa Clara, USA). For HiFi sequencing, the HMW DNA was fragmented into ~ 20 kb fragments using a Megaruptor 3 device (Diagenode) at speed setting 30. HiFi SMRTbell libraries were then prepared following the Pacific Biosciences SMRTbell Express Template Prep Kit 3.0 protocol. The final libraries were size-selected within a narrow 17–20 kb range using the SageELF system with a 0.75% Agarose Gel Cassette (Sage Science), following the manufacturer’s guidelines. HiFi circular consensus sequencing (CCS) reads were generated using the PacBio Revio platform (Pacific Biosciences) in accordance with the standard operating procedure. For ONT reads, HMW DNA fragments were size-selected using the Short Read Eliminator (SRE) kit (Pacific Biosciences, Menlo Park, USA) with a 25 kb cut-off. Libraries were generated using the SQK-LSK114 Ligation Sequencing Kit V14 (Oxford Nanopore Technologies, Oxford, UK) following the manufacturer’s protocol with 2–3 µg size selected HMW DNA as input. Sequencing was done on R10.4.1 PromethION (FLO-PRO114M) flow cells (Oxford Nanopore Technologies, Oxford, UK) with a run-time of 72–96 hours and raw data in the .pod5 format was acquired using MinKNOW (versions 23.11 and 24.02). Basecalling was performed with Dorado (version 0.9.1) using the [email protected] super-accurate basecalling model with the “--min-qscore 20” parameter to yield basecalled reads in the .bam format. Reads were then converted to the .fastq.gz format using Samtools 43 and filtered for a minimum length of 25 kb using SeqKit 44 . Hi-C sequencing For Hi-C sequencing, we followed a protocol similar to that used in previous barley Hi-C studies, with slight modifications to optimize for rye 45 . Briefly, plant tissues were cross-linked using formaldehyde, and chromatin was fragmented by restriction enzyme (DpnII) digestion. The resulting DNA fragments were ligated with a biotinylated bridge linker, which allowed for the capture of ligation products that represent physically interacting chromatin regions. These biotinylated ligation products were then enriched through streptavidin beads and processed for paired-end sequencing. The sequencing data were used to generate a high-resolution contact map, revealing chromatin interactions and facilitating the construction of chromosome-scale scaffolds. Sequencing and Hi-C raw data processing was performed as described before 45 . Genome assembly and evaluation PacBio HiFi reads were assembled using hifiasm (v.0.19.9) 28 . Pseudomolecule construction was done with the TRITEX long-read assembly pipeline, https://tritexassembly.bitbucket.io/ 27 . Chimeric contigs and orientation errors were identified through manual inspection of Hi-C contact matrices. Genome completeness and consensus accuracy were evaluated using Merqury (v.1.3) 46 . Levels of duplication and heterozygosity were assessed with Merqury and GenomeScope 2.0 47 . Gene annotation was performed using the BUSCO (v.5.0.0) tool (Simão et al., 2015), which was used to evaluate the completeness of gene models by assessing the presence of conserved core genes in the genome. Finally, contigs containing rye chloroplast sequences (NC_021761.1) were removed from the unanchored regions, resulting in the final pseudomolecules. Optical genome mapping A total of 1.8 milion cell nuclei, purified from root tips of rye Lo7 seedlings by flow cytometry, were embedded in agarose miniplugs and treated by proteinase K as described 48 . Resulting 548 ng of ultraHMW DNA were labelled at DLE-1 recognition sites (CTTAAG motive) and stained following Bionano Prep Direct Label and Stain-G2 protocol (Bionano, San Diego, USA). The labelled molecules were analyzed on the Saphyr platform of Bionano. Total of 1500 Gbp single-molecule data greater than 150 kb, corresponding to 190x coverage of the rye Lo7 genome, was used to generate de novo optical genome map (OGM) by Bionano Solve (v.3.6.1_11162020), applying “optArguments_nonhaplotype_noES_noCut_DLE1_saphyr.xml” parameters (complete statistics in Figure S3). To validate the sequence assembly of the Lo7 genome, we aligned the OGM to the sequence in Access (v.1.7.1, Bionano). Identified mismatches between the OGM and the pseudomolecules are provided in Supplementary Table 11. 5mCpG methylation HiFi reads were aligned to the pseudomolecules using ccsmeth 49 with default parameters (align_hifi). Methylation calling was performed using pb-CpG-tools based on the resulting BAM files. All bigWig files were visualized in the Integrative Genomics Viewer (IGV) 50 . Gene model prediction We performed de novo structural gene prediction, confidence classification, and functional annotation, following the protocol described by the previous publication 51 . The strategy applied in this study only differs in the use of Helixer 52 , run with standard parameters, as an additional ab initio input source for Evidence Modeller (weight set to 10). The access to all RNA-seq and Iso-seq datasets is described in the previous publication 5 . Transposable element annotation using EDTA To annotate transposable elements (TEs) in the genome, we used the Extensive De Novo TE Annotator (EDTA, v2.2.2) pipeline with the curated library TREP ( https://trep-db.uzh.ch/ ), a comprehensive tool for identifying and classifying TEs 53 . First, we preprocessed the genome assembly by soft-masking low-complexity regions and simple repeats. Next, EDTA was run with default parameters to detect and classify TEs into different categories: long terminal repeat (LTR) retrotransposons, DNA transposons (TIR and Helitron), and other repeat elements. The proportion of the genome occupied by TEs was then calculated, and TEs were classified into superfamilies. Analysis of centromeric TEs We identified full-length elements by following the pipeline described 34 . First, we extracted ~ 100 LTRs per family, which we aligned using ClustalW with standard settings. If we identified distinct groups of LTR variants (likely representing TE subfamilies) in the alignment, these groups were separately re-aligned to produce consensus sequences for individual subfamilies. These consensus sequences were used in blastn searches against the genome. An identified element was classified as full-length if a pair of LTRs was found in the same orientation within a range of ± 1000 bp of the length of the TE family consensus sequence and with no more than 5 bp missing in at the LTR ends. Elements on the extrema of the size distribution were discarded to remove elements with large insertions or deletions. The resulting full-length elements were used for further analysis. The estimation of insertion ages was performed as previously described 34 . The full-length elements were aligned with the initial TE family consensus using the program Water (EMBOSS package obtained from ubuntu.com) with a gap extension penalty of 0.1 and a gap opening penalty of 50. The resulting pairwise comparisons were combined into a variant call format (vcf) file using an in-house script 34 . We filtered for sequence variants with an allele frequency of more than 5%. Next, we used R for further analysis, data preparation and visualization. Synteny analysis For the analysis of synteny between genome sequences, we used plotsr 54 to visualize and quantify synteny between different genome sequences. First, we performed pairwise sequence alignment using minimap2 55 to align the genomic sequences of interest, such as rye Lo7_V3 and other related species. All variations were called using SyRI 56 . The resulting .out files were then processed with plotsr 54 to generate syntenic plots, allowing us to examine the collinearity and structural conservation between genomes. The syntenic blocks were identified, and the degree of synteny was quantified. For collinearity plot, we made the plot using R. Analysis of satellite sequences To visualize the distribution of tandem repeats pSc200 (GenBank: Z54189.1), pSc250 (GenBank: Z50040.1) and pSC119.2 (GenBank: KF719093.1) on the assembly, the seven pseudomolecules were fragmented into 100-bp sequences and aligned to pSc200, pSc250 and pSC119.2 using Bowtie2 (v2.5.0, default) 57 . The aligned reads were saved in separate files, and pSc200/250/119.2-positive reads were subsequently mapped back to the assembly using Bowtie2. The resulting BAM file was converted into a bigWig file with bamCoverage from deepTools2 58 using a bin size of 10 kb. For 5S rDNA (GenBank: AY841027.1) and 45S rDNA (GenBank: KF482106.1), the relevant analysis was performed using the same methods from above. Fluorescence in situ hybridization (FISH) Predicted probe annealing patterns were obtained using homology searches with 30,982 FISH probe sequences as queries and each rye genome assembly as a subject in turn (Lo7_V2 and Lo7_V3). The homology search and subsequent data handling and visualization were conducted directly from R with the using the BioDT package; The workflow is available at github.com/mtrw/Lo7v2_inSilicoFISH, to recreate the analysis presented here scripts should be run in the order 1_run_blast.R, 2_count_gBins.R and 3a_plot_bins.R. In brief, BLAST (v2.14.0) is called with preset parameters for short queries (argument ‘-task blastn-short’). Lo7_V3 assembly was divided into 1 Mb bins at 500 kb intervals. The sum of all bitscores of all probe sequence alignments overlapping each bin are calculated as an approximate indicator of the relative amount of probe binding that might be expected at that genomic region. Plots are constructed using R’s base plotting functions. For the preparation of 5–10 kb long chromosome segment-specific barcoding FISH probes non-overlapping, single-copy target-specific oligonucleotides were selected and synthesized as myTAGs® Labeled Libraries (Daicel Arbor Bioscience, Ann Arbor, MI, USA). The pooled oligos were labelled with either Atto 594 (red) or Alexa 488 (green). A dropping method 59 was used to prepare mitotic metaphase speads. FISH was performed as described earlier 60 with minor alterations: 20 µl hybridization mixture per slide contained 50% deionized formamide, 25% 20× SSC, 1 mM Tris–HCL pH 8.0, 1 µl (400 ng) Atto 488 or Atto 594 labelled oligo chromosome painting probes, 1 µl (25 ng) Cy5-labelled oligo probes pSc200 and pSc250 61 , 10 µg/ml salmon sperm DNA, and 0.5 M EDTA. Hybridization mixture was denatured together with the chromosomal DNA on a hot plate at 80°C for 2 min. Hybridization at 37°C was performed for 20 h in a moist chamber. Subsequently, slides were washed in 2× SSC at room temperature for 1 minute to remove coverslips, for 20 min at 58°C and dehydrated in an ethanol series (70, 90 and 96%). Finally, the slides were air dried and counterstained with 1 µg/ml 1 4′,6-diamidino-2-phenylindole (DAPI) in Vectashield (Vector Laboratories, http://vectorlabs.com/ ). Images were acquired with an epifluorescence microscope BX61 (Olympus, http://www.olympus.fi/medical/en/microscopy ) using a cooled charge coupled device (CCD) camera (Orca ER, Hamamatsu, www.hamamatsu.com ). Pictures were processed and merged using Adobe Photoshop (Adobe Systems Incorporated, USA, http://www.adobe.com ). Identification of centromeric regions Centromeric regions were identified by integrating previously published CENH3 ChIP-seq data 22 , which specifically targets centromeric histone H3 variant (CENH3) binding sites. The CENH3 ChIP-seq data were first aligned to the rye genome using Bowtie2 57 . The resulting peaks were visualized on the genome using IGV 50 , enabling a clear representation of centromeric regions across chromosomes. To further refine the centromeric region boundaries, we used deepTools2 58 to generate a 10 kb resolution coverage plot, providing an overview of centromeric regions' accessibility and distribution. Peaks corresponding to CENH3 binding were identified using MACS2 (Model-based Analysis of ChIP-Seq) 62 , which allows for the precise detection of enriched regions. ATAC sequencing ATAC-seq was performed using either fresh (native) or crosslinked leaf tissues from three-week-old seedlings. For the native sample, fresh leaves were finely chopped with a razor blade in nuclei isolation buffer (0.25 M sucrose, 10 mM Tris-HCl pH 8.0, 10 mM MgCl 2 , 1% Trion X-100, 5 mM β-Mercaptoethanol) supplemented with 1x Halt™ Protease Inhibitor Cocktail (Thermo Scientific). The resulting slurry was filtered through a 50-µm cell strainer, and the nuclei were washed twice with the same buffer before being resuspended. For the native sample, an aliquot was analyzed by flow cytometry for quality control and quantification. Based on the quantification, approximately 75,000 nuclei were aliquoted, and the nuclei pellet was collected by centrifugation. For the crosslinked sample, fresh leaves were fixed under vacuum in 1% formaldehyde (Sigma-Aldrich 252549) for 20 min. Fixation was stopped by incubating the sample in 0.125 M glycine for 5 min. The tissues were frozen before nuclei isolation, following the same protocol as the native sample. After isolation, 75,000 nuclei were sorted by flow cytometry and incubated at 60°C for 5 min. For both crosslinked (ATAC-seq1) and native (ATAC-seq2) samples, the nuclei pellet was resuspended in a transposition reaction mix containing Tagment DNA Enzyme (TDE1, Illumina, 20034197) and incubated at 37°C for 30 min. Crosslinked samples (ATAC-seq1) were subsequently incubated 65°C overnight in SDS buffer (50 mM Tris-HCl, pH 8.0, 1% SDS, 10 mM EDTA). Transposition products were purified using the MinElute PCR Purification Kit (QIAGEN, 28004). Libraries were then amplified with the NEBNext® High-Fidelity 2x PCR Master Mix (NEB, M0541), and further purified using VAHTSTM DNA Clean Beads (Vazyme, N411). The final libraries were sequenced in paired-end mode (2x 151 cycles) on the Illumina NovaSeq 6000 (Illumina Inc., San Diego, CA, USA). Sequencing reads were adapter-trimmed using fastp (v0.20.0) 63 and aligned to the reference genome with BWA-MEM (v0.7.17) 64 . SAMtools (v1.16.1) 43 was used to filter out duplicate and multi-mapping reads (-q 30), followed by peak calling (-q 0.01) with MACS2 (v3.0.0) 62 . For visualization, BAM files containing uniquely mapped reads were converted to bigWig format using deepTools bamCoverage (v3.5.1) 58 . Declarations Funding This study was supported by core funding of Leibniz Institute of Plant Genetics and Crop Plant Research (IPK). A.H. was supported by the German Research Foundation (DFG), project 559365585. Conflict of interest The authors declare they have no conflict of interest. Acknowledgments We thank Sylvia Swetik, Manuela Knauft, Jacqueline Pohl and Mary Ziems for technical support in plant material management and DNA sequencing. We thank Pascal Jaroschinsky for his assistance with ATAC sequencing, Jörg Fuchs for his help with flow cytometry, and Katrin Kumke for performing the FISH experiment. We thank Jens Bauernfeind and Danuta Schueler for sequencing data management. We gratefully acknowledge Anne Fiebig for data submission. We gratefully acknowledge Dr. Andres Gordillo of KWS for providing the Lo7 seeds. Author Contributions N.S. conceived the project. E.C. completed the genome assembly and performed most of the downstream computational analyses. E.C., Axel Himmelbach, S.J., and Z.Z. performed the sequencing experiments. J.F. provide the technical support for genome assembly. C.M.W. and T.W. performed the TE analysis. M.T.R., J.C. and Andreas Houben contributed to the FISH and satellite sequence analyses. T.L. and M.S. contributed to gene annotation. Z.T. and H.Š. did the optical genome mapping. M.M., H.S., M.S., T.W., Andreas Houben, and N.S. supervised the project. E.C. drafted the manuscript with inputs from C.M.W., T.W. and Andreas Houben. All authors reviewed and revised the manuscript. Data Availability Genome assembly, sequencing data, ATAC, AGP file All the sequence data collected in this study have been deposited at the European Nucleotide Archive 65 (ENA). Sequence assemblies and gene annotations have been submitted to ENA. References Martis MM et al (2013) Reticulate evolution of the rye genome. Plant Cell 25:3685–3698. 10.1105/tpc.113.114553 Jiao C et al (2025) Pan-genome bridges wheat structural variations with habitat and breeding. Nature 637:384–393. 10.1038/s41586-024-08277-0 Mago R et al (2005) High-resolution mapping and mutation analysis separate the rust resistance genes Sr31, Lr26 and Yr9 on the short arm of rye chromosome 1. Theor Appl Genet 112:41–50. 10.1007/s00122-005-0098-9 Hurni S et al (2013) Rye Pm8 and wheat Pm3 are orthologous genes and show evolutionary conservation of resistance function against powdery mildew. Plant J 76:957–969. 10.1111/tpj.12345 Rabanus-Wallace MT et al (2021) Chromosome-scale genome assembly provides insights into rye biology, evolution and agronomic potential. Nat Genet 53:564–573. 10.1038/s41588-021-00807-0 Li G et al (2021) A high-quality genome assembly highlights rye genomic characteristics and agronomically important genes. Nat Genet 53:574–584. 10.1038/s41588-021-00808-z Liao WW et al (2023) A draft human pangenome reference. Nature 617:312–324. 10.1038/s41586-023-05896-x Naish M et al (2021) The genetic and epigenetic landscape of the Arabidopsis centromeres. Science 374:eabi7489. 10.1126/science.abi7489 Shang L et al (2023) A complete assembly of the rice Nipponbare reference genome. Mol Plant. 10.1016/j.molp.2023.08.003 Chen J et al (2023) A complete telomere-to-telomere assembly of the maize genome. Nat Genet 55:1221–1231. 10.1038/s41588-023-01419-6 Zhang C et al (2023) The T2T genome assembly of soybean cultivar ZH13 and its epigenetic landscapes. Mol Plant 16:1715–1718. 10.1016/j.molp.2023.10.003 Li W et al (2025) A gap-free complete genome assembly of oat and OatOmics, a multi-omics database. Mol Plant 18:179–182. 10.1016/j.molp.2025.01.006 Yan H et al (2025) Post-polyploidization centromere evolution in cotton. Nat Genet. 10.1038/s41588-025-02115-3 Huang G et al (2024) A telomere-to-telomere cotton genome assembly reveals centromere evolution and a Mutator transposon-linked module regulating embryo development. Nat Genet. 10.1038/s41588-024-01877-6 Chen C et al (2025) A comprehensive omics resource and genetic tools for functional genomics research and genetic improvement of sorghum. Mol Plant. 10.1016/j.molp.2025.03.005 Tian Y et al (2025) The near-complete genome assembly of pickling cucumber and its mutation library illuminate cucumber functional genomics and genetic improvement. Mol Plant. 10.1016/j.molp.2025.03.001 Liu S et al (2025) A telomere-to-telomere genome assembly coupled with multi-omic data provides insights into the evolution of hexaploid bread wheat. Nat Genet. 10.1038/s41588-025-02137-x Wicker T et al (2018) Impact of transposable elements on genome structure and evolution in bread wheat. Genome Biol 19:103. 10.1186/s13059-018-1479-0 Gustafson JP, Dera AR, Petrovic S (1988) Expression of modified rye ribosomal RNA genes in wheat. Proc Natl Acad Sci U S A 85:3943–3945. 10.1073/pnas.85.11.3943 Guo J et al (2019) Frequent variations in tandem repeats pSc200 and pSc119.2 cause rapid chromosome evolution of open-pollinated rye. Mol Breeding 39. 10.1007/s11032-019-1033-0 Alkhimova AG, Heslop-Harrison JS, Shchapova AI, Vershinin AV (1999) Rye chromosome variability in wheat-rye addition and substitution lines. Chromosome Res 7:205–212. 10.1023/a:1009299300018 Liu C et al (2024) Young retrotransposons and non-B DNA structures promote the establishment of dominant rye centromere in the 1RS.1BL fused centromere. New Phytol 241:607–622. 10.1111/nph.19359 Francki MG (2001) Identification of Bilby, a diverged centromeric Ty1-copia retrotransposon family from cereal rye (Secale cereale L). Genome 44:266–274. 10.1139/g00-112 Ahmed HI et al (2023) Einkorn genomics sheds light on history of the oldest domesticated wheat. Nature. 10.1038/s41586-023-06389-7 Heuberger M et al (2024) Evolution of Einkorn wheat centromeres is driven by the mutualistic interplay of two LTR retrotransposons. Mob DNA 15:16. 10.1186/s13100-024-00326-9 Bauer E et al (2017) Towards a whole-genome sequence for rye (Secale cereale L). Plant J 89:853–869. 10.1111/tpj.13436 Marone MP, Singh HC, Pozniak CJ, Mascher M (2022) A technical guide to TRITEX, a computational pipeline for chromosome-scale sequence assembly of plant genomes. Plant Methods 18:128. 10.1186/s13007-022-00964-1 Cheng H, Asri M, Lucas J, Koren S, Li H (2024) Scalable telomere-to-telomere assembly for diploid and polyploid genomes with double graph. Nat Methods 21:967–970. 10.1038/s41592-024-02269-8 Bousios A, Kakutani T, Henderson IR (2025) Centrophilic Retrotransposons of Plant Genomes. Annu Rev Plant Biol. 10.1146/annurev-arplant-083123-082220 Liu C et al (2024) Unveiling the distinctive traits of functional rye centromeres: minisatellites, retrotransposons, and R-loop formation. Sci China Life Sci. 10.1007/s11427-023-2524-0 Ahmed HI et al (2023) Einkorn genomics sheds light on history of the oldest domesticated wheat. Nature 620:830–838. 10.1038/s41586-023-06389-7 Mascher M et al (2017) A chromosome conformation capture ordered sequence of the barley genome. Nature 544:427–433. 10.1038/nature22043 Wicker T et al (2017) The repetitive landscape of the 5100 Mbp barley genome. Mob DNA 8:22. 10.1186/s13100-017-0102-3 Wicker T et al (2022) Transposable Element Populations Shed Light on the Evolutionary History of Wheat and the Complex Co-Evolution of Autonomous and Non-Autonomous Retrotransposons. Adv Genet (Hoboken) 3:2100022. 10.1002/ggn2.202100022 Li G et al (2025) Genomic analysis of Zhou8425B, a key founder parent, reveals its genetic contributions to elite agronomic traits in wheat breeding. Plant Commun 6:101222. 10.1016/j.xplc.2024.101222 Wang Z et al (2025) Near-complete assembly and comprehensive annotation of the wheat Chinese Spring genome. Mol Plant. 10.1016/j.molp.2025.02.002 Jia J et al (2023) Genome resources for the elite bread wheat cultivar Aikang 58 and mining of elite homeologous haplotypes for accelerating wheat improvement. Mol Plant 16:1893–1910. 10.1016/j.molp.2023.10.015 Wang C et al (2025) The Yr9 gene encoding a CC-NBS-LRR protein in the 1RS-1BL translocation confers wheat stripe rust resistance. Sci China Life Sci. 10.1007/s11427-025-2929-6 Yu Y et al (2025) Wheat stripe rust resistance gene Yr9, derived from rye, is a CC-NBS-LRR gene in a highly conserved NLR cluster. Sci China Life Sci. 10.1007/s11427-024-2932-5 Chen J et al (2024) The genetic mechanism of B chromosome drive in rye illuminated by chromosome-scale assembly. Nat Commun 15:9686. 10.1038/s41467-024-53799-w Wehrkamp CM, Heuberger M, Wicker T (2025) Comparative analysis of centromeres of oat ( Avena sativa ) and its tetraploid and diploid relatives reveals rapid evolution of centromere composition and architecture. bioRxiv , 2025.2004.2008.647780. 10.1101/2025.04.08.647780 Schlegel R, Korzun V (2006) About the origin of 1RS.1BL wheat-rye chromosome translocations from Germany. Plant Breeding 116:537–540. 10.1111/j.1439-0523.1997.tb02186.x Danecek P et al (2021) Twelve years of SAMtools and BCFtools. Gigascience 10. 10.1093/gigascience/giab008 Shen W, Sipos B, Zhao L (2024) SeqKit2: A Swiss army knife for sequence and alignment processing. Imeta 3:e191. 10.1002/imt2.191 Jayakodi M et al (2020) The barley pan-genome reveals the hidden legacy of mutation breeding. Nature. 10.1038/s41586-020-2947-8 Rhie A, Walenz BP, Koren S, Phillippy AM (2020) Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol 21:245. 10.1186/s13059-020-02134-9 Ranallo-Benavidez TR, Jaron KS, Schatz MC (2020) GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat Commun 11:1432. 10.1038/s41467-020-14998-3 Simkova H, Tulpova Z, Capal P (2023) Flow Sorting-Assisted Optical Mapping. Methods Mol Biol 2672:465–483. 10.1007/978-1-0716-3226-0_28 Tse OYO et al (2021) Genome-wide detection of cytosine methylation by single molecule real-time sequencing. Proc Natl Acad Sci U S A 118. 10.1073/pnas.2019768118 Robinson JT et al (2011) Integrative genomics viewer. Nat Biotechnol 29:24–26. 10.1038/nbt.1754 Mascher M et al (2021) Long-read sequence assembly: a technical evaluation in barley. Plant Cell 33:1888–1906. 10.1093/plcell/koab077 Stiehler F et al (2021) Helixer: cross-species gene annotation of large eukaryotic genomes using deep learning. Bioinformatics 36:5291–5298. 10.1093/bioinformatics/btaa1044 Ou S et al (2019) Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline. Genome Biol 20:275. 10.1186/s13059-019-1905-y Goel M, Schneeberger K (2022) plotsr: visualizing structural similarities and rearrangements between multiple genomes. Bioinformatics 38:2922–2926. 10.1093/bioinformatics/btac196 Li H (2018) Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34:3094–3100. 10.1093/bioinformatics/bty191 Goel M, Sun H, Jiao WB, Schneeberger K (2019) SyRI: finding genomic rearrangements and local sequence differences from whole-genome assemblies. Genome Biol 20:277. 10.1186/s13059-019-1911-0 Langmead B, Salzberg SL (2012) Fast gapped-read alignment with Bowtie 2. Nat Methods 9:357–359. 10.1038/nmeth.1923 Ramirez F et al (2016) deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res 44:W160–165. 10.1093/nar/gkw257 Aliyeva-Schnorr L, Ma L, Houben AA (2015) Fast Air-dry Dropping Chromosome Preparation Method Suitable for FISH in Plants. J Vis Exp e53470. 10.3791/53470 Aliyeva-Schnorr L et al (2015) Cytogenetic mapping with centromeric bacterial artificial chromosomes contigs shows that this recombination-poor region comprises more than half of barley chromosome 3H. Plant J 84:385–394. 10.1111/tpj.13006 Fu S et al (2015) Oligonucleotide Probes for ND-FISH Analysis to Identify Rye and Wheat Chromosomes. Sci Rep 5:10552. 10.1038/srep10552 Zhang Y et al (2008) Model-based analysis of ChIP-Seq (MACS). Genome Biol 9, R137. 10.1186/gb-2008-9-9-r137 Chen S, Zhou Y, Chen Y, Gu J (2018) fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34:i884–i890. 10.1093/bioinformatics/bty560 Md. V, Sanchit M, Heng L, Srinivas A (2019) Efficient Architecture-Aware Acceleration of BWA-MEM for Multicore Systems. 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS) , 314–324. 10.1109/IPDPS.2019.00041 Burgin J et al (2023) The European Nucleotide Archive in 2022. Nucleic Acids Res 51:D121–D125. 10.1093/nar/gkac1051 Additional Declarations There is NO Competing Interest. Supplementary Files Lo7SupplementaryTable250630.xlsx Lo7_supplementary_table Lo7SupplementaryFigure250630.docx Lo7_supplementary_figures_250630 AcroBrwExnrreportingsummaryErwang250704ns.pdf Reporting Summary Cite Share Download PDF Status: Under Review Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-7013114","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":480291438,"identity":"d7a46305-51f0-48b7-b113-11827bc7a6f4","order_by":0,"name":"Nils Stein","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABE0lEQVRIie2PMUvDQBTHXyncLS/t+krAfIVIoRY65KtcCNhFgpIlQ7EHhWzuilK/ReccB5mi4OYkQtYOhSyCIF6LBZejugnebznuvffj/R+Aw/FHKSEnoP2vB7h9uvZ5tlXqL6XcFX6gQKeAXyjB/ULpi+U4HUj+1G7yl4jxK9Wcw+TIpoQVE/pmRZkPmFFZZ3GBj8nwGqZDq8Iw1N6K4mWwFqQKIRidjXwEHUtbsKK/0d6dUQCTN/UhIhasT96NMrcpUJmuJym+BV6RkqJTEI66pijst5yGGivKBhLZuK6EuSVNfAynx9ZgC920OLtMqeTNcz4TUZ8/qBbzSWDbssfEwPD79kPCTuGvh8ccDofjX/IJahVRPD0oib4AAAAASUVORK5CYII=","orcid":"https://orcid.org/0000-0003-3011-8731","institution":"Leibniz Institute of Plant Genetics and Crop Plant Research (IPK)","correspondingAuthor":true,"prefix":"","firstName":"Nils","middleName":"","lastName":"Stein","suffix":""},{"id":480291439,"identity":"cf6b1fc3-005f-4e4a-9bba-ffed97b19793","order_by":1,"name":"Erwang Chen","email":"","orcid":"https://orcid.org/0000-0002-6705-2417","institution":"Leibniz Institute of Plant Genetics and Crop Plant Research (IPK)","correspondingAuthor":false,"prefix":"","firstName":"Erwang","middleName":"","lastName":"Chen","suffix":""},{"id":480291440,"identity":"3cc9dd33-7cd9-4b59-a634-31fa7a2d6052","order_by":2,"name":"Srijan Jhingan","email":"","orcid":"","institution":"Leibniz Institute of Plant Genetics and Crop Plant Research (IPK)","correspondingAuthor":false,"prefix":"","firstName":"Srijan","middleName":"","lastName":"Jhingan","suffix":""},{"id":480291441,"identity":"92c8ebbd-34c9-4cf4-9095-6d80aff8f6a5","order_by":3,"name":"Zihao Zhu","email":"","orcid":"https://orcid.org/0000-0002-2296-2724","institution":"Leibniz Institute of Plant Genetics and Crop Plant Research (IPK)","correspondingAuthor":false,"prefix":"","firstName":"Zihao","middleName":"","lastName":"Zhu","suffix":""},{"id":480291442,"identity":"442a0d55-5d3c-47c2-a29a-f6d3aaa89e27","order_by":4,"name":"Jianyong Chen","email":"","orcid":"","institution":"Institute of Plant Genetics and Crop Plant Research (IPK)","correspondingAuthor":false,"prefix":"","firstName":"Jianyong","middleName":"","lastName":"Chen","suffix":""},{"id":480291443,"identity":"8752689a-5067-44d7-9be4-722819df656b","order_by":5,"name":"Axel Himmelbach","email":"","orcid":"https://orcid.org/0000-0001-7338-0946","institution":"Leibniz Institute of Plant Genetics and Crop Plant Research","correspondingAuthor":false,"prefix":"","firstName":"Axel","middleName":"","lastName":"Himmelbach","suffix":""},{"id":480291444,"identity":"d6c7dbf3-77f6-43d9-ab89-6af1f2bd85f0","order_by":6,"name":"Jia-Wu Feng","email":"","orcid":"https://orcid.org/0000-0003-3737-4850","institution":"Leibniz Institute of Plant Genetics and Crop Plant Research","correspondingAuthor":false,"prefix":"","firstName":"Jia-Wu","middleName":"","lastName":"Feng","suffix":""},{"id":480291445,"identity":"1b5e7409-30e4-4d46-9295-f6fb37819f37","order_by":7,"name":"Martin Mascher","email":"","orcid":"https://orcid.org/0000-0001-6373-6013","institution":"Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben","correspondingAuthor":false,"prefix":"","firstName":"Martin","middleName":"","lastName":"Mascher","suffix":""},{"id":480291446,"identity":"512eacd4-2f2f-4306-9b51-dc1f6477ff4a","order_by":8,"name":"Andreas Houben","email":"","orcid":"https://orcid.org/0000-0003-3419-239X","institution":"Leibniz Institute of Plant Genetics and Crop Plant Research (IPK)","correspondingAuthor":false,"prefix":"","firstName":"Andreas","middleName":"","lastName":"Houben","suffix":""},{"id":480291447,"identity":"2b122aa6-67bc-4033-9a27-f24d8a4d82f7","order_by":9,"name":"Carlotta Wehrkamp","email":"","orcid":"https://orcid.org/0009-0009-5275-7295","institution":"University of Zurich","correspondingAuthor":false,"prefix":"","firstName":"Carlotta","middleName":"","lastName":"Wehrkamp","suffix":""},{"id":480291448,"identity":"420acff8-fddd-48e3-b6b1-82d3d2ccd533","order_by":10,"name":"Thomas Wicker","email":"","orcid":"https://orcid.org/0000-0002-6777-7135","institution":"University of Zurich","correspondingAuthor":false,"prefix":"","firstName":"Thomas","middleName":"","lastName":"Wicker","suffix":""},{"id":480291449,"identity":"f5253b7d-b94c-471e-8e44-226be5e41156","order_by":11,"name":"Thomas Lux","email":"","orcid":"https://orcid.org/0000-0002-5543-1911","institution":"German Research Center for Environmental Health","correspondingAuthor":false,"prefix":"","firstName":"Thomas","middleName":"","lastName":"Lux","suffix":""},{"id":480291450,"identity":"028a713c-e13a-4466-93d5-3294ec47e5e4","order_by":12,"name":"Manuel Spannagl","email":"","orcid":"","institution":"Helmholtz Helmholtz-Center Munich, German Research Center for Environmental Health","correspondingAuthor":false,"prefix":"","firstName":"Manuel","middleName":"","lastName":"Spannagl","suffix":""},{"id":480291451,"identity":"20e0ede4-d7dc-426a-a0f6-f0621d24d7ff","order_by":13,"name":"Zuzana Tulpova","email":"","orcid":"https://orcid.org/0000-0002-4403-9367","institution":"Institute of Experimental Botany of the Czech Academy of Sciences","correspondingAuthor":false,"prefix":"","firstName":"Zuzana","middleName":"","lastName":"Tulpova","suffix":""},{"id":480291452,"identity":"bf2ca671-1c27-409f-b33b-2ddbd72e708d","order_by":14,"name":"Hana Simkova","email":"","orcid":"https://orcid.org/0000-0003-4159-7619","institution":"Centre of the Region Haná for Biotechnological and Agricultural Research","correspondingAuthor":false,"prefix":"","firstName":"Hana","middleName":"","lastName":"Simkova","suffix":""},{"id":480291453,"identity":"2b6f3697-2649-434a-bee0-dfe99e34051a","order_by":15,"name":"Mark Rabanus-Wallace","email":"","orcid":"https://orcid.org/0000-0002-4663-985X","institution":"The University of Melbourne","correspondingAuthor":false,"prefix":"","firstName":"Mark","middleName":"","lastName":"Rabanus-Wallace","suffix":""}],"badges":[],"createdAt":"2025-06-30 17:30:44","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-7013114/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-7013114/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":86243764,"identity":"fbd6b13e-cd2b-4985-8b8f-0b2a34d17d40","added_by":"auto","created_at":"2025-07-08 11:09:54","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":910841,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eAssembly of the inbred line Lo7 genome. \u003c/strong\u003eCircular display of the key characteristics of the Lo7_V3 genome assembly (plotted in Circos, mkweb.bcgsc.ca/circos/). The circles, from the outside to the inside, correspond to the following: (I) Densities [0-43] of tandem repeats (pSc200, pSc250, and pSc119.2). (II) 5mCpG methylation [0 - 110], averaged in 10 Mb windows. (III) Gene density [0 - 70], the density was computed using non-overlapping sliding windows of 10 Mb across the pseudomolecules. (IV) Centromeric peaks were validated using CENH3 ChIP-seq data. The values [0 -0.4] represent the normalized signal intensity range where centromeric peaks were observed. (V– IX) Densities of transposable elements (TEs) in 10 Mb windows, including: (V) All TEs [0 - 153], (VI) LTR/\u003cem\u003eGypsy\u003c/em\u003e elements [0 - 77], (VII) LTR/\u003cem\u003eCopia\u003c/em\u003e elements [0 - 77], (VIII) TIR elements [0 - 79], (IX) Helitron elements [0 - 77]. Color intensity in tracks III-IX ranges from yellow (low values), through blue (intermediate), to purple (high values), reflecting feature abundance or signal strength.\u003c/p\u003e","description":"","filename":"image1.png","url":"https://assets-eu.researchsquare.com/files/rs-7013114/v1/d93bd7cb091b95eaa295e695.png"},{"id":86244035,"identity":"397144e3-989f-4eba-a9e7-a56bf32b139f","added_by":"auto","created_at":"2025-07-08 11:17:54","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":150652,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eOrientation error corrections within chromosome 1R. a\u003c/strong\u003eCollinearity plot between Lo7_V2 (\u003cem\u003ey\u003c/em\u003e-axis) and Lo7_V3 (\u003cem\u003ex\u003c/em\u003e-axis). Grey lines indicate scaffold/contig junctions within chromosome 1R. The blue square indicates the region from 200 to 400 Mb. \u003cstrong\u003eb\u003c/strong\u003eZoom-in (200 - 400 Mb) on 1R, showing two inversion-like orientation errors in Lo7_V2 in comparison within a single contig in Lo7_V3. \u003cstrong\u003ec-d\u003c/strong\u003e Hi-C contact map of Lo7_V2 \u003cstrong\u003e(c)\u003c/strong\u003eand Lo7_V3 \u003cstrong\u003e(d)\u003c/strong\u003e on 1R (200 - 400 Mb). The region with low interactions represents the centromeric region within Lo7_V3 1R.\u003c/p\u003e","description":"","filename":"image2.png","url":"https://assets-eu.researchsquare.com/files/rs-7013114/v1/caa7491de1066856a54c8e71.png"},{"id":86243769,"identity":"67c2d57f-c9d2-463c-942b-cc65e5d9b79e","added_by":"auto","created_at":"2025-07-08 11:09:54","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":547128,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eGenome assembly validations from \u003c/strong\u003e\u003cem\u003e\u003cstrong\u003ein silico\u003c/strong\u003e\u003c/em\u003e\u003cstrong\u003e computation to FISH. a \u003c/strong\u003eIdentification of rye tandem repeats in the subtelomeric regions of each mitotic metaphase chromosome, including pSc200 (226.9 Mb, representing 3.4% of the genome size) and pSc250 (29.6 Mb, 0.44%).\u003cstrong\u003e \u003c/strong\u003eFrequency values range from 0 to 90.\u003cstrong\u003eb \u003c/strong\u003eThe FISH experiment revealed the signals of painting probes and pSc200/250 on each chromosome, with DAPI used as to stain chromatin. Schematic chromosomes show the \u003cem\u003ein silico\u003c/em\u003epredicted probe positions as deduced in Supplementary Figure 7a. The painting probes were labeled in green or red, allowing clear differentiation of each chromosome. pSc200 and pSc250 signals highlighted the subtelomeric regions on every chromosome. The alignment between FISH signals and the \u003cem\u003ein silico\u003c/em\u003e predictions highlights the consistency between cytogenetic validation and computational genome modeling.\u003c/p\u003e","description":"","filename":"image3.png","url":"https://assets-eu.researchsquare.com/files/rs-7013114/v1/d8211fa692b4d65b70b741cd.png"},{"id":86243766,"identity":"d2f3da72-32ed-4b88-ab6e-8806f1e0a4a7","added_by":"auto","created_at":"2025-07-08 11:09:54","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":125942,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eIdentification of seven complete centromeres and in-depth structural analysis of a centromeric gene on chromosome 1R. a-b\u003c/strong\u003e Mapping of CENH3 ChIP-seq data\u003csup\u003e22\u003c/sup\u003e to the Lo7_V2 and Lo7_V3 sequence assemblies. Compared to the earlier Lo7_V2 assembly, the updated Lo7_V3 assembly (\u003cstrong\u003ea\u003c/strong\u003e) showed that CENH3 ChIP-seq signals were almost exclusively localized to the seven defined centromeric regions. The majority of CENH3 ChIP-seq reads were mapped to unanchored parts in Lo7_V2, indicating a highly fragmented centromeric landscape lacking chromosomal context (\u003cstrong\u003eb\u003c/strong\u003e). \u003cstrong\u003ec\u003c/strong\u003e Detailed structural analysis of a gene encoding the eIF-2B subunit beta-like protein, located within the centromeric region of chromosome 1R.The analysis integrated multiple datasets, including CENH3 ChIP-seq (I), HiFi read mapping (II), ONT read mapping (III), 5mCpG methylation (IV), ATAC-seq1/2 (V-VI), and RNA-seq from three tissues, spike, root, and aerial organs (VII-IX)—along with high-confidence gene annotation (X). ACR, accessible chromatin region. All datasets were visualized using IGV.\u003c/p\u003e","description":"","filename":"image4.png","url":"https://assets-eu.researchsquare.com/files/rs-7013114/v1/01a5fdc4ffb6d62341260b16.png"},{"id":86245059,"identity":"39badc56-938c-4963-9338-71a0eaf82adf","added_by":"auto","created_at":"2025-07-08 11:25:54","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":137515,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eThe retrotransposon families \u003c/strong\u003e\u003cem\u003e\u003cstrong\u003eRLG_Abia\u003c/strong\u003e\u003c/em\u003e\u003cstrong\u003e (orange), \u003c/strong\u003e\u003cem\u003e\u003cstrong\u003eRLG_Cereba\u003c/strong\u003e\u003c/em\u003e\u003cstrong\u003e (dark purple) and \u003c/strong\u003e\u003cem\u003e\u003cstrong\u003eRLG_Quinta\u003c/strong\u003e\u003c/em\u003e\u003cstrong\u003e(light purple) reveal inversions in centromere of chromosome 7R.\u003c/strong\u003e \u003cstrong\u003ea\u003c/strong\u003ePositions of identified full-length copies across the genome. \u003cstrong\u003eb\u003c/strong\u003eDistribution of the estimated ages (insertion times) of copies from these families across the pericentromeric region of chromosome 4R. Copies with an estimated age of 3 myrs or younger are shown. The marks on the gray track display the assembly gaps found in the respective region per chromosome (top). The bottom track shows the corresponding CENH3 ChIP-seq data. \u003cstrong\u003ec\u003c/strong\u003e CENH3 ChIP-seq data of the pericentromeric region of chromosome 7R (top). Distribution of full-length copies’ estimates insertion ages across the displayed region. The dotted lines indicate the proposed breakpoints of inversions which are labeled with A-C corresponding to the inversions depicted in the model presented in \u003cstrong\u003ed\u003c/strong\u003e.\u003c/p\u003e","description":"","filename":"image5.png","url":"https://assets-eu.researchsquare.com/files/rs-7013114/v1/427c678dcdbac05add65e443.png"},{"id":86245479,"identity":"b5186e2e-d645-4671-a2d9-7decaa8957b1","added_by":"auto","created_at":"2025-07-08 11:33:56","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":3354393,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7013114/v1/3b1c5ea5-7473-44a6-b9e9-080eb9d34790.pdf"},{"id":86243763,"identity":"e4e31cf0-cf70-4d27-9767-896f4cd70021","added_by":"auto","created_at":"2025-07-08 11:09:54","extension":"xlsx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":25163,"visible":true,"origin":"","legend":"Lo7_supplementary_table","description":"","filename":"Lo7SupplementaryTable250630.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-7013114/v1/9f6b6c30a7128c80bdeb13d8.xlsx"},{"id":86243776,"identity":"1f042abc-bd60-49e8-88d7-0784783f07c3","added_by":"auto","created_at":"2025-07-08 11:09:54","extension":"docx","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":4637728,"visible":true,"origin":"","legend":"Lo7_supplementary_figures_250630","description":"","filename":"Lo7SupplementaryFigure250630.docx","url":"https://assets-eu.researchsquare.com/files/rs-7013114/v1/16cc442d29167356630c7159.docx"},{"id":86243768,"identity":"4e9f1a46-83da-423d-8bf4-fd92794559a5","added_by":"auto","created_at":"2025-07-08 11:09:54","extension":"pdf","order_by":3,"title":"","display":"","copyAsset":false,"role":"supplement","size":1667752,"visible":true,"origin":"","legend":"Reporting Summary","description":"","filename":"AcroBrwExnrreportingsummaryErwang250704ns.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7013114/v1/4b5b8585c8b464b09a80c866.pdf"}],"financialInterests":"There is \u003cb\u003eNO\u003c/b\u003e Competing Interest.","formattedTitle":"Unveiling centromeric retrotransposon dynamics through a near-complete rye genome assembly","fulltext":[{"header":"Introduction","content":"\u003cp\u003eRye is a diploid cereal crop within the tribe \u003cem\u003eTriticeae\u003c/em\u003e, which poses unique challenges for genome assembly due to its large genome (\u0026sim;7 Gb), high repeat contents, and extensive structural heterogeneity\u003csup\u003e\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e\u003c/sup\u003e. The 1RS/1BL translocation, in which the short arm of rye chromosome 1R (1RS) replaces the short arm of wheat chromosome 1B, has been widely utilized in wheat breeding programs\u003csup\u003e\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e\u003c/sup\u003e. This translocation has contributed significantly to disease resistance, abiotic stress tolerance, and yield improvement. The introgression of rye chromatin, particularly from 1RS, has introduced beneficial alleles associated with resistance to rusts (\u003cem\u003eSr31\u003c/em\u003e, \u003cem\u003eLr26\u003c/em\u003e and \u003cem\u003eYr9\u003c/em\u003e)\u003csup\u003e\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e\u003c/sup\u003e, and powdery mildew (\u003cem\u003ePm8\u003c/em\u003e)\u003csup\u003e\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e\u003c/sup\u003e, making it one of the most successful examples of wide hybridization in modern wheat breeding. Despite its importance for improving stress tolerance and disease resistance in wheat, progress in rye genomics has been hindered by the lack of a high-quality reference genome. To date, high-quality genome assemblies for two rye inbred lines have been published: Lo7 and Weining\u003csup\u003e\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e,\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e\u003c/sup\u003e. While both assemblies represented significant breakthroughs for rye research and breeding, they suffered from gaps and structural inaccuracy, low repeat resolution, and missing centromere representation. Based on the advances in sequencing technology, it was our aim to overcome these shortcomings, ultimately advancing rye genomics.\u003c/p\u003e \u003cp\u003eRecent advancements in long-read sequencing technologies such as PacBio HiFi and Oxford Nanopore, combined with techniques like high-throughput chromatin conformation capture (Hi-C) or BioNano optical map scaffolding, have facilitated the completion of complex genomes in human and plants\u003csup\u003e\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e,\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e\u003c/sup\u003e. For instance, a complete T2T genome assembly for \u003cem\u003eArabidopsis thaliana\u003c/em\u003e revealed comprehensive insights into centromeric regions, structural variations, and evolutionary dynamics\u003csup\u003e\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e\u003c/sup\u003e, while recent T2T (or near-complete) genome assemblies of crops like rice\u003csup\u003e\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e\u003c/sup\u003e, maize\u003csup\u003e\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e\u003c/sup\u003e, soybean\u003csup\u003e\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e\u003c/sup\u003e, oat\u003csup\u003e\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e\u003c/sup\u003e, cotton\u003csup\u003e\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e,\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e\u003c/sup\u003e, sorghum\u003csup\u003e\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e\u003c/sup\u003e and wheat\u003csup\u003e\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e,\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e\u003c/sup\u003e have significantly expanded our understanding of genome architecture and function. These achievements highlight the power of T2T assemblies in addressing previously unresolved genomic challenges, such as repetitive regions and structural variants, across diverse species. Given the recent success of complete assembly in species like wheat, an improved rye genome assembly will improve the basis for studying rye genome organization and architecture.\u003c/p\u003e \u003cp\u003eThe rye genome is exceptionally rich in repetitive sequences, with approximately 90% of its genome composed of various types of repeats\u0026mdash;a proportion notably higher than that found in related \u003cem\u003eTriticeae\u003c/em\u003e species such as barley and wheat\u003csup\u003e\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e,\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e\u003c/sup\u003e. For example, the nucleolar organizer region (NOR) on chromosome 1R contains large arrays of 45S rDNA tandem repeats, serving as the site for ribosomal RNA gene clusters and contributing significantly to the structural complexity\u003csup\u003e\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e\u003c/sup\u003e. Transposable elements (TEs), particularly long terminal repeat retrotransposons (LTR-RTs), like \u003cem\u003eGypsy\u003c/em\u003e and \u003cem\u003eCopia\u003c/em\u003e superfamilies, dominate the rye genome and play a central role in genome size expansion and structural variation\u003csup\u003e\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e,\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e\u003c/sup\u003e. In addition to dispersed TEs, satellite sequences are widely distributed throughout the genome and are often organized in large blocks, contributing to heterochromatin formation. Likewise, rye-specific tandem repeats, pSc200, pSc250, and pSc119.2, are predominantly localized in the subtelomeric regions of chromosomes 1R to 7R\u003csup\u003e\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e,\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e\u003c/sup\u003e. With this, the abundance and diversity of repetitive sequences in the rye genome not only reflect its complex evolutionary history but also present challenges for genome assembly.\u003c/p\u003e \u003cp\u003ePrevious studies have shown that centromeres in rye are enriched with centromere-specific retrotransposons and satellite repeats\u003csup\u003e\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e,\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e\u003c/sup\u003e. The centromeric regions of the rye genome are predominantly composed of TEs, with LTR retrotransposons being the major components. Among these, the \u003cem\u003eGypsy\u003c/em\u003e superfamily is particularly abundant and plays a central role in shaping the structure of centromeric chromatin. Although \u003cem\u003eCopia\u003c/em\u003e elements are also present, they are less prevalent compared to \u003cem\u003eGypsy\u003c/em\u003e elements. The high density and accumulation of \u003cem\u003eGypsy\u003c/em\u003e-type retrotransposons in the centromeres suggest their importance in centromere function and evolution in rye. Another two previous studies presented the analysis of near-complete centromere sequences in Einkorn wheat\u003csup\u003e\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e,\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e\u003c/sup\u003e. These two studies found that \u003cem\u003eTriticum monococcum\u003c/em\u003e centromeres are primarily composed of two retrotransposon families, \u003cem\u003eRLG_Cereba\u003c/em\u003e and \u003cem\u003eRLG_Quinta\u003c/em\u003e, both of which belong to the \u003cem\u003eGypsy\u003c/em\u003e superfamily of LTR retrotransposons. Notably, \u003cem\u003eRLG_Quinta\u003c/em\u003e is a non-autonomous element, lacking essential coding sequences such as reverse transcriptase and integrase, and thus relies on the enzymatic machinery provided by its autonomous partner like \u003cem\u003eRLG_Cereba\u003c/em\u003e for its propagation (i.e., \u003cem\u003eRLG_Quinta\u003c/em\u003e is a \u0026ldquo;parasite\u0026rdquo; of \u003cem\u003eRLG_Cereba\u003c/em\u003e).\u003c/p\u003e \u003cp\u003eIn this study, we present a chromosome-scale, near-complete genome assembly for rye inbred variety Lo7, utilizing state-of-the-art sequencing and scaffolding techniques. Compared with previous versions, Lo7_V1\u003csup\u003e26\u003c/sup\u003e and Lo7_V2\u003csup\u003e5\u003c/sup\u003e, this assembly further improved the resolution of complex repetitive regions in subtelomeres and centromeres. We have significantly improved the Lo7 genome by reducing assembly gaps, correcting orientation mistakes, and enhancing genome completeness.\u003c/p\u003e"},{"header":"Results","content":"\u003cp\u003e \u003cb\u003eHigh-quality genome assembly of rye genotype Lo7\u003c/b\u003e \u003c/p\u003e \u003cp\u003eTo generate a high-quality reference assembly for rye Lo7, we employed the TRITEX pipeline\u003csup\u003e\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e\u003c/sup\u003e, integrating data generated by multiple advanced sequencing technologies: PacBio HiFi reads (38-fold haploid coverage), ONT long reads (\u0026gt;\u0026thinsp;25 kb, 9-fold coverage), and Hi-C reads (20-fold coverage) for chromosome-level scaffolding (\u003cb\u003eSupplementary Fig.\u0026nbsp;1a and Supplementary Table\u0026nbsp;1\u003c/b\u003e). We estimated the heterozygosity of Lo7 using k-mer analysis, determining it to be 0.06% (\u003cb\u003eSupplementary Fig.\u0026nbsp;1b\u003c/b\u003e). The initial contigs were assembled by combining HiFi and ONT long reads using hifiasm\u003csup\u003e\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e\u003c/sup\u003e. These contigs were manually curated with Hi-C data to generate a draft assembly (\u003cb\u003eSupplementary Fig.\u0026nbsp;2\u003c/b\u003e), which was further evaluated and refined using a BioNano optical map (\u003cb\u003eSupplementary Fig.\u0026nbsp;3 and Supplementary Table\u0026nbsp;2\u003c/b\u003e). Finally, genome annotation was conducted using RNA-seq and Iso-seq data (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003ea). Additional features were systematically annotated, including chromosome lengths, repetitive sequences, centromere locations, and 5mCpG methylation (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eb).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eThe final assembly spanned 6.76 Gb, with a contig N50 of 128 Mb and a GC content of 50% (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). The lengths of the seven chromosomes ranged from 776 Mb to 1.1 Gb, which is 7.4% longer on average compared to each chromosome length in the earlier Lo7_V2 assembly (\u003cb\u003eSupplementary Table\u0026nbsp;3\u003c/b\u003e). We annotated 75,850 protein-coding genes, including 43,754 high-confidence (HC) and 32,096 low-confidence (LC) genes (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). LTR-RT annotation yielded an average LTR Assembly Index (LAI) of 23.47, indicating high assembly quality (\u003cb\u003eSupplementary Table\u0026nbsp;4\u003c/b\u003e). We identified two enrichment regions for 5S and 45S rDNA sequences on chromosome 1R, coinciding and thus confirming previous findings of the NOR on chromosome 1R (\u003cb\u003eSupplementary Fig.\u0026nbsp;4a\u003c/b\u003e). Moreover, by mapping telomere-specific short sequence motifs (TTTAGGG), we identified 6 of the expected 14 telomeres (\u003cb\u003eSupplementary Fig.\u0026nbsp;4a\u003c/b\u003e). Overall, we provided a Lo7_V3 genome assembly with high contiguity.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003e\u003cb\u003eGenome assembly comparisons between Lo7_V2\u003c/b\u003e\u003csup\u003e\u003cb\u003e5\u003c/b\u003e\u003c/sup\u003e \u003cb\u003eand Lo7_V3.\u003c/b\u003e Completeness of two assemblies were evaluated based on HiFi long reads. HC, high-confidence. LC, low-confidence. x indicates sequencing depth.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"3\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eLo7_V2\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eLo7_V3\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eIllumina (Gb)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e947 (120x)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eHiFi (Gb)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e266.7 (38x)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eONT (Gb)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e65.1 (\u0026gt;\u0026thinsp;25 kb, 9x)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eN50 (Mb)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e15.2 (Scaffold)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e128.0 (Contig)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAnchored (Gb)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e6.21\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e6.68\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eUnanchored (Mb)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e528.4\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e79.6\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCompleteness (%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e97.3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e97.7\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTotal (Gb)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e6.74\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e6.76\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAll_genes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e57,222\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e75,850\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eHC_genes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e34,441\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e43,754\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLC_genes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e22,781\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e32,096\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eHC_chrUn\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e1,939\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e114\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eBUSCO_gene (%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e98.4\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e98.9\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eGC content (%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e48.8\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e50.0\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003eLo7_V3 outperforms Lo7_V2 in assembly contiguity, orientation correction, and gene annotation quality\u003c/b\u003e \u003c/p\u003e \u003cp\u003eLo7_V2 was assembled using high-coverage short reads (120-fold coverage), while Lo7_V3 incorporated both long-read HiFi and ONT sequencing technologies. Although the overall genome length remained comparable, Lo7_V3 demonstrated substantial improvements in chromosomal anchoring, reducing unanchored regions. Assembly completeness was assessed through k-mer analysis of HiFi reads, showing a slight increase from 97.3% in Lo7_V2 to 97.7% in Lo7_V3 (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). Gene annotation quality improved significantly, with more genes annotated and higher BUSCO scores for both HC and LC gene sets (\u003cb\u003eSupplementary Table\u0026nbsp;5\u003c/b\u003e). Notably, the number of genes in unanchored regions decreased dramatically from 1,939 in Lo7_V2 to just 114 in Lo7_V3 (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). Following improvements in gene annotation, the BUSCO completeness score rose from 98.4\u0026ndash;98.9% (\u003cb\u003eSupplementary Table\u0026nbsp;5\u003c/b\u003e). Assembly contiguity was strongly enhanced, with the number of remaining gaps decreasing from 570 to 106 (\u003cb\u003eSupplementary Fig.\u0026nbsp;4b\u003c/b\u003e). Taken together, these results demonstrate that Lo7_V3 represents a substantial upgrade over Lo7_V2 in terms of contiguity, scaffolding accuracy, and annotation quality.\u003c/p\u003e \u003cp\u003eTo evaluate orientation corrections, we generated collinearity plots comparing the scaffold-based Lo7_V2 and contig-based Lo7_V3 assemblies (\u003cb\u003eSupplementary Fig.\u0026nbsp;5\u003c/b\u003e). Analysis revealed nine regions with orientation errors in Lo7_V2, all of which were resolved in Lo7_V3 (\u003cb\u003eSupplementary Fig.\u0026nbsp;5\u003c/b\u003e). On chromosome 1R, Lo7_V2 contained two inversion-like orientation errors and three assembly gaps, particularly in the centromeric and pericentromeric regions (200\u0026ndash;300 Mb) (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003ea). These issues were corrected in Lo7_V3 within a single contig spanning the coordinate region between 260 Mb and 320 Mb of the respective chromosome sequence (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eb). Hi-C contact maps comparing both assemblies further provided validation of these structural improvements (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003ec,d).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003eShort tandem repeats form subtelomeric clusters\u003c/b\u003e \u003c/p\u003e \u003cp\u003eConsistent with previous findings in repeats annotation\u003csup\u003e\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e\u003c/sup\u003e, we revealed that 89.4% of the genome was annotated as repetitive sequences, including transposable elements (TEs) and other repeat fragments, which underscored the highly repetitive nature of the rye genome (\u003cb\u003eSupplementary Fig.\u0026nbsp;5 and Supplementary Table\u0026nbsp;6\u003c/b\u003e). To verify the accuracy of the assembly, we designed chromosome barcode painting probes from the assembly and validated them using fluorescence in situ hybridization (FISH) (\u003cb\u003eSupplementary Fig.\u0026nbsp;7a\u003c/b\u003e). These probes, selected based on single-copy sequences near the subtelomeric regions and labeled with distinct fluorescent dyes (green and red), enabled precise differentiation of each chromosome (\u003cb\u003eSupplementary Fig.\u0026nbsp;7b\u003c/b\u003e).\u003c/p\u003e \u003cp\u003eBy integrating HiFi and ONT data, Lo7_V3 successfully resolved complex regions containing rye satellite sequences, which reduced fragmentation within the unanchored regions compared to Lo7_V2 (\u003cb\u003eSupplementary Fig.\u0026nbsp;8\u003c/b\u003e). We found that pSc119.2 (3.5 Mb,representing 0.05% of the genome size), pSc200 (226.9 Mb, 3.4%) and pSc250 (29.6 Mb, 0.44%) occupied the majority of these regions, particularly in the subtelomeric regions (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003ea \u003cb\u003eand Supplementary Fig.\u0026nbsp;9\u003c/b\u003e). To test the accuracy of these repeats, we also performed FISH experiments by using mixed probes, pSc200 and pSc250 as localization markers (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eb \u003cb\u003eand Supplementary Fig.\u0026nbsp;7b\u003c/b\u003e). By correlating the repeat and chromosome painting signals, we were able to determine the specific distribution of pSc200 and pSc250 on rye chromosomes. The observed FISH signals aligned well with the predicted distribution from the assembly, further confirming the consistency between the experimental results and computational simulations (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eb). These results indicate that Lo7_V3 significantly improved the anchoring of satellite sequences to the subtelomeric regions.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003eStructural analysis of seven complete centromeres\u003c/b\u003e \u003c/p\u003e \u003cp\u003eCentromeres are notoriously difficult to assemble due to their highly repetitive nature\u003csup\u003e\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e\u003c/sup\u003e. Previous analyses of rye centromeres have primarily relied on two published genome assemblies, both of which lack fully assembled centromeres for all seven chromosomes\u003csup\u003e\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e,\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e\u003c/sup\u003e. In Lo7_V2, centromere-associated regions were fragmented and mapped to unanchored region (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003ea,b). However, based on the CENH3 ChIP-seq data, the Lo7_V3 assembly achieved a significant breakthrough by successfully assembling all seven centromeres, each accurately anchored to its respective chromosome (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003ea, \u003cb\u003eSupplementary Table\u0026nbsp;7 and Supplementary Fig.\u0026nbsp;10\u003c/b\u003e). We presented the predicted functional centromere lengths for chromosomes 1R to 7R to be in the range of approximately 10\u0026ndash;12 Mb (\u003cb\u003eSupplementary Table\u0026nbsp;8\u003c/b\u003e).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eGene content analysis identified eleven HC genes within five of seven rye centromeres. Centromeres of the chromosome 4R and 6R were lacking any evidence of functional genes (\u003cb\u003eSupplementary Table\u0026nbsp;9\u003c/b\u003e). Notably, the chromosome 1R centromere contained a gene encoding translation initiation factor eIF-2B subunit beta-like protein. To further validate the accuracy of genes annotated in the centromeric regions, we also integrated data from Transposase-Accessible Chromatin sequencing (ATAC-seq), 5mCpG methylation, HiFi and ONT read mapping, TE annotation and examined transcriptomic support for the annotation (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eb \u003cb\u003eand Supplementary Fig.\u0026nbsp;11\u003c/b\u003e). These results revealed that accessible chromatin regions are close to the transcription site (within an accessible chromatin region) with low DNA methylation. RNA-seq data further indicated that \u003cem\u003eeIF-2B subunit beta-like\u003c/em\u003e gene is transcriptionally active across the examined tissues, including spike, root, and aerial organ, supporting its correct annotation (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eb \u003cb\u003eand Supplementary Table\u0026nbsp;9\u003c/b\u003e).\u003c/p\u003e \u003cp\u003e \u003cb\u003eRetrotransposon dynamics in rye centromeres\u003c/b\u003e \u003c/p\u003e \u003cp\u003eConsistent with previous findings, we performed a comprehensive TE analysis on our genome assembly. Our results revealed that LTR retrotransposons (64.79%), particularly those belonging to the \u003cem\u003eGypsy\u003c/em\u003e superfamily (37.04%), are the predominant TEs in the genome (\u003cb\u003eSupplementary Table\u0026nbsp;6\u003c/b\u003e). Notably, \u003cem\u003eGypsy\u003c/em\u003e elements exhibit a strong centromeric enrichment, suggesting their significant role in the structural organization and evolution of centromeric regions (\u003cb\u003eSupplementary Fig.\u0026nbsp;11\u003c/b\u003e). In contrast, \u003cem\u003eCopia\u003c/em\u003e elements (6.58%) were more broadly distributed across the chromosome arms (\u003cb\u003eSupplementary Fig.\u0026nbsp;11\u003c/b\u003e). To reveal the evolution of centromere-specific retrotransposable elements in rye, we identified three families (\u003cem\u003eRLG\u003c/em\u003e_\u003cem\u003eCereba\u003c/em\u003e, \u003cem\u003eRLG_Quinta\u003c/em\u003e and \u003cem\u003eRLG\u003c/em\u003e_\u003cem\u003eAbia\u003c/em\u003e) within the seven complete centromeres of Lo7 (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003ea). We found \u003cem\u003eRLG_Cereba\u003c/em\u003e, an element family well known to be present in the centromeres of the tribe \u003cem\u003eTriticeae\u003c/em\u003e\u003csup\u003e\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e,\u003cspan additionalcitationids=\"CR32\" citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e\u003c/sup\u003e, and its non-autonomous partner family \u003cem\u003eRLG\u003c/em\u003e_\u003cem\u003eQuinta\u003c/em\u003e\u003csup\u003e\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e\u003c/sup\u003e. In addition, we identified a recently highly active family in rye, \u003cem\u003eRLG_Abia\u003c/em\u003e, which was previously found in wheat centromeres, but at very low abundance\u003csup\u003e\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e\u003c/sup\u003e. Using a TE population analysis pipeline\u003csup\u003e\u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e\u003c/sup\u003e, we identified 2,098, 1,298 and 1,602 full length elements of \u003cem\u003eRLG_Cereba\u003c/em\u003e, \u003cem\u003eRLG_Quinta\u003c/em\u003e and \u003cem\u003eRLG_Abia\u003c/em\u003e, respectively. The three TE families were highly enriched in centromeric and peri-centromeric regions (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003ea).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eInsertion ages of all retrotransposon copies were estimated based on divergence of their LTRs. We found the youngest copies of these families consistently to be located in the same regions where we identified the functional centromeres using CENH3 ChIP-seq analysis (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e \u003cb\u003eand Supplementary Fig.\u0026nbsp;12\u003c/b\u003e). This indicates that the centromeric retrotransposons actively target the functional centromere, similar to previous findings in wheat\u003csup\u003e\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e,\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e\u003c/sup\u003e. As a consequence, the older copies are pushed away from the active centromere over time. The example of the centromere of chromosome 4R (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eb) shows this gradual, passive movement outwards of older insertions through insertions of new copies in the region of the functional centromere.\u003c/p\u003e \u003cp\u003eInterestingly, we identified multiple shifts of the centromere on chromosome 7R. Based on TE insertion ages and sites we propose that these shifts are the result of a series of inversions: The first inversion, here named Inversion A, occurred\u0026thinsp;~\u0026thinsp;0.6\u0026nbsp;million years ago (mya) and led to the establishment of a new centromere in the region from around 416 to 422.5 Mb (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003ec). The second one, Inversion B, took place around 0.25 mya, when a segment of the previous centromere was brought closer to the position of the current functional centromere. Lastly, an inversion of ~\u0026thinsp;5 Mb (Inversion C), took place\u0026thinsp;~\u0026thinsp;0.2 mya (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003ed).\u003c/p\u003e \u003cp\u003eFinally, although less prominent, we also found breaks in the continuity of TE insertion sites and ages in other centromeres of Lo7, for example in chromosomes 2R, 3R and 6R (\u003cb\u003eSupplementary Fig.\u0026nbsp;12\u003c/b\u003e), suggesting that these also underwent similar centromere shifts in the past. However, the insertion site and age patterns are less clear and did not allow a reconstruction or dating of the events. A main difference to the centromeres of wheat is very high abundance of \u003cem\u003eRLG_Abia\u003c/em\u003e elements. While in wheat, they were silent for a long time and were only found in traces, the \u003cem\u003eRLG_Abia\u003c/em\u003e family seems currently highly active in rye. This indicates that centromeres, even between very closely related species, can take different evolutionary trajectories and that different types of TEs may change their activity levels over relatively short evolutionary time periods.\u003c/p\u003e \u003cp\u003e \u003cb\u003eSequence diversity of wheat/rye 1RS introgressions is low compared to the rye genepool\u003c/b\u003e \u003c/p\u003e \u003cp\u003eRecent wheat pan-genome studies have highlighted the crucial role of the rye 1BL/1RS translocation in modern wheat programs\u003csup\u003e\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e\u003c/sup\u003e. High-quality wheat genome assemblies have detailed syntenic analyses of 1RS, facilitating comparisons between rye and multiple wheat accessions carrying this translocation. To investigate 1RS genetic diversity between rye and wheat, we analyzed\u0026thinsp;~\u0026thinsp;300 Mb of 1RS/1BS sequences from multiple assemblies, including rye accessions Weining and Lo7, as well as 1BL/1RS carrying wheat varieties Z8425B, HD6172, ZM16, ZM22, AMN, KF11, S4185, and Aikang58\u003csup\u003e2,35\u0026ndash;37\u003c/sup\u003e. Syntenic analysis revealed limited 1RS diversity among wheat accessions (\u003cb\u003eSupplementary Fig.\u0026nbsp;13\u003c/b\u003e). Collinearity analysis of 1RS suggested this homogeneity may result from the widespread use of early translocation lines in wheat breeding. In contrast, the two rye accessions showed substantially greater diversity in this region (\u003cb\u003eSupplementary Table\u0026nbsp;10\u003c/b\u003e). Notably, resistance genes such as \u003cem\u003ePm8\u003c/em\u003e and \u003cem\u003eYr9\u003c/em\u003e located on 1RS have been successfully deployed in wheat breeding, providing strong disease resistance\u003csup\u003e\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e,\u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e,\u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e\u003c/sup\u003e. We further updated the annotation of resistance (R) genes containing NLR domains based on the whole-genome assembly of Lo7 (Lo7_V3), providing a clearer view of the genetic landscape of the 1RS chromosome arm (\u003cb\u003eSupplementary Fig.\u0026nbsp;14\u003c/b\u003e). These observations underscore the potential value of introducing more diverse rye 1RS into wheat and triticale breeding programs to enhance genetic variation.\u003c/p\u003e"},{"header":"Discussion","content":"\u003cp\u003eIn this study, we present an improved and updated chromosome-scale reference genome assembly for the rye inbred line Lo7, representing a major milestone in crop genomics. This work makes rye more accessible to genomic research and a valuable resource for crop improvement. The Lo7 genome assembly is near-complete, however not T2T, reflecting a significant advancement, especially given the challenges of rye's high repeat content, structural variability, and large genome size. This achievement is especially relevant in the context of complex crop genomes, as evidenced by recent near-complete or T2T genome assemblies of wheat\u003csup\u003e\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e,\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e\u003c/sup\u003e. The integration of multiple sequencing technologies not only improves assembly quality but also paves the way for further genomic advances in rye\u003csup\u003e\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e\u003c/sup\u003e. Looking ahead, further advancements will focus on integrating more diverse genomic data sources, such as transcriptomics, to enhance genome annotations. Additionally, more comprehensive gene family analyses, including those related to flowering and vernalization, will provide valuable insights into crop evolution, particularly in the context of wheat and other cereal crops.\u003c/p\u003e \u003cp\u003eComparative analysis with Lo7_V2 underscores the critical role of high-fidelity long read data in revealing the true complexity of rye’s genomic architecture. In this study, about 14% of the genome was annotated as “repeat fragment”. We speculate that a substantial portion of these fragments may correspond to short tandem repeats (STRs), which were not fully reconstructed or classified by the repeat annotation tools due to their short length, high copy number, or structural variability. The accuracy of these regions was further validated through FISH, confirming the structural integrity of repetitive regions. Specifically, the subtelomeric regions of 1RL and 6RS remain incomplete, and contig gaps are still present across the genome. These gaps, caused by the presence of repetitive sequences, have hindered the full assembly of subtelomeres or the majority of telomeres. Additionally, current technologies such as optical genome mapping and Hi-C scaffolding remain insufficient for accurately resolving orientation errors in contigs located in subtelomeric regions, particularly those enriched with satellite sequences. These gaps could be addressed in the future by incorporating additional ultra-long ONT sequencing reads to improve genome continuity and completeness. In this context, the highly repetitive supernumerary B chromosome of rye also offers a unique system to explore the accumulation and activity of transposable elements beyond the standard A chromosomes\u003csup\u003e\u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e40\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eWe assembled the complete centromeric regions in rye and conducted an in-depth structural analysis. We identified centromere-located HC genes supported by expression data and estimated the size of active (CENH3-interacting) centromere regions. In particular, we performed evolutionary analyses within the \u003cem\u003eGypsy\u003c/em\u003e superfamily and identified centromere-enriched subfamilies. Among these, the highly active \u003cem\u003eRLG_Abia\u003c/em\u003e family, was discovered, highlighting a potentially rye-specific expansion of centromeric retrotransposons. Similar patterns have recently been reported in \u003cem\u003eOats\u003c/em\u003e (\u003cem\u003eAvena sativa\u003c/em\u003e)\u003csup\u003e\u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e\u003c/sup\u003e and other grass genomes, supporting the notion that \u003cem\u003eRLG_Cereba\u003c/em\u003e and \u003cem\u003eRLG_Abia\u003c/em\u003e retrotransposons have been competing for the centromeric niche over millions of years of grass genome evolution. As a next step, we plan to conduct a rye pan-genome analysis by incorporating high-quality genome assemblies from diverse genotypes. This will enable us to investigate the evolutionary dynamics of centromeric transposable elements and their contribution to genomic diversity.\u003c/p\u003e \u003cp\u003eAlthough we compared multiple 1RS of various rye and 1RS/1BL translocation lines, we found that the genetic diversity of the introgressed rye segment utilized in wheat remains remarkably narrow. This documents that the 1RS chromosomal fragments historically introduced into wheat breeding were derived from a very limited genetic pool, leaving much of rye’s genetic potential underexplored in wheat improvement\u003csup\u003e\u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e\u003c/sup\u003e. Looking forward, the development of novel translocation lines carrying diverse rye chromosome segments may hold great promise for broadening the genetic base of wheat and triticale breeding and for delivering new insights into resistance improvement.\u003c/p\u003e "},{"header":"Methods","content":"\u003cp\u003e\u003cstrong\u003ePlant materials and growth conditions\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003ePlants were grown under controlled conditions in the green house with a 16-hour light/8-hour dark photoperiod at 25\u0026deg;C (day) and 18\u0026deg;C (night). Humidity was maintained at 60%, and plants were irrigated with a nutrient solution weekly. Samples for DNA extraction were collected from young leaf tissues (2\u0026ndash;3 weeks).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eHMW DNA extraction, library preparation, PacBio HiFi and Oxford Nanopore sequencing\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eHigh molecular weight (HMW) DNA was extracted from a single Lo7 plant using the MACHEREY-NAGEL NucleoBond HMW DNA kit, following the manufacturer\u0026rsquo;s protocol. The extracted HMW DNA was quality-controlled for concentration using the Qubit 2.0\u0026reg; dsDNA HS assay (Invitrogen, Waltham, USA) and for fragment size distribution using the Agilent Femto Pulse System (Agilent Technologies, Santa Clara, USA).\u003c/p\u003e\n\u003cp\u003eFor HiFi sequencing, the HMW DNA was fragmented into ~\u0026thinsp;20 kb fragments using a Megaruptor 3 device (Diagenode) at speed setting 30. HiFi SMRTbell libraries were then prepared following the Pacific Biosciences SMRTbell Express Template Prep Kit 3.0 protocol. The final libraries were size-selected within a narrow 17\u0026ndash;20 kb range using the SageELF system with a 0.75% Agarose Gel Cassette (Sage Science), following the manufacturer\u0026rsquo;s guidelines. HiFi circular consensus sequencing (CCS) reads were generated using the PacBio Revio platform (Pacific Biosciences) in accordance with the standard operating procedure.\u003c/p\u003e\n\u003cp\u003eFor ONT reads, HMW DNA fragments were size-selected using the Short Read Eliminator (SRE) kit (Pacific Biosciences, Menlo Park, USA) with a 25 kb cut-off. Libraries were generated using the SQK-LSK114 Ligation Sequencing Kit V14 (Oxford Nanopore Technologies, Oxford, UK) following the manufacturer\u0026rsquo;s protocol with 2\u0026ndash;3 \u0026micro;g size selected HMW DNA as input. Sequencing was done on R10.4.1 PromethION (FLO-PRO114M) flow cells (Oxford Nanopore Technologies, Oxford, UK) with a run-time of 72\u0026ndash;96 hours and raw data in the .pod5 format was acquired using MinKNOW (versions 23.11 and 24.02). Basecalling was performed with Dorado (version 0.9.1) using the [email protected] super-accurate basecalling model with the \u0026ldquo;--min-qscore 20\u0026rdquo; parameter to yield basecalled reads in the .bam format. Reads were then converted to the .fastq.gz format using Samtools\u003csup\u003e\u003cspan class=\"CitationRef\"\u003e43\u003c/span\u003e\u003c/sup\u003e and filtered for a minimum length of 25 kb using SeqKit\u003csup\u003e\u003cspan class=\"CitationRef\"\u003e44\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eHi-C sequencing\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eFor Hi-C sequencing, we followed a protocol similar to that used in previous barley Hi-C studies, with slight modifications to optimize for rye\u003csup\u003e\u003cspan class=\"CitationRef\"\u003e45\u003c/span\u003e\u003c/sup\u003e. Briefly, plant tissues were cross-linked using formaldehyde, and chromatin was fragmented by restriction enzyme (DpnII) digestion. The resulting DNA fragments were ligated with a biotinylated bridge linker, which allowed for the capture of ligation products that represent physically interacting chromatin regions. These biotinylated ligation products were then enriched through streptavidin beads and processed for paired-end sequencing. The sequencing data were used to generate a high-resolution contact map, revealing chromatin interactions and facilitating the construction of chromosome-scale scaffolds. Sequencing and Hi-C raw data processing was performed as described before\u003csup\u003e\u003cspan class=\"CitationRef\"\u003e45\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eGenome assembly and evaluation\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003ePacBio HiFi reads were assembled using hifiasm (v.0.19.9)\u003csup\u003e\u003cspan class=\"CitationRef\"\u003e28\u003c/span\u003e\u003c/sup\u003e. Pseudomolecule construction was done with the TRITEX long-read assembly pipeline, \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://tritexassembly.bitbucket.io/\u003c/span\u003e\u003c/span\u003e\u003csup\u003e27\u003c/sup\u003e. Chimeric contigs and orientation errors were identified through manual inspection of Hi-C contact matrices. Genome completeness and consensus accuracy were evaluated using Merqury (v.1.3)\u003csup\u003e\u003cspan class=\"CitationRef\"\u003e46\u003c/span\u003e\u003c/sup\u003e. Levels of duplication and heterozygosity were assessed with Merqury and GenomeScope 2.0\u003csup\u003e47\u003c/sup\u003e. Gene annotation was performed using the BUSCO (v.5.0.0) tool (Sim\u0026atilde;o et al., 2015), which was used to evaluate the completeness of gene models by assessing the presence of conserved core genes in the genome. Finally, contigs containing rye chloroplast sequences (NC_021761.1) were removed from the unanchored regions, resulting in the final pseudomolecules.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eOptical genome mapping\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eA total of 1.8 milion cell nuclei, purified from root tips of rye Lo7 seedlings by flow cytometry, were embedded in agarose miniplugs and treated by proteinase K as described\u003csup\u003e\u003cspan class=\"CitationRef\"\u003e48\u003c/span\u003e\u003c/sup\u003e. Resulting 548 ng of ultraHMW DNA were labelled at DLE-1 recognition sites (CTTAAG motive) and stained following Bionano Prep Direct Label and Stain-G2 protocol (Bionano, San Diego, USA). The labelled molecules were analyzed on the Saphyr platform of Bionano. Total of 1500 Gbp single-molecule data greater than 150 kb, corresponding to 190x coverage of the rye Lo7 genome, was used to generate de novo optical genome map (OGM) by Bionano Solve (v.3.6.1_11162020), applying \u0026ldquo;optArguments_nonhaplotype_noES_noCut_DLE1_saphyr.xml\u0026rdquo; parameters (complete statistics in Figure S3). To validate the sequence assembly of the Lo7 genome, we aligned the OGM to the sequence in Access (v.1.7.1, Bionano). Identified mismatches between the OGM and the pseudomolecules are provided in Supplementary Table\u0026nbsp;11.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e5mCpG methylation\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eHiFi reads were aligned to the pseudomolecules using ccsmeth\u003csup\u003e\u003cspan class=\"CitationRef\"\u003e49\u003c/span\u003e\u003c/sup\u003e with default parameters (align_hifi). Methylation calling was performed using pb-CpG-tools based on the resulting BAM files. All bigWig files were visualized in the Integrative Genomics Viewer (IGV)\u003csup\u003e\u003cspan class=\"CitationRef\"\u003e50\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eGene model prediction\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe performed de novo structural gene prediction, confidence classification, and functional annotation, following the protocol described by the previous publication\u003csup\u003e\u003cspan class=\"CitationRef\"\u003e51\u003c/span\u003e\u003c/sup\u003e. The strategy applied in this study only differs in the use of Helixer\u003csup\u003e\u003cspan class=\"CitationRef\"\u003e52\u003c/span\u003e\u003c/sup\u003e, run with standard parameters, as an additional ab initio input source for Evidence Modeller (weight set to 10). The access to all RNA-seq and Iso-seq datasets is described in the previous publication\u003csup\u003e\u003cspan class=\"CitationRef\"\u003e5\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eTransposable element annotation using EDTA\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTo annotate transposable elements (TEs) in the genome, we used the Extensive De Novo TE Annotator (EDTA, v2.2.2) pipeline with the curated library TREP (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://trep-db.uzh.ch/\u003c/span\u003e\u003c/span\u003e), a comprehensive tool for identifying and classifying TEs\u003csup\u003e\u003cspan class=\"CitationRef\"\u003e53\u003c/span\u003e\u003c/sup\u003e. First, we preprocessed the genome assembly by soft-masking low-complexity regions and simple repeats. Next, EDTA was run with default parameters to detect and classify TEs into different categories: long terminal repeat (LTR) retrotransposons, DNA transposons (TIR and Helitron), and other repeat elements. The proportion of the genome occupied by TEs was then calculated, and TEs were classified into superfamilies.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAnalysis of centromeric TEs\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe identified full-length elements by following the pipeline described\u003csup\u003e\u003cspan class=\"CitationRef\"\u003e34\u003c/span\u003e\u003c/sup\u003e. First, we extracted\u0026thinsp;~\u0026thinsp;100 LTRs per family, which we aligned using ClustalW with standard settings. If we identified distinct groups of LTR variants (likely representing TE subfamilies) in the alignment, these groups were separately re-aligned to produce consensus sequences for individual subfamilies. These consensus sequences were used in blastn searches against the genome. An identified element was classified as full-length if a pair of LTRs was found in the same orientation within a range of \u0026plusmn;\u0026thinsp;1000 bp of the length of the TE family consensus sequence and with no more than 5 bp missing in at the LTR ends. Elements on the extrema of the size distribution were discarded to remove elements with large insertions or deletions. The resulting full-length elements were used for further analysis. The estimation of insertion ages was performed as previously described\u003csup\u003e\u003cspan class=\"CitationRef\"\u003e34\u003c/span\u003e\u003c/sup\u003e. The full-length elements were aligned with the initial TE family consensus using the program Water (EMBOSS package obtained from ubuntu.com) with a gap extension penalty of 0.1 and a gap opening penalty of 50. The resulting pairwise comparisons were combined into a variant call format (vcf) file using an in-house script\u003csup\u003e\u003cspan class=\"CitationRef\"\u003e34\u003c/span\u003e\u003c/sup\u003e. We filtered for sequence variants with an allele frequency of more than 5%. Next, we used R for further analysis, data preparation and visualization.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eSynteny analysis\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eFor the analysis of synteny between genome sequences, we used plotsr\u003csup\u003e\u003cspan class=\"CitationRef\"\u003e54\u003c/span\u003e\u003c/sup\u003e to visualize and quantify synteny between different genome sequences. First, we performed pairwise sequence alignment using minimap2\u003csup\u003e55\u003c/sup\u003e to align the genomic sequences of interest, such as rye Lo7_V3 and other related species. All variations were called using SyRI\u003csup\u003e\u003cspan class=\"CitationRef\"\u003e56\u003c/span\u003e\u003c/sup\u003e. The resulting .out files were then processed with plotsr\u003csup\u003e\u003cspan class=\"CitationRef\"\u003e54\u003c/span\u003e\u003c/sup\u003e to generate syntenic plots, allowing us to examine the collinearity and structural conservation between genomes. The syntenic blocks were identified, and the degree of synteny was quantified. For collinearity plot, we made the plot using R.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAnalysis of satellite sequences\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTo visualize the distribution of tandem repeats pSc200 (GenBank: Z54189.1), pSc250 (GenBank: Z50040.1) and pSC119.2 (GenBank: KF719093.1) on the assembly, the seven pseudomolecules were fragmented into 100-bp sequences and aligned to pSc200, pSc250 and pSC119.2 using Bowtie2 (v2.5.0, default)\u003csup\u003e\u003cspan class=\"CitationRef\"\u003e57\u003c/span\u003e\u003c/sup\u003e. The aligned reads were saved in separate files, and pSc200/250/119.2-positive reads were subsequently mapped back to the assembly using Bowtie2. The resulting BAM file was converted into a bigWig file with bamCoverage from deepTools2\u003csup\u003e58\u003c/sup\u003e using a bin size of 10 kb. For 5S rDNA (GenBank: AY841027.1) and 45S rDNA (GenBank: KF482106.1), the relevant analysis was performed using the same methods from above.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFluorescence in situ hybridization (FISH)\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003ePredicted probe annealing patterns were obtained using homology searches with 30,982 FISH probe sequences as queries and each rye genome assembly as a subject in turn (Lo7_V2 and Lo7_V3). The homology search and subsequent data handling and visualization were conducted directly from R with the using the BioDT package; The workflow is available at github.com/mtrw/Lo7v2_inSilicoFISH, to recreate the analysis presented here scripts should be run in the order 1_run_blast.R, 2_count_gBins.R and 3a_plot_bins.R. In brief, BLAST (v2.14.0) is called with preset parameters for short queries (argument \u0026lsquo;-task blastn-short\u0026rsquo;). Lo7_V3 assembly was divided into 1 Mb bins at 500 kb intervals. The sum of all bitscores of all probe sequence alignments overlapping each bin are calculated as an approximate indicator of the relative amount of probe binding that might be expected at that genomic region. Plots are constructed using R\u0026rsquo;s base plotting functions.\u003c/p\u003e\n\u003cp\u003eFor the preparation of 5\u0026ndash;10 kb long chromosome segment-specific barcoding FISH probes non-overlapping, single-copy target-specific oligonucleotides were selected and synthesized as myTAGs\u0026reg; Labeled Libraries (Daicel Arbor Bioscience, Ann Arbor, MI, USA). The pooled oligos were labelled with either Atto 594 (red) or Alexa 488 (green). A dropping method\u003csup\u003e\u003cspan class=\"CitationRef\"\u003e59\u003c/span\u003e\u003c/sup\u003e was used to prepare mitotic metaphase speads. FISH was performed as described earlier\u003csup\u003e\u003cspan class=\"CitationRef\"\u003e60\u003c/span\u003e\u003c/sup\u003e with minor alterations: 20 \u0026micro;l hybridization mixture per slide contained 50% deionized formamide, 25% 20\u0026times; SSC, 1 mM Tris\u0026ndash;HCL pH 8.0, 1 \u0026micro;l (400 ng) Atto 488 or Atto 594 labelled oligo chromosome painting probes, 1 \u0026micro;l (25 ng) Cy5-labelled oligo probes pSc200 and pSc250\u003csup\u003e61\u003c/sup\u003e, 10 \u0026micro;g/ml salmon sperm DNA, and 0.5 M EDTA. Hybridization mixture was denatured together with the chromosomal DNA on a hot plate at 80\u0026deg;C for 2 min. Hybridization at 37\u0026deg;C was performed for 20 h in a moist chamber. Subsequently, slides were washed in 2\u0026times; SSC at room temperature for 1 minute to remove coverslips, for 20 min at 58\u0026deg;C and dehydrated in an ethanol series (70, 90 and 96%). Finally, the slides were air dried and counterstained with 1 \u0026micro;g/ml 1 4\u0026prime;,6-diamidino-2-phenylindole (DAPI) in Vectashield (Vector Laboratories, \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://vectorlabs.com/\u003c/span\u003e\u003c/span\u003e). Images were acquired with an epifluorescence microscope BX61 (Olympus, \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://www.olympus.fi/medical/en/microscopy\u003c/span\u003e\u003c/span\u003e) using a cooled charge coupled device (CCD) camera (Orca ER, Hamamatsu, \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ewww.hamamatsu.com\u003c/span\u003e\u003c/span\u003e). Pictures were processed and merged using Adobe Photoshop (Adobe Systems Incorporated, USA, \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://www.adobe.com\u003c/span\u003e\u003c/span\u003e).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eIdentification of centromeric regions\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eCentromeric regions were identified by integrating previously published CENH3 ChIP-seq data\u003csup\u003e\u003cspan class=\"CitationRef\"\u003e22\u003c/span\u003e\u003c/sup\u003e, which specifically targets centromeric histone H3 variant (CENH3) binding sites. The CENH3 ChIP-seq data were first aligned to the rye genome using Bowtie2\u003csup\u003e57\u003c/sup\u003e. The resulting peaks were visualized on the genome using IGV\u003csup\u003e\u003cspan class=\"CitationRef\"\u003e50\u003c/span\u003e\u003c/sup\u003e, enabling a clear representation of centromeric regions across chromosomes. To further refine the centromeric region boundaries, we used deepTools2\u003csup\u003e58\u003c/sup\u003e to generate a 10 kb resolution coverage plot, providing an overview of centromeric regions' accessibility and distribution. Peaks corresponding to CENH3 binding were identified using MACS2 (Model-based Analysis of ChIP-Seq)\u003csup\u003e\u003cspan class=\"CitationRef\"\u003e62\u003c/span\u003e\u003c/sup\u003e, which allows for the precise detection of enriched regions.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eATAC sequencing\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eATAC-seq was performed using either fresh (native) or crosslinked leaf tissues from three-week-old seedlings. For the native sample, fresh leaves were finely chopped with a razor blade in nuclei isolation buffer (0.25 M sucrose, 10 mM Tris-HCl pH 8.0, 10 mM MgCl\u003csub\u003e2\u003c/sub\u003e, 1% Trion X-100, 5 mM \u0026beta;-Mercaptoethanol) supplemented with 1x Halt\u0026trade; Protease Inhibitor Cocktail (Thermo Scientific). The resulting slurry was filtered through a 50-\u0026micro;m cell strainer, and the nuclei were washed twice with the same buffer before being resuspended. For the native sample, an aliquot was analyzed by flow cytometry for quality control and quantification. Based on the quantification, approximately 75,000 nuclei were aliquoted, and the nuclei pellet was collected by centrifugation. For the crosslinked sample, fresh leaves were fixed under vacuum in 1% formaldehyde (Sigma-Aldrich 252549) for 20 min. Fixation was stopped by incubating the sample in 0.125 M glycine for 5 min. The tissues were frozen before nuclei isolation, following the same protocol as the native sample. After isolation, 75,000 nuclei were sorted by flow cytometry and incubated at 60\u0026deg;C for 5 min. For both crosslinked (ATAC-seq1) and native (ATAC-seq2) samples, the nuclei pellet was resuspended in a transposition reaction mix containing Tagment DNA Enzyme (TDE1, Illumina, 20034197) and incubated at 37\u0026deg;C for 30 min. Crosslinked samples (ATAC-seq1) were subsequently incubated 65\u0026deg;C overnight in SDS buffer (50 mM Tris-HCl, pH 8.0, 1% SDS, 10 mM EDTA). Transposition products were purified using the MinElute PCR Purification Kit (QIAGEN, 28004). Libraries were then amplified with the NEBNext\u0026reg; High-Fidelity 2x PCR Master Mix (NEB, M0541), and further purified using VAHTSTM DNA Clean Beads (Vazyme, N411). The final libraries were sequenced in paired-end mode (2x 151 cycles) on the Illumina NovaSeq 6000 (Illumina Inc., San Diego, CA, USA). Sequencing reads were adapter-trimmed using fastp (v0.20.0)\u003csup\u003e\u003cspan class=\"CitationRef\"\u003e63\u003c/span\u003e\u003c/sup\u003e and aligned to the reference genome with BWA-MEM (v0.7.17)\u003csup\u003e\u003cspan class=\"CitationRef\"\u003e64\u003c/span\u003e\u003c/sup\u003e. SAMtools (v1.16.1)\u003csup\u003e\u003cspan class=\"CitationRef\"\u003e43\u003c/span\u003e\u003c/sup\u003e was used to filter out duplicate and multi-mapping reads (-q 30), followed by peak calling (-q 0.01) with MACS2 (v3.0.0)\u003csup\u003e\u003cspan class=\"CitationRef\"\u003e62\u003c/span\u003e\u003c/sup\u003e. For visualization, BAM files containing uniquely mapped reads were converted to bigWig format using deepTools bamCoverage (v3.5.1)\u003csup\u003e\u003cspan class=\"CitationRef\"\u003e58\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eFunding\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis study was supported by core funding of Leibniz Institute of Plant Genetics and Crop Plant Research (IPK). A.H. was supported by the German Research Foundation (DFG), project 559365585.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConflict of interest\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors declare they have no conflict of interest.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAcknowledgments\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe thank Sylvia Swetik, Manuela Knauft, Jacqueline Pohl and Mary Ziems for technical support in plant material management and DNA sequencing. We thank Pascal Jaroschinsky for his assistance with ATAC sequencing, Jörg Fuchs for his help with flow cytometry, and Katrin Kumke for performing the FISH experiment. We thank Jens Bauernfeind and Danuta Schueler for sequencing data management. We gratefully acknowledge Anne Fiebig for data submission. We gratefully acknowledge Dr. Andres Gordillo of KWS for providing the Lo7 seeds.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthor Contributions\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eN.S. conceived the project. E.C. completed the genome assembly and performed most of the downstream computational analyses. E.C., Axel Himmelbach, S.J., and Z.Z. performed the sequencing experiments. J.F. provide the technical support for genome assembly. C.M.W. and T.W. performed the TE analysis. M.T.R., J.C. and Andreas Houben contributed to the FISH and satellite sequence analyses. T.L. and M.S. contributed to gene annotation. Z.T. and H.Š. did the optical genome mapping. M.M., H.S., M.S., T.W., Andreas Houben, and N.S. supervised the project. E.C. drafted the manuscript with inputs from C.M.W., T.W. and Andreas Houben. All authors reviewed and revised the manuscript.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eData Availability\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eGenome assembly, sequencing data, ATAC,\u0026nbsp;AGP file\u003c/p\u003e\n\u003cp\u003eAll the sequence data collected in this study have been deposited at the European Nucleotide Archive\u003csup\u003e65\u003c/sup\u003e (ENA). Sequence assemblies and gene annotations have been submitted to ENA.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eMartis MM et al (2013) Reticulate evolution of the rye genome. Plant Cell 25:3685\u0026ndash;3698. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1105/tpc.113.114553\u003c/span\u003e\u003cspan address=\"10.1105/tpc.113.114553\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJiao C et al (2025) Pan-genome bridges wheat structural variations with habitat and breeding. Nature 637:384\u0026ndash;393. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1038/s41586-024-08277-0\u003c/span\u003e\u003cspan address=\"10.1038/s41586-024-08277-0\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMago R et al (2005) High-resolution mapping and mutation analysis separate the rust resistance genes Sr31, Lr26 and Yr9 on the short arm of rye chromosome 1. Theor Appl Genet 112:41\u0026ndash;50. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1007/s00122-005-0098-9\u003c/span\u003e\u003cspan address=\"10.1007/s00122-005-0098-9\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHurni S et al (2013) Rye Pm8 and wheat Pm3 are orthologous genes and show evolutionary conservation of resistance function against powdery mildew. Plant J 76:957\u0026ndash;969. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1111/tpj.12345\u003c/span\u003e\u003cspan address=\"10.1111/tpj.12345\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRabanus-Wallace MT et al (2021) Chromosome-scale genome assembly provides insights into rye biology, evolution and agronomic potential. Nat Genet 53:564\u0026ndash;573. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1038/s41588-021-00807-0\u003c/span\u003e\u003cspan address=\"10.1038/s41588-021-00807-0\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi G et al (2021) A high-quality genome assembly highlights rye genomic characteristics and agronomically important genes. Nat Genet 53:574\u0026ndash;584. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1038/s41588-021-00808-z\u003c/span\u003e\u003cspan address=\"10.1038/s41588-021-00808-z\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLiao WW et al (2023) A draft human pangenome reference. Nature 617:312\u0026ndash;324. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1038/s41586-023-05896-x\u003c/span\u003e\u003cspan address=\"10.1038/s41586-023-05896-x\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNaish M et al (2021) The genetic and epigenetic landscape of the Arabidopsis centromeres. Science 374:eabi7489. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1126/science.abi7489\u003c/span\u003e\u003cspan address=\"10.1126/science.abi7489\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eShang L et al (2023) A complete assembly of the rice Nipponbare reference genome. Mol Plant. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.molp.2023.08.003\u003c/span\u003e\u003cspan address=\"10.1016/j.molp.2023.08.003\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChen J et al (2023) A complete telomere-to-telomere assembly of the maize genome. Nat Genet 55:1221\u0026ndash;1231. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1038/s41588-023-01419-6\u003c/span\u003e\u003cspan address=\"10.1038/s41588-023-01419-6\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang C et al (2023) The T2T genome assembly of soybean cultivar ZH13 and its epigenetic landscapes. Mol Plant 16:1715\u0026ndash;1718. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.molp.2023.10.003\u003c/span\u003e\u003cspan address=\"10.1016/j.molp.2023.10.003\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi W et al (2025) A gap-free complete genome assembly of oat and OatOmics, a multi-omics database. Mol Plant 18:179\u0026ndash;182. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.molp.2025.01.006\u003c/span\u003e\u003cspan address=\"10.1016/j.molp.2025.01.006\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYan H et al (2025) Post-polyploidization centromere evolution in cotton. Nat Genet. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1038/s41588-025-02115-3\u003c/span\u003e\u003cspan address=\"10.1038/s41588-025-02115-3\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHuang G et al (2024) A telomere-to-telomere cotton genome assembly reveals centromere evolution and a Mutator transposon-linked module regulating embryo development. Nat Genet. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1038/s41588-024-01877-6\u003c/span\u003e\u003cspan address=\"10.1038/s41588-024-01877-6\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChen C et al (2025) A comprehensive omics resource and genetic tools for functional genomics research and genetic improvement of sorghum. Mol Plant. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.molp.2025.03.005\u003c/span\u003e\u003cspan address=\"10.1016/j.molp.2025.03.005\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTian Y et al (2025) The near-complete genome assembly of pickling cucumber and its mutation library illuminate cucumber functional genomics and genetic improvement. Mol Plant. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.molp.2025.03.001\u003c/span\u003e\u003cspan address=\"10.1016/j.molp.2025.03.001\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLiu S et al (2025) A telomere-to-telomere genome assembly coupled with multi-omic data provides insights into the evolution of hexaploid bread wheat. Nat Genet. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1038/s41588-025-02137-x\u003c/span\u003e\u003cspan address=\"10.1038/s41588-025-02137-x\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWicker T et al (2018) Impact of transposable elements on genome structure and evolution in bread wheat. Genome Biol 19:103. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1186/s13059-018-1479-0\u003c/span\u003e\u003cspan address=\"10.1186/s13059-018-1479-0\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGustafson JP, Dera AR, Petrovic S (1988) Expression of modified rye ribosomal RNA genes in wheat. Proc Natl Acad Sci U S A 85:3943\u0026ndash;3945. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1073/pnas.85.11.3943\u003c/span\u003e\u003cspan address=\"10.1073/pnas.85.11.3943\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGuo J et al (2019) Frequent variations in tandem repeats pSc200 and pSc119.2 cause rapid chromosome evolution of open-pollinated rye. Mol Breeding 39. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1007/s11032-019-1033-0\u003c/span\u003e\u003cspan address=\"10.1007/s11032-019-1033-0\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAlkhimova AG, Heslop-Harrison JS, Shchapova AI, Vershinin AV (1999) Rye chromosome variability in wheat-rye addition and substitution lines. Chromosome Res 7:205\u0026ndash;212. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1023/a:1009299300018\u003c/span\u003e\u003cspan address=\"10.1023/a:1009299300018\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLiu C et al (2024) Young retrotransposons and non-B DNA structures promote the establishment of dominant rye centromere in the 1RS.1BL fused centromere. New Phytol 241:607\u0026ndash;622. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1111/nph.19359\u003c/span\u003e\u003cspan address=\"10.1111/nph.19359\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFrancki MG (2001) Identification of Bilby, a diverged centromeric Ty1-copia retrotransposon family from cereal rye (Secale cereale L). Genome 44:266\u0026ndash;274. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1139/g00-112\u003c/span\u003e\u003cspan address=\"10.1139/g00-112\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAhmed HI et al (2023) Einkorn genomics sheds light on history of the oldest domesticated wheat. Nature. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1038/s41586-023-06389-7\u003c/span\u003e\u003cspan address=\"10.1038/s41586-023-06389-7\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHeuberger M et al (2024) Evolution of Einkorn wheat centromeres is driven by the mutualistic interplay of two LTR retrotransposons. Mob DNA 15:16. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1186/s13100-024-00326-9\u003c/span\u003e\u003cspan address=\"10.1186/s13100-024-00326-9\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBauer E et al (2017) Towards a whole-genome sequence for rye (Secale cereale L). Plant J 89:853\u0026ndash;869. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1111/tpj.13436\u003c/span\u003e\u003cspan address=\"10.1111/tpj.13436\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMarone MP, Singh HC, Pozniak CJ, Mascher M (2022) A technical guide to TRITEX, a computational pipeline for chromosome-scale sequence assembly of plant genomes. Plant Methods 18:128. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1186/s13007-022-00964-1\u003c/span\u003e\u003cspan address=\"10.1186/s13007-022-00964-1\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCheng H, Asri M, Lucas J, Koren S, Li H (2024) Scalable telomere-to-telomere assembly for diploid and polyploid genomes with double graph. Nat Methods 21:967\u0026ndash;970. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1038/s41592-024-02269-8\u003c/span\u003e\u003cspan address=\"10.1038/s41592-024-02269-8\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBousios A, Kakutani T, Henderson IR (2025) Centrophilic Retrotransposons of Plant Genomes. Annu Rev Plant Biol. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1146/annurev-arplant-083123-082220\u003c/span\u003e\u003cspan address=\"10.1146/annurev-arplant-083123-082220\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLiu C et al (2024) Unveiling the distinctive traits of functional rye centromeres: minisatellites, retrotransposons, and R-loop formation. Sci China Life Sci. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1007/s11427-023-2524-0\u003c/span\u003e\u003cspan address=\"10.1007/s11427-023-2524-0\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAhmed HI et al (2023) Einkorn genomics sheds light on history of the oldest domesticated wheat. Nature 620:830\u0026ndash;838. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1038/s41586-023-06389-7\u003c/span\u003e\u003cspan address=\"10.1038/s41586-023-06389-7\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMascher M et al (2017) A chromosome conformation capture ordered sequence of the barley genome. Nature 544:427\u0026ndash;433. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1038/nature22043\u003c/span\u003e\u003cspan address=\"10.1038/nature22043\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWicker T et al (2017) The repetitive landscape of the 5100 Mbp barley genome. Mob DNA 8:22. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1186/s13100-017-0102-3\u003c/span\u003e\u003cspan address=\"10.1186/s13100-017-0102-3\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWicker T et al (2022) Transposable Element Populations Shed Light on the Evolutionary History of Wheat and the Complex Co-Evolution of Autonomous and Non-Autonomous Retrotransposons. Adv Genet (Hoboken) 3:2100022. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1002/ggn2.202100022\u003c/span\u003e\u003cspan address=\"10.1002/ggn2.202100022\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi G et al (2025) Genomic analysis of Zhou8425B, a key founder parent, reveals its genetic contributions to elite agronomic traits in wheat breeding. Plant Commun 6:101222. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.xplc.2024.101222\u003c/span\u003e\u003cspan address=\"10.1016/j.xplc.2024.101222\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang Z et al (2025) Near-complete assembly and comprehensive annotation of the wheat Chinese Spring genome. Mol Plant. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.molp.2025.02.002\u003c/span\u003e\u003cspan address=\"10.1016/j.molp.2025.02.002\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJia J et al (2023) Genome resources for the elite bread wheat cultivar Aikang 58 and mining of elite homeologous haplotypes for accelerating wheat improvement. Mol Plant 16:1893\u0026ndash;1910. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.molp.2023.10.015\u003c/span\u003e\u003cspan address=\"10.1016/j.molp.2023.10.015\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang C et al (2025) The Yr9 gene encoding a CC-NBS-LRR protein in the 1RS-1BL translocation confers wheat stripe rust resistance. Sci China Life Sci. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1007/s11427-025-2929-6\u003c/span\u003e\u003cspan address=\"10.1007/s11427-025-2929-6\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYu Y et al (2025) Wheat stripe rust resistance gene Yr9, derived from rye, is a CC-NBS-LRR gene in a highly conserved NLR cluster. Sci China Life Sci. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1007/s11427-024-2932-5\u003c/span\u003e\u003cspan address=\"10.1007/s11427-024-2932-5\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChen J et al (2024) The genetic mechanism of B chromosome drive in rye illuminated by chromosome-scale assembly. Nat Commun 15:9686. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1038/s41467-024-53799-w\u003c/span\u003e\u003cspan address=\"10.1038/s41467-024-53799-w\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWehrkamp CM, Heuberger M, Wicker T (2025) Comparative analysis of centromeres of oat (\u003cem\u003eAvena sativa\u003c/em\u003e) and its tetraploid and diploid relatives reveals rapid evolution of centromere composition and architecture. \u003cem\u003ebioRxiv\u003c/em\u003e, 2025.2004.2008.647780. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1101/2025.04.08.647780\u003c/span\u003e\u003cspan address=\"10.1101/2025.04.08.647780\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSchlegel R, Korzun V (2006) About the origin of 1RS.1BL wheat-rye chromosome translocations from Germany. Plant Breeding 116:537\u0026ndash;540. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1111/j.1439-0523.1997.tb02186.x\u003c/span\u003e\u003cspan address=\"10.1111/j.1439-0523.1997.tb02186.x\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDanecek P et al (2021) Twelve years of SAMtools and BCFtools. \u003cem\u003eGigascience\u003c/em\u003e 10. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1093/gigascience/giab008\u003c/span\u003e\u003cspan address=\"10.1093/gigascience/giab008\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eShen W, Sipos B, Zhao L (2024) SeqKit2: A Swiss army knife for sequence and alignment processing. Imeta 3:e191. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1002/imt2.191\u003c/span\u003e\u003cspan address=\"10.1002/imt2.191\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJayakodi M et al (2020) The barley pan-genome reveals the hidden legacy of mutation breeding. Nature. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1038/s41586-020-2947-8\u003c/span\u003e\u003cspan address=\"10.1038/s41586-020-2947-8\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRhie A, Walenz BP, Koren S, Phillippy AM (2020) Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol 21:245. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1186/s13059-020-02134-9\u003c/span\u003e\u003cspan address=\"10.1186/s13059-020-02134-9\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRanallo-Benavidez TR, Jaron KS, Schatz MC (2020) GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat Commun 11:1432. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1038/s41467-020-14998-3\u003c/span\u003e\u003cspan address=\"10.1038/s41467-020-14998-3\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSimkova H, Tulpova Z, Capal P (2023) Flow Sorting-Assisted Optical Mapping. Methods Mol Biol 2672:465\u0026ndash;483. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1007/978-1-0716-3226-0_28\u003c/span\u003e\u003cspan address=\"10.1007/978-1-0716-3226-0_28\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTse OYO et al (2021) Genome-wide detection of cytosine methylation by single molecule real-time sequencing. Proc Natl Acad Sci U S A 118. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1073/pnas.2019768118\u003c/span\u003e\u003cspan address=\"10.1073/pnas.2019768118\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRobinson JT et al (2011) Integrative genomics viewer. Nat Biotechnol 29:24\u0026ndash;26. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1038/nbt.1754\u003c/span\u003e\u003cspan address=\"10.1038/nbt.1754\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMascher M et al (2021) Long-read sequence assembly: a technical evaluation in barley. Plant Cell 33:1888\u0026ndash;1906. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1093/plcell/koab077\u003c/span\u003e\u003cspan address=\"10.1093/plcell/koab077\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eStiehler F et al (2021) Helixer: cross-species gene annotation of large eukaryotic genomes using deep learning. Bioinformatics 36:5291\u0026ndash;5298. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1093/bioinformatics/btaa1044\u003c/span\u003e\u003cspan address=\"10.1093/bioinformatics/btaa1044\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eOu S et al (2019) Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline. Genome Biol 20:275. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1186/s13059-019-1905-y\u003c/span\u003e\u003cspan address=\"10.1186/s13059-019-1905-y\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGoel M, Schneeberger K (2022) plotsr: visualizing structural similarities and rearrangements between multiple genomes. Bioinformatics 38:2922\u0026ndash;2926. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1093/bioinformatics/btac196\u003c/span\u003e\u003cspan address=\"10.1093/bioinformatics/btac196\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi H (2018) Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34:3094\u0026ndash;3100. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1093/bioinformatics/bty191\u003c/span\u003e\u003cspan address=\"10.1093/bioinformatics/bty191\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGoel M, Sun H, Jiao WB, Schneeberger K (2019) SyRI: finding genomic rearrangements and local sequence differences from whole-genome assemblies. Genome Biol 20:277. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1186/s13059-019-1911-0\u003c/span\u003e\u003cspan address=\"10.1186/s13059-019-1911-0\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLangmead B, Salzberg SL (2012) Fast gapped-read alignment with Bowtie 2. Nat Methods 9:357\u0026ndash;359. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1038/nmeth.1923\u003c/span\u003e\u003cspan address=\"10.1038/nmeth.1923\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRamirez F et al (2016) deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res 44:W160\u0026ndash;165. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1093/nar/gkw257\u003c/span\u003e\u003cspan address=\"10.1093/nar/gkw257\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAliyeva-Schnorr L, Ma L, Houben AA (2015) Fast Air-dry Dropping Chromosome Preparation Method Suitable for FISH in Plants. J Vis Exp e53470. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.3791/53470\u003c/span\u003e\u003cspan address=\"10.3791/53470\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAliyeva-Schnorr L et al (2015) Cytogenetic mapping with centromeric bacterial artificial chromosomes contigs shows that this recombination-poor region comprises more than half of barley chromosome 3H. Plant J 84:385\u0026ndash;394. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1111/tpj.13006\u003c/span\u003e\u003cspan address=\"10.1111/tpj.13006\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFu S et al (2015) Oligonucleotide Probes for ND-FISH Analysis to Identify Rye and Wheat Chromosomes. Sci Rep 5:10552. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1038/srep10552\u003c/span\u003e\u003cspan address=\"10.1038/srep10552\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang Y et al (2008) Model-based analysis of ChIP-Seq (MACS). \u003cem\u003eGenome Biol\u003c/em\u003e 9, R137. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1186/gb-2008-9-9-r137\u003c/span\u003e\u003cspan address=\"10.1186/gb-2008-9-9-r137\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChen S, Zhou Y, Chen Y, Gu J (2018) fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34:i884\u0026ndash;i890. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1093/bioinformatics/bty560\u003c/span\u003e\u003cspan address=\"10.1093/bioinformatics/bty560\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMd. V, Sanchit M, Heng L, Srinivas A (2019) Efficient Architecture-Aware Acceleration of BWA-MEM for Multicore Systems. \u003cem\u003e2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\u003c/em\u003e, 314\u0026ndash;324. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1109/IPDPS.2019.00041\u003c/span\u003e\u003cspan address=\"10.1109/IPDPS.2019.00041\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBurgin J et al (2023) The European Nucleotide Archive in 2022. Nucleic Acids Res 51:D121\u0026ndash;D125. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1093/nar/gkac1051\u003c/span\u003e\u003cspan address=\"10.1093/nar/gkac1051\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"nature-portfolio","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"","title":"Nature Portfolio","twitterHandle":"","acdcEnabled":false,"dfaEnabled":false,"editorialSystem":"ejp","reportingPortfolio":"","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"","lastPublishedDoi":"10.21203/rs.3.rs-7013114/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-7013114/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eRye (\u003cem\u003eSecale cereale\u003c/em\u003e L.) is an important cereal crop known for its high yield potential and tolerance to biotic and abiotic stresses. However, its large, repeat-rich, and heterozygous genome has posed challenges for assembly compared to related species such as wheat and barley. Here, we present a high-quality, chromosome-scale genome assembly of the inbred line Lo7, generated using PacBio HiFi, Oxford Nanopore, Hi-C, and BioNano technologies using the TRITEX pipeline. The resulting Lo7_V3 assembly spans 6.76 Gb with a contig N50 of 128 Mb, correcting previous misorientations and achieving complete centromere assemblies across all seven chromosomes. Repetitive clusters containing rye-specific satellite sequences (pSc200 and pSc250) were contiguously assembled. Their chromosomal positions were validated using FISH. Centromeric retrotransposon analysis highlighted \u003cem\u003eRLG_Abia\u003c/em\u003e as a prominent element in rye, showing signs of recent activity and high abundance, unlike in wheat. Collectively, the new Lo7_V3 genome assembly provides a highly improved resource that will support future genomic research and crop improvement efforts in rye and related cereal species.\u003c/p\u003e","manuscriptTitle":"Unveiling centromeric retrotransposon dynamics through a near-complete rye genome assembly","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-07-08 11:09:49","doi":"10.21203/rs.3.rs-7013114/v1","editorialEvents":[],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"nature-communications","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"NCOMMS","sideBox":"Learn more about [Nature Communications](http://www.nature.com/ncomms/)","snPcode":"","submissionUrl":"https://mts-ncomms.nature.com/","title":"Nature Communications","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"ejp","reportingPortfolio":"Nature Communications","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"7549c1c6-e5d3-41bd-bda3-7a56c0331cb7","owner":[],"postedDate":"July 8th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[{"id":51006725,"name":"Biological sciences/Plant sciences/Plant genetics"},{"id":51006726,"name":"Biological sciences/Genetics/Genome"}],"tags":[],"updatedAt":"2026-05-08T14:14:18+00:00","versionOfRecord":[],"versionCreatedAt":"2025-07-08 11:09:49","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-7013114","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-7013114","identity":"rs-7013114","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00