Tandem LTR-retrotransposon structures are common and highly polymorphic in plant genomes | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Tandem LTR-retrotransposon structures are common and highly polymorphic in plant genomes Noemia Morales-Díaz, Svitlana Sushko, Lucia Campos-Domínguez, and 4 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-5356060/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 12 Mar, 2025 Read the published version in Mobile DNA → Version 1 posted 9 You are reading this latest preprint version Abstract Background LTR-retrotransposons (LTR-RT) are a major component of plant genomes and are a major driver of genome evolution. Most LTR-RT copies in plant genomes are defective elements, found as truncated copies, nested insertions or being part of more complex structures. With the availability of highly contiguous plant genome assemblies based on long-read sequences it has become feasible the detailed characterization of these complex structures and the evaluation of their importance for plant genome evolution. Results The detailed analysis of two rice loci containing complex LTR-RT structures showed that they consist of tandem arrays of LTR copies sharing internal LTRs. Our analysis show that the tandems are not the result of a single insertion and not of the recombination of two independent LTR-RT elements. Our results suggest that gypsy elements may be more prone to form these structures. We show that these structures are highly polymorphic in rice and have therefore the potential to generate genetic and phenotypic variability. We developed a computational pipeline, IDENTAM, that scans genome sequences and identifies tandem LTR-RT candidates and detected 307 tandems in a pangenome built from the genomes of 75 accessions of cultivated and wild rice, showing that tandem LTR-RT structures are frequent in the rice genome and are highly polymorphic in the species. Running IDENTAM in the Arabidopsis, almond and cotton genomes showed that LTR-RT tandems are frequent in plant genomes of different size, complexity and ploidy levels. The complexity of differentiating intra-element variations at the nucleotide level among haplotypes is very high, and we found that graph-based pangenomic methodologies are appropriate to resolve these structures. Conclusions Our results show that LTR-RTs can form tandem arrays of elements. These structures are relatively abundant and highly polymorphic in rice and are widespread in the plant kingdom. Future studies will contribute to understand how these structures originate and if the variability that they generate has a functional impact. Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Background Transposable Elements (TEs) are a major component of eukaryote genomes. In plants, TEs frequently account for most of the genome content, as for example in maize where TE-related sequences account for 85% of the genome content [ 1 ]. TEs contribute to genome evolution in many ways, fulfilling structural roles and generating genome variability that can translate into phenotypic novelty [ 2 ]. In plants, LTR retrotransposons (LTR-RTs), together with MITEs, are the most prevalent types of TEs [ 3 ]. As an example, in maize LTR-RTs account for as much as 90% of the total TE content [ 1 ]. LTR-RT insertions can inactivate genes or result in changes of the expression of genes located nearby and be at the origin of new phenotypic variability [ 3 , 4 ]. In fact, different LTR-RT insertions have been selected during the domestication, local adaptation and breeding of plant crops [ 5 ]. LTR-RTs move through a replicative process, which leads to increasing their copy number while transposing. Their amplification can be at the origin of rapid increases in genome size, as it has been shown in O. australiensis , where the genome size doubling in just three million years can be explained by the amplification of three families of LTR-RTs [ 6 ]. However, LTR-RT sequences can also be eliminated from genomes, thus reverting the tendency to genome size expansion [ 7 , 8 ]. The main mechanism for this is illegitimate recombination, either at the LTRs giving rise to the so called solo-LTRs or involving any other repeated sequence thus resulting in truncated LTR-copies [ 9 – 12 ]. In fact, most LTR-RT-related sequences in genomes are deletion derivatives of LTR-RTs and are no longer able to autonomously transpose [ 13 ]. Moreover, LTR-RTs can also give rise to complex LTR-RT-related structures through their nested insertion in other LTR-RTs. For example, the array of nested LTR-RTs of different families is common in the genomes of maize and barley [ 11 , 14 ]. This can be explained by the lack of phenotypic consequences of the insertion in these gene-free regions, and therefore the lower selective pressure against these insertions, or by a targeted insertion of certain LTR-RT families. For example, the latter could be the case of the two main LTR-RT families of Physcomitrium patens , RLG1 and RLC5 that are generally found forming heterochromatic islands composed mainly of a single family LTR-RT elements in the chromosome arms and the centromere, respectively [ 15 – 17 ]. In addition, a particular type of defective LTR-RT, called Terminal-repeat Retrotransposons in Miniature, TRIMs, has been shown to form tandem repeats of elements sharing an internal LTR [ 18 , 19 ]. These structures can be the result of illegitimate recombination or could be generated during the retrotransposition process [ 18 , 19 ]. Tandem repeats of LTR-RT sequences sharing internal LTRs have also been found in the centromeres of two species of kangaroos [ 20 ], and it has been proposed that they could be originated by illegitimate recombination when repairing a double-strand break (DSB) at an LTR with the other LTR of the element from the sister chromatid or the homologous chromosome [ 21 ]. Moreover, it has been reported in yeast that LTR-RTs could also generate these tandem structures through an integrase-independent mechanism of integration into preexisting elements [ 22 , 23 ]. Previous data on Drosophila suggests that tandem TE insertions may be relatively frequent in eukaryote genomes [ 24 ]. Although the available plant reference genomes probably contain TE tandem repeats, these have not been systematically analyzed or reported due to the difficulty to discard the artefactual nature of some of these structures when the reference genomes are based mainly on short read data. However, as the number of Telomere-to-Telomere and high-quality assemblies based on long-read data increases, it becomes feasible to analyze the structure and the prevalence of tandem-repeat LTR-RT insertions in plant genomes. Here we show that plant genomes frequently have tandem arrays of LTR-RTs sharing the internal LTRs and that these structures are highly variable, thus increasing the potential of LTR-RTs for generating phenotypic variability. Methods Detecting LTR-RT Tandems and intact LTR-RTs To detect LTR-RT tandems in different plant genomes, we developed a bioinformatics pipeline, IDENTAM ( https://github.com/NMoralesD/IDENTAM ), which requires a defined LTR-RT consensus library with internal and LTR regions provided as separate sequences and a reference genome. This pipeline runs RepeatMasker with the input library, retains the hits that cover more than 70% of the consensus length and employs two approaches to detect tandems: one based on identifying two nearby internal regions from LTR-RT, and another based on detecting three close LTRs. Multiple filters (flexible parameters set by the user) are applied to limit false positives, and. TEsorter [ 25 ] is then applied to classify the elements into LTR-RT_TR, which are potential LTR-RT tandems with recognized coding domains and associated to known LTR-RT lineages, or LTR-RT-related elements, which are tandemly arranged elements without recognized coding domains. An expanded description of this pipeline is shown in Additional Fig. 1 . Input LTR-RT libraries for IDENTAM were built with EDTA_raw.pl script [ 26 ], except for rice, in which a previously published TE library was used (ref). Default IDENTAM parameters (Additional Fig. 1 ) were employed for detecting LTR-RT tandems in all species. Rice pangenome construction We followed two different strategies to build a pangenome of rice. We first created a pangenome using long-read-based genome assemblies of 76 rice varieties, representing the diversity of the species and including also 7 assemblies of the wild rice relatives O. rufipogon (3), O. barthii (3) and 1 O. glaberrima (1) (AdditionalTable 1 ). We anchored the pangenome to the Nipponbare IRGSP-1.0 genome (Additional Table 1). Every assembly was aligned to I IRGSP-1.0 using minimap minimap2[ 27 ] and SVIM-asm[ 28 ] was used for structural variant detection. The vcf files generated by SVIM-asm ( https://github.com/lanasushko/Rice_pangenome_TEs ) for the 75 genomes were merged with bcftools merge[ 29 ] and Truvari[ 30 ] The pangenome graph was built using vgtools [ 31 ]. To identify the LTR-RT Tandems corresponding to transposon insertion polymorphisms (TIPs), we ran IDENTAM pipeline on the insertion and deletion sequences detected by SVIM-asm. [ 28 ]A second pangenome was obtained using Minigraph-Cactus[ 32 ] for variant detection in a reduced set of 20 accessions (Additional Table 2). The pipeline was run in every chromosome independently, and vg deconstruct [ 33 ] was used for variant calling using vg 1.58 version Cartari [ 33 ], with the -L parameter set to 0.9 to cluster nearly exact allele transversals -L 0.9. Then, large deletions (> 1Mb) were removed using vcfbub[ 34 ] with the option -r 1000000. Bcftools norm with -m- option[ 29 ] allowed us to split multiallelic sites into biallelic records (-). Only SVs larger than 50bp were considered for further TE analyses. The output files were merged using bcftools concat , as all the files had the same columns in the same order. The pangenome graph was built again using vgtools [ 31 ]. Results An LTR-RT insertion with a tandem structure in the rice genome As a first step to characterize a rice non-reference LTR-RT insertion with potential phenotypic impact, as suggested by the result of a Transposon-Insertion Polymorphism GWAS (TIP-GWAS) previously performed [ 35 ], we analyzed the available long-read-based genome assemblies of different rice accessions. This analysis confirmed the presence of the LTR-RT insertion in the assembly of NH218 rice accession [ 36 ] but showed that this insertion is complex. Indeed, the insertion consists of a tandem array of two LTR-RT elements sharing an internal LTR (Fig. 1 ). An analysis of the long-reads used to produce the assembly of NH218 showed that the LTR-RT tandem region was covered by 4 long reads that spanned the entire LTR-RT tandem and, at least, 1kb upstream and 5 kb downstream flanking regions. This confirmed that the tandem LTR-RT structure is not the result of an artifactual assembly and that this structure exists in the genome of the NH218 rice. We designated this insertion as Tandem LTR-RT Insertion 1 (TLI1). A comparison of the sequence with that of the Nipponbare rice reference genome (IRGSP-1.0) [ 37 ], that does not contain the insertion, shows that the insertion is accompanied by a duplication of 5 nt, which is the canonical length for the target site duplication (TSD) generated by LTR-RT upon insertion [ 38 ]. The sequences flanking the tandem LTR-RT insertion show a high degree of sequence identity (99% over 2 Kb upstream and 92% over 2 Kb downstream), which discards the tandems as being the result of the recombination of two close by independent insertions, which would result in the elimination of the interleaving sequence. The high identity of the LTR-RT internal regions (94%) and of the LTRs (85–89%), as well as the absence of additional TSDs, also discarded the possibility of nested insertions of different LTR-RTs. Therefore, all the data suggests that the tandem LTR-RT structure is linked to a single retrotransposition-mediated insertion. LTR-RT tandems loci can be highly polymorphic The identification of a tandem LTR-RT structure in the rice genome prompted us to look more closely at other loci that appeared as complex in previous analyses. In particular, we analyzed a complex structure present in chromosome 2 of Nipponbare rice. A detailed analysis of this locus showed that it contains a tandem LTR-RT structure, with two internal regions flanked by three LTRs, inserted within a MULE transposon (Fig. 2 ). The MULE element is flanked by a 10 nt repeat, which fits the canonical size for MULE TSDs generated upon transposition [ 39 ] and the tandem LTR-RT is flanked by a direct repeat of 5 nt, typical for TSDs of LTR-RT insertions [ 38 ]. This suggests that the insertions are the result of two independent transposition events. As for the previous tandem LTR-RT structure analyzed, the identity of the two internal regions (99%) and the three LTRs (99%) is very high. We designated this insertion as TLI2. An analysis of 27 additional long-read-based genome assemblies of cultivated and wild rice and related species [ 36 , 40 , 41 ] showed that this locus is present in at least 6 different haplotypes in these genomes (Fig. 2 and Additional Table 3). The insertion of the Mu-related element seems relatively ancient as it is found in one of the two wild rice O. rufipogon assemblies analyzed, although it is not present in the two assemblies of O. barthii and in the O. glaberrima assembly analyzed, which all consist of the empty site. Interestingly, five out of the nine rice accessions belonging to the indica subspecies have the empty site (Hap 1) and all the remaining indica accessions except one (4), as well as the two aromatic japonicas analyzed have a deletion compatible with the Mu-like excision (Hap 6). We have not found any cultivated rice accession with a simple Mu-like insertion at this location, which may suggest selection for the empty site or the excision of the element. On the contrary, all the japonica accessions (12), as well as one indica accession (LARHA MUGAD, LM) contain the Mu-like insertion with a nested insertion of an LTR-RT-related sequence. A phylogenetic analysis of the regions flanking the insertion site (20 Kb upstream and 35 Kb downstream; Additional Fig. 2 ), shows that the sequences of this indica accession (LM) are more similar to those of the japonica accessions than to those of the other indica accessions, which suggest that this region may have been introgressed from japonica into the LM indica accession. Therefore, our results are compatible with the LTR-RT-related insertion happening after the split of indica and japonica and even after the split of the aromatic/circum-basmati group. We have not found any sign of excision of the Mu-like element in japonica accessions which could suggest that the nested insertion of the LTR-RT may have stabilized the Mu-like insertion. The insertion of the LTR-RT-related sequence consists of a tandem of two LTR-RTs sharing the internal LTR (Hap 3), a single LTR-RT insertion (Hap 4) or a solo-LTR (Hap 5). The existence of haplotypes with single or tandem LTR-RTs and solo-LTR insertions for the same locus suggests that these structures are highly dynamic. Tandem LTR-RT insertions could be inserted as such, and single insertions (as for solo-LTRs) could be the result of illegitimate recombination event at the LTRs using the sister chromatid or the homologous chromosome. Unfortunately, the phylogenetic analysis of the sequences flanking the insertions (Additional Fig. 2 ) did not allow us to establish the sequence of events and discriminate between the two different mechanisms for the tandem LTR-RT formation. Tandems of LTR-RTs from different families are widespread in rice To analyze how common tandem LTR-RT structures are in the rice genome, we systematically searched for these structures in the Nipponbare rice genome. To this end, we build a pipeline, that we named IDENTAM (see Methods and Additional Fig. 1 ) that searches for the presence of highly similar repeats of LTR-RT internal sequences interleaved with LTRs, or alternatively highly similar LTRs interleaved with LTR-RT internal sequences. We searched the Nipponbare rice reference genome [ 37 ] and identified 74 potential tandem LTR-RT structures from which 66 were clearly related to LTR-RT sequences. A manual inspection of these 66 LTR-RT related insertions showed that 28 had a clear LTR-RT tandem structure (i.e. alternating internal LTR-RT sequences and LTRs, starting and finishing with an LTR) whereas the rest were potentially degenerated LTR-RT tandems, with a more complex array of LTR-RT sequences, or potential nested elements. These more complex structures were not analyzed further and were filtered out from our selection. All the selected 28 tandem LTR-RT sequences contain regions encoding conserved retrotransposon protein domains, and 10 are flanked by perfect TSD sequences of 5 nts (AdditionalTable 4). An analysis of the 28 LTR-RTs (Fig. 3 ) shows that most of the tandem LTR-RT insertions (82%) are related to the gypsy LTR-RT superfamily. This percentage is slightly higher than the percentage of the intact gypsy LTR-RT elements in the Nipponbare genome (72%), which could indicate that there is a slight bias in the type of elements that generate tandem LTR-RT structures. However, no significant difference was observed between the two groups (p-value = 0.2932, Fisher test). A more detailed analysis shows that 82% of the tandem LTR-RT structures related to the gypsy superfamily belong to the Tekay lineage (Fig. 3 ), whereas Tekay elements account only for the 28% of the gypsy elements annotated in the rice Nipponbare genome. Although the total number of the analyzed structures is low, a one-tail Fisher’s test revealed an enrichment in the Tekay linage in the LTR-RT Tandem group (p-value = 9.81e-08) which means some LTR-RT lineages are more prone to form tandem LTR-RT structures. Alternatively, the bias found could be the consequence of the particular distribution of Tekay elements, which tend to concentrate in pericentromeric regions (Additional Fig. 2 ). However, an analysis of the distribution of the relative distance to the centromere of the tandem LTR-RTs suggests that this may not be an important factor explaining the possible preference of Tekay elements to form tandem LTR-RT structures (Additional Fig. 3 ), as no specific bias is observed towards a shorter distance to the centromere. The analysis of the Nipponbare IRGSP-1.0 genome suggests that tandem LTR-RT structures are frequent in rice. To further analyze how frequent these structures are within rice and related species we constructed a pangenome using long-read-based genome assemblies of 75 O. sativa varieties, representing the diversity of the species and including also 7 assemblies of the wild rice relatives O. rufipogon (3), O. barthii (3) and O. glaberrima (1)[ 36 , 41 ] (see methods). We found 175,555 SVs in the pangenome, which were annotated for the presence of LTR-RTs and we searched for sequences potentially corresponding to tandem LTR-RT structures using IDENTAM. We identified 241 additional tandem LTR-RT structures that are not present in the assembled genome of Nipponbare rice. On the other hand, we found that 41 out of the 66 tandem LTR-RT structures found in Nipponbare are absent from at least one of the 75 assemblies included in the pangenome (data not shown). These results confirm that tandem LTR-RT structures are frequent and highly polymorphic in rice. A comparison of the types of LTR-RT elements forming the potential 307 tandem LTR-RT structures found in the pangenome (241 new non-overlapping insertions plus the 66 previously detected in Nipponbare) (Fig. 4 ) with the LTR-RT annotation of the pangenome (LTR-RTs annotated in the Nipponbare reference genome[ 26 ] plus the LTR-RT present in the SVs) shows that gypsy LTR-RTs are overrepresented according to a two-tail Fisher’s test (p-value = 9.81e-08) in tandem LTR-RT structures (88% while these elements account for the 74% of the total LTR-RTs) and among gypsy elements the Tekay lineage seems also to be significantly enriched in the tandem group according to a one-tail Fisher's test (p-value = 2.2e-16). These 66% LTR-RT tandem Tekay elements account for 35% of the total LTR-RTs), in line with what was found analyzing the genome of Nipponbare rice only (see Fig. 3 ). Using the cactus-minigraph pangenome for characterizing LTR-RT polymorphic structures The pangenome approach described above allowed us to identify many tandem LTR-RT insertions present in rice and related species. However, this approach proved to be of limited use for the correct characterization of the different alleles these structures can produce. Indeed, the analysis of the TLI2 locus, which can be present in up to six different haplotypes, showed that this locus was not satisfactorily resolved in the pangenome. The different haplotypes were collapsed, as the different structural variants occur at the same position and have extensive sequence identity. Consequently, only two haplotypes were defined at this position, the LTR-RT tandem inserted within the MULE element present in the reference genome, and a deletion corresponding to the absence of insertion of both the MULE and the nested tandem LTR-RT structure (Additional Fig. 4 ). The accessions presenting other haplotypes were resolved as having one of these two, with the single LTR-RT insertions nested in the MULE (Hap 4, Fig. 2 ) being resolved as in the reference (which contains a tandem LTR-RT insertion, Hap 3), and the accessions presenting the insertion of the MULE alone (Hap 2), as deletions of the MULE and the nested LTR-RT structures (Hap 6). This prompted us to use minigraph-cactus, which does not collapse duplications during the pangenome construction [ 32 ]. This pipeline allowed us to further resolve multiallelic, complex SVs. Figure 5 shows the minigraph-cactus pangenome version graph showing the complex allelic variants defined in Fig. 2 , which could not be defined with the previous approach. As the Bandage visualization shows, all haplotypes previously defined are easily characterized using this approach except for the 29 bp deletion, as in the pipeline regions smaller than 50 bp were not considered SV (see Methods section). An analysis of the 28 loci characterized here as containing tandem LTR-RT insertions using the cactus-minigraph pangenome showed that 61% of the tandem LTR-RT loci are fixed, while the rest are polymorphic, often giving rise to multiple haplotypes (up to 7 different haplotypes in a single locus), which highlights the high genomic diversity LTR-RT tandems can generate. LTR-RT tandems are common in plant species To evaluate how common the presence of tandem LTR-RT structures is in plant genomes we ran the IDENTAM pipeline on the assembled genomes of three other plant species including Arabidopsis thaliana (TAIR 10) [ 42 ], Prunus dulcis (almond)[ 43 ] and the upland cotton Gossypium hirsutum [ 44 ], which span a wide range of genome sizes, LTR-RT content and have different levels of ploidy. We found tandem LTR-RT structures in all of them, with a lower number in the genomes with a lower content of LTR-RTs (11 tandem LTR-RT structures in A. thaliana and P. dulcis ) and higher in bigger genomes (e.g. 86 in cotton). With respect to the type of LTR-RT forming tandems, the analysis of these genomes shows that in in most of them gypsy LTR-RTs seem more prone to form tandem LTR-RT structures (Fig. 6 ). Indeed, tandem LTR-RTs are significantly enriched in gypsy elements in Arabidopsis (Fisher’s test p-value = 0.009041), as found in rice (Fig. 4 ), whereas in cotton and almond there is no significant enrichment for any of the two LTR-RT main superfamilies, gypsy and copia (Fig. 6 ). Our analysis also shows that in most genomes there is a bias towards specific gypsy lineages to form LTR-RT Tandems (Fig. 6 ), but the specific lineage enriched depends on the genome analyzed. Athila elements are highly enriched in the LTR-RT Tandem group in A. thaliana (p-value = 0.02573), P. dulcis (p-value F0 = 0.001404) and cotton (p-value = 5.205e-16), whereas in rice the tandem LTR-RT structures are enriched in Tekay elements (p-value = 2.2e-16; Fig. 4 ). The analysis of the rice pangenome suggested that these tandem LTR-RT structures are highly polymorphic within a species. Interestingly, the analysis of the phased genome of almond showed that among the 11 tandem LTR-RT structures identified in the F1 phase, one was not present in the F0 phase which, on the other hand, has one additional tandem LTR-RT structure, which stresses the high variability of these structures (data not shown). Discussion TEs are widespread in eukaryote genomes and their mobilization and amplification is thought to have an important impact on genome structure and gene regulation. In plants, TEs are known to be a major driving force of genome evolution [ 45 ] and they can account for most of the genome space. In addition to autonomous elements, genomes contain defective elements, which become increasingly difficult to identify as they accumulate mutations [ 46 ], and are overrepresented as compared with active TEs. The repetitive nature of TEs has made their identification and study challenging, in particular on genome assemblies based mainly on short-read sequences. With long-read based reference genomes and pangenomes of different plant species becoming available, it is now possible to annotate and study TEs with much more detail. Here we show that LTR-RTs can form tandem arrays of alternating LTRs and LTR-TR internal regions, which are flanked by TSDs. Our results show that these structures are relatively abundant in rice and are also present in other genomes, of both monocot and dicot plants, with different genome size and ploidy levels. This suggests that tandem LTR-RT insertions are widespread in plant genomes, as they also seem to be in other higher eukaryotes such as Drosophila [ 24 ]. Our results suggest that gypsy elements tend to form more LTR-RT tandem structures than copia LTR-RTs, and some biases towards certain gypsy lineages seem also to exist, although different lineages seem prone to form these structures in different genomes. This general trend of gypsy elements could be the result of their average longer LTR size, that may more easily promote illegitimate recombination, or the frequent association of gypsy elements with heterochromatin and pericentromeric regions. However, we have not been able to detect any significant correlation of tandem LTR-RT formation and any of these features. The pangenome-based analysis of the variability linked to these structures showed that they are highly dynamic, with more than 66% of the LTR-TR tandems found in Nipponbare being absent in at least one of the other Oryza genomes analyzed. Moreover, when present, LTR-RT tandems can generate many different haplotypes with a variable number of the tandemly repeated unit. This significantly expands the potential of LTR-RTs to generate genome variability within a species, which can translate into phenotypic diversity. However, analyzing LTR-RT tandems at a population scale is complex and requires the use of completely assembled genomes and novel pangenome graph pipelines to properly their genetic variability. Tandem LTR-RT structures like the ones described here have been found in the centromeres of two different species of Kangaroos, and it has been proposed that they could arise by illegitimate recombination between the two different LTRs of the LTR of the element sitting in sister chromatids or homologous chromosomes, which could also give rise to solo LTRs [ 20 ]. Indeed, the same mechanism was proposed to explain tandem arrays of TRIMs in different species, although it was also proposed that these structures could also result from the insertion of tandem structures produced during retrotransposition [ 18 , 19 ]. Interestingly, it has recently been shown that the retrotransposition process involves the formation of circular LTR-RT DNA containing a single LTR that can be used for transcribing LTR-RT mRNA to initiate a new round of replication [ 47 ]. Under this scenario, the presence of a weak transcriptional terminator, as the one described for the tobacco Tnt1 LTR-RT [ 48 ] could allow the production of tandem LTR-RT transcripts, leading to the transposition of tandem LTR-RT structures. The analysis of the SNPs surrounding the insertion site in the 6 different haplotypes of the chromosome 2 locus, did not allow us to establish the complete sequence of events leading to diversity of structures present, and determine whether the tandem LTR-RT structure is the result of a complex insertion or of an illegitimate recombination event. At this point, both mechanisms seem possible and not necessarily mutually exclusive. More research will be needed to clarify this point. Conclusions Tandem LTR-RT structures are widespread in plant genomes and can give rise to multiple haplotypes. The frequent and highly polymorphic nature of tandem LTR-RTs expands the potential of LTR-RTs to generate genome variability with potential phenotypic consequences. Declarations Ethics approval and consent to participate Not applicable Consent for publication Not applicable Availability of data and materials All data generated or analysed during this study are included in this published article [and its supplementary information files]. Competing interests The authors declare that they have no competing interests. Funding The work done at CRAG was funded by grant PID2022-143167NB-I00 funded by MICIU/AEI/ 10.13039/501100011033 and by “ERDF/EU” and grant CEX2019-000902-S funded by MICIU/AEI /10.13039/501100011033 to JMC. NMD is funded by Grant PRE2020-095111 funded by MCIU/AEI /10.13039/501100011033 and by “ESF Investing in your future”, LCD is funded by rom the European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement No 945043, and RC was partially funded by a Juan de la Cierva contract, grant IJC2020-045949-I funded by MICIU/AEI /10.13039/501100011033 and by European Union NextGenerationEU/PRTR”, and is now a Ramón y Cajal contract holder, RYC2022-037459-I funded by MICIU/AEI/ https://doi.org/10.13039/501100011033 and by FSE+. AAG was supported by the LOEWE Start Professorship from the Hessian Ministry for Science and the Arts. VK was supported by GRK 2843 from the German Research Foundation (DFG). Authors' contributions NMD developed IDENTAM and performed most of the experiments. LCD performed the analysis of tandem LTR-RTs in cotton; SS obtained the rice pangenome based on SVIM-asm, whereas NMD obtained the cactus-minigraph pangenome in the laboratory of AAG with the help of VK. RC and JC conceived and directed the project and wrote the manuscript with the help of all other authors. Acknowledgements We are grateful to all the members of CRAG’s lab for useful discussions. References Stitzer MC, Anderson SN, Springer NM V, Ross-Ibarra J. The Genomic Ecosystem of Transposable Elements in Maize. PLoS Genet. 2021;17:e1009768. Bennetzen JL, Wang H. The contributions of transposable elements to the structure, function, and evolution of plant genomes. Annu Rev Plant Biol [Internet]. 2014;65:505–30. Available from: http://www.ncbi.nlm.nih.gov/pubmed/24579996 Pulido M, Casacuberta JM. Transposable element evolution in plant genome ecosystems. Curr Opin Plant Biol. 2023;75:102418. Lisch D. How important are transposons for plant evolution? Nat Rev Genet [Internet]. 2013;14:49–61. Available from: http://www.ncbi.nlm.nih.gov/pubmed/23247435 Andersson L, Purugganan M. Molecular genetic variation of animals and plants underdomestication. Proc Natl Acad Sci U S A. 2022;119:e2122150119. Piegu B, Guyot R, Picault N, Roulin A, Saniyal A, Kim H, et al. Doubling genome size without polyploidization: Dynamics of retrotransposition-driven genomic expansions in Oryza australiensis, a wild relative of rice. Genome Res. 2006;16:1262–9. Bennetzen JL, Kellogg EA. Do Plants Have a One-Way Ticket to Genomic Obesity? Plant Cell [Internet]. 1997;9:1509–14. Available from: http://www.plantcell.org/content/9/9/1509.short Munasinghe M, Read A, Stitzer M, Song B, Menard C, Ma K, et al. Combined analysis of transposable elements and structural variation in maize genomes reveals genome contraction outpaces expansion. PLoS Genet. 2023;19:e1011086. Devos KM, Brown JKM, Bennetzen JL. Genome size reduction through illegitimate recombination counteracts genome expansion in Arabidopsis. Genome Res. 2002;12:1075–9. Tian Z, Rizzon C, Du J, Zhu L, Bennetzen JL, Jackson SA, et al. Do genetic recombination and gene density shape the pattern of DNA elimination in rice long terminal repeat retrotransposons? Genome Res. 2009;19:2221–30. Shirasu K, Schulman A, Lahaye T, Schulze-Lefert P. A contiguous 66-kb barley DNA sequence provides evidence for reversible genome expansion. Genome Res. 2000;10:908–15. Vitte C, Panaud O. Formation of solo-LTRs through unequal homologous recombination counterbalances amplifications of LTR retrotransposons in rice Oryza sativa L. Mol Biol Evol. 2003 Apr;20(4):528-40. Mol Biol Evol. 2003;20:528–40. Ma J, Devos KM, Bennetzen JL. Analyses of LTR-retrotransposon structures reveal recent and rapid genomic DNA loss in rice. Genome Res. 2004;14:860–9. SanMiguel P, Gaut BS, Tikhonov A, Nakajima Y, Bennetzen JL. The paleontology of intergene retrotransposons of maize. Nat Genet. 1998;20:43–5. Lang D, Ullrich KK, Murat F, Fuchs J, Jenkins J, Haas FB, et al. The Physcomitrella patens chromosome-scale assembly reveals moss genome structure and evolution. Plant Journal. 2018;93:515–33. Vendrell-Mir P, López-Obando M, Nogué F, Casacuberta JM. Different Families of Retrotransposons and DNA Transposons Are Actively Transcribed and May Have Transposed Recently in Physcomitrium (Physcomitrella) patens. Front Plant Sci. 2020;11:1274. Bi G, Zhao S, Yao J, Wang H, Zhao M, Sun Y, et al. Near telomere-to-telomere genome of the model plant Physcomitrium patens. Nat Plants. 2024;10:327–343. Kalendar R, Raskina O, Belyayev A, Schulman AH. Long Tandem Arrays of Cassandra Retroelements and Their Role in Genome Dynamics in Plants. Int J Mol Sci. 2020;21:2931. Wang Q, Huang J, Li Y, Dooner H. The unusual dRemp retrotransposon is abundant, highly mutagenic, and mobilized only in the second pollen mitosis of some maize lines. Proc Natl Acad Sci U S A. 2020;117:18091–8. Koga A, Nishihara H, Tanabe H, Tanaka R, Kayano R, Matsumoto S, et al. Kangaroo endogenous retrovirus (KERV) forms megasatellite DNA with a simple repetition pattern in which the provirus structure is retained. Virology. 2023 Sep;586:56-66. Virology. 2023;586:56–66. Hayashi S, Honda Y, Kanesaki E, Koga A. Marsupial satellite DNA as faithful reflections of long-terminal repeat retroelement structure. Genome. 2022;65:469–78. Ke N, Voytas D. High frequency cDNA recombination of the saccharomyces retrotransposon Ty5: The LTR mediates formation of tandem elements. Genetics. 1997;147:545–56. Li F, Lee M, Esnault C, Wendover K, Guo Y, Atkins P, et al. Identification of an integrase-independent pathway of retrotransposition. Sci Adv. 2022;8:eabm9390. McGurk M, Barbash D. Double insertion of transposable elements provides a substrate for the evolution of satellite DNA. Genome Res. 2018;28:714–25. Zhang R-G, Li G-Y, Wang X-L, Dainat J, Wang Z-X, Ou S, et al. TEsorter: An accurate and fast method to classify LTR-retrotransposons in plant genomes. Hortic Res [Internet]. 2022;9:uhac017. Available from: https://doi.org/10.1093/hr/uhac017 Ou S, Su W, Liao Y, Chougule K, Agda JRA, Hellinga AJ, et al. Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline. Genome Biol. 2019;20:275. Li H. Minimap2: Pairwise alignment for nucleotide sequences. Bioinformatics. 2018;34:3094–100. Heller D, Vingron M. SVIM-asm: structural variant detection from haploid and diploid genome assemblies. Bioinformatics [Internet]. 2021;36:5519–21. Available from: https://doi.org/10.1093/bioinformatics/btaa1034 Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, et al. Twelve years of SAMtools and BCFtools. Gigascience [Internet]. 2021;10:giab008. Available from: https://doi.org/10.1093/gigascience/giab008 English AC, Menon VK, Gibbs RA, Metcalf GA, Sedlazeck FJ. Truvari: refined structural variant comparison preserves allelic diversity. Genome Biol [Internet]. 2022;23:271. Available from: https://doi.org/10.1186/s13059-022-02840-6 Garrison E, Sirén J, Novak AM, Hickey G, Eizenga JM, Dawson ET, et al. Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nat Biotechnol [Internet]. 2018;36:875–9. Available from: https://doi.org/10.1038/nbt.4227 Hickey G, Monlong J, Ebler J, Novak A, Eizenga J, Gao Y, et al. Pangenome graph construction from genome alignments with Minigraph-Cactus. Nat Biotechnol. 2024;42:663–73. Liao W-W, Asri M, Ebler J, Doerr D, Haukness M, Hickey G, et al. A draft human pangenome reference. Nature [Internet]. 2023;617:312–24. Available from: https://doi.org/10.1038/s41586-023-05896-x Garrison E. vcfbub: popping bubbles in vg deconstruct VCFs. Zenodo. 2022. Castanera R, Vendrell-Mir P, Bardil A, Carpentier MC, Panaud O, Casacuberta JM. The amplification dynamics of MITEs and their impact on rice trait variability. Plant J. 2021;107:118–35. Shang L, Li X, He H, Yuan Q, Song Y, Wei Z, et al. A super pan-genomic landscape of rice. Cell Res. 2022;32:878–896. Kawahara Y, de la Bastide M, Hamilton JP, Kanamori H, Mccombie WR, Ouyang S, et al. Improvement of the oryza sativa nipponbare reference genome using next generation sequence and optical map data. Rice. 2013;6:3–10. Neumann P, Novák P, Hoštáková N, Macas J. Systematic survey of plant LTR-retrotransposons elucidates phylogenetic relationships of their polyprotein domains and provides a reference for element classification. Mob DNA. 2019;10:1. Ferguson A, Jiang N. Mutator-like elements with multiple long terminal inverted repeats in plants. Comp Funct Genomics. 2012;2012:695827. Zhang F, Xue H, Dong X, Li M, Zheng X, Li Z, et al. Long-read sequencing of 111 rice genomes reveals significantly larger pan-genomes. Genome Res. 2022;32:853–63. Qin P, Lu H, Du H, Wang H, Chen W, Chen Z, et al. Pan-genome analysis of 33 genetically diverse rice accessions reveals hidden genomic variations. Cell. 2021;184:3542–58. Lamesch P, Berardini TZ, Li D, Swarbreck D, Wilks C, Sasidharan R, et al. The Arabidopsis Information Resource (TAIR): improved gene annotation and new tools. Nucleic Acids Res [Internet]. 2012;40:D1202–10. Available from: https://doi.org/10.1093/nar/gkr1090 Castanera R, de Tomás C, Ruggieri V, Vicient C, Eduardo I, Aranzana M, et al. A phased genome of the highly heterozygous ‘Texas’ almond uncovers patterns of allele-specific expression linked to heterozygous structural variants. Hortic Res. 2024;11:uhae106. Yang Z, Ge X, Yang Z, Qin W, Sun G, Wang Z, et al. Extensive intraspecific gene order and gene structural variations in upland cotton cultivars. Nat Commun [Internet]. 2019;10:2989. Available from: https://doi.org/10.1038/s41467-019-10820-x Wendel JF, Jackson SA, Meyers BC, Wing RA. Evolution of plant genome architecture. Genome Biol [Internet]. 2016;17:37. Available from: http://dx.doi.org/10.1186/s13059-016-0908-1 Maumus F, Quesneville H. Deep investigation of Arabidopsis thaliana junk DNA reveals a continuum between repetitive elements and genomic dark matter. PLoS One [Internet]. 2014;9:e94101. Available from: http://dx.doi.org/10.1371%2Fjournal.pone.0094101 Yang F, Su W, Chung OW, Tracy L, Wang L, Ramsden DA, et al. Retrotransposons hijack alt-EJ for DNA replication and eccDNA biogenesis. Nature. 2023;620:218–225. Hernández-Pinzón I, De Jesús E, Santiago N, Casacuberta JM. The frequent transcriptional readthrough of the tobacco tnt1 retrotransposon and its possible implications for the control of resistance genes. J Mol Evol. 2009;68:269–78. Additional Declarations No competing interests reported. Supplementary Files AdditionalFiguresandTableslegends.docx AdditionalTable.zip Cite Share Download PDF Status: Published Journal Publication published 12 Mar, 2025 Read the published version in Mobile DNA → Version 1 posted Editorial decision: Revision requested 17 Dec, 2024 Reviews received at journal 16 Dec, 2024 Reviews received at journal 21 Nov, 2024 Reviewers agreed at journal 18 Nov, 2024 Reviewers agreed at journal 15 Nov, 2024 Reviewers invited by journal 07 Nov, 2024 Editor assigned by journal 06 Nov, 2024 Submission checks completed at journal 04 Nov, 2024 First submitted to journal 29 Oct, 2024 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-5356060","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":378505188,"identity":"110bf56a-90f5-4b2c-b93d-85563bd38147","order_by":0,"name":"Noemia Morales-Díaz","email":"","orcid":"","institution":"Center for Research in Agricultural Genomics","correspondingAuthor":false,"prefix":"","firstName":"Noemia","middleName":"","lastName":"Morales-Díaz","suffix":""},{"id":378505189,"identity":"bf9ef432-bd5e-4132-9902-87a340f40fb0","order_by":1,"name":"Svitlana Sushko","email":"","orcid":"","institution":"Center for Research in Agricultural Genomics","correspondingAuthor":false,"prefix":"","firstName":"Svitlana","middleName":"","lastName":"Sushko","suffix":""},{"id":378505190,"identity":"1a770430-eaf5-4488-9da8-83013b31b2a9","order_by":2,"name":"Lucia Campos-Domínguez","email":"","orcid":"","institution":"Center for Research in Agricultural Genomics","correspondingAuthor":false,"prefix":"","firstName":"Lucia","middleName":"","lastName":"Campos-Domínguez","suffix":""},{"id":378505191,"identity":"06ba70e9-d1d8-41b7-9828-a21c5c77309d","order_by":3,"name":"Venkataramana Kopalli","email":"","orcid":"","institution":"University of Giessen","correspondingAuthor":false,"prefix":"","firstName":"Venkataramana","middleName":"","lastName":"Kopalli","suffix":""},{"id":378505192,"identity":"60b12edc-875b-49c2-8ebc-7fa042410bb7","order_by":4,"name":"Agnieszka Golicz","email":"","orcid":"","institution":"University of Giessen","correspondingAuthor":false,"prefix":"","firstName":"Agnieszka","middleName":"","lastName":"Golicz","suffix":""},{"id":378505193,"identity":"66b35844-aadc-4495-a57d-ad84ae31717c","order_by":5,"name":"Raul Castanera","email":"","orcid":"","institution":"Center for Research in Agricultural Genomics","correspondingAuthor":false,"prefix":"","firstName":"Raul","middleName":"","lastName":"Castanera","suffix":""},{"id":378505194,"identity":"04861628-0034-4420-8a22-e773feaa319c","order_by":6,"name":"Josep Casacuberta","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABR0lEQVRIiWNgGAWjYBACxgYkDjOYZG/AKolbiwQDzwH8WlAARItEAgFV7e0PHxcw2MnLtx9+Jl1Qca+Of+bjZx9+/LLJ4xc73PjhB4ONPbrDes4YG89gSDbccCbNTHrGmWIJidtpxjN7+9KKJWcnNkv2MKQlojmPcUYOmzQPwwHGDQxABm9bggTD7RxmBt6ew4kbbie2MfAwHEZ3J+OM9Oe/gVrs5/e/AWr5lyAhf/MMM+Pfnv9gLYx/GP5jOGxGghkzUEtiww2QLQ0JEgY3eJiZeX4cAGsBSaGHG8gv0jMMkpM33HhmbD3jWILkxjNpxsyyDcmJM4F+kZYxSEb3iyEwxD4XVNjZzu9Pfni7oCaBX+744ceMb/7YJfZLpz/8+KbCDt1hhg2g6DBAD3vGNhgLQ4pBngGWTlDBHyxio2AUjIJRMFIBABXxcdV7hnRGAAAAAElFTkSuQmCC","orcid":"","institution":"Center for Research in Agricultural Genomics","correspondingAuthor":true,"prefix":"","firstName":"Josep","middleName":"","lastName":"Casacuberta","suffix":""}],"badges":[],"createdAt":"2024-10-29 16:53:11","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-5356060/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-5356060/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1186/s13100-025-00347-y","type":"published","date":"2025-03-12T15:58:51+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":70509355,"identity":"1f7f5189-a837-4603-95c6-0e9feab8c4ed","added_by":"auto","created_at":"2024-12-04 00:13:43","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":38511,"visible":true,"origin":"","legend":"\u003cp\u003eSchematic representation of the TLI1 \u003cem\u003elocus\u003c/em\u003e in the Nipponbare and NH218 genomes. The LTRs are flanked by 4bp target site duplication (TSDs). The 2Kbp sequences flanking the tandem LTR-RT region are shown in grey\u003cstrong\u003e.\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"floatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-5356060/v1/f69b557166cf3f52b733c257.png"},{"id":70509354,"identity":"b1d4a4f8-65ae-44e2-abd0-a3fe2047ab63","added_by":"auto","created_at":"2024-12-04 00:13:43","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":61991,"visible":true,"origin":"","legend":"\u003cp\u003eSchematic representation of the six different haplotypes observed for the TLI2 locus (left) and their relative presence in different domesticated and wild rice genome accessions (right). Black boxes indicate the presence of the haplotype in a particular species or population group, and the size of the box is proportional to the number of accessions presenting the haplotype. The grey box indicates that the haplotype Hap4 found in Indica group is the result of an introgression event from Japonica.\u003c/p\u003e","description":"","filename":"floatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-5356060/v1/dc4f21271c0b868be54cb9a5.png"},{"id":70509358,"identity":"af971e7a-e5d0-42c7-af05-086c81105250","added_by":"auto","created_at":"2024-12-04 00:13:43","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":178408,"visible":true,"origin":"","legend":"\u003cp\u003eIntact LTR-RT and LTR-RT Tandems in the rice Nipponbare genome. a) Frequency of copia and gypsy elements as LTR-RT Tandems filtered by structure in Nipponbare. b) Frequency of copia and gypsy transposons as intact elements in Nipponbare. c) Frequency of the different gypsy lineages detected as LTR-RT Tandems filtered by structure in Nipponbare. d) Frequency of the different gypsy lineages detected as intact elements in Nipponbare. The asterisk indicates statistical enrichment (p \u0026lt; 0.05) based on Fisher’s test.\u003c/p\u003e","description":"","filename":"floatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-5356060/v1/987998c9d022658e552faf8f.png"},{"id":70509360,"identity":"b5939818-6498-4da0-b47e-b064d4df3578","added_by":"auto","created_at":"2024-12-04 00:13:43","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":180098,"visible":true,"origin":"","legend":"\u003cp\u003eIntact LTR-RT and LTR-RT Tandems in the rice pangenome. a) Frequency of gypsy elements identified as LTR-RT Tandem insertions. b) Frequency of gypsy transposons identified as intact elements. c) Frequency of the different gypsy lineages detected as LTR-RT Tandem insertions. d) Frequency of the different gypsy lineages detected as intact elements. The asterisk indicates statistical enrichment (p \u0026lt; 0.05) based on Fisher’s test.\u003c/p\u003e","description":"","filename":"floatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-5356060/v1/30286edec1eeaaa4b6efc7ea.png"},{"id":70509357,"identity":"0c129582-dac7-4328-8a11-68e5d6506358","added_by":"auto","created_at":"2024-12-04 00:13:43","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":135659,"visible":true,"origin":"","legend":"\u003cp\u003eBandage visualization of the TLI2 locus in the cactus-minigraph pangenome graph. a) Scheme of the different haplotypes identified in the graph. b) Visual representation of the different haplotypes using Bandage. The accessions used for the creation of the graph are described in Additional Table 2.\u003c/p\u003e","description":"","filename":"floatimage5.png","url":"https://assets-eu.researchsquare.com/files/rs-5356060/v1/f7570ac517bb6b86e60df682.png"},{"id":70509361,"identity":"571626b2-2e33-4062-bf66-427db85f7b7f","added_by":"auto","created_at":"2024-12-04 00:13:44","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":504562,"visible":true,"origin":"","legend":"\u003cp\u003eIntact LTR-RT and LTR-RT Tandems in the \u003cem\u003eArabidopsis thaliana\u003c/em\u003e (a-d), \u003cem\u003ePrunus persica\u003c/em\u003e cv. Texas(e-h) and \u003cem\u003eGossypium hirsutum \u003c/em\u003eTM1 (i-l) genomes. a, e, i) Frequency of gypsy and copia elements identified as LTR-RT Tandem insertions in the genomes. b, f, j) Frequency of gypsy and copia elements identified as intact LTR-RTs elements in the genomes. c, g, k) Frequency of the different gypsy lineages detected as LTR-RT Tandem insertions in the genomes. d, h, l) Frequency of the different gypsy lineages detected as intact elements in the genomes. The asterisk indicates statistical enrichment (p \u0026lt; 0.05) based on Fisher’s test.\u003c/p\u003e","description":"","filename":"floatimage6.png","url":"https://assets-eu.researchsquare.com/files/rs-5356060/v1/3605f5ad264e2a8bc6275d62.png"},{"id":78689101,"identity":"115c6e54-ea51-48cf-aa09-967400cc87d2","added_by":"auto","created_at":"2025-03-17 16:11:18","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1606591,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-5356060/v1/2847f36a-b2af-4b8d-8d78-ace73e6a5212.pdf"},{"id":70509356,"identity":"d7bc3737-c217-4677-9030-abe7c82fcd94","added_by":"auto","created_at":"2024-12-04 00:13:43","extension":"docx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":826181,"visible":true,"origin":"","legend":"","description":"","filename":"AdditionalFiguresandTableslegends.docx","url":"https://assets-eu.researchsquare.com/files/rs-5356060/v1/b5139b0127ab02fd88ccd6a1.docx"},{"id":70509359,"identity":"9d2a80c6-aba3-42a4-8680-2e37c8ba6635","added_by":"auto","created_at":"2024-12-04 00:13:43","extension":"zip","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":19845,"visible":true,"origin":"","legend":"","description":"","filename":"AdditionalTable.zip","url":"https://assets-eu.researchsquare.com/files/rs-5356060/v1/de714be0a13935a9af5f4380.zip"}],"financialInterests":"No competing interests reported.","formattedTitle":"Tandem LTR-retrotransposon structures are common and highly polymorphic in plant genomes","fulltext":[{"header":"Background","content":"\u003cp\u003eTransposable Elements (TEs) are a major component of eukaryote genomes. In plants, TEs frequently account for most of the genome content, as for example in maize where TE-related sequences account for 85% of the genome content [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e]. TEs contribute to genome evolution in many ways, fulfilling structural roles and generating genome variability that can translate into phenotypic novelty [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e]. In plants, LTR retrotransposons (LTR-RTs), together with MITEs, are the most prevalent types of TEs [\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e]. As an example, in maize LTR-RTs account for as much as 90% of the total TE content [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e]. LTR-RT insertions can inactivate genes or result in changes of the expression of genes located nearby and be at the origin of new phenotypic variability [\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e, \u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e]. In fact, different LTR-RT insertions have been selected during the domestication, local adaptation and breeding of plant crops [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e]. LTR-RTs move through a replicative process, which leads to increasing their copy number while transposing. Their amplification can be at the origin of rapid increases in genome size, as it has been shown in \u003cem\u003eO. australiensis\u003c/em\u003e, where the genome size doubling in just three million years can be explained by the amplification of three families of LTR-RTs [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e]. However, LTR-RT sequences can also be eliminated from genomes, thus reverting the tendency to genome size expansion [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e, \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e]. The main mechanism for this is illegitimate recombination, either at the LTRs giving rise to the so called solo-LTRs or involving any other repeated sequence thus resulting in truncated LTR-copies [\u003cspan additionalcitationids=\"CR10 CR11\" citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e]. In fact, most LTR-RT-related sequences in genomes are deletion derivatives of LTR-RTs and are no longer able to autonomously transpose [\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e]. Moreover, LTR-RTs can also give rise to complex LTR-RT-related structures through their nested insertion in other LTR-RTs. For example, the array of nested LTR-RTs of different families is common in the genomes of maize and barley [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e, \u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e]. This can be explained by the lack of phenotypic consequences of the insertion in these gene-free regions, and therefore the lower selective pressure against these insertions, or by a targeted insertion of certain LTR-RT families. For example, the latter could be the case of the two main LTR-RT families of \u003cem\u003ePhyscomitrium patens\u003c/em\u003e, RLG1 and RLC5 that are generally found forming heterochromatic islands composed mainly of a single family LTR-RT elements in the chromosome arms and the centromere, respectively [\u003cspan additionalcitationids=\"CR16\" citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e]. In addition, a particular type of defective LTR-RT, called Terminal-repeat Retrotransposons in Miniature, TRIMs, has been shown to form tandem repeats of elements sharing an internal LTR [\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e, \u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e]. These structures can be the result of illegitimate recombination or could be generated during the retrotransposition process [\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e, \u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e]. Tandem repeats of LTR-RT sequences sharing internal LTRs have also been found in the centromeres of two species of kangaroos [\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e], and it has been proposed that they could be originated by illegitimate recombination when repairing a double-strand break (DSB) at an LTR with the other LTR of the element from the sister chromatid or the homologous chromosome [\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e]. Moreover, it has been reported in yeast that LTR-RTs could also generate these tandem structures through an integrase-independent mechanism of integration into preexisting elements [\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e, \u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e].\u003c/p\u003e \u003cp\u003ePrevious data on Drosophila suggests that tandem TE insertions may be relatively frequent in eukaryote genomes [\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e]. Although the available plant reference genomes probably contain TE tandem repeats, these have not been systematically analyzed or reported due to the difficulty to discard the artefactual nature of some of these structures when the reference genomes are based mainly on short read data. However, as the number of Telomere-to-Telomere and high-quality assemblies based on long-read data increases, it becomes feasible to analyze the structure and the prevalence of tandem-repeat LTR-RT insertions in plant genomes. Here we show that plant genomes frequently have tandem arrays of LTR-RTs sharing the internal LTRs and that these structures are highly variable, thus increasing the potential of LTR-RTs for generating phenotypic variability.\u003c/p\u003e"},{"header":"Methods","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003eDetecting LTR-RT Tandems and intact LTR-RTs\u003c/h2\u003e \u003cp\u003eTo detect LTR-RT tandems in different plant genomes, we developed a bioinformatics pipeline, IDENTAM (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/NMoralesD/IDENTAM\u003c/span\u003e\u003cspan address=\"https://github.com/NMoralesD/IDENTAM\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e), which requires a defined LTR-RT consensus library with internal and LTR regions provided as separate sequences and a reference genome. This pipeline runs RepeatMasker with the input library, retains the hits that cover more than 70% of the consensus length and employs two approaches to detect tandems: one based on identifying two nearby internal regions from LTR-RT, and another based on detecting three close LTRs. Multiple filters (flexible parameters set by the user) are applied to limit false positives, and. TEsorter [\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e] is then applied to classify the elements into LTR-RT_TR, which are potential LTR-RT tandems with recognized coding domains and associated to known LTR-RT lineages, or LTR-RT-related elements, which are tandemly arranged elements without recognized coding domains. An expanded description of this pipeline is shown in Additional Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e. Input LTR-RT libraries for IDENTAM were built with EDTA_raw.pl script [\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e], except for rice, in which a previously published TE library was used (ref). Default IDENTAM parameters (Additional Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e) were employed for detecting LTR-RT tandems in all species.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eRice pangenome construction\u003c/h3\u003e\n\u003cp\u003eWe followed two different strategies to build a pangenome of rice. We first created a pangenome using long-read-based genome assemblies of 76 rice varieties, representing the diversity of the species and including also 7 assemblies of the wild rice relatives \u003cem\u003eO. rufipogon\u003c/em\u003e (3), \u003cem\u003eO. barthii\u003c/em\u003e (3) and 1 \u003cem\u003eO. glaberrima\u003c/em\u003e (1) (AdditionalTable \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). We anchored the pangenome to the Nipponbare IRGSP-1.0 genome (Additional Table\u0026nbsp;1). Every assembly was aligned to I IRGSP-1.0 using minimap minimap2[\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e] and SVIM-asm[\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e] was used for structural variant detection. The vcf files generated by SVIM-asm (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/lanasushko/Rice_pangenome_TEs\u003c/span\u003e\u003cspan address=\"https://github.com/lanasushko/Rice_pangenome_TEs\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e) for the 75 genomes were merged with bcftools merge[\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e] and Truvari[\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e] The pangenome graph was built using vgtools [\u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e]. To identify the LTR-RT Tandems corresponding to transposon insertion polymorphisms (TIPs), we ran IDENTAM pipeline on the insertion and deletion sequences detected by SVIM-asm. [\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e]A second pangenome was obtained using Minigraph-Cactus[\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e] for variant detection in a reduced set of 20 accessions (Additional Table\u0026nbsp;2). The pipeline was run in every chromosome independently, and \u003cem\u003evg deconstruct\u003c/em\u003e[\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e] was used for variant calling using vg 1.58 version Cartari [\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e], with the -L parameter set to 0.9 to cluster nearly exact allele transversals -L 0.9. Then, large deletions (\u0026gt;\u0026thinsp;1Mb) were removed using vcfbub[\u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e] with the option -r 1000000. Bcftools norm with -m- option[\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e] allowed us to split multiallelic sites into biallelic records (-). Only SVs larger than 50bp were considered for further TE analyses. The output files were merged using \u003cem\u003ebcftools concat\u003c/em\u003e, as all the files had the same columns in the same order. The pangenome graph was built again using vgtools [\u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e].\u003c/p\u003e"},{"header":"Results","content":"\u003cdiv id=\"Sec6\" class=\"Section2\"\u003e \u003ch2\u003eAn LTR-RT insertion with a tandem structure in the rice genome\u003c/h2\u003e \u003cp\u003eAs a first step to characterize a rice non-reference LTR-RT insertion with potential phenotypic impact, as suggested by the result of a Transposon-Insertion Polymorphism GWAS (TIP-GWAS) previously performed [\u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e], we analyzed the available long-read-based genome assemblies of different rice accessions. This analysis confirmed the presence of the LTR-RT insertion in the assembly of NH218 rice accession [\u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e] but showed that this insertion is complex. Indeed, the insertion consists of a tandem array of two LTR-RT elements sharing an internal LTR (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). An analysis of the long-reads used to produce the assembly of NH218 showed that the LTR-RT tandem region was covered by 4 long reads that spanned the entire LTR-RT tandem and, at least, 1kb upstream and 5 kb downstream flanking regions. This confirmed that the tandem LTR-RT structure is not the result of an artifactual assembly and that this structure exists in the genome of the NH218 rice. We designated this insertion as Tandem LTR-RT Insertion 1 (TLI1). A comparison of the sequence with that of the Nipponbare rice reference genome (IRGSP-1.0) [\u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e37\u003c/span\u003e], that does not contain the insertion, shows that the insertion is accompanied by a duplication of 5 nt, which is the canonical length for the target site duplication (TSD) generated by LTR-RT upon insertion [\u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e]. The sequences flanking the tandem LTR-RT insertion show a high degree of sequence identity (99% over 2 Kb upstream and 92% over 2 Kb downstream), which discards the tandems as being the result of the recombination of two close by independent insertions, which would result in the elimination of the interleaving sequence. The high identity of the LTR-RT internal regions (94%) and of the LTRs (85\u0026ndash;89%), as well as the absence of additional TSDs, also discarded the possibility of nested insertions of different LTR-RTs. Therefore, all the data suggests that the tandem LTR-RT structure is linked to a single retrotransposition-mediated insertion.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003eLTR-RT tandems\u003c/b\u003e \u003cb\u003eloci\u003c/b\u003e \u003cb\u003ecan be highly polymorphic\u003c/b\u003e\u003c/p\u003e \u003cp\u003eThe identification of a tandem LTR-RT structure in the rice genome prompted us to look more closely at other loci that appeared as complex in previous analyses. In particular, we analyzed a complex structure present in chromosome 2 of Nipponbare rice. A detailed analysis of this locus showed that it contains a tandem LTR-RT structure, with two internal regions flanked by three LTRs, inserted within a MULE transposon (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e). The MULE element is flanked by a 10 nt repeat, which fits the canonical size for MULE TSDs generated upon transposition [\u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e] and the tandem LTR-RT is flanked by a direct repeat of 5 nt, typical for TSDs of LTR-RT insertions [\u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e]. This suggests that the insertions are the result of two independent transposition events. As for the previous tandem LTR-RT structure analyzed, the identity of the two internal regions (99%) and the three LTRs (99%) is very high. We designated this insertion as TLI2.\u003c/p\u003e \u003cp\u003eAn analysis of 27 additional long-read-based genome assemblies of cultivated and wild rice and related species [\u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e, \u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e40\u003c/span\u003e, \u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e] showed that this locus is present in at least 6 different haplotypes in these genomes (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e and Additional Table\u0026nbsp;3). The insertion of the Mu-related element seems relatively ancient as it is found in one of the two wild rice \u003cem\u003eO. rufipogon\u003c/em\u003e assemblies analyzed, although it is not present in the two assemblies of \u003cem\u003eO. barthii\u003c/em\u003e and in the \u003cem\u003eO. glaberrima\u003c/em\u003e assembly analyzed, which all consist of the empty site.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eInterestingly, five out of the nine rice accessions belonging to the indica subspecies have the empty site (Hap 1) and all the remaining indica accessions except one (4), as well as the two aromatic japonicas analyzed have a deletion compatible with the Mu-like excision (Hap 6). We have not found any cultivated rice accession with a simple Mu-like insertion at this location, which may suggest selection for the empty site or the excision of the element. On the contrary, all the japonica accessions (12), as well as one indica accession (LARHA MUGAD, LM) contain the Mu-like insertion with a nested insertion of an LTR-RT-related sequence. A phylogenetic analysis of the regions flanking the insertion site (20 Kb upstream and 35 Kb downstream; Additional Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e), shows that the sequences of this indica accession (LM) are more similar to those of the japonica accessions than to those of the other indica accessions, which suggest that this region may have been introgressed from japonica into the LM indica accession. Therefore, our results are compatible with the LTR-RT-related insertion happening after the split of indica and japonica and even after the split of the aromatic/circum-basmati group. We have not found any sign of excision of the Mu-like element in japonica accessions which could suggest that the nested insertion of the LTR-RT may have stabilized the Mu-like insertion. The insertion of the LTR-RT-related sequence consists of a tandem of two LTR-RTs sharing the internal LTR (Hap 3), a single LTR-RT insertion (Hap 4) or a solo-LTR (Hap 5).\u003c/p\u003e \u003cp\u003eThe existence of haplotypes with single or tandem LTR-RTs and solo-LTR insertions for the same locus suggests that these structures are highly dynamic. Tandem LTR-RT insertions could be inserted as such, and single insertions (as for solo-LTRs) could be the result of illegitimate recombination event at the LTRs using the sister chromatid or the homologous chromosome. Unfortunately, the phylogenetic analysis of the sequences flanking the insertions (Additional Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e) did not allow us to establish the sequence of events and discriminate between the two different mechanisms for the tandem LTR-RT formation.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eTandems of LTR-RTs from different families are widespread in rice\u003c/h3\u003e\n\u003cp\u003eTo analyze how common tandem LTR-RT structures are in the rice genome, we systematically searched for these structures in the Nipponbare rice genome. To this end, we build a pipeline, that we named IDENTAM (see Methods and Additional Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e) that searches for the presence of highly similar repeats of LTR-RT internal sequences interleaved with LTRs, or alternatively highly similar LTRs interleaved with LTR-RT internal sequences. We searched the Nipponbare rice reference genome [\u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e37\u003c/span\u003e] and identified 74 potential tandem LTR-RT structures from which 66 were clearly related to LTR-RT sequences. A manual inspection of these 66 LTR-RT related insertions showed that 28 had a clear LTR-RT tandem structure (i.e. alternating internal LTR-RT sequences and LTRs, starting and finishing with an LTR) whereas the rest were potentially degenerated LTR-RT tandems, with a more complex array of LTR-RT sequences, or potential nested elements. These more complex structures were not analyzed further and were filtered out from our selection. All the selected 28 tandem LTR-RT sequences contain regions encoding conserved retrotransposon protein domains, and 10 are flanked by perfect TSD sequences of 5 nts (AdditionalTable 4).\u003c/p\u003e \u003cp\u003eAn analysis of the 28 LTR-RTs (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e) shows that most of the tandem LTR-RT insertions (82%) are related to the gypsy LTR-RT superfamily. This percentage is slightly higher than the percentage of the intact gypsy LTR-RT elements in the Nipponbare genome (72%), which could indicate that there is a slight bias in the type of elements that generate tandem LTR-RT structures. However, no significant difference was observed between the two groups (p-value\u0026thinsp;=\u0026thinsp;0.2932, Fisher test). A more detailed analysis shows that 82% of the tandem LTR-RT structures related to the gypsy superfamily belong to the Tekay lineage (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e), whereas Tekay elements account only for the 28% of the gypsy elements annotated in the rice Nipponbare genome. Although the total number of the analyzed structures is low, a one-tail Fisher\u0026rsquo;s test revealed an enrichment in the Tekay linage in the LTR-RT Tandem group (p-value\u0026thinsp;=\u0026thinsp;9.81e-08) which means some LTR-RT lineages are more prone to form tandem LTR-RT structures. Alternatively, the bias found could be the consequence of the particular distribution of Tekay elements, which tend to concentrate in pericentromeric regions (Additional Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e). However, an analysis of the distribution of the relative distance to the centromere of the tandem LTR-RTs suggests that this may not be an important factor explaining the possible preference of Tekay elements to form tandem LTR-RT structures (Additional Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e), as no specific bias is observed towards a shorter distance to the centromere.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eThe analysis of the Nipponbare IRGSP-1.0 genome suggests that tandem LTR-RT structures are frequent in rice. To further analyze how frequent these structures are within rice and related species we constructed a pangenome using long-read-based genome assemblies of 75 \u003cem\u003eO. sativa\u003c/em\u003e varieties, representing the diversity of the species and including also 7 assemblies of the wild rice relatives \u003cem\u003eO. rufipogon\u003c/em\u003e (3), \u003cem\u003eO. barthii\u003c/em\u003e (3) and \u003cem\u003eO. glaberrima\u003c/em\u003e (1)[\u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e, \u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e] (see methods). We found 175,555 SVs in the pangenome, which were annotated for the presence of LTR-RTs and we searched for sequences potentially corresponding to tandem LTR-RT structures using IDENTAM. We identified 241 additional tandem LTR-RT structures that are not present in the assembled genome of Nipponbare rice. On the other hand, we found that 41 out of the 66 tandem LTR-RT structures found in Nipponbare are absent from at least one of the 75 assemblies included in the pangenome (data not shown). These results confirm that tandem LTR-RT structures are frequent and highly polymorphic in rice. A comparison of the types of LTR-RT elements forming the potential 307 tandem LTR-RT structures found in the pangenome (241 new non-overlapping insertions plus the 66 previously detected in Nipponbare) (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e) with the LTR-RT annotation of the pangenome (LTR-RTs annotated in the Nipponbare reference genome[\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e] plus the LTR-RT present in the SVs) shows that gypsy LTR-RTs are overrepresented according to a two-tail Fisher\u0026rsquo;s test (p-value\u0026thinsp;=\u0026thinsp;9.81e-08) in tandem LTR-RT structures (88% while these elements account for the 74% of the total LTR-RTs) and among gypsy elements the Tekay lineage seems also to be significantly enriched in the tandem group according to a one-tail Fisher's test (p-value\u0026thinsp;=\u0026thinsp;2.2e-16). These 66% LTR-RT tandem Tekay elements account for 35% of the total LTR-RTs), in line with what was found analyzing the genome of Nipponbare rice only (see Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003eUsing the cactus-minigraph pangenome for characterizing LTR-RT polymorphic structures\u003c/h2\u003e \u003cp\u003eThe pangenome approach described above allowed us to identify many tandem LTR-RT insertions present in rice and related species. However, this approach proved to be of limited use for the correct characterization of the different alleles these structures can produce. Indeed, the analysis of the TLI2 locus, which can be present in up to six different haplotypes, showed that this locus was not satisfactorily resolved in the pangenome. The different haplotypes were collapsed, as the different structural variants occur at the same position and have extensive sequence identity. Consequently, only two haplotypes were defined at this position, the LTR-RT tandem inserted within the MULE element present in the reference genome, and a deletion corresponding to the absence of insertion of both the MULE and the nested tandem LTR-RT structure (Additional Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e). The accessions presenting other haplotypes were resolved as having one of these two, with the single LTR-RT insertions nested in the MULE (Hap 4, Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e) being resolved as in the reference (which contains a tandem LTR-RT insertion, Hap 3), and the accessions presenting the insertion of the MULE alone (Hap 2), as deletions of the MULE and the nested LTR-RT structures (Hap 6).\u003c/p\u003e \u003cp\u003eThis prompted us to use minigraph-cactus, which does not collapse duplications during the pangenome construction [\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e]. This pipeline allowed us to further resolve multiallelic, complex SVs. Figure\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e shows the minigraph-cactus pangenome version graph showing the complex allelic variants defined in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e, which could not be defined with the previous approach. As the Bandage visualization shows, all haplotypes previously defined are easily characterized using this approach except for the 29 bp deletion, as in the pipeline regions smaller than 50 bp were not considered SV (see \u003cspan refid=\"Sec2\" class=\"InternalRef\"\u003eMethods\u003c/span\u003e section).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eAn analysis of the 28 loci characterized here as containing tandem LTR-RT insertions using the cactus-minigraph pangenome showed that 61% of the tandem LTR-RT loci are fixed, while the rest are polymorphic, often giving rise to multiple haplotypes (up to 7 different haplotypes in a single locus), which highlights the high genomic diversity LTR-RT tandems can generate.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eLTR-RT tandems are common in plant species\u003c/h3\u003e\n\u003cp\u003eTo evaluate how common the presence of tandem LTR-RT structures is in plant genomes we ran the IDENTAM pipeline on the assembled genomes of three other plant species including \u003cem\u003eArabidopsis thaliana\u003c/em\u003e (TAIR 10) [\u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e], \u003cem\u003ePrunus dulcis\u003c/em\u003e (almond)[\u003cspan citationid=\"CR43\" class=\"CitationRef\"\u003e43\u003c/span\u003e] and the upland cotton \u003cem\u003eGossypium hirsutum\u003c/em\u003e [\u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e44\u003c/span\u003e], which span a wide range of genome sizes, LTR-RT content and have different levels of ploidy. We found tandem LTR-RT structures in all of them, with a lower number in the genomes with a lower content of LTR-RTs (11 tandem LTR-RT structures in \u003cem\u003eA. thaliana\u003c/em\u003e and \u003cem\u003eP. dulcis\u003c/em\u003e) and higher in bigger genomes (e.g. 86 in cotton). With respect to the type of LTR-RT forming tandems, the analysis of these genomes shows that in in most of them gypsy LTR-RTs seem more prone to form tandem LTR-RT structures (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003e). Indeed, tandem LTR-RTs are significantly enriched in gypsy elements in Arabidopsis (Fisher\u0026rsquo;s test p-value\u0026thinsp;=\u0026thinsp;0.009041), as found in rice (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e), whereas in cotton and almond there is no significant enrichment for any of the two LTR-RT main superfamilies, gypsy and copia (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eOur analysis also shows that in most genomes there is a bias towards specific gypsy lineages to form LTR-RT Tandems (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003e), but the specific lineage enriched depends on the genome analyzed. Athila elements are highly enriched in the LTR-RT Tandem group in \u003cem\u003eA. thaliana\u003c/em\u003e (p-value\u0026thinsp;=\u0026thinsp;0.02573), \u003cem\u003eP. dulcis\u003c/em\u003e (p-value F0\u0026thinsp;=\u0026thinsp;0.001404) and cotton (p-value\u0026thinsp;=\u0026thinsp;5.205e-16), whereas in rice the tandem LTR-RT structures are enriched in Tekay elements (p-value\u0026thinsp;=\u0026thinsp;2.2e-16; Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eThe analysis of the rice pangenome suggested that these tandem LTR-RT structures are highly polymorphic within a species. Interestingly, the analysis of the phased genome of almond showed that among the 11 tandem LTR-RT structures identified in the F1 phase, one was not present in the F0 phase which, on the other hand, has one additional tandem LTR-RT structure, which stresses the high variability of these structures (data not shown).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e"},{"header":"Discussion","content":"\u003cp\u003eTEs are widespread in eukaryote genomes and their mobilization and amplification is thought to have an important impact on genome structure and gene regulation. In plants, TEs are known to be a major driving force of genome evolution [\u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e45\u003c/span\u003e] and they can account for most of the genome space. In addition to autonomous elements, genomes contain defective elements, which become increasingly difficult to identify as they accumulate mutations [\u003cspan citationid=\"CR46\" class=\"CitationRef\"\u003e46\u003c/span\u003e], and are overrepresented as compared with active TEs. The repetitive nature of TEs has made their identification and study challenging, in particular on genome assemblies based mainly on short-read sequences. With long-read based reference genomes and pangenomes of different plant species becoming available, it is now possible to annotate and study TEs with much more detail. Here we show that LTR-RTs can form tandem arrays of alternating LTRs and LTR-TR internal regions, which are flanked by TSDs. Our results show that these structures are relatively abundant in rice and are also present in other genomes, of both monocot and dicot plants, with different genome size and ploidy levels. This suggests that tandem LTR-RT insertions are widespread in plant genomes, as they also seem to be in other higher eukaryotes such as \u003cem\u003eDrosophila\u003c/em\u003e[\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e]. Our results suggest that gypsy elements tend to form more LTR-RT tandem structures than copia LTR-RTs, and some biases towards certain gypsy lineages seem also to exist, although different lineages seem prone to form these structures in different genomes. This general trend of gypsy elements could be the result of their average longer LTR size, that may more easily promote illegitimate recombination, or the frequent association of gypsy elements with heterochromatin and pericentromeric regions. However, we have not been able to detect any significant correlation of tandem LTR-RT formation and any of these features.\u003c/p\u003e \u003cp\u003eThe pangenome-based analysis of the variability linked to these structures showed that they are highly dynamic, with more than 66% of the LTR-TR tandems found in Nipponbare being absent in at least one of the other Oryza genomes analyzed. Moreover, when present, LTR-RT tandems can generate many different haplotypes with a variable number of the tandemly repeated unit. This significantly expands the potential of LTR-RTs to generate genome variability within a species, which can translate into phenotypic diversity. However, analyzing LTR-RT tandems at a population scale is complex and requires the use of completely assembled genomes and novel pangenome graph pipelines to properly their genetic variability.\u003c/p\u003e \u003cp\u003eTandem LTR-RT structures like the ones described here have been found in the centromeres of two different species of Kangaroos, and it has been proposed that they could arise by illegitimate recombination between the two different LTRs of the LTR of the element sitting in sister chromatids or homologous chromosomes, which could also give rise to solo LTRs [\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e]. Indeed, the same mechanism was proposed to explain tandem arrays of TRIMs in different species, although it was also proposed that these structures could also result from the insertion of tandem structures produced during retrotransposition [\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e, \u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e]. Interestingly, it has recently been shown that the retrotransposition process involves the formation of circular LTR-RT DNA containing a single LTR that can be used for transcribing LTR-RT mRNA to initiate a new round of replication [\u003cspan citationid=\"CR47\" class=\"CitationRef\"\u003e47\u003c/span\u003e]. Under this scenario, the presence of a weak transcriptional terminator, as the one described for the tobacco Tnt1 LTR-RT [\u003cspan citationid=\"CR48\" class=\"CitationRef\"\u003e48\u003c/span\u003e] could allow the production of tandem LTR-RT transcripts, leading to the transposition of tandem LTR-RT structures. The analysis of the SNPs surrounding the insertion site in the 6 different haplotypes of the chromosome 2 locus, did not allow us to establish the complete sequence of events leading to diversity of structures present, and determine whether the tandem LTR-RT structure is the result of a complex insertion or of an illegitimate recombination event. At this point, both mechanisms seem possible and not necessarily mutually exclusive. More research will be needed to clarify this point.\u003c/p\u003e"},{"header":"Conclusions","content":"\u003cp\u003eTandem LTR-RT structures are widespread in plant genomes and can give rise to multiple haplotypes. The frequent and highly polymorphic nature of tandem LTR-RTs expands the potential of LTR-RTs to generate genome variability with potential phenotypic consequences.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eEthics approval and consent to participate\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNot applicable\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eConsent for publication\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNot applicable\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eAvailability of data and materials\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eAll data generated or analysed during this study are included in this published article [and its supplementary information files].\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCompeting interests\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors declare that they have no competing interests.\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eFunding\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe work done at CRAG was funded by grant PID2022-143167NB-I00 funded by MICIU/AEI/ 10.13039/501100011033 and by \u0026ldquo;ERDF/EU\u0026rdquo; and grant CEX2019-000902-S funded by MICIU/AEI /10.13039/501100011033 to JMC. NMD is funded by Grant PRE2020-095111 funded by MCIU/AEI /10.13039/501100011033 and by \u0026ldquo;ESF Investing in your future\u0026rdquo;, LCD is funded by rom the European Union\u0026rsquo;s Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement No 945043, and RC was partially funded by a Juan de la Cierva contract, grant IJC2020-045949-I funded by MICIU/AEI /10.13039/501100011033 and by European Union NextGenerationEU/PRTR\u0026rdquo;, and is now a Ram\u0026oacute;n y Cajal contract holder, RYC2022-037459-I funded by MICIU/AEI/ https://doi.org/10.13039/501100011033 and by FSE+. AAG was supported by the LOEWE Start Professorship from the Hessian Ministry for Science and the Arts. VK was supported by GRK 2843 from the German Research Foundation (DFG).\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eAuthors\u0026apos; contributions\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNMD developed IDENTAM and performed most of the experiments. LCD performed the analysis of tandem LTR-RTs in cotton; SS obtained the rice pangenome based on SVIM-asm, whereas NMD obtained the cactus-minigraph pangenome in the laboratory of AAG with the help of VK. RC and JC conceived and directed the project and wrote the manuscript with the help of all other authors.\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eAcknowledgements\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe are grateful to all the members of CRAG\u0026rsquo;s lab for useful discussions.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eStitzer MC, Anderson SN, Springer NM V, Ross-Ibarra J. The Genomic Ecosystem of Transposable Elements in Maize. PLoS Genet. 2021;17:e1009768. \u003c/li\u003e\n\u003cli\u003eBennetzen JL, Wang H. The contributions of transposable elements to the structure, function, and evolution of plant genomes. Annu Rev Plant Biol [Internet]. 2014;65:505\u0026ndash;30. Available from: http://www.ncbi.nlm.nih.gov/pubmed/24579996\u003c/li\u003e\n\u003cli\u003ePulido M, Casacuberta JM. Transposable element evolution in plant genome ecosystems. Curr Opin Plant Biol. 2023;75:102418. \u003c/li\u003e\n\u003cli\u003eLisch D. How important are transposons for plant evolution? Nat Rev Genet [Internet]. 2013;14:49\u0026ndash;61. Available from: http://www.ncbi.nlm.nih.gov/pubmed/23247435\u003c/li\u003e\n\u003cli\u003eAndersson L, Purugganan M. Molecular genetic variation of animals and plants underdomestication. Proc Natl Acad Sci U S A. 2022;119:e2122150119. \u003c/li\u003e\n\u003cli\u003ePiegu B, Guyot R, Picault N, Roulin A, Saniyal A, Kim H, et al. Doubling genome size without polyploidization: Dynamics of retrotransposition-driven genomic expansions in Oryza australiensis, a wild relative of rice. Genome Res. 2006;16:1262\u0026ndash;9. \u003c/li\u003e\n\u003cli\u003eBennetzen JL, Kellogg EA. Do Plants Have a One-Way Ticket to Genomic Obesity? Plant Cell [Internet]. 1997;9:1509\u0026ndash;14. Available from: http://www.plantcell.org/content/9/9/1509.short\u003c/li\u003e\n\u003cli\u003eMunasinghe M, Read A, Stitzer M, Song B, Menard C, Ma K, et al. Combined analysis of transposable elements and structural variation in maize genomes reveals genome contraction outpaces expansion. PLoS Genet. 2023;19:e1011086. \u003c/li\u003e\n\u003cli\u003eDevos KM, Brown JKM, Bennetzen JL. Genome size reduction through illegitimate recombination counteracts genome expansion in Arabidopsis. Genome Res. 2002;12:1075\u0026ndash;9. \u003c/li\u003e\n\u003cli\u003eTian Z, Rizzon C, Du J, Zhu L, Bennetzen JL, Jackson SA, et al. Do genetic recombination and gene density shape the pattern of DNA elimination in rice long terminal repeat retrotransposons? Genome Res. 2009;19:2221\u0026ndash;30. \u003c/li\u003e\n\u003cli\u003eShirasu K, Schulman A, Lahaye T, Schulze-Lefert P. A contiguous 66-kb barley DNA sequence provides evidence for reversible genome expansion. Genome Res. 2000;10:908\u0026ndash;15. \u003c/li\u003e\n\u003cli\u003eVitte C, Panaud O. Formation of solo-LTRs through unequal homologous recombination counterbalances amplifications of LTR retrotransposons in rice Oryza sativa L. Mol Biol Evol. 2003 Apr;20(4):528-40. Mol Biol Evol. 2003;20:528\u0026ndash;40. \u003c/li\u003e\n\u003cli\u003eMa J, Devos KM, Bennetzen JL. Analyses of LTR-retrotransposon structures reveal recent and rapid genomic DNA loss in rice. Genome Res. 2004;14:860\u0026ndash;9. \u003c/li\u003e\n\u003cli\u003eSanMiguel P, Gaut BS, Tikhonov A, Nakajima Y, Bennetzen JL. The paleontology of intergene retrotransposons of maize. Nat Genet. 1998;20:43\u0026ndash;5. \u003c/li\u003e\n\u003cli\u003eLang D, Ullrich KK, Murat F, Fuchs J, Jenkins J, Haas FB, et al. The Physcomitrella patens chromosome-scale assembly reveals moss genome structure and evolution. Plant Journal. 2018;93:515\u0026ndash;33. \u003c/li\u003e\n\u003cli\u003eVendrell-Mir P, L\u0026oacute;pez-Obando M, Nogu\u0026eacute; F, Casacuberta JM. Different Families of Retrotransposons and DNA Transposons Are Actively Transcribed and May Have Transposed Recently in Physcomitrium (Physcomitrella) patens. Front Plant Sci. 2020;11:1274. \u003c/li\u003e\n\u003cli\u003eBi G, Zhao S, Yao J, Wang H, Zhao M, Sun Y, et al. Near telomere-to-telomere genome of the model plant Physcomitrium patens. Nat Plants. 2024;10:327\u0026ndash;343. \u003c/li\u003e\n\u003cli\u003eKalendar R, Raskina O, Belyayev A, Schulman AH. Long Tandem Arrays of Cassandra Retroelements and Their Role in Genome Dynamics in Plants. Int J Mol Sci. 2020;21:2931. \u003c/li\u003e\n\u003cli\u003eWang Q, Huang J, Li Y, Dooner H. The unusual dRemp retrotransposon is abundant, highly mutagenic, and mobilized only in the second pollen mitosis of some maize lines. Proc Natl Acad Sci U S A. 2020;117:18091\u0026ndash;8. \u003c/li\u003e\n\u003cli\u003eKoga A, Nishihara H, Tanabe H, Tanaka R, Kayano R, Matsumoto S, et al. Kangaroo endogenous retrovirus (KERV) forms megasatellite DNA with a simple repetition pattern in which the provirus structure is retained. Virology. 2023 Sep;586:56-66. Virology. 2023;586:56\u0026ndash;66. \u003c/li\u003e\n\u003cli\u003eHayashi S, Honda Y, Kanesaki E, Koga A. Marsupial satellite DNA as faithful reflections of long-terminal repeat retroelement structure. Genome. 2022;65:469\u0026ndash;78. \u003c/li\u003e\n\u003cli\u003eKe N, Voytas D. High frequency cDNA recombination of the saccharomyces retrotransposon Ty5: The LTR mediates formation of tandem elements. Genetics. 1997;147:545\u0026ndash;56. \u003c/li\u003e\n\u003cli\u003eLi F, Lee M, Esnault C, Wendover K, Guo Y, Atkins P, et al. Identification of an integrase-independent pathway of retrotransposition. Sci Adv. 2022;8:eabm9390. \u003c/li\u003e\n\u003cli\u003eMcGurk M, Barbash D. Double insertion of transposable elements provides a substrate for the evolution of satellite DNA. Genome Res. 2018;28:714\u0026ndash;25. \u003c/li\u003e\n\u003cli\u003eZhang R-G, Li G-Y, Wang X-L, Dainat J, Wang Z-X, Ou S, et al. TEsorter: An accurate and fast method to classify LTR-retrotransposons in plant genomes. Hortic Res [Internet]. 2022;9:uhac017. Available from: https://doi.org/10.1093/hr/uhac017\u003c/li\u003e\n\u003cli\u003eOu S, Su W, Liao Y, Chougule K, Agda JRA, Hellinga AJ, et al. Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline. Genome Biol. 2019;20:275. \u003c/li\u003e\n\u003cli\u003eLi H. Minimap2: Pairwise alignment for nucleotide sequences. Bioinformatics. 2018;34:3094\u0026ndash;100. \u003c/li\u003e\n\u003cli\u003eHeller D, Vingron M. SVIM-asm: structural variant detection from haploid and diploid genome assemblies. Bioinformatics [Internet]. 2021;36:5519\u0026ndash;21. Available from: https://doi.org/10.1093/bioinformatics/btaa1034\u003c/li\u003e\n\u003cli\u003eDanecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, et al. Twelve years of SAMtools and BCFtools. Gigascience [Internet]. 2021;10:giab008. Available from: https://doi.org/10.1093/gigascience/giab008\u003c/li\u003e\n\u003cli\u003eEnglish AC, Menon VK, Gibbs RA, Metcalf GA, Sedlazeck FJ. Truvari: refined structural variant comparison preserves allelic diversity. Genome Biol [Internet]. 2022;23:271. Available from: https://doi.org/10.1186/s13059-022-02840-6\u003c/li\u003e\n\u003cli\u003eGarrison E, Sir\u0026eacute;n J, Novak AM, Hickey G, Eizenga JM, Dawson ET, et al. Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nat Biotechnol [Internet]. 2018;36:875\u0026ndash;9. Available from: https://doi.org/10.1038/nbt.4227\u003c/li\u003e\n\u003cli\u003eHickey G, Monlong J, Ebler J, Novak A, Eizenga J, Gao Y, et al. Pangenome graph construction from genome alignments with Minigraph-Cactus. Nat Biotechnol. 2024;42:663\u0026ndash;73. \u003c/li\u003e\n\u003cli\u003eLiao W-W, Asri M, Ebler J, Doerr D, Haukness M, Hickey G, et al. A draft human pangenome reference. Nature [Internet]. 2023;617:312\u0026ndash;24. Available from: https://doi.org/10.1038/s41586-023-05896-x\u003c/li\u003e\n\u003cli\u003eGarrison E. vcfbub: popping bubbles in vg deconstruct VCFs. Zenodo. 2022. \u003c/li\u003e\n\u003cli\u003eCastanera R, Vendrell-Mir P, Bardil A, Carpentier MC, Panaud O, Casacuberta JM. The amplification dynamics of MITEs and their impact on rice trait variability. Plant J. 2021;107:118\u0026ndash;35. \u003c/li\u003e\n\u003cli\u003eShang L, Li X, He H, Yuan Q, Song Y, Wei Z, et al. A super pan-genomic landscape of rice. Cell Res. 2022;32:878\u0026ndash;896. \u003c/li\u003e\n\u003cli\u003eKawahara Y, de la Bastide M, Hamilton JP, Kanamori H, Mccombie WR, Ouyang S, et al. Improvement of the oryza sativa nipponbare reference genome using next generation sequence and optical map data. Rice. 2013;6:3\u0026ndash;10. \u003c/li\u003e\n\u003cli\u003eNeumann P, Nov\u0026aacute;k P, Ho\u0026scaron;t\u0026aacute;kov\u0026aacute; N, Macas J. Systematic survey of plant LTR-retrotransposons elucidates phylogenetic relationships of their polyprotein domains and provides a reference for element classification. Mob DNA. 2019;10:1. \u003c/li\u003e\n\u003cli\u003eFerguson A, Jiang N. Mutator-like elements with multiple long terminal inverted repeats in plants. Comp Funct Genomics. 2012;2012:695827. \u003c/li\u003e\n\u003cli\u003eZhang F, Xue H, Dong X, Li M, Zheng X, Li Z, et al. Long-read sequencing of 111 rice genomes reveals significantly larger pan-genomes. Genome Res. 2022;32:853\u0026ndash;63. \u003c/li\u003e\n\u003cli\u003eQin P, Lu H, Du H, Wang H, Chen W, Chen Z, et al. Pan-genome analysis of 33 genetically diverse rice accessions reveals hidden genomic variations. Cell. 2021;184:3542\u0026ndash;58. \u003c/li\u003e\n\u003cli\u003eLamesch P, Berardini TZ, Li D, Swarbreck D, Wilks C, Sasidharan R, et al. The Arabidopsis Information Resource (TAIR): improved gene annotation and new tools. Nucleic Acids Res [Internet]. 2012;40:D1202\u0026ndash;10. Available from: https://doi.org/10.1093/nar/gkr1090\u003c/li\u003e\n\u003cli\u003eCastanera R, de Tom\u0026aacute;s C, Ruggieri V, Vicient C, Eduardo I, Aranzana M, et al. A phased genome of the highly heterozygous \u0026lsquo;Texas\u0026rsquo; almond uncovers patterns of allele-specific expression linked to heterozygous structural variants. Hortic Res. 2024;11:uhae106. \u003c/li\u003e\n\u003cli\u003eYang Z, Ge X, Yang Z, Qin W, Sun G, Wang Z, et al. Extensive intraspecific gene order and gene structural variations in upland cotton cultivars. Nat Commun [Internet]. 2019;10:2989. Available from: https://doi.org/10.1038/s41467-019-10820-x\u003c/li\u003e\n\u003cli\u003eWendel JF, Jackson SA, Meyers BC, Wing RA. Evolution of plant genome architecture. Genome Biol [Internet]. 2016;17:37. Available from: http://dx.doi.org/10.1186/s13059-016-0908-1\u003c/li\u003e\n\u003cli\u003eMaumus F, Quesneville H. Deep investigation of Arabidopsis thaliana junk DNA reveals a continuum between repetitive elements and genomic dark matter. PLoS One [Internet]. 2014;9:e94101. Available from: http://dx.doi.org/10.1371%2Fjournal.pone.0094101\u003c/li\u003e\n\u003cli\u003eYang F, Su W, Chung OW, Tracy L, Wang L, Ramsden DA, et al. Retrotransposons hijack alt-EJ for DNA replication and eccDNA biogenesis. Nature. 2023;620:218\u0026ndash;225. \u003c/li\u003e\n\u003cli\u003eHern\u0026aacute;ndez-Pinz\u0026oacute;n I, De Jes\u0026uacute;s E, Santiago N, Casacuberta JM. The frequent transcriptional readthrough of the tobacco tnt1 retrotransposon and its possible implications for the control of resistance genes. J Mol Evol. 2009;68:269\u0026ndash;78. \u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"mobile-dna","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"mdna","sideBox":"Learn more about [Mobile DNA](http://mobilednajournal.biomedcentral.com/)","snPcode":"13100","submissionUrl":"https://submission.nature.com/new-submission/13100/3","title":"Mobile DNA","twitterHandle":"@MobDNAjournal","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"em","reportingPortfolio":"BMC/SO AJ","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"","lastPublishedDoi":"10.21203/rs.3.rs-5356060/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-5356060/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003ch2\u003eBackground\u003c/h2\u003e \u003cp\u003eLTR-retrotransposons (LTR-RT) are a major component of plant genomes and are a major driver of genome evolution. Most LTR-RT copies in plant genomes are defective elements, found as truncated copies, nested insertions or being part of more complex structures. With the availability of highly contiguous plant genome assemblies based on long-read sequences it has become feasible the detailed characterization of these complex structures and the evaluation of their importance for plant genome evolution.\u003c/p\u003e\u003ch2\u003eResults\u003c/h2\u003e \u003cp\u003eThe detailed analysis of two rice loci containing complex LTR-RT structures showed that they consist of tandem arrays of LTR copies sharing internal LTRs. Our analysis show that the tandems are not the result of a single insertion and not of the recombination of two independent LTR-RT elements. Our results suggest that gypsy elements may be more prone to form these structures. We show that these structures are highly polymorphic in rice and have therefore the potential to generate genetic and phenotypic variability. We developed a computational pipeline, IDENTAM, that scans genome sequences and identifies tandem LTR-RT candidates and detected 307 tandems in a pangenome built from the genomes of 75 accessions of cultivated and wild rice, showing that tandem LTR-RT structures are frequent in the rice genome and are highly polymorphic in the species. Running IDENTAM in the Arabidopsis, almond and cotton genomes showed that LTR-RT tandems are frequent in plant genomes of different size, complexity and ploidy levels. The complexity of differentiating intra-element variations at the nucleotide level among haplotypes is very high, and we found that graph-based pangenomic methodologies are appropriate to resolve these structures.\u003c/p\u003e\u003ch2\u003eConclusions\u003c/h2\u003e \u003cp\u003eOur results show that LTR-RTs can form tandem arrays of elements. These structures are relatively abundant and highly polymorphic in rice and are widespread in the plant kingdom. Future studies will contribute to understand how these structures originate and if the variability that they generate has a functional impact.\u003c/p\u003e","manuscriptTitle":"Tandem LTR-retrotransposon structures are common and highly polymorphic in plant genomes","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-12-04 00:13:38","doi":"10.21203/rs.3.rs-5356060/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2024-12-17T15:13:48+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2024-12-16T22:21:50+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2024-11-21T14:35:11+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"277347615912968736012359367020099255634","date":"2024-11-18T14:20:25+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"23847813522961526252452869253958417990","date":"2024-11-15T09:32:28+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2024-11-07T16:18:32+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2024-11-06T08:20:35+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2024-11-05T04:40:47+00:00","index":"","fulltext":""},{"type":"submitted","content":"Mobile DNA","date":"2024-10-29T16:40:09+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"mobile-dna","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"mdna","sideBox":"Learn more about [Mobile DNA](http://mobilednajournal.biomedcentral.com/)","snPcode":"13100","submissionUrl":"https://submission.nature.com/new-submission/13100/3","title":"Mobile DNA","twitterHandle":"@MobDNAjournal","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"em","reportingPortfolio":"BMC/SO AJ","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"a982337e-2302-44a0-a98b-0e9f17ee2a52","owner":[],"postedDate":"December 4th, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[],"tags":[],"updatedAt":"2025-03-17T16:05:40+00:00","versionOfRecord":{"articleIdentity":"rs-5356060","link":"https://doi.org/10.1186/s13100-025-00347-y","journal":{"identity":"mobile-dna","isVorOnly":false,"title":"Mobile DNA"},"publishedOn":"2025-03-12 15:58:51","publishedOnDateReadable":"March 12th, 2025"},"versionCreatedAt":"2024-12-04 00:13:38","video":"","vorDoi":"10.1186/s13100-025-00347-y","vorDoiUrl":"https://doi.org/10.1186/s13100-025-00347-y","workflowStages":[]},"version":"v1","identity":"rs-5356060","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-5356060","identity":"rs-5356060","version":["v1"]},"buildId":"qtupq5eGEP_6zYnWcrvyt","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.