The chromosome-level genome assembly of Dryopteris fragrans reveals transposon-mediated genome evolution and adaptation | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article The chromosome-level genome assembly of Dryopteris fragrans reveals transposon-mediated genome evolution and adaptation ying chang, weicong DAI, Qia Wang, Yuhan Fang, xiaojie qiu, chunhua song, and 2 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-7268223/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted You are reading this latest preprint version Abstract Ferns are ancient vascular plants pivotal to plant evolution research. Although sequencing technologies have advanced fern genomic studies, scarce genomic resources for specialist-habitat ferns limit insights into their genome evolution. Dryopteris fragrans (L.) Schott is a fern endemic to sun-exposed volcanic-lava habitats; here, we generated its high-quality chromosome-level genome assembly, and explored the drivers of its genomic evolution and habitat adaptation via whole-genome duplication (WGD) detection, gene family evolution analysis and other approaches. No recent WGD event was detected in D. fragrans , while transposable elements (TEs)—the major genomic component, associated with the expansion of environment-adaptive gene families—were identified as the primary evolutionary driver. Specifically, TEs shape gene structure by forming clade-specific long-intron genes, regulate gene expression through promoter insertion, and increase alternative splicing events in host genes. This study reports the first high-quality genome of a volcanic-lava-adapted fern, revealing TEs as potential key drivers of D. fragrans ’ genomic evolution and habitat adaptation. Our findings advance understanding of TE functions in non-seed plant evolution, and provide valuable genomic resources for researching early land plant adaptation and regulatory innovation. Biological sciences/Plant sciences/Plant molecular biology Biological sciences/Plant sciences/Plant evolution Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Introduction Ferns represent one of the oldest surviving lineages of vascular plants and are valuable model groups for investigating plant evolution 1 – 3 . With approximately 10,600 extant species, ferns constitute the second largest lineage (after angiosperms) among vascular plants 4 . Ferns have notoriously immense genomes (average 1C = 12.3 billion bases (Gb); maximum 1C = 160.45 Gb) and very high chromosome numbers (average, 40.5; maximum, 720) 5 , 6 . For a long time, this feature of ferns was thought to be associated with whole-genome duplication (WGD) events 7 , 8 . However, in published fern genomes, WGD events are less common in ferns than in seed plants 9 – 13 . On the other hand, in all published fern genomes, repetitive sequences were found to be the predominant component of the genome, with transposable element (TE) being the major contributors 9 – 13 . Consistent with recent comprehensive analyses of WGD in leptosporangiate ferns, these observations suggest that in the absence of frequent WGD events, fern genome evolution may have relied on small-scale duplications 14 , wherein TE—the dominant genomic component—likely play a pivotal role. TE act as “jumping genes” in the genome, shaping plant genomic diversity and playing dual roles in genomic structural variation and functional innovation 15 – 17 . TEs are divided into two major classes: Class I retrotransposons that mobilize via an RNA intermediate and Class II DNA transposons that move through a cut-and-paste mechanism 18 . In seed plants, the insertion of TEs has been shown to be closely associated with genome expansion 19 , 20 , gene family evolution 21 , 22 and environmental adaptation 23 . Moreover, environmental stresses such as ultraviolet stress, drought, or pathogen attack can trigger bursts of TE activity, leading to adaptive genetic variation 24 . TE insertions can regulate gene expression by altering gene structure or via insertional mutagenesis, supplying raw material for evolutionary innovation 25 . While in seed plants, the ability of TE to promote genome evolution and improve plant environmental adaptability has been supported by numerous studies, research on TEs in ferns remains relatively scarce. The current availability of high-quality fern genomes remains limited to only a few lineages: three species from Pteridaceae 9 , 10 , 13 , two from Salviniaceae 11 , one from Cyatheaceae 12 , four from Cibotiaceae 26 , 27 , and one from Lygodiaceae 28 . Although these fern genomic resources have enhanced our understanding of this group to a certain extent, existing studies have mainly focused on ferns inhabiting humid and/or shaded habitats. Genomic research on ferns from specific habitats is relatively scarce, which may contain new perspectives for understanding the genomic evolution of this group. Take D. fragrans as an example, it is a fern species growing in direct sunlight on volcanic lava habitats in the northern temperate zone, and belongs to the basal lineage of Dryopteridaceae, the most species-rich family of ferns 29 , 30 . Currently, research on D. fragrans has mainly focused on terpenoid metabolism and the medicinal value of its active components, with few studies on its genomic evolution and environmental adaptability 31 . In this study, we present a chromosome-level genome assembly of D. fragrans . Annotation analyses revealed that its genome harbors significantly higher TE content than other reported fern genomes, but with lower TE integrity. We explored the evolution of gene families potentially associated with adaptation to specialized habitats and the potential relationship between gene expansion and TEs. Additionally, we investigated the potential impacts of TEs on gene structure and expression regulation. Similar TE-mediated genome evolution has been reported in seed plants 15 , 32 – 34 ; our study extends this concept to ferns, laying a foundation for future comparative studies in non-seed plants. These findings suggest that TE-driven genomic dynamics may be a universal feature of land plant evolution, rather than being unique to seed plants. Results Chromosome-level genome assembly and annotation Ferns exhibit an alternation of generations between diploid sporophytes and haploid gametophytes. To obtain a high-quality haplotype-resolved genome of D. fragrans , spores were cultured into gametophytes, which were then subjected to propagation and culture. The resulting callus was used for DNA extraction, library construction, and sequencing. K-mer analysis estimated the genome size to be approximately 4.48 Gb (Fig. S1 a). The new genome was generated using a combination of Oxford Nanopore Technologies (ONT) long reads (average sequencing depth of 58×; Table S1 ), Illumina short reads (coverage depth of 52×; Table S2 ), and high-throughput chromosome conformation capture data (Hi-C; average coverage depth of 111×). The final assembly comprised 41 chromosomes with a total size of 4.45 Gb and a contig N50 of 5.70 Mb (Fig. 1 a; Table 1; Fig. S1 b; Table S3). Furthermore, we identified telomeric sequences at both ends of 16 chromosomes and at one end of 14 chromosomes (Fig. S1 c; Table S4). We annotated the D. fragrans genome using a combined strategy (ab initio, homology-based and transcriptome-based predictions; Tables S5–S8). In total, 33,489 high-confidence gene models were annotated, with a mean gene length of 36,768 bp (Table 1); 75.11% of genes were supported by transcript evidence (Table S8). Benchmarking Universal Single Copy Orthologs (BUSCO) analysis recovered 1,388 complete BUSCOs (86.0%), including 1,296 single-copy (80.3%) and 92 duplicated (5.7%) complete BUSCOs (Fig. 1 b). Additional Core Eukaryotic Genes Mapping Approach (CEGMA) assessment also supported the high reliability of our gene models (Fig. S2 ). A total of 22,879 protein-coding genes (68.32% of all genes) were assigned functional annotations in public databases (Fig. S3; Table. S9). Together, these results indicate that the gene-structure annotation of the D. fragrans genome is of sufficient quality to support downstream analyses. In the D. fragrans genome, we annotated 3.77 Gb of repetitive sequences, accounting for 85.39% of the assembly (Table S10). Long terminal repeat (LTR) retrotransposons are the dominant component: LTR/Gypsy and LTR/Copia elements constitute 54.35% and 17.75% of the genome, respectively. We also detected asymmetric repeat accumulation at chromosome ends for 20 chromosomes (Fig. 1 c; Fig. S4). Notably, 10 of these chromosomes contain telomeric sequences at both ends. This pattern suggests that the two chromosomal arms within the same chromosome may differ in terms of repeat content. The underlying mechanisms, including potential differences in insertion or removal dynamics, recombination or DNA repair activity, and selective constraints, require further investigation. a. Circos plot of the D. fragrans genome, divided into 10 Mb windows, showing (1) chromosome length and number; (2) content of repetitive sequences; (3) content of Solo-LTRs; (4) content of Gypsy; (5) content of Copia; (6) gene density; and (7) GC content, with the innermost circle representing synteny. b. BUSCO assessment of genome completeness across seven fern species. c. Distribution of repeat content along chromosomes visualized via bar plots and heatmaps. Each chromosome is divided into 50 windows (x-axis), and the repeat density within each window is shown on the y-axis and by color intensity. Recent WGD has not been detected in D. fragrans . WGD is a widespread and important process in plant genome evolution, providing raw gene material for functional diversification and thereby accelerating evolutionary innovation. To investigate the WGD history of D. fragrans , we first surveyed genome-wide synteny. We identified 75 syntenic blocks containing 399 paralogous gene pairs, representing only 2.38% of the gene set, a proportion too low to support a recent WGD (Fig. 2 a). The synonymous substitution rate (Ks) distributions of both syntenic pairs and all homologous gene pairs showed a single peak at Ks ≈ 1.2, which was consistent with one ancient duplication event rather than recent polyploidy (Fig. 2 b; Fig. S5). Using the multi-taxon polyploidy search (MAPS) framework with genome and transcriptome data from D. fragrans and other ferns, we detected only a single WGD shared broadly among leptosporangiate ferns, likely dating to the Permian–Triassic interval (Fig. 2 c). In summary, we found no evidence of recent WGD in D. fragrans , and its genomic evolution may be driven by other factors. a. Intragenomic synteny analysis of D. fragrans , illustrating the distribution and retention of homologous gene blocks. b. Ks distribution for non-tandem duplicated homolog pairs. c. MAPS‐based detection of WGD signals: the phylogenetic tree of selected fern species is shown below, with nodes N2–N6 indicated; above, the proportion of duplicated genes in each subtree is plotted, where the green, red, and blue curves represent the null simulation, positive simulation, and observed data, respectively. The shaded regions denote confidence intervals. TE-related dispersed duplicates (DSD) are the main drivers of gene family expansion in D. fragrans To explore whether the evolution of specific gene families contributes to the genomic evolution of D. fragrans , protein-coding genes from 12 species were clustered into 26,136 orthogroups, of which 31,115 D. fragrans genes (93.8% of the gene set) were assigned. A species tree inferred from 253 single-copy orthogroups and molecular dating data indicates that D. fragrans diverged from the lineage shared with water ferns and maidenhair ferns at ~ 168.6 Ma (Middle Jurassic; Fig. 2 a). To investigate gene-family dynamics potentially linked to adaptation, we conducted gene ontology (GO) enrichment on 761 D. fragrans -specific orthogroups (4,615 genes) and 2,262 significantly expanded orthogroups (8,774 genes). Both lineage-specific and expanded families were enriched for functions related to redox metabolism, terpene biosynthesis, stress perception/defense, and protein homeostasis (Fig. 3 b, c). Compared with the other genes, the genes within the D. fragrans -specific and expanded orthogroups presented stronger transcriptional responses to ultraviolet (UV) treatment (Fig. S6) 35 . Taken together, these results suggest that the evolution and expansion of these gene families may have contributed to the adaptation of D. fragrans to its specific habitat. To further explore the main modes of gene family expansion in D. fragrans , we identified 28,193 duplicated genes and classified them into five categories: 486 whole-genome duplicates (WGD; 1.7%), 2,456 tandem duplicates (TD; 8.7%), 2,165 proximal duplicates (PD; 7.7%), 980 transposed duplicates (TRD; 3.5%), and 21,747 dispersed duplicates (DSD; 77.1%). Compared with the other classes, the PD and TD pairs presented lower Ks values and relatively higher nonsynonymous/synonymous substitution rate (Ka/Ks) values (Fig. 3 d; Fig. S7), indicating a more recent origin and faster sequence divergence, which is consistent with ongoing tandem and proximal duplications. We then compared the overlap between expanded orthogroups and genes derived from the five duplication modes (WGD, TD, PD, TRD, DSD). The majority of genes within expanded orthogroups are derived from DSD (Fig. 3 e). In addition, by assessing genome-wide correlations between TE abundance and gene counts of different duplication classes, we observed that DSD gene counts were positively correlated with TEs (Pearson’s R = 0.79), a correlation stronger than that between TRD gene counts and TEs (Pearson’s R = 0.75; Fig. S8). Taken together, the expanded gene families potentially associated with the adaptation of D. fragrans to its specific habitat are mainly derived from DSD, and DSD is closely related to TEs, suggesting that TEs may be the main driver of genomic evolution in D. fragrans . a. Species tree with divergence time estimates and numbers of expanded/contracted gene families. The tree was inferred from 253 single-copy orthogroups; divergence times are indicated in Mya. b. GO enrichment of D. fragrans -specific orthogroups. c. GO enrichment of orthogroups significantly enriched in D. fragrans . d. Distributions of Ka/Ks ratios for genes originating from five duplication modes (WGD, TD, PD, TRD, DSD). e. Overlap analysis between genes in expanded orthogroups and genes derived from each duplication mode (WGD, TD, PD, TRD, DSD). Among them, WGDp, TDp, PDp, TRDp, and DSDp refer to positively selected genes. f. GO enrichment for genes originating from WGD pairs. Low Integrity and continuous insertion of TE in the D. fragrans genome. A comparison with published fern genomes revealed that both the relative proportion and the absolute base count of TEs are elevated in D. fragrans (Fig. 4 a). However, the TE complement in this genome is characterized not by the preservation of numerous intact elements but by pervasive fragmentation and rapid turnover. Several lines of evidence support this interpretation. First, the LTR assembly index (LAI) was low (LAI = 9.73), indicating reduced overall integrity of LTR sequences relative to those of other ferns (Fig. 4 b). Second, intact LTR retrotransposons represent only ~ 0.58% of the assembly, which is consistent with widespread truncation or degradation. Third, analyses of Kimura substitution spectra and pairwise sequence identity revealed a pattern of continuous, frequent LTR integration rather than a single, recent burst (Fig. 4 c-f); in contrast, several non-LTR TE classes presented signatures of episodic amplification. Notably, we observed peaks at a Kimura substitution level of ~ 35 and at a sequence identity of ~ 0.75, which may correspond to the ancient leptosporangiate WGD signal. Subfamily level classification of ~ 46,130 intact LTRs further reveals heterogeneous dynamics: Copia elements are dominated by the Ale, Tork and Ivana subfamilies, whereas Gypsy elements are primarily Athila, Reina and Tekay (Fig. 4 d; Fig. S9a). When a neutral substitution rate (µ = 1.3 × 10⁻⁸ substitution site⁻¹ year⁻¹) was used for insertion-time estimation, the Ivana subfamily presented evidence of relatively recent expansion, whereas most other LTR lineages accumulated more gradually (Fig. S9b,c). Taken together, these observations are most consistent with a high-turnover regime in which ongoing insertion of LTRs is rapidly followed by processes that fragment or remove copies. a . Comparison of LTR content and genome size between D. fragrans and several other plant species. b . LAI index of the D. fragrans genome, with regions > 10 shown in blue and regions < 10 shown in red. c . Kimura substitution levels of D. fragrans TEs . d . Phylogenetic trees of the six major LTR/Copia (left) and LTR/Gypsy (right) families. e . Identity distribution of TEs within the genome compared with a custom-built TE database. f . Identity distribution of LTRs within the genome compared with a custom-built TE database. TE insertion shapes gene structure in D. fragrans . TE act as a "double-edged sword" in genomes: on the one hand, their insertion into genic regions can disrupt gene structure and impair function; on the other hand, they provide genetic variation resources for evolution. Since the most direct effect of TE insertion into genic regions is altering gene length, we first investigated the associations between TEs and gene structure in D. fragrans through cross-species comparisons. The results revealed that among the fern lineages, with the exception of two aquatic ferns with small genomes, the other four ferns presented a broader range of gene lengths (Fig. 5 a). Further analysis of coding sequence (CDS) and intron regions revealed that CDS length was relatively conserved across all plant groups, whereas intron length varied significantly—with a subset of genes harboring exceptionally long introns (> 10 kb) in the four ferns. Notably, a strong positive correlation was observed between gene length and intron length in D. fragrans (R = 0.99; Fig. S10), indicating that gene length is determined primarily by intron length. Further investigations revealed that TE insertions are prevalent within the introns of D. fragrans genes. Moreover, compared with TE in other genomic regions, these TE insertions located in gene regions exhibit higher stability (Fig. S11). All long-intron genes contained TE insertions (Fig. 5 b), and a positive correlation was detected between the proportion of TE-derived sequences in introns and intron length (R = 0.71). To further investigate the impacts of varying proportions of TE-derived sequences relative to intron length on genes, we classified the genes into four categories according to TE insertion proportions relative to intron length and intron length itself: short-intron no TE-insertion genes (SNG), short-intron low TE-ratio genes (SLG, TE ratio < 0.5), short-intron high TE-ratio genes (SHG), and long-intron TE-insertion genes (LG) (Fig. 5 c; Fig. S12). In addition, genes with TE insertions possessed more introns than other genes, with LG genes harboring the largest number of introns (Fig. S13). Further analysis of TE types inserted into introns revealed that various TE families were present in the SLG, SHG, and LG genes. Among these, LTR retrotransposons dominated in both insertion number and total length (Fig. 5 d), indicating that LTRs are the primary TE type that are inserted into genic regions. A comparison of TE sequence identity distributions in introns of the three TE-containing gene categories revealed that SLG genes had significantly lower TE identity than did SHG genes and LG genes (Fig. 5 e; Fig. S14), suggesting that more ancient TE insertions occur in SLG genes. a. Comparison of the full gene length, CDS length, and intron length distributions between D. fragrans and other representative plant species. b. Density plot showing the relationship between intron length and the proportion of TE-derived sequences in introns. Red and blue represent long-intron genes and short-intron genes, respectively; solid lines indicate linear regression fits. c. Classification of genes into five categories on the basis of intron length and TE content proportion, with the number of genes in each category displayed. d. Insertion number and total length of different TE families in introns of TE-containing genes. e. Sequence identity distribution of TEs in introns of TE-containing genes. TE insertions in promoter regions may affect gene expression. To investigate the potential impacts of TE insertions on gene expression, we compared the expression levels of the four gene categories across different tissues and distinct developmental stages of sporangia in D. fragrans 36 . The results revealed that LG genes generally presented the highest expression levels, whereas SHG genes presented very low expression (Fig. 6 a). Since the promoter region plays a key role in gene expression regulation, we examined TE insertions in the promoters of these four gene categories. We found that, compared with the other categories, the SHG genes presented significantly more TE insertions in their promoters (Fig. 6 b), which may account for the relatively low expression levels of SHG genes. To further explore the relationship between the length of TE insertions in promoter regions and gene expression levels, we divided the TE insertion length in the promoter regions into four 500-bp intervals and compared their effects on gene expression. Across the four gene categories, gene expression levels were consistently the lowest when the TE insertion length ranged from 1500 to 2000 bp (Fig. 6 c). Notably, in SHG genes, TE insertions with lengths ranging from 0 to 1500 bp may promote gene expression. These findings suggest that TE can influence gene expression by being inserted into promoter regions. We used four tandemly duplicated UDP-glycosyltransferases (UDPGTs) as a representative example to illustrate the impacts of TE insertions in the promoter regions of SHG genes on gene expression levels and the underlying potential mechanisms. The promoter region of UDPGT1 contained a 1912-bp TE insertion, while UDPGT2 and UDPGT3 had 518 bp and 344 bp TE insertions, respectively, and no TE insertions were detected in the promoter of UDPGT4 (Fig. 6 d). The transcriptome data revealed no expression of UDPGT1 or UDPGT4 , whereas UDPGT2 and UDPGT3 were expressed in glandular trichomes and spore-bearing leaves. Further investigation revealed that the loss of the TATA-box core element in the promoter of UDPGT1 due to extensive TE insertions likely accounts for the lack of expression. In contrast, the promoter regions of UDPGT2 and UDPGT3 harbored TE insertions that contained additional functional cis-acting elements, such as CAAT-boxes and MYB-binding sites, suggesting that TEs may modulate gene expression by introducing regulatory elements. a. Expression levels of four gene categories (LG, SHG, SLG, SNG) across different tissues and stages of sporangium development (p < 0.001). b. Length distribution of TE insertions in the promoter regions of the four gene categories. c. Comparison of gene expression levels across different insertion size intervals of TEs in the promoter regions of the four gene categories. The x-axis represents the insertion length of TEs, with intervals of 500 bp. d. Expression statistics for four UDPGT genes. The structure of each gene is shown at the bottom, where wide blocks represent exons, narrow blocks represent introns, and white arrows indicate the promoter regions. The red bars above the gene structures represent TEs. The histograms above show the distribution of transcriptome reads, with the numbers on the left indicating the normalized read peak values. TE insertions can enhance alternative splicing (AS) of genes To investigate whether TE insertion into genic regions regulates AS, we performed a genome-wide systematic analysis of alternative splicing in D. fragrans and identified a total of 105,203 alternative splicing events. The results showed that approximately 66% of the genes exhibited alternative splicing characteristics, with an average of 6 alternative splicing isoforms per gene. Among all AS types, alternative first exon (AF) events accounted for the highest proportion, reaching 23.54%. Notably, as many as 80.42% of mutually exclusive exon (MX) events occurred in TE-inserted genes (Fig. S15a), suggesting that TE insertion may be one of the factors driving MX-type AS events. Further comparison of AS characteristics across different gene categories revealed that, compared with SNG genes without TE insertions, TE-inserted genes (LG, SHG and SLG) had significantly higher proportions of alternative splicing genes (ASGs) and greater numbers of AS events. Among these, LG genes not only had the highest proportion of AS occurrence but also the largest number of AS events (Fig. S15b, c). Discussion We generated a high-quality chromosome-level genome assembly of D. fragrans —a fern species inhabiting sun-exposed volcanic-lava habitats (Fig. 1 ; Fig. S1 ). As the first genome of a specialized-habitat fern within Dryopteridaceae, this assembly not only fills a critical gap in fern genome research but also provides a foundational resource for comparative genomics studies of early land plants. In this study, we found that the core factor driving genome evolution in D. fragrans is not frequent WGD (Fig. 2 ; Fig. S5), but rather TEs associated with the expansion of gene families related to adaptation to specialized environments (Fig. 3 ; Fig. S6-8). This TE-mediated gene expansion pattern dependent on DSDs is consistent with the mechanistic models in angiosperms, where mobile elements regulate gene transposition and the acquisition of new genes 37 , 38 . Our results extend this evolutionary paradigm to leptosporangiate ferns, suggesting that distantly diverged land plant lineages share convergent evolutionary paths in gene family expansion. This convergence implies that, in the absence of large-scale WGD, TE-mediated small-scale gene duplication may represent a universal strategy for adaptive evolution—a pattern particularly relevant for early land plant lineages with conserved diploidy 14 . Combined with analyses of TE insertion frequency, this study suggests that TEs in the D. fragrans genome exhibit a more distinct dynamic evolutionary pattern of continuous insertion and rapid removal (Fig. 4 ; Fig. S9). The habitat of D. fragrans is characterized by intense ultraviolet radiation and large temperature fluctuations, all of which have been confirmed to induce TE activation in plants 39 – 42 . Rapid removal of intact TEs can minimize genomic instability caused by TE hyperactivity, while continuous TE insertion events can reserve abundant genetic variation for the adaptive evolution of the species. This dynamic TE pattern may represent a previously unrecognized genomic adaptation strategy of ferns in specialized habitats, a conclusion that merits further validation in other fern species adapted to similar habitats. Further studies revealed that TEs can also insert into the genic regions of D. fragrans to affect gene structure, leading to the presence of a class of genes with longer introns in the D. fragrans genome (Fig. 5 ; Fig. S10). Furthermore, genes with TE insertions have a greater number of introns (Fig. S13) and more alternative splicing events (Fig. S15), and TEs within genic regions exhibit higher stability (Fig. S11). Meanwhile, TE insertions in gene promoter regions may regulate gene expression by introducing additional cis-acting elements (Fig. 6 ). This regulatory effect has been previously reported in seed plants 16 , 22 , and the present study extends this finding to ferns, suggesting that the TE-mediated regulatory innovation mechanism is conserved across clades. This study still has certain methodological and interpretive limitations. First, this study only used the genome of a single species, D. fragrans from a specialized habitat, to explore the driving role of TEs in fern genome evolution, which has a certain limitation in representativeness. Second, all analyses in this study were based on experimental materials from a single individual, so it was impossible to assess the variation characteristics of TE insertion patterns and copy numbers within natural populations; future population-level sequencing analyses are needed to clarify whether the TE characteristics observed in this study are fixed or variable across different habitats. Third, this study did not analyze the epigenetic landscape regulating TE stability and promoter activity; to further explore the mechanism by which TEs affect gene expression, future studies could investigate relevant epigenetic modification characteristics or use gene editing technology to knockout TE insertions within genes, thereby providing direct experimental evidence for the regulatory functions of TEs. Materials and Methods Plant materials and genome sequencing Plant materials used in this study were collected from the Wudalianchi region, Heilongjiang Province, China. Spores were isolated from sporophytes of D. fragrans and subsequently cultured to the gametophyte stage. Thereafter, gametophyte tissues were propagated through successive generations on sterile solid media containing (per liter) 4.43 g Murashige and Skoog (MS) basal salts, 10 g sucrose, and 7.5 g agar. Cultures were maintained under controlled environmental conditions (22°C, 16-hr light/8-hr dark photoperiod, 60% humidity) for 6–8 weeks to ensure robust biomass accumulation. DNA was extracted from the gametophytes of D. fragrans via the SDS method, followed by purification with the QIAGEN® Genomic Kit (Cat# 13343). DNA quality was assessed on 1% agarose gels to check for degradation and contamination. Purity was evaluated via a NanoDrop™ One UV‒Vis spectrophotometer (Thermo Fisher Scientific), with OD260/280 ratios between 1.8 and 2.0 and OD260/230 ratios between 2.0 and 2.2. The DNA concentration was measured via a Qubit® 3.0 fluorometer (Invitrogen, USA). For long-read sequencing, 2 µg of qualified DNA per sample was used as input for Oxford Nanopore Technologies (ONT) library preparation. DNA was size-selected via the BluePippin system (Sage Science, USA) before end-repair and A-tailing with the NEBNext Ultra II End Repair/dA-tailing Kit (Cat# E7546). Adapter ligation was performed via an LSK109 kit (Oxford Nanopore Technologies). Library quality was quantified with a Qubit® 3.0 fluorometer. Sequencing was performed on the GridION X5/PromethION platform (Oxford Nanopore Technologies). Genome Assembly The genome assembly was carried out via ONT reads via a hybrid approach. First, raw Nanopore reads were basecalled from FAST5 to FASTQ via Guppy (version 3.2.2 + 9fe0a78) 43 . Low-quality reads (mean_qscore_template < 7) were filtered out. De novo genome assembly was conducted via NextDenovo (v2.3.1) 44 , which employs an overlap-layout-consensus (OLC) strategy. Given the high error rate of ONT reads, subreads were self-corrected via NextCorrect to generate consistent sequences (CNS reads). After CNS correlation analysis with NextGraph, a preliminary genome assembly was constructed. To refine the assembly, contigs were corrected via Racon (v1.3.1) 45 (with ONT long reads) and polished via Nextpolish (v1.3.0) 46,47 (using Illumina short reads). Redundant contigs were removed via similarity searches (identity ≥ 80%, overlap ≥ 80%). The completeness of the genome was assessed via BUSCO (v4.0.5) 48 and CEGMA (v2) 49 . Assembly accuracy was evaluated by mapping Illumina paired-end reads to the genome with BWA (0.7.12-r1039) 50 and SAMtools (v1.4) 51 to assess the mapping rate and genome coverage. Base accuracy was further calculated with BCFtools (v1.8.0) 52 . Additionally, RNA-seq reads were aligned to the genome to assess gene coverage, and mitochondrial sequences were excluded by submitting the draft genome to the NT library for sequence filtering 53 . For telomere prediction, the quartet_teloexplorer.py script from quarTeT was utilized, in which the species was specified as "plant", and all other parameters were set to default values 54 . Hi-C scaffolding To anchor the scaffolds to chromosomes, Hi-C libraries were prepared from the genomic DNA of the reference cultivar. Freshly harvested leaves were vacuum infiltrated in nuclei isolation buffer, fixed with formaldehyde, and ground into powder. Nuclei were isolated and digested with DpnII. Biotin-14-dCTP was incorporated into the DNA, and unligated DNA ends were removed via T4 DNA polymerase exonuclease activity. The ligated DNA was sheared into 300 − 600 bp fragments, repaired, and A-tailed before being purified via biotin‒streptavidin pull-down. The Hi-C library was quantified and sequenced on the Illumina NovaSeq/MGI-2000 platform. Hi-C raw data quality was controlled via Hi-C-Pro (v3.1.0) 55 , with filtering for low-quality sequences (quality score < 20), adaptor sequences, and sequences shorter than 30 bp. Clean paired-end reads were aligned to the draft genome via Bowtie2 (v2.3.2) 56 . Valid interaction pairs were identified and retained via Hi-C-Pro (v3.1.0) for further analysis. The scaffolds were clustered, ordered, and oriented via LACHESIS, with parameters set for minimum resites, link density, and the noninformative ratio. Manual adjustment was performed to correct any orientation errors and improve chromosome-level scaffold placement. Gene annotation and functional annotation Gene prediction was carried out via three independent approaches: ab initio prediction, homology-based prediction, and RNA-seq-based prediction. For homology-based gene prediction, the GeMoMa (v1.6.1) 57 tool was used to align homologous peptides from closely related species to the repeat-masked genome assembly, allowing us to obtain gene structure information. For RNA-seq-based prediction, filtered mRNA-seq reads were aligned to the reference genome via STAR (v2.7.3a) 58 . Transcripts were assembled with StringTie (v1.3.4d) 59 , and open reading frames (ORFs) were predicted via PASA (v2.3.3) 60 . Additionally, RNA-seq reads were assembled de novo via StringTie, and the resulting transcripts were analyzed via PASA to generate a training set. Augustus (v3.3.1) 61 was then employed for ab initio gene prediction via this training set. The final gene set was generated by integrating the predictions from the three approaches via EVidenceModeler (v1.1.1) 60 . For functional annotation, gene models were annotated by comparing the predicted proteins against several public databases: SwissProt, Non-Redundant Protein Database (NR), Kyoto Encyclopedia of Genes and Genomes (KEGG), Clusters of Orthologous Groups of proteins (KOG), and GO. Protein domains and GO terms were identified via InterProScan (5.32–71.0) 62 . For the other four databases, BLASTP (v2.7.1) 63 was used to compare the predicted protein sequences against the public protein databases, and the best hits (with the lowest E value) were retained. The results from all five databases were concatenated to provide comprehensive functional annotations for the gene models. Phylogenetic reconstruction and gene-family evolution We retrieved publicly available genome assemblies and annotations for the species included in this study. Orthogroups were inferred with OrthoFinder (v2.5.5) 64 via default settings. Single-copy orthologs identified by OrthoFinder were aligned with MAFFT (v7.525) 65 , and poorly covered alignment positions (coverage < 60%) were removed via trimAl (v1.4) 66 . Model selection and maximum-likelihood tree inference were performed with IQ-TREE (v2.3.3) 67 , and node support was assessed with 1,000 ultrafast bootstrap replicates. Divergence times were estimated with MCMCtree in the PAML (v4.10.7) package 68 ; analyses used a burn-in of 200,000 iterations followed by 10,000 sampled iterations with a sampling frequency of 5. Gene family expansion and contraction were inferred via CAFE (v5.0) 69 . Species trees were visualized and annotated with FigTree (v1.4.4). TE Annotation and LTR Analysis For TE annotation, a D. fragrans TE library was constructed via EDTA (v2.2) 70 . EDTA incorporates both structure- and homology-based detection programs to annotate the predominant TE classes found in plant genomes. Unknown TEs were isolated separately and reclassified via DeepTE 71 . The complete TEs were further categorized via Tesorter (v1.4.6) 72 . Ultimately, a comprehensive TE database for D. fragrans was established. Subsequent summary statistics and visualizations were performed via the R packages tidyverse and ggplot2. Gene Structure Analysis First, the GFF3 files of the species under analysis were filtered to retain only protein-coding genes. The positions of exons and genes were extracted, and intron positions were identified by calculating the complement via BEDTools (v2.31.1) 73 . Gene length, exon length, and intron length were then quantified, with visualizations created via the R packages ggplot2 and ggridges. Genes with log10-transformed intron lengths greater than four were defined as having long introns. TE insertions within these long-intron genes and other genes were assessed via bedtools. The types and lengths of TEs inserted into introns were then summarized. Identification of Solo/Intact LTRs To evaluate the dynamics of LTR retrotransposons, intact and solo LTR elements in the D. fragrans genome were identified and classified. An intact LTR was defined as a retrotransposon containing two recognizable LTR sequences (both ends) within 1,000 bp of the flanking regions, along with an internal coding region, typically including domains such as gag, pol, and reverse transcriptase. In contrast, a single LTR was defined as a partial element retaining only a single LTR sequence along with a truncated internal region or lacking it entirely. These genes were identified via EDTA and further validated through sequence homology and structure-based annotation. WGD analysis To assess the WGD history of D. fragrans , first, conserved collinear blocks within the assembled genome were identified via intragenomic synteny analysis via JCVI 74 . Only gene pairs located within syntenic blocks containing at least five collinear genes were retained for further analysis. The number and size of these blocks were used to evaluate large-scale duplication events. To estimate the timing of gene duplications, Ks was calculated for paralogous gene pairs via ParaAT (v2.0) 75 and KaKs_Calculator (v3.0) 76 . The Ks distribution was fitted with a Gaussian model to detect putative peaks indicative of ancient or recent WGD events. In addition, the MultitAxon Paleopolyploidy Search (MAPS) 77 approach was applied to identify phylogenetically congruent duplication events across ferns. Gene family trees were reconstructed via genome and transcriptome data from multiple species via OrthoFinder (v2.5.5). To reduce transcript redundancy, transcriptome datasets were clustered via cd-hit (v4.8.1) 78 at a 90% identity threshold prior to analysis. Duplicated signals were then mapped onto internal nodes of the species phylogeny. To assess the significance of the observed duplication patterns, we performed simulation-based comparisons against both null and positive models. Full-Length Transcriptome Sequencing and Alternative Splicing Analysis To characterize the full-length transcriptome and identify alternative splicing (AS) events in D. fragrans , third-generation sequencing was performed via the PacBio CCS platform on samples collected from various tissues and developmental stages. HiFi reads were processed with the Iso-Seq3 (v4.2.0) pipeline to generate full-length nonconcatemer (FLNC) reads. These FLNC reads were then aligned to the reference genome via Minimap2-2.27 (r1193) 79 , and the resulting alignments were integrated to produce a preliminary GTF annotation file. To refine the transcript models, we used SQANTI3 (v5.3.6) 80 to compare the full-length transcriptome-based annotations with the existing genome annotations, resulting in a final, high-confidence gene annotation set. For AS analysis, SUPPA2 (v2.4) 81 was used to identify AS events on the basis of the full-length transcriptome annotation. The inclusion levels (percent spliced-in, PSI) of splicing events were calculated by integrating second-generation (Illumina) transcriptome data. Declarations Acknowledgements This work was supported by the National Natural Science Foundation of China (no. 32270394 to Y Chang and no.32370243 to YH Fang), a startup fund from Linyi University (LYDX2019BS039). We gratefully acknowledge financial support from the “Double First-Class” initiative of Heilongjiang Province for the advantageous and characteristic discipline of Chinese Materia Medica Biogenetics. NGS, ONT, and Hi-C sequencing of the genome were conducted via grandomic methods. Full-length transcriptome sequencing was performed by BerryGenomics Co., Ltd. Data availability The D. fragrans genome assembly and all of the raw sequencing data have been deposited at China National Center for Bioinformation (CNCB) 82 , 83 , under the BioProject accession number PRJCA042196. Authors’ contributions Y.C. and S.W. conceived the study. W.D., Q.W. and Y.F. designed and managed the major scientific objectives. D.Z. and C.S. managed the plant materials. X.Q. assembled the genome and estimated the genome size. W.D. annotated the genome and transposons and performed the data analysis. D.Z. contributed to the full-length transcriptome sequencing. W.D., Q.W., Y.F. and S.W. led the manuscript preparation. All the authors read and approved the final manuscript. References Testo, W., Sundue, M.: A 4000-species dataset provides new insight into the evolution of ferns. Mol. Phylogenet. Evol. 105 , 200–211 (2016) Kenrick, P., Crane, P.R.: The origin and early evolution of plants on land. Nature. 389 , 33–39 (1997) Nildas, K.J., Tiftneyt, B.: H. Patterns in vascular land plant diversification. (1983) I, P.: A community-derived classification for extant lycophytes and ferns. J. Syst. Evol. 54 , 563–603 (2016) Sessa, E.B., Der, J.P.: Chapter Seven - Evolutionary Genomics of Ferns and Lycophytes. In: Rensing, S.A. (ed.) Advances in Botanical Research, vol. 78, pp. 215–254. Academic (2016) Fernández, P., et al.: A 160 Gbp fork fern genome shatters size record for eukaryotes. iScience 27, (2024) Huang, C.-H., Qi, X., Chen, D., Qi, J., Ma, H.: Recurrent genome duplication events likely contributed to both the ancient and recent rise of ferns. J. Integr. Plant Biol. 62 , 433–455 (2020) Pelosi, J.A., Kim, E.H., Barbazuk, W.B., Sessa, E.B.: Phylotranscriptomics Illuminates the Placement of Whole Genome Duplications and Gene Retention in Ferns. Front. Plant. Sci. 13 , (2022) Fang, Y., et al.: The genome of homosporous maidenhair fern sheds light on the euphyllophyte evolution and defences. Nat. Plants. 8 , 1024–1037 (2022) Marchant, D.B., et al.: Dynamic genome evolution in a model fern. Nat. Plants. 8 , 1038–1051 (2022) Li, F.-W., et al.: Fern genomes elucidate land plant evolution and cyanobacterial symbioses. Nat. Plants. 4 , 460–472 (2018) Huang, X., et al.: The flying spider-monkey tree fern genome provides insights into fern evolution and arborescence. Nat. Plants. 8 , 500–512 (2022) Zhong, Y., et al.: Genomic Insights into Genetic Diploidization in the Homosporous Fern Adiantum nelumboides. Genome Biol. Evol. 14 , evac127 (2022) Chen, H., et al.: Revisiting ancient polyploidy in leptosporangiate ferns. New Phytol. 237 , 1405–1417 (2023) Alseekh, S., Scossa, F., Fernie, A.R.: Mobile Transposable Elements Shape Plant Genome Diversity. Trends Plant Sci. 25 , 1062–1064 (2020) Hirsch, C.D., Springer, N.M.: Transposable element influences on gene expression in plants. Biochim. et Biophys. Acta (BBA) - Gene Regul. Mech. 1860 , 157–165 (2017) Hassan, A.H., Mokhtar, M.M., El Allali, A.: Transposable elements: multifunctional players in the plant genome. Front. Plant. Sci. 14 , (2024) Wicker, T., et al.: A unified classification system for eukaryotic transposable elements. Nat. Rev. Genet. 8 , 973–982 (2007) Zhang, X., Wessler, S.R.: Genome-wide comparative analysis of the transposable elements in the related species Arabidopsis thaliana and Brassica oleracea. Proceedings of the National Academy of Sciences 101, 5589–5594 (2004) Staton, S.E., Burke, J.M.: Evolutionary transitions in the Asteraceae coincide with marked shifts in transposable element abundance. BMC Genom. 16 , 623 (2015) Wicker, T., et al.: Impact of transposable elements on genome structure and evolution in bread wheat. Genome Biol. 19 , 103 (2018) LTR-retrotransposons in plants: Engines of evolution. Gene. 626 , 14–25 (2017) Li, Z.-W., et al.: Transposable Elements Contribute to the Adaptation of Arabidopsis thaliana. Genome Biol. Evol. 10 , 2140–2150 (2018) Makarevitch, I., et al.: Transposable elements contribute to activation of maize genes in response to abiotic stress. PLoS Genet. 11 , e1004915 (2015) Lisch, D.: How important are transposons for plant evolution? Nat. Rev. Genet. 14 , 49–61 (2013) Qin, G., et al.: Chromosome-Scale Genome of the Fern Cibotium barometz Unveils a Genetic Resource of Medicinal Value. Horticulturae. 10 , 1191 (2024) Wei, Z.: Resolving the Stasis-Dynamism Paradox: Genome Evolution in Tree Ferns Pelosi, J., et al.: The genome of the vining fern Lygodium microphyllum highlights genomic and functional differences between life phases of an invasive plant. Preprint at. (2025). https://doi.org/10.1101/2025.03.06.640867 Sessa, E.B., Zimmer, E.A., Givnish, T.J.: Phylogeny, divergence times, and historical biogeography of New World Dryopteris (Dryopteridaceae). Am. J. Bot. 99 , 730–750 (2012) Zuo, Z.-Y., et al.: A revised classification of Dryopteridaceae based on plastome phylogenomics and morphological evidence, with the description of a new genus, Pseudarachniodes. Plant. Divers. 47 , 34–52 (2025) Chen, L., et al.: Microbial-type terpene synthases significantly contribute to the terpene profile of glandular trichomes of the fern Dryopteris fragrans (L). Plant J. 121 , e70079 (2025) Hassan, A.H., Mokhtar, M.M.: El Allali, A. Transposable elements: multifunctional players in the plant genome. Front. Plant. Sci. 14 , 1330127 (2024) Zhang, Y., et al.: Evolutionary rewiring of the wheat transcriptional regulatory network by lineage-specific transposable elements. Genome Res. 31 , 2276–2289 (2021) Choulet, F., et al.: Megabase Level Sequencing Reveals Contrasted Organization and Evolution Patterns of the Wheat Gene and Transposable Element Spaces. Plant. Cell. 22 , 1686–1701 (2010) Song, C., Guan, Y., Zhang, D., Tang, X., Chang, Y.: Integrated mRNA and miRNA Transcriptome Analysis Suggests a Regulatory Network for UV–B-Controlled Terpenoid Synthesis in Fragrant Woodfern (Dryopteris fragrans). IJMS 23, 5708 (2022) Lu, Z., Huang, Q., Zhang, T., Hu, B., Chang, Y.: Global transcriptome analysis and characterization of Dryopteris fragrans (L.) Schott sporangium in different developmental stages. BMC Genom. 19 , 471 (2018) Xiao, Y., Wang, J.: Understanding the Regulation Activities of Transposons in Driving the Variation and Evolution of Polyploid Plant Genome. Plants. 14 , 1160 (2025) Oliver, K.R., McComb, J.A., Greene, W.K.: Transposable Elements: Powerful Contributors to Angiosperm Evolution and Diversity. Genome Biol. Evol. 5 , 1886–1901 (2013) Mhiri, C., Borges, F., Grandbastien, M.-A.: Specificities and Dynamics of Transposable Elements in Land Plants. Biology. 11 , 488 (2022) Negi, P., Rai, A.N., Suprasanna, P.: Moving through the Stressed Genome: Emerging Regulatory Roles for Transposons in Plant Stress Response. Front. Plant. Sci. 7 , (2016) Roquis, D., et al.: Genomic impact of stress-induced transposable element mobility in Arabidopsis. Nucleic Acids Res. 49 , 10431–10447 (2021) Thieme, M., et al.: Experimentally heat-induced transposition increases drought tolerance in Arabidopsis thaliana . New Phytol. 236 , 182–194 (2022) Wick, R.R., Judd, L.M., Holt, K.E.: Performance of neural network basecalling tools for Oxford Nanopore sequencing. Genome Biol. 20 , 129 (2019) Hu, J., et al.: An efficient error correction and accurate assembly tool for noisy long reads. Preprint at. (2023). https://doi.org/10.1101/2023.03.09.531669 Vaser, R., Sović, I., Nagarajan, N., Šikić, M.: Fast and accurate de novo genome assembly from long uncorrected reads. 068122 Preprint at (2016). https://doi.org/10.1101/068122 Hu, J., Fan, J., Sun, Z., Liu, S.: NextPolish: a fast and efficient genome polishing tool for long-read assembly. Bioinformatics. 36 , 2253–2255 (2020) Chen, S., Zhou, Y., Chen, Y., Gu, J.: fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 34 , i884–i890 (2018) Simão, F.A., Waterhouse, R.M., Ioannidis, P., Kriventseva, E.V., Zdobnov, E.M.: BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 31 , 3210–3212 (2015) Parra, G., Bradnam, K., Korf, I.: CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics. 23 , 1061–1067 (2007) Li, H., Durbin, R.: Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics. 25 , 1754–1760 (2009) Li, H., et al.: The Sequence Alignment/Map format and SAMtools. Bioinformatics. 25 , 2078–2079 (2009) Danecek, P., McCarthy, S.A.: BCFtools/csq: haplotype-aware variant consequences. Bioinformatics. 33 , 2037–2039 (2017) Kim, D., Langmead, B., Salzberg, S.L.: HISAT: a fast spliced aligner with low memory requirements. Nat. Methods. 12 , 357–360 (2015) Lin, Y., et al.: quarTeT: a telomere-to-telomere toolkit for gap-free genome assembly and centromeric repeat identification. Hortic. Res. 10 , uhad127 (2023) Servant, N., et al.: HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 16 , 259 (2015) Langmead, B., Salzberg, S.L.: Fast gapped-read alignment with Bowtie 2. Nat. Methods. 9 , 357–359 (2012) Keilwagen, J., et al.: Using intron position conservation for homology-based gene prediction. Nucleic Acids Res. 44 , e89–e89 (2016) Dobin, A., et al.: STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 29 , 15–21 (2013) Kovaka, S., et al.: Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biol. 20 , 278 (2019) Haas, B.J., et al.: Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol. 9 , R7 (2008) Stanke, M., Diekhans, M., Baertsch, R., Haussler, D.: Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics. 24 , 637–644 (2008) Zdobnov, E.M., Apweiler, R.: InterProScan – an integration platform for the signature-recognition methods in InterPro. Bioinformatics. 17 , 847–848 (2001) McGinnis, S., Madden, T.L.: BLAST: at the core of a powerful and diverse set of sequence analysis tools. Nucleic Acids Res. 32 , W20–W25 (2004) Emms, D.M., Kelly, S.: OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 20 , 238 (2019) Katoh, K.: MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 30 , 3059–3066 (2002) Capella-Gutiérrez, S., Silla-Martínez, J.M., Gabaldón, T.: trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics. 25 , 1972–1973 (2009) Minh, B.Q., et al.: IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era. Mol. Biol. Evol. 37 , 1530–1534 (2020) Yang, Z.: PAML 4: Phylogenetic Analysis by Maximum Likelihood. Mol. Biol. Evol. 24 , 1586–1591 (2007) Mendes, F.K., Vanderpool, D., Fulton, B., Hahn: M. W. CAFE 5 models variation in evolutionary rates among gene families. Bioinformatics. 36 , 5516–5518 (2021) Ou, S., et al.: Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline. Genome Biol. 20 , 275 (2019) Yan, H., Bombarely, A., Li, S.: DeepTE: a computational method for de novo classification of transposons with convolutional neural network Zhang, R.-G., Wang, Z.-X., Ou, S., Li, G.-Y.: TEsorter: lineage-level classification of transposable elements using conserved protein domains. Preprint at. (2019). https://doi.org/10.1101/800177 Quinlan, A.R., BEDTools: The Swiss-Army Tool for Genome Feature Analysis. CP Bioinf. 47 , (2014) Tang, H., et al.: JCVI: A versatile toolkit for comparative genomics analysis. iMeta 3, e211 (2024) Zhang, Z., et al.: ParaAT: A parallel tool for constructing multiple protein-coding DNA alignments. Biochem. Biophys. Res. Commun. 419 , 779–781 (2012) Zhang, Z.: KaKs_Calculator 3.0: Calculating Selective Pressure on Coding and Non-Coding Sequences. Genom. Proteom. Bioinform. 20 , 536–540 (2022) Li, Z., et al.: Early genome duplications in conifers and other seed plants. Sci. Adv. (2015). https://doi.org/10.1126/sciadv.1501084 Fu, L., Niu, B., Zhu, Z., Wu, S., Li, W.: CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics. 28 , 3150–3152 (2012) Li, H.: Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 34 , 3094–3100 (2018) Pardo-Palacios, F.J., et al.: SQANTI3: curation of long-read transcriptomes for accurate identification of known and novel isoforms. Nat. Methods. 21 , 793–797 (2024) Trincado, J.L., et al.: SUPPA2: fast, accurate, and uncertainty-aware differential splicing analysis across multiple conditions. Genome Biol. 19 , 40 (2018) Chen, T., et al.: The Genome Sequence Archive Family: Toward Explosive Data Growth and Diverse Data Types. Genom. Proteom. Bioinform. 19 , 578–583 (2021) Members, C.N.C.B.-N.G.D.C., Partners, et al.: Database Resources of the National Genomics Data Center, China National Center for Bioinformation in 2025. Nucleic Acids Res. 53 , D30–D44 (2025) Tables Table 1. Genome assembly statistics of D. fragrans . Genome Size (Gb) 4.45 Contig N50 (Mb) 5.70 Scaffold N50 (Mb) 108.12 Contig number 1,740 Sequence anchored on chromosome (%) 99.33 GC content (%) 40.53 High-copy repeat content (%) 85.39 Protein-coding gene Mean gene length (bp) 36,768 Mean CDS length (bp) 1,301 Mean exon length (bp) 251 Mean intron length (bp) 8,497 Mean exon number per gene 5.17 Gene number 33,489 Noncoding Number Total (bp) miRNA 116 13,673 tRNA 3,811 287,962 rRNA 514 721,680 Additional Declarations There is NO Competing Interest. Supplementary Files SupplementaryTable.xlsx Supplementary Table.S1-S10 SupplementaryFigure.docx Supplementary Figure.S1-S15 Cite Share Download PDF Status: Under Review Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-7268223","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":599036322,"identity":"2f62975e-df57-4909-abf4-025a2b31ec3f","order_by":0,"name":"ying chang","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA60lEQVRIiWNgGAWjYDACCRjJ3gAVOUC0Fh6YUiK1gBgJRGrhn9387OHXNos8+ci3Bx/8bGOQ47uRwPi5AJ8ld46ZG8u2SRQb3s5LNuxtYzCWvJHALD0DjxYDiQQzack2icSNs3PMJHjbGBI33EhgY+bBqyX9G0TLzDPmP/+2MdQToSXHTPIjUMt8CR4zZqAtCQaEtEjcyCmTZjgnkbiBJ8dYWuachOHMMw+bpfFp4Z+Rvk3yR1ld4vz2M4Yf35TZyPMdTz74GZ8WEGDmZQO68ADEViBmbCCgAajkxx8GBnnC6kbBKBgFo2CkAgCEfkoKCKSNUAAAAABJRU5ErkJggg==","orcid":"","institution":"College of Life Sciences, Northeast Agricultural University","correspondingAuthor":true,"prefix":"","firstName":"ying","middleName":"","lastName":"chang","suffix":""},{"id":599036323,"identity":"2a5cbd8a-3e1f-4086-a118-29e969c0b47a","order_by":1,"name":"weicong DAI","email":"","orcid":"","institution":"College of Life Sciences, Northeast Agricultural University","correspondingAuthor":false,"prefix":"","firstName":"weicong","middleName":"","lastName":"DAI","suffix":""},{"id":599036324,"identity":"d573c06c-9692-4094-abc4-9ca223e51bf3","order_by":2,"name":"Qia Wang","email":"","orcid":"","institution":"Kunming Institute of Botany, Chinese Academy of Sciences","correspondingAuthor":false,"prefix":"","firstName":"Qia","middleName":"","lastName":"Wang","suffix":""},{"id":599036325,"identity":"549c8bfb-927b-4d7d-bef3-18a5c567d040","order_by":3,"name":"Yuhan Fang","email":"","orcid":"","institution":"South China Botanical Garden","correspondingAuthor":false,"prefix":"","firstName":"Yuhan","middleName":"","lastName":"Fang","suffix":""},{"id":599036326,"identity":"bba0c5ca-e0e6-4aa7-a049-fe7f0cbdad3b","order_by":4,"name":"xiaojie qiu","email":"","orcid":"","institution":"College of Life Sciences, Northeast Agricultural University","correspondingAuthor":false,"prefix":"","firstName":"xiaojie","middleName":"","lastName":"qiu","suffix":""},{"id":599036327,"identity":"e6fbc763-548e-4ea0-91a1-bf457401be13","order_by":5,"name":"chunhua song","email":"","orcid":"","institution":"College of Life Sciences and Technology, Harbin Normal University","correspondingAuthor":false,"prefix":"","firstName":"chunhua","middleName":"","lastName":"song","suffix":""},{"id":599036328,"identity":"22f8a7b1-bbc2-4f64-933b-f5b6bca3aba8","order_by":6,"name":"dongrui zhang","email":"","orcid":"","institution":"College of Life Sciences, Northeast Agricultural University","correspondingAuthor":false,"prefix":"","firstName":"dongrui","middleName":"","lastName":"zhang","suffix":""},{"id":599036329,"identity":"03602833-ac64-4946-b85f-65f235665aab","order_by":7,"name":"shucai wang","email":"","orcid":"","institution":"Laboratory of Plant Molecular Genetics \u0026 Crop Gene Editing, School of Life Sciences, Linyi University","correspondingAuthor":false,"prefix":"","firstName":"shucai","middleName":"","lastName":"wang","suffix":""}],"badges":[],"createdAt":"2025-08-01 07:05:10","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-7268223/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-7268223/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":103794543,"identity":"96c4933d-f513-42df-b523-198177026388","added_by":"auto","created_at":"2026-03-03 03:32:15","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":3539972,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eChromosome-level genome assembly of \u003c/strong\u003e\u003cem\u003e\u003cstrong\u003eD. fragrans\u003c/strong\u003e\u003c/em\u003e\u003cstrong\u003e.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ea.\u003c/strong\u003e Circos plot of the \u003cem\u003eD. fragrans\u003c/em\u003egenome, divided into 10 Mb windows, showing (1) chromosome length and number; (2) content of repetitive sequences; (3) content of Solo-LTRs; (4) content of Gypsy; (5) content of Copia; (6) gene density; and (7) GC content, with the innermost circle representing synteny. \u003cstrong\u003eb. \u003c/strong\u003eBUSCO assessment of genome completeness across seven fern species. \u003cstrong\u003ec.\u003c/strong\u003e Distribution of repeat content along chromosomes visualized viabar plots and heatmaps. Each chromosome is divided into 50 windows (x-axis), and the repeat density within each window is shown on the y-axis and by color intensity.\u003c/p\u003e","description":"","filename":"image1.png","url":"https://assets-eu.researchsquare.com/files/rs-7268223/v1/0b247717c6545265d90f35ca.png"},{"id":103794547,"identity":"0449537f-ea6e-43de-99c4-7806777787a8","added_by":"auto","created_at":"2026-03-03 03:32:15","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":2183282,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eWGD analysis.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ea.\u003c/strong\u003e Intragenomic synteny analysis of \u003cem\u003eD.\u003c/em\u003e \u003cem\u003efragrans\u003c/em\u003e, illustrating the distribution and retention of homologous gene blocks. \u003cstrong\u003eb. \u003c/strong\u003eKs distribution for non‐tandem duplicated homolog pairs. \u003cstrong\u003ec. \u003c/strong\u003eMAPS‐based detection of WGD signals: the phylogenetic tree of selected fern species is shown below, with nodes N2–N6 indicated; above, the proportion of duplicated genes in each subtree is plotted, where the green, red, and blue curves represent the nullsimulation, positivesimulation, and observeddata, respectively. The shaded regions denote confidence intervals.\u003c/p\u003e","description":"","filename":"image2.png","url":"https://assets-eu.researchsquare.com/files/rs-7268223/v1/30dd0dae36f6ff4d78ac3678.png"},{"id":104400703,"identity":"f27e7ea9-0c09-4f1b-99d2-dffba03a3a95","added_by":"auto","created_at":"2026-03-11 12:10:45","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":3915842,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003ePhylogeny and gene-family evolution.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ea.\u003c/strong\u003e Species tree with divergence time estimates and numbers of expanded/contracted gene families. The tree was inferred from 253 single-copy orthogroups; divergence times are indicated in Mya. \u003cstrong\u003eb.\u003c/strong\u003e GO enrichment of \u003cem\u003eD. fragrans\u003c/em\u003e-specific orthogroups. \u003cstrong\u003ec.\u003c/strong\u003e GO enrichment oforthogroups significantly enrichedin \u003cem\u003eD. fragrans\u003c/em\u003e. \u003cstrong\u003ed.\u003c/strong\u003e Distributions of Ka/Ks ratios for genes originating from five duplication modes (WGD, TD, PD, TRD, DSD). \u003cstrong\u003ee.\u003c/strong\u003e Overlap analysis between genes in expanded orthogroups and genes derived from each duplication mode (WGD, TD, PD, TRD, DSD). Among them, WGDp, TDp, PDp, TRDp, and DSDp refer to positively selected genes. \u003cstrong\u003ef.\u003c/strong\u003e GO enrichment for genes originating from WGD pairs.\u003c/p\u003e","description":"","filename":"image3.png","url":"https://assets-eu.researchsquare.com/files/rs-7268223/v1/c668ba3aab0ca4c200500ef1.png"},{"id":103794548,"identity":"e20a2b40-a05e-420f-8a27-ebbcf9b6fbe9","added_by":"auto","created_at":"2026-03-03 03:32:15","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":1172827,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eAnnotation and characteristics of TEs in \u003c/strong\u003e\u003cem\u003e\u003cstrong\u003eD. fragrans\u003c/strong\u003e\u003c/em\u003e\u003cstrong\u003e.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ea\u003c/strong\u003e. Comparison of LTR content and genome size between \u003cem\u003eD. fragrans\u003c/em\u003e and several other plant species. \u003cstrong\u003eb\u003c/strong\u003e. LAI index of the \u003cem\u003eD. fragrans\u003c/em\u003e genome, with regions \u0026gt;10 shown in blue and regions \u0026lt;10 shown in red. \u003cstrong\u003ec\u003c/strong\u003e. Kimura substitution levels of \u003cem\u003eD. fragransTEs\u003c/em\u003e. \u003cstrong\u003ed\u003c/strong\u003e. Phylogenetic trees of the six major LTR/Copia (left) and LTR/Gypsy (right) families. \u003cstrong\u003ee\u003c/strong\u003e. Identity distribution of TEs within the genome compared with a custom-built TE database. \u003cstrong\u003ef\u003c/strong\u003e. Identity distribution of LTRs within the genome compared with a custom-built TE database.\u003c/p\u003e","description":"","filename":"image4.png","url":"https://assets-eu.researchsquare.com/files/rs-7268223/v1/df38f7050b9f2f0b1d14206b.png"},{"id":103794544,"identity":"208a2950-4450-4d44-9871-c0a8e6519566","added_by":"auto","created_at":"2026-03-03 03:32:15","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":486664,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eEffects of TEs on gene structure in \u003c/strong\u003e\u003cem\u003e\u003cstrong\u003eD. fragrans.\u003c/strong\u003e\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ea.\u003c/strong\u003e Comparison of the full gene length, CDS length, and intron length distributions between \u003cem\u003eD. fragrans\u003c/em\u003e and other representative plant species. \u003cstrong\u003eb.\u003c/strong\u003e Density plot showing the relationship between intron length and the proportion of TE-derived sequences in introns. Red and blue represent long-intron genes and short-intron genes, respectively; solid lines indicate linear regression fits. \u003cstrong\u003ec.\u003c/strong\u003e Classification of genes into five categories on the basis of intron length and TE content proportion, with the number of genes in each category displayed. \u003cstrong\u003ed.\u003c/strong\u003e Insertion number and total length of different TE families in introns of TE-containing genes. \u003cstrong\u003ee.\u003c/strong\u003e Sequence identity distribution of TEs in introns of TE-containing genes.\u003c/p\u003e","description":"","filename":"image5.png","url":"https://assets-eu.researchsquare.com/files/rs-7268223/v1/71437db125ac082ed4dd665b.png"},{"id":103794545,"identity":"7308f8d6-a6f5-4922-b4fd-b4cedc3263b5","added_by":"auto","created_at":"2026-03-03 03:32:15","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":3331871,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eImpact of TE insertion in promoter regions on gene expression.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ea.\u003c/strong\u003e Expression levels of four gene categories (LG, SHG, SLG, SNG) across different tissues and stages of sporangium development (p \u0026lt; 0.001). \u003cstrong\u003eb.\u003c/strong\u003e Length distribution of TE insertions in the promoter regions of the four gene categories. \u003cstrong\u003ec.\u003c/strong\u003e Comparison of gene expression levels across different insertion size intervals of TEs in the promoter regions of the four gene categories. The x-axis represents the insertion length of TEs, with intervals of 500 bp. \u003cstrong\u003ed.\u003c/strong\u003e Expression statistics for four \u003cem\u003eUDPGT\u003c/em\u003egenes. The structure of each gene is shown at the bottom, where wide blocks represent exons, narrow blocks represent introns, and white arrows indicate the promoter regions. The redbars above the gene structures represent TEs. The histograms above show the distribution of transcriptome reads, with the numbers on the left indicating the normalized read peak values.\u003c/p\u003e","description":"","filename":"image6.png","url":"https://assets-eu.researchsquare.com/files/rs-7268223/v1/5662db3288005f8e365d60c8.png"},{"id":104410621,"identity":"ecd04f36-4902-44e0-9f07-3f0045e5e0db","added_by":"auto","created_at":"2026-03-11 12:53:04","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":15791312,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7268223/v1/fdf52237-20f7-4b77-94ed-8af045cabce5.pdf"},{"id":103794542,"identity":"acec6f21-775a-445b-9919-90170497ea6c","added_by":"auto","created_at":"2026-03-03 03:32:15","extension":"xlsx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":26250,"visible":true,"origin":"","legend":"Supplementary Table.S1-S10","description":"","filename":"SupplementaryTable.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-7268223/v1/6d299a80b4e2e6b36153f265.xlsx"},{"id":103794549,"identity":"6abb2ef0-ac1a-4318-83f6-c5abc36e305a","added_by":"auto","created_at":"2026-03-03 03:32:15","extension":"docx","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":9404910,"visible":true,"origin":"","legend":"Supplementary Figure.S1-S15","description":"","filename":"SupplementaryFigure.docx","url":"https://assets-eu.researchsquare.com/files/rs-7268223/v1/128d4617b07df9e43accc092.docx"}],"financialInterests":"There is \u003cb\u003eNO\u003c/b\u003e Competing Interest.","formattedTitle":"The chromosome-level genome assembly of Dryopteris fragrans reveals transposon-mediated genome evolution and adaptation","fulltext":[{"header":"Introduction","content":"\u003cp\u003eFerns represent one of the oldest surviving lineages of vascular plants and are valuable model groups for investigating plant evolution\u003csup\u003e\u003cspan additionalcitationids=\"CR2\" citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e\u003c/sup\u003e. With approximately 10,600 extant species, ferns constitute the second largest lineage (after angiosperms) among vascular plants\u003csup\u003e\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e\u003c/sup\u003e. Ferns have notoriously immense genomes (average 1C\u0026thinsp;=\u0026thinsp;12.3\u0026nbsp;billion bases (Gb); maximum 1C\u0026thinsp;=\u0026thinsp;160.45 Gb) and very high chromosome numbers (average, 40.5; maximum, 720)\u003csup\u003e\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e,\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e\u003c/sup\u003e. For a long time, this feature of ferns was thought to be associated with whole-genome duplication (WGD) events \u003csup\u003e\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e,\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e\u003c/sup\u003e. However, in published fern genomes, WGD events are less common in ferns than in seed plants\u003csup\u003e\u003cspan additionalcitationids=\"CR10 CR11 CR12\" citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e\u003c/sup\u003e. On the other hand, in all published fern genomes, repetitive sequences were found to be the predominant component of the genome, with transposable element (TE) being the major contributors\u003csup\u003e\u003cspan additionalcitationids=\"CR10 CR11 CR12\" citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e\u003c/sup\u003e. Consistent with recent comprehensive analyses of WGD in leptosporangiate ferns, these observations suggest that in the absence of frequent WGD events, fern genome evolution may have relied on small-scale duplications\u003csup\u003e\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e\u003c/sup\u003e, wherein TE\u0026mdash;the dominant genomic component\u0026mdash;likely play a pivotal role.\u003c/p\u003e \u003cp\u003eTE act as \u0026ldquo;jumping genes\u0026rdquo; in the genome, shaping plant genomic diversity and playing dual roles in genomic structural variation and functional innovation\u003csup\u003e\u003cspan additionalcitationids=\"CR16\" citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e\u003c/sup\u003e. TEs are divided into two major classes: Class I retrotransposons that mobilize via an RNA intermediate and Class II DNA transposons that move through a cut-and-paste mechanism\u003csup\u003e\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e\u003c/sup\u003e. In seed plants, the insertion of TEs has been shown to be closely associated with genome expansion\u003csup\u003e\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e,\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e\u003c/sup\u003e, gene family evolution\u003csup\u003e\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e,\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e\u003c/sup\u003e and environmental adaptation\u003csup\u003e\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e\u003c/sup\u003e. Moreover, environmental stresses such as ultraviolet stress, drought, or pathogen attack can trigger bursts of TE activity, leading to adaptive genetic variation\u003csup\u003e\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e\u003c/sup\u003e. TE insertions can regulate gene expression by altering gene structure or via insertional mutagenesis, supplying raw material for evolutionary innovation\u003csup\u003e\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e\u003c/sup\u003e. While in seed plants, the ability of TE to promote genome evolution and improve plant environmental adaptability has been supported by numerous studies, research on TEs in ferns remains relatively scarce.\u003c/p\u003e \u003cp\u003eThe current availability of high-quality fern genomes remains limited to only a few lineages: three species from Pteridaceae\u003csup\u003e\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e,\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e,\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e\u003c/sup\u003e, two from Salviniaceae\u003csup\u003e\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e\u003c/sup\u003e, one from Cyatheaceae\u003csup\u003e\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e\u003c/sup\u003e, four from Cibotiaceae\u003csup\u003e\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e,\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e\u003c/sup\u003e, and one from Lygodiaceae\u003csup\u003e\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e\u003c/sup\u003e. Although these fern genomic resources have enhanced our understanding of this group to a certain extent, existing studies have mainly focused on ferns inhabiting humid and/or shaded habitats. Genomic research on ferns from specific habitats is relatively scarce, which may contain new perspectives for understanding the genomic evolution of this group. Take \u003cem\u003eD. fragrans\u003c/em\u003e as an example, it is a fern species growing in direct sunlight on volcanic lava habitats in the northern temperate zone, and belongs to the basal lineage of Dryopteridaceae, the most species-rich family of ferns\u003csup\u003e\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e,\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e\u003c/sup\u003e. Currently, research on \u003cem\u003eD. fragrans\u003c/em\u003e has mainly focused on terpenoid metabolism and the medicinal value of its active components, with few studies on its genomic evolution and environmental adaptability\u003csup\u003e\u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eIn this study, we present a chromosome-level genome assembly of \u003cem\u003eD. fragrans\u003c/em\u003e. Annotation analyses revealed that its genome harbors significantly higher TE content than other reported fern genomes, but with lower TE integrity. We explored the evolution of gene families potentially associated with adaptation to specialized habitats and the potential relationship between gene expansion and TEs. Additionally, we investigated the potential impacts of TEs on gene structure and expression regulation. Similar TE-mediated genome evolution has been reported in seed plants\u003csup\u003e\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e,\u003cspan additionalcitationids=\"CR33\" citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e\u003c/sup\u003e; our study extends this concept to ferns, laying a foundation for future comparative studies in non-seed plants. These findings suggest that TE-driven genomic dynamics may be a universal feature of land plant evolution, rather than being unique to seed plants.\u003c/p\u003e"},{"header":"Results","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003eChromosome-level genome assembly and annotation\u003c/h2\u003e \u003cp\u003eFerns exhibit an alternation of generations between diploid sporophytes and haploid gametophytes. To obtain a high-quality haplotype-resolved genome of \u003cem\u003eD. fragrans\u003c/em\u003e, spores were cultured into gametophytes, which were then subjected to propagation and culture. The resulting callus was used for DNA extraction, library construction, and sequencing. K-mer analysis estimated the genome size to be approximately 4.48 Gb (Fig. \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003eS1\u003c/span\u003ea). The new genome was generated using a combination of Oxford Nanopore Technologies (ONT) long reads (average sequencing depth of 58\u0026times;; Table \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003eS1\u003c/span\u003e), Illumina short reads (coverage depth of 52\u0026times;; Table \u003cspan refid=\"MOESM2\" class=\"InternalRef\"\u003eS2\u003c/span\u003e), and high-throughput chromosome conformation capture data (Hi-C; average coverage depth of 111\u0026times;). The final assembly comprised 41 chromosomes with a total size of 4.45 Gb and a contig N50 of 5.70 Mb (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003ea; Table\u0026nbsp;1; Fig. \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003eS1\u003c/span\u003eb; Table S3). Furthermore, we identified telomeric sequences at both ends of 16 chromosomes and at one end of 14 chromosomes (Fig. \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003eS1\u003c/span\u003ec; Table S4).\u003c/p\u003e \u003cp\u003eWe annotated the \u003cem\u003eD. fragrans\u003c/em\u003e genome using a combined strategy (ab initio, homology-based and transcriptome-based predictions; Tables S5\u0026ndash;S8). In total, 33,489 high-confidence gene models were annotated, with a mean gene length of 36,768 bp (Table\u0026nbsp;1); 75.11% of genes were supported by transcript evidence (Table S8). Benchmarking Universal Single Copy Orthologs (BUSCO) analysis recovered 1,388 complete BUSCOs (86.0%), including 1,296 single-copy (80.3%) and 92 duplicated (5.7%) complete BUSCOs (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eb). Additional Core Eukaryotic Genes Mapping Approach (CEGMA) assessment also supported the high reliability of our gene models (Fig. \u003cspan refid=\"MOESM2\" class=\"InternalRef\"\u003eS2\u003c/span\u003e). A total of 22,879 protein-coding genes (68.32% of all genes) were assigned functional annotations in public databases (Fig. S3; Table. S9). Together, these results indicate that the gene-structure annotation of the \u003cem\u003eD. fragrans\u003c/em\u003e genome is of sufficient quality to support downstream analyses.\u003c/p\u003e \u003cp\u003eIn the \u003cem\u003eD. fragrans\u003c/em\u003e genome, we annotated 3.77 Gb of repetitive sequences, accounting for 85.39% of the assembly (Table S10). Long terminal repeat (LTR) retrotransposons are the dominant component: LTR/Gypsy and LTR/Copia elements constitute 54.35% and 17.75% of the genome, respectively. We also detected asymmetric repeat accumulation at chromosome ends for 20 chromosomes (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003ec; Fig. S4). Notably, 10 of these chromosomes contain telomeric sequences at both ends. This pattern suggests that the two chromosomal arms within the same chromosome may differ in terms of repeat content. The underlying mechanisms, including potential differences in insertion or removal dynamics, recombination or DNA repair activity, and selective constraints, require further investigation.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv description=\"\" class=\"Drawing\" id=\"896075505\" name=\"图片 1\"\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003ea.\u003c/b\u003e Circos plot of the \u003cem\u003eD. fragrans\u003c/em\u003e genome, divided into 10 Mb windows, showing (1) chromosome length and number; (2) content of repetitive sequences; (3) content of Solo-LTRs; (4) content of Gypsy; (5) content of Copia; (6) gene density; and (7) GC content, with the innermost circle representing synteny. \u003cb\u003eb.\u003c/b\u003e BUSCO assessment of genome completeness across seven fern species. \u003cb\u003ec.\u003c/b\u003e Distribution of repeat content along chromosomes visualized via bar plots and heatmaps. Each chromosome is divided into 50 windows (x-axis), and the repeat density within each window is shown on the y-axis and by color intensity.\u003c/p\u003e \u003cp\u003e \u003cb\u003eRecent WGD has not been detected in\u003c/b\u003e \u003cb\u003eD. fragrans\u003c/b\u003e.\u003c/p\u003e \u003cp\u003eWGD is a widespread and important process in plant genome evolution, providing raw gene material for functional diversification and thereby accelerating evolutionary innovation. To investigate the WGD history of \u003cem\u003eD. fragrans\u003c/em\u003e, we first surveyed genome-wide synteny. We identified 75 syntenic blocks containing 399 paralogous gene pairs, representing only 2.38% of the gene set, a proportion too low to support a recent WGD (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003ea). The synonymous substitution rate (Ks) distributions of both syntenic pairs and all homologous gene pairs showed a single peak at Ks\u0026thinsp;\u0026asymp;\u0026thinsp;1.2, which was consistent with one ancient duplication event rather than recent polyploidy (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eb; Fig. S5). Using the multi-taxon polyploidy search (MAPS) framework with genome and transcriptome data from \u003cem\u003eD. fragrans\u003c/em\u003e and other ferns, we detected only a single WGD shared broadly among leptosporangiate ferns, likely dating to the Permian\u0026ndash;Triassic interval (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003ec). In summary, we found no evidence of recent WGD in \u003cem\u003eD. fragrans\u003c/em\u003e, and its genomic evolution may be driven by other factors.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003ea.\u003c/b\u003e Intragenomic synteny analysis of \u003cem\u003eD. fragrans\u003c/em\u003e, illustrating the distribution and retention of homologous gene blocks. \u003cb\u003eb.\u003c/b\u003e Ks distribution for non-tandem duplicated homolog pairs. \u003cb\u003ec.\u003c/b\u003e MAPS‐based detection of WGD signals: the phylogenetic tree of selected fern species is shown below, with nodes N2\u0026ndash;N6 indicated; above, the proportion of duplicated genes in each subtree is plotted, where the green, red, and blue curves represent the null simulation, positive simulation, and observed data, respectively. The shaded regions denote confidence intervals.\u003c/p\u003e \u003cp\u003e \u003cb\u003eTE-related dispersed duplicates (DSD) are the main drivers of gene family expansion in\u003c/b\u003e \u003cb\u003eD. fragrans\u003c/b\u003e\u003c/p\u003e \u003cp\u003eTo explore whether the evolution of specific gene families contributes to the genomic evolution of \u003cem\u003eD. fragrans\u003c/em\u003e, protein-coding genes from 12 species were clustered into 26,136 orthogroups, of which 31,115 \u003cem\u003eD. fragrans\u003c/em\u003e genes (93.8% of the gene set) were assigned. A species tree inferred from 253 single-copy orthogroups and molecular dating data indicates that \u003cem\u003eD. fragrans\u003c/em\u003e diverged from the lineage shared with water ferns and maidenhair ferns at ~\u0026thinsp;168.6 Ma (Middle Jurassic; Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003ea). To investigate gene-family dynamics potentially linked to adaptation, we conducted gene ontology (GO) enrichment on 761 \u003cem\u003eD. fragrans\u003c/em\u003e-specific orthogroups (4,615 genes) and 2,262 significantly expanded orthogroups (8,774 genes). Both lineage-specific and expanded families were enriched for functions related to redox metabolism, terpene biosynthesis, stress perception/defense, and protein homeostasis (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eb, c). Compared with the other genes, the genes within the \u003cem\u003eD. fragrans\u003c/em\u003e-specific and expanded orthogroups presented stronger transcriptional responses to ultraviolet (UV) treatment (Fig. S6)\u003csup\u003e\u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e\u003c/sup\u003e. Taken together, these results suggest that the evolution and expansion of these gene families may have contributed to the adaptation of \u003cem\u003eD. fragrans\u003c/em\u003e to its specific habitat.\u003c/p\u003e \u003cp\u003eTo further explore the main modes of gene family expansion in \u003cem\u003eD. fragrans\u003c/em\u003e, we identified 28,193 duplicated genes and classified them into five categories: 486 whole-genome duplicates (WGD; 1.7%), 2,456 tandem duplicates (TD; 8.7%), 2,165 proximal duplicates (PD; 7.7%), 980 transposed duplicates (TRD; 3.5%), and 21,747 dispersed duplicates (DSD; 77.1%). Compared with the other classes, the PD and TD pairs presented lower Ks values and relatively higher nonsynonymous/synonymous substitution rate (Ka/Ks) values (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003ed; Fig. S7), indicating a more recent origin and faster sequence divergence, which is consistent with ongoing tandem and proximal duplications. We then compared the overlap between expanded orthogroups and genes derived from the five duplication modes (WGD, TD, PD, TRD, DSD). The majority of genes within expanded orthogroups are derived from DSD (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003ee). In addition, by assessing genome-wide correlations between TE abundance and gene counts of different duplication classes, we observed that DSD gene counts were positively correlated with TEs (Pearson\u0026rsquo;s R\u0026thinsp;=\u0026thinsp;0.79), a correlation stronger than that between TRD gene counts and TEs (Pearson\u0026rsquo;s R\u0026thinsp;=\u0026thinsp;0.75; Fig. S8). Taken together, the expanded gene families potentially associated with the adaptation of \u003cem\u003eD. fragrans\u003c/em\u003e to its specific habitat are mainly derived from DSD, and DSD is closely related to TEs, suggesting that TEs may be the main driver of genomic evolution in \u003cem\u003eD. fragrans\u003c/em\u003e.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003ea.\u003c/b\u003e Species tree with divergence time estimates and numbers of expanded/contracted gene families. The tree was inferred from 253 single-copy orthogroups; divergence times are indicated in Mya. \u003cb\u003eb.\u003c/b\u003e GO enrichment of \u003cem\u003eD. fragrans\u003c/em\u003e-specific orthogroups. \u003cb\u003ec.\u003c/b\u003e GO enrichment of orthogroups significantly enriched in \u003cem\u003eD. fragrans\u003c/em\u003e. \u003cb\u003ed.\u003c/b\u003e Distributions of Ka/Ks ratios for genes originating from five duplication modes (WGD, TD, PD, TRD, DSD). \u003cb\u003ee.\u003c/b\u003e Overlap analysis between genes in expanded orthogroups and genes derived from each duplication mode (WGD, TD, PD, TRD, DSD). Among them, WGDp, TDp, PDp, TRDp, and DSDp refer to positively selected genes. \u003cb\u003ef.\u003c/b\u003e GO enrichment for genes originating from WGD pairs.\u003c/p\u003e \u003cp\u003e \u003cb\u003eLow Integrity and continuous insertion of TE in the\u003c/b\u003e \u003cb\u003eD. fragrans\u003c/b\u003e \u003cb\u003egenome.\u003c/b\u003e\u003c/p\u003e \u003cp\u003eA comparison with published fern genomes revealed that both the relative proportion and the absolute base count of TEs are elevated in \u003cem\u003eD. fragrans\u003c/em\u003e (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003ea). However, the TE complement in this genome is characterized not by the preservation of numerous intact elements but by pervasive fragmentation and rapid turnover. Several lines of evidence support this interpretation. First, the LTR assembly index (LAI) was low (LAI\u0026thinsp;=\u0026thinsp;9.73), indicating reduced overall integrity of LTR sequences relative to those of other ferns (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eb). Second, intact LTR retrotransposons represent only\u0026thinsp;~\u0026thinsp;0.58% of the assembly, which is consistent with widespread truncation or degradation. Third, analyses of Kimura substitution spectra and pairwise sequence identity revealed a pattern of continuous, frequent LTR integration rather than a single, recent burst (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003ec-f); in contrast, several non-LTR TE classes presented signatures of episodic amplification. Notably, we observed peaks at a Kimura substitution level of ~\u0026thinsp;35 and at a sequence identity of ~\u0026thinsp;0.75, which may correspond to the ancient leptosporangiate WGD signal.\u003c/p\u003e \u003cp\u003eSubfamily level classification of ~\u0026thinsp;46,130 intact LTRs further reveals heterogeneous dynamics: Copia elements are dominated by the Ale, Tork and Ivana subfamilies, whereas Gypsy elements are primarily Athila, Reina and Tekay (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003ed; Fig. S9a). When a neutral substitution rate (\u0026micro;\u0026thinsp;=\u0026thinsp;1.3 \u0026times; 10⁻⁸ substitution site⁻\u0026sup1; year⁻\u0026sup1;) was used for insertion-time estimation, the Ivana subfamily presented evidence of relatively recent expansion, whereas most other LTR lineages accumulated more gradually (Fig. S9b,c). Taken together, these observations are most consistent with a high-turnover regime in which ongoing insertion of LTRs is rapidly followed by processes that fragment or remove copies.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003ea\u003c/b\u003e. Comparison of LTR content and genome size between \u003cem\u003eD. fragrans\u003c/em\u003e and several other plant species. \u003cb\u003eb\u003c/b\u003e. LAI index of the \u003cem\u003eD. fragrans\u003c/em\u003e genome, with regions\u0026thinsp;\u0026gt;\u0026thinsp;10 shown in blue and regions\u0026thinsp;\u0026lt;\u0026thinsp;10 shown in red. \u003cb\u003ec\u003c/b\u003e. Kimura substitution levels of \u003cem\u003eD. fragrans TEs\u003c/em\u003e. \u003cb\u003ed\u003c/b\u003e. Phylogenetic trees of the six major LTR/Copia (left) and LTR/Gypsy (right) families. \u003cb\u003ee\u003c/b\u003e. Identity distribution of TEs within the genome compared with a custom-built TE database. \u003cb\u003ef\u003c/b\u003e. Identity distribution of LTRs within the genome compared with a custom-built TE database.\u003c/p\u003e \u003cp\u003e \u003cb\u003eTE insertion shapes gene structure in\u003c/b\u003e \u003cb\u003eD. fragrans\u003c/b\u003e.\u003c/p\u003e \u003cp\u003eTE act as a \"double-edged sword\" in genomes: on the one hand, their insertion into genic regions can disrupt gene structure and impair function; on the other hand, they provide genetic variation resources for evolution. Since the most direct effect of TE insertion into genic regions is altering gene length, we first investigated the associations between TEs and gene structure in \u003cem\u003eD. fragrans\u003c/em\u003e through cross-species comparisons. The results revealed that among the fern lineages, with the exception of two aquatic ferns with small genomes, the other four ferns presented a broader range of gene lengths (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003ea). Further analysis of coding sequence (CDS) and intron regions revealed that CDS length was relatively conserved across all plant groups, whereas intron length varied significantly\u0026mdash;with a subset of genes harboring exceptionally long introns (\u0026gt;\u0026thinsp;10 kb) in the four ferns. Notably, a strong positive correlation was observed between gene length and intron length in \u003cem\u003eD. fragrans\u003c/em\u003e (R\u0026thinsp;=\u0026thinsp;0.99; Fig. S10), indicating that gene length is determined primarily by intron length.\u003c/p\u003e \u003cp\u003eFurther investigations revealed that TE insertions are prevalent within the introns of \u003cem\u003eD. fragrans\u003c/em\u003e genes. Moreover, compared with TE in other genomic regions, these TE insertions located in gene regions exhibit higher stability (Fig. S11). All long-intron genes contained TE insertions (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eb), and a positive correlation was detected between the proportion of TE-derived sequences in introns and intron length (R\u0026thinsp;=\u0026thinsp;0.71). To further investigate the impacts of varying proportions of TE-derived sequences relative to intron length on genes, we classified the genes into four categories according to TE insertion proportions relative to intron length and intron length itself: short-intron no TE-insertion genes (SNG), short-intron low TE-ratio genes (SLG, TE ratio\u0026thinsp;\u0026lt;\u0026thinsp;0.5), short-intron high TE-ratio genes (SHG), and long-intron TE-insertion genes (LG) (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003ec; Fig. S12). In addition, genes with TE insertions possessed more introns than other genes, with LG genes harboring the largest number of introns (Fig. S13).\u003c/p\u003e \u003cp\u003eFurther analysis of TE types inserted into introns revealed that various TE families were present in the SLG, SHG, and LG genes. Among these, LTR retrotransposons dominated in both insertion number and total length (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003ed), indicating that LTRs are the primary TE type that are inserted into genic regions. A comparison of TE sequence identity distributions in introns of the three TE-containing gene categories revealed that SLG genes had significantly lower TE identity than did SHG genes and LG genes (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003ee; Fig. S14), suggesting that more ancient TE insertions occur in SLG genes.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003ea.\u003c/b\u003e Comparison of the full gene length, CDS length, and intron length distributions between \u003cem\u003eD. fragrans\u003c/em\u003e and other representative plant species. \u003cb\u003eb.\u003c/b\u003e Density plot showing the relationship between intron length and the proportion of TE-derived sequences in introns. Red and blue represent long-intron genes and short-intron genes, respectively; solid lines indicate linear regression fits. \u003cb\u003ec.\u003c/b\u003e Classification of genes into five categories on the basis of intron length and TE content proportion, with the number of genes in each category displayed. \u003cb\u003ed.\u003c/b\u003e Insertion number and total length of different TE families in introns of TE-containing genes. \u003cb\u003ee.\u003c/b\u003e Sequence identity distribution of TEs in introns of TE-containing genes.\u003c/p\u003e \u003cp\u003e \u003cb\u003eTE insertions in promoter regions may affect gene expression.\u003c/b\u003e \u003c/p\u003e \u003cp\u003eTo investigate the potential impacts of TE insertions on gene expression, we compared the expression levels of the four gene categories across different tissues and distinct developmental stages of sporangia in \u003cem\u003eD. fragrans\u003c/em\u003e\u003csup\u003e\u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e\u003c/sup\u003e. The results revealed that LG genes generally presented the highest expression levels, whereas SHG genes presented very low expression (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003ea). Since the promoter region plays a key role in gene expression regulation, we examined TE insertions in the promoters of these four gene categories. We found that, compared with the other categories, the SHG genes presented significantly more TE insertions in their promoters (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003eb), which may account for the relatively low expression levels of SHG genes.\u003c/p\u003e \u003cp\u003eTo further explore the relationship between the length of TE insertions in promoter regions and gene expression levels, we divided the TE insertion length in the promoter regions into four 500-bp intervals and compared their effects on gene expression. Across the four gene categories, gene expression levels were consistently the lowest when the TE insertion length ranged from 1500 to 2000 bp (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003ec). Notably, in SHG genes, TE insertions with lengths ranging from 0 to 1500 bp may promote gene expression. These findings suggest that TE can influence gene expression by being inserted into promoter regions.\u003c/p\u003e \u003cp\u003eWe used four tandemly duplicated UDP-glycosyltransferases (UDPGTs) as a representative example to illustrate the impacts of TE insertions in the promoter regions of SHG genes on gene expression levels and the underlying potential mechanisms. The promoter region of \u003cem\u003eUDPGT1\u003c/em\u003e contained a 1912-bp TE insertion, while \u003cem\u003eUDPGT2\u003c/em\u003e and \u003cem\u003eUDPGT3\u003c/em\u003e had 518 bp and 344 bp TE insertions, respectively, and no TE insertions were detected in the promoter of \u003cem\u003eUDPGT4\u003c/em\u003e (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003ed). The transcriptome data revealed no expression of \u003cem\u003eUDPGT1\u003c/em\u003e or \u003cem\u003eUDPGT4\u003c/em\u003e, whereas \u003cem\u003eUDPGT2\u003c/em\u003e and \u003cem\u003eUDPGT3\u003c/em\u003e were expressed in glandular trichomes and spore-bearing leaves. Further investigation revealed that the loss of the TATA-box core element in the promoter of \u003cem\u003eUDPGT1\u003c/em\u003e due to extensive TE insertions likely accounts for the lack of expression. In contrast, the promoter regions of \u003cem\u003eUDPGT2\u003c/em\u003e and \u003cem\u003eUDPGT3\u003c/em\u003e harbored TE insertions that contained additional functional cis-acting elements, such as CAAT-boxes and MYB-binding sites, suggesting that TEs may modulate gene expression by introducing regulatory elements.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003ea.\u003c/b\u003e Expression levels of four gene categories (LG, SHG, SLG, SNG) across different tissues and stages of sporangium development (p\u0026thinsp;\u0026lt;\u0026thinsp;0.001). \u003cb\u003eb.\u003c/b\u003e Length distribution of TE insertions in the promoter regions of the four gene categories. \u003cb\u003ec.\u003c/b\u003e Comparison of gene expression levels across different insertion size intervals of TEs in the promoter regions of the four gene categories. The x-axis represents the insertion length of TEs, with intervals of 500 bp. \u003cb\u003ed.\u003c/b\u003e Expression statistics for four \u003cem\u003eUDPGT\u003c/em\u003e genes. The structure of each gene is shown at the bottom, where wide blocks represent exons, narrow blocks represent introns, and white arrows indicate the promoter regions. The red bars above the gene structures represent TEs. The histograms above show the distribution of transcriptome reads, with the numbers on the left indicating the normalized read peak values.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eTE insertions can enhance alternative splicing (AS) of genes\u003c/h3\u003e\n\u003cp\u003eTo investigate whether TE insertion into genic regions regulates AS, we performed a genome-wide systematic analysis of alternative splicing in \u003cem\u003eD. fragrans\u003c/em\u003e and identified a total of 105,203 alternative splicing events. The results showed that approximately 66% of the genes exhibited alternative splicing characteristics, with an average of 6 alternative splicing isoforms per gene. Among all AS types, alternative first exon (AF) events accounted for the highest proportion, reaching 23.54%. Notably, as many as 80.42% of mutually exclusive exon (MX) events occurred in TE-inserted genes (Fig. S15a), suggesting that TE insertion may be one of the factors driving MX-type AS events. Further comparison of AS characteristics across different gene categories revealed that, compared with SNG genes without TE insertions, TE-inserted genes (LG, SHG and SLG) had significantly higher proportions of alternative splicing genes (ASGs) and greater numbers of AS events. Among these, LG genes not only had the highest proportion of AS occurrence but also the largest number of AS events (Fig. S15b, c).\u003c/p\u003e"},{"header":"Discussion","content":"\u003cp\u003eWe generated a high-quality chromosome-level genome assembly of \u003cem\u003eD. fragrans\u003c/em\u003e\u0026mdash;a fern species inhabiting sun-exposed volcanic-lava habitats (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e; Fig. \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003eS1\u003c/span\u003e). As the first genome of a specialized-habitat fern within Dryopteridaceae, this assembly not only fills a critical gap in fern genome research but also provides a foundational resource for comparative genomics studies of early land plants.\u003c/p\u003e \u003cp\u003eIn this study, we found that the core factor driving genome evolution in \u003cem\u003eD. fragrans\u003c/em\u003e is not frequent WGD (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e; Fig. S5), but rather TEs associated with the expansion of gene families related to adaptation to specialized environments (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e; Fig. S6-8). This TE-mediated gene expansion pattern dependent on DSDs is consistent with the mechanistic models in angiosperms, where mobile elements regulate gene transposition and the acquisition of new genes\u003csup\u003e\u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e37\u003c/span\u003e,\u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e\u003c/sup\u003e. Our results extend this evolutionary paradigm to leptosporangiate ferns, suggesting that distantly diverged land plant lineages share convergent evolutionary paths in gene family expansion. This convergence implies that, in the absence of large-scale WGD, TE-mediated small-scale gene duplication may represent a universal strategy for adaptive evolution\u0026mdash;a pattern particularly relevant for early land plant lineages with conserved diploidy\u003csup\u003e\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eCombined with analyses of TE insertion frequency, this study suggests that TEs in the \u003cem\u003eD. fragrans\u003c/em\u003e genome exhibit a more distinct dynamic evolutionary pattern of continuous insertion and rapid removal (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e; Fig. S9). The habitat of \u003cem\u003eD. fragrans\u003c/em\u003e is characterized by intense ultraviolet radiation and large temperature fluctuations, all of which have been confirmed to induce TE activation in plants\u003csup\u003e\u003cspan additionalcitationids=\"CR40 CR41\" citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e\u003c/sup\u003e. Rapid removal of intact TEs can minimize genomic instability caused by TE hyperactivity, while continuous TE insertion events can reserve abundant genetic variation for the adaptive evolution of the species. This dynamic TE pattern may represent a previously unrecognized genomic adaptation strategy of ferns in specialized habitats, a conclusion that merits further validation in other fern species adapted to similar habitats.\u003c/p\u003e \u003cp\u003eFurther studies revealed that TEs can also insert into the genic regions of \u003cem\u003eD. fragrans\u003c/em\u003e to affect gene structure, leading to the presence of a class of genes with longer introns in the \u003cem\u003eD. fragrans\u003c/em\u003e genome (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e; Fig. S10). Furthermore, genes with TE insertions have a greater number of introns (Fig. S13) and more alternative splicing events (Fig. S15), and TEs within genic regions exhibit higher stability (Fig. S11). Meanwhile, TE insertions in gene promoter regions may regulate gene expression by introducing additional cis-acting elements (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003e). This regulatory effect has been previously reported in seed plants \u003csup\u003e\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e,\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e\u003c/sup\u003e, and the present study extends this finding to ferns, suggesting that the TE-mediated regulatory innovation mechanism is conserved across clades.\u003c/p\u003e \u003cp\u003eThis study still has certain methodological and interpretive limitations. First, this study only used the genome of a single species, \u003cem\u003eD. fragrans\u003c/em\u003e from a specialized habitat, to explore the driving role of TEs in fern genome evolution, which has a certain limitation in representativeness. Second, all analyses in this study were based on experimental materials from a single individual, so it was impossible to assess the variation characteristics of TE insertion patterns and copy numbers within natural populations; future population-level sequencing analyses are needed to clarify whether the TE characteristics observed in this study are fixed or variable across different habitats. Third, this study did not analyze the epigenetic landscape regulating TE stability and promoter activity; to further explore the mechanism by which TEs affect gene expression, future studies could investigate relevant epigenetic modification characteristics or use gene editing technology to knockout TE insertions within genes, thereby providing direct experimental evidence for the regulatory functions of TEs.\u003c/p\u003e"},{"header":"Materials and Methods","content":"\u003cdiv id=\"Sec7\" class=\"Section2\"\u003e \u003ch2\u003ePlant materials and genome sequencing\u003c/h2\u003e \u003cp\u003ePlant materials used in this study were collected from the Wudalianchi region, Heilongjiang Province, China. Spores were isolated from sporophytes of \u003cem\u003eD. fragrans\u003c/em\u003e and subsequently cultured to the gametophyte stage. Thereafter, gametophyte tissues were propagated through successive generations on sterile solid media containing (per liter) 4.43 g Murashige and Skoog (MS) basal salts, 10 g sucrose, and 7.5 g agar. Cultures were maintained under controlled environmental conditions (22\u0026deg;C, 16-hr light/8-hr dark photoperiod, 60% humidity) for 6\u0026ndash;8 weeks to ensure robust biomass accumulation.\u003c/p\u003e \u003cp\u003eDNA was extracted from the gametophytes of \u003cem\u003eD. fragrans\u003c/em\u003e via the SDS method, followed by purification with the QIAGEN\u0026reg; Genomic Kit (Cat# 13343). DNA quality was assessed on 1% agarose gels to check for degradation and contamination. Purity was evaluated via a NanoDrop\u0026trade; One UV‒Vis spectrophotometer (Thermo Fisher Scientific), with OD260/280 ratios between 1.8 and 2.0 and OD260/230 ratios between 2.0 and 2.2. The DNA concentration was measured via a Qubit\u0026reg; 3.0 fluorometer (Invitrogen, USA).\u003c/p\u003e \u003cp\u003eFor long-read sequencing, 2 \u0026micro;g of qualified DNA per sample was used as input for Oxford Nanopore Technologies (ONT) library preparation. DNA was size-selected via the BluePippin system (Sage Science, USA) before end-repair and A-tailing with the NEBNext Ultra II End Repair/dA-tailing Kit (Cat# E7546). Adapter ligation was performed via an LSK109 kit (Oxford Nanopore Technologies). Library quality was quantified with a Qubit\u0026reg; 3.0 fluorometer. Sequencing was performed on the GridION X5/PromethION platform (Oxford Nanopore Technologies).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003eGenome Assembly\u003c/h2\u003e \u003cp\u003eThe genome assembly was carried out via ONT reads via a hybrid approach. First, raw Nanopore reads were basecalled from FAST5 to FASTQ via Guppy (version 3.2.2\u0026thinsp;+\u0026thinsp;9fe0a78)\u003csup\u003e43\u003c/sup\u003e. Low-quality reads (mean_qscore_template\u0026thinsp;\u0026lt;\u0026thinsp;7) were filtered out. De novo genome assembly was conducted via NextDenovo (v2.3.1)\u003csup\u003e44\u003c/sup\u003e, which employs an overlap-layout-consensus (OLC) strategy. Given the high error rate of ONT reads, subreads were self-corrected via NextCorrect to generate consistent sequences (CNS reads). After CNS correlation analysis with NextGraph, a preliminary genome assembly was constructed.\u003c/p\u003e \u003cp\u003eTo refine the assembly, contigs were corrected via Racon (v1.3.1)\u003csup\u003e45\u003c/sup\u003e (with ONT long reads) and polished via Nextpolish (v1.3.0)\u003csup\u003e46,47\u003c/sup\u003e (using Illumina short reads). Redundant contigs were removed via similarity searches (identity\u0026thinsp;\u0026ge;\u0026thinsp;80%, overlap\u0026thinsp;\u0026ge;\u0026thinsp;80%). The completeness of the genome was assessed via BUSCO (v4.0.5)\u003csup\u003e48\u003c/sup\u003e and CEGMA (v2)\u003csup\u003e\u003cspan citationid=\"CR49\" class=\"CitationRef\"\u003e49\u003c/span\u003e\u003c/sup\u003e. Assembly accuracy was evaluated by mapping Illumina paired-end reads to the genome with BWA (0.7.12-r1039)\u003csup\u003e\u003cspan citationid=\"CR50\" class=\"CitationRef\"\u003e50\u003c/span\u003e\u003c/sup\u003e and SAMtools (v1.4)\u003csup\u003e51\u003c/sup\u003e to assess the mapping rate and genome coverage. Base accuracy was further calculated with BCFtools (v1.8.0)\u003csup\u003e52\u003c/sup\u003e. Additionally, RNA-seq reads were aligned to the genome to assess gene coverage, and mitochondrial sequences were excluded by submitting the draft genome to the NT library for sequence filtering\u003csup\u003e\u003cspan citationid=\"CR53\" class=\"CitationRef\"\u003e53\u003c/span\u003e\u003c/sup\u003e. For telomere prediction, the quartet_teloexplorer.py script from quarTeT was utilized, in which the species was specified as \"plant\", and all other parameters were set to default values\u003csup\u003e\u003cspan citationid=\"CR54\" class=\"CitationRef\"\u003e54\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eHi-C scaffolding\u003c/h3\u003e\n\u003cp\u003eTo anchor the scaffolds to chromosomes, Hi-C libraries were prepared from the genomic DNA of the reference cultivar. Freshly harvested leaves were vacuum infiltrated in nuclei isolation buffer, fixed with formaldehyde, and ground into powder. Nuclei were isolated and digested with DpnII. Biotin-14-dCTP was incorporated into the DNA, and unligated DNA ends were removed via T4 DNA polymerase exonuclease activity. The ligated DNA was sheared into 300\u0026thinsp;\u0026minus;\u0026thinsp;600 bp fragments, repaired, and A-tailed before being purified via biotin‒streptavidin pull-down. The Hi-C library was quantified and sequenced on the Illumina NovaSeq/MGI-2000 platform.\u003c/p\u003e \u003cp\u003eHi-C raw data quality was controlled via Hi-C-Pro (v3.1.0)\u003csup\u003e55\u003c/sup\u003e, with filtering for low-quality sequences (quality score\u0026thinsp;\u0026lt;\u0026thinsp;20), adaptor sequences, and sequences shorter than 30 bp. Clean paired-end reads were aligned to the draft genome via Bowtie2 (v2.3.2)\u003csup\u003e56\u003c/sup\u003e. Valid interaction pairs were identified and retained via Hi-C-Pro (v3.1.0) for further analysis. The scaffolds were clustered, ordered, and oriented via LACHESIS, with parameters set for minimum resites, link density, and the noninformative ratio. Manual adjustment was performed to correct any orientation errors and improve chromosome-level scaffold placement.\u003c/p\u003e\n\u003ch3\u003eGene annotation and functional annotation\u003c/h3\u003e\n\u003cp\u003eGene prediction was carried out via three independent approaches: ab initio prediction, homology-based prediction, and RNA-seq-based prediction. For homology-based gene prediction, the GeMoMa (v1.6.1)\u003csup\u003e57\u003c/sup\u003e tool was used to align homologous peptides from closely related species to the repeat-masked genome assembly, allowing us to obtain gene structure information. For RNA-seq-based prediction, filtered mRNA-seq reads were aligned to the reference genome via STAR (v2.7.3a)\u003csup\u003e\u003cspan citationid=\"CR58\" class=\"CitationRef\"\u003e58\u003c/span\u003e\u003c/sup\u003e. Transcripts were assembled with StringTie (v1.3.4d)\u003csup\u003e\u003cspan citationid=\"CR59\" class=\"CitationRef\"\u003e59\u003c/span\u003e\u003c/sup\u003e, and open reading frames (ORFs) were predicted via PASA (v2.3.3)\u003csup\u003e60\u003c/sup\u003e. Additionally, RNA-seq reads were assembled de novo via StringTie, and the resulting transcripts were analyzed via PASA to generate a training set. Augustus (v3.3.1)\u003csup\u003e61\u003c/sup\u003e was then employed for ab initio gene prediction via this training set. The final gene set was generated by integrating the predictions from the three approaches via EVidenceModeler (v1.1.1)\u003csup\u003e60\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eFor functional annotation, gene models were annotated by comparing the predicted proteins against several public databases: SwissProt, Non-Redundant Protein Database (NR), Kyoto Encyclopedia of Genes and Genomes (KEGG), Clusters of Orthologous Groups of proteins (KOG), and GO. Protein domains and GO terms were identified via InterProScan (5.32\u0026ndash;71.0)\u003csup\u003e62\u003c/sup\u003e. For the other four databases, BLASTP (v2.7.1)\u003csup\u003e63\u003c/sup\u003e was used to compare the predicted protein sequences against the public protein databases, and the best hits (with the lowest E value) were retained. The results from all five databases were concatenated to provide comprehensive functional annotations for the gene models.\u003c/p\u003e \u003cdiv id=\"Sec11\" class=\"Section2\"\u003e \u003ch2\u003ePhylogenetic reconstruction and gene-family evolution\u003c/h2\u003e \u003cp\u003eWe retrieved publicly available genome assemblies and annotations for the species included in this study. Orthogroups were inferred with OrthoFinder (v2.5.5)\u003csup\u003e64\u003c/sup\u003e via default settings. Single-copy orthologs identified by OrthoFinder were aligned with MAFFT (v7.525)\u003csup\u003e\u003cspan citationid=\"CR65\" class=\"CitationRef\"\u003e65\u003c/span\u003e\u003c/sup\u003e, and poorly covered alignment positions (coverage\u0026thinsp;\u0026lt;\u0026thinsp;60%) were removed via trimAl (v1.4)\u003csup\u003e66\u003c/sup\u003e. Model selection and maximum-likelihood tree inference were performed with IQ-TREE (v2.3.3)\u003csup\u003e67\u003c/sup\u003e, and node support was assessed with 1,000 ultrafast bootstrap replicates. Divergence times were estimated with MCMCtree in the PAML (v4.10.7) package\u003csup\u003e\u003cspan citationid=\"CR68\" class=\"CitationRef\"\u003e68\u003c/span\u003e\u003c/sup\u003e; analyses used a burn-in of 200,000 iterations followed by 10,000 sampled iterations with a sampling frequency of 5. Gene family expansion and contraction were inferred via CAFE (v5.0)\u003csup\u003e69\u003c/sup\u003e. Species trees were visualized and annotated with FigTree (v1.4.4).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec12\" class=\"Section2\"\u003e \u003ch2\u003eTE Annotation and LTR Analysis\u003c/h2\u003e \u003cp\u003eFor TE annotation, a \u003cem\u003eD. fragrans\u003c/em\u003e TE library was constructed via EDTA (v2.2)\u003csup\u003e70\u003c/sup\u003e. EDTA incorporates both structure- and homology-based detection programs to annotate the predominant TE classes found in plant genomes. Unknown TEs were isolated separately and reclassified via DeepTE\u003csup\u003e\u003cspan citationid=\"CR71\" class=\"CitationRef\"\u003e71\u003c/span\u003e\u003c/sup\u003e. The complete TEs were further categorized via Tesorter (v1.4.6)\u003csup\u003e72\u003c/sup\u003e. Ultimately, a comprehensive TE database for \u003cem\u003eD. fragrans\u003c/em\u003e was established. Subsequent summary statistics and visualizations were performed via the R packages tidyverse and ggplot2.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec13\" class=\"Section2\"\u003e \u003ch2\u003eGene Structure Analysis\u003c/h2\u003e \u003cp\u003eFirst, the GFF3 files of the species under analysis were filtered to retain only protein-coding genes. The positions of exons and genes were extracted, and intron positions were identified by calculating the complement via BEDTools (v2.31.1)\u003csup\u003e73\u003c/sup\u003e. Gene length, exon length, and intron length were then quantified, with visualizations created via the R packages ggplot2 and ggridges. Genes with log10-transformed intron lengths greater than four were defined as having long introns. TE insertions within these long-intron genes and other genes were assessed via bedtools. The types and lengths of TEs inserted into introns were then summarized.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec14\" class=\"Section2\"\u003e \u003ch2\u003eIdentification of Solo/Intact LTRs\u003c/h2\u003e \u003cp\u003eTo evaluate the dynamics of LTR retrotransposons, intact and solo LTR elements in the \u003cem\u003eD. fragrans\u003c/em\u003e genome were identified and classified. An intact LTR was defined as a retrotransposon containing two recognizable LTR sequences (both ends) within 1,000 bp of the flanking regions, along with an internal coding region, typically including domains such as gag, pol, and reverse transcriptase.\u003c/p\u003e \u003cp\u003eIn contrast, a single LTR was defined as a partial element retaining only a single LTR sequence along with a truncated internal region or lacking it entirely. These genes were identified via EDTA and further validated through sequence homology and structure-based annotation.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec15\" class=\"Section2\"\u003e \u003ch2\u003eWGD analysis\u003c/h2\u003e \u003cp\u003eTo assess the WGD history of \u003cem\u003eD. fragrans\u003c/em\u003e, first, conserved collinear blocks within the assembled genome were identified via intragenomic synteny analysis via JCVI\u003csup\u003e\u003cspan citationid=\"CR74\" class=\"CitationRef\"\u003e74\u003c/span\u003e\u003c/sup\u003e. Only gene pairs located within syntenic blocks containing at least five collinear genes were retained for further analysis. The number and size of these blocks were used to evaluate large-scale duplication events.\u003c/p\u003e \u003cp\u003eTo estimate the timing of gene duplications, Ks was calculated for paralogous gene pairs via ParaAT (v2.0)\u003csup\u003e75\u003c/sup\u003e and KaKs_Calculator (v3.0)\u003csup\u003e76\u003c/sup\u003e. The Ks distribution was fitted with a Gaussian model to detect putative peaks indicative of ancient or recent WGD events.\u003c/p\u003e \u003cp\u003eIn addition, the MultitAxon Paleopolyploidy Search (MAPS)\u003csup\u003e\u003cspan citationid=\"CR77\" class=\"CitationRef\"\u003e77\u003c/span\u003e\u003c/sup\u003e approach was applied to identify phylogenetically congruent duplication events across ferns. Gene family trees were reconstructed via genome and transcriptome data from multiple species via OrthoFinder (v2.5.5). To reduce transcript redundancy, transcriptome datasets were clustered via cd-hit (v4.8.1)\u003csup\u003e78\u003c/sup\u003e at a 90% identity threshold prior to analysis. Duplicated signals were then mapped onto internal nodes of the species phylogeny. To assess the significance of the observed duplication patterns, we performed simulation-based comparisons against both null and positive models.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec16\" class=\"Section2\"\u003e \u003ch2\u003eFull-Length Transcriptome Sequencing and Alternative Splicing Analysis\u003c/h2\u003e \u003cp\u003eTo characterize the full-length transcriptome and identify alternative splicing (AS) events in \u003cem\u003eD. fragrans\u003c/em\u003e, third-generation sequencing was performed via the PacBio CCS platform on samples collected from various tissues and developmental stages. HiFi reads were processed with the Iso-Seq3 (v4.2.0) pipeline to generate full-length nonconcatemer (FLNC) reads. These FLNC reads were then aligned to the reference genome via Minimap2-2.27 (r1193)\u003csup\u003e\u003cspan citationid=\"CR79\" class=\"CitationRef\"\u003e79\u003c/span\u003e\u003c/sup\u003e, and the resulting alignments were integrated to produce a preliminary GTF annotation file. To refine the transcript models, we used SQANTI3 (v5.3.6)\u003csup\u003e80\u003c/sup\u003e to compare the full-length transcriptome-based annotations with the existing genome annotations, resulting in a final, high-confidence gene annotation set. For AS analysis, SUPPA2 (v2.4)\u003csup\u003e81\u003c/sup\u003e was used to identify AS events on the basis of the full-length transcriptome annotation. The inclusion levels (percent spliced-in, PSI) of splicing events were calculated by integrating second-generation (Illumina) transcriptome data.\u003c/p\u003e \u003c/div\u003e"},{"header":"Declarations","content":"\u003ch2\u003eAcknowledgements\u003c/h2\u003e \u003cp\u003eThis work was supported by the National Natural Science Foundation of China (no. 32270394 to Y Chang and no.32370243 to YH Fang), a startup fund from Linyi University (LYDX2019BS039). We gratefully acknowledge financial support from the \u0026ldquo;Double First-Class\u0026rdquo; initiative of Heilongjiang Province for the advantageous and characteristic discipline of Chinese Materia Medica Biogenetics. NGS, ONT, and Hi-C sequencing of the genome were conducted via grandomic methods. Full-length transcriptome sequencing was performed by BerryGenomics Co., Ltd.\u003c/p\u003e\n \u003cdiv id=\"Sec17\" class=\"Section2\"\u003e \u003ch2\u003eData availability\u003c/h2\u003e \u003cp\u003eThe \u003cem\u003eD. fragrans\u003c/em\u003e genome assembly and all of the raw sequencing data have been deposited at China National Center for Bioinformation (CNCB)\u003csup\u003e\u003cspan citationid=\"CR82\" class=\"CitationRef\"\u003e82\u003c/span\u003e,\u003cspan citationid=\"CR83\" class=\"CitationRef\"\u003e83\u003c/span\u003e\u003c/sup\u003e, under the BioProject accession number PRJCA042196.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec18\" class=\"Section2\"\u003e \u003ch2\u003eAuthors\u0026rsquo; contributions\u003c/h2\u003e \u003cp\u003eY.C. and S.W. conceived the study. W.D., Q.W. and Y.F. designed and managed the major scientific objectives. D.Z. and C.S. managed the plant materials. X.Q. assembled the genome and estimated the genome size. W.D. annotated the genome and transposons and performed the data analysis. D.Z. contributed to the full-length transcriptome sequencing. W.D., Q.W., Y.F. and S.W. led the manuscript preparation. All the authors read and approved the final manuscript.\u003c/p\u003e \u003c/div\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eTesto, W., Sundue, M.: A 4000-species dataset provides new insight into the evolution of ferns. Mol. Phylogenet. Evol. \u003cb\u003e105\u003c/b\u003e, 200\u0026ndash;211 (2016)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKenrick, P., Crane, P.R.: The origin and early evolution of plants on land. Nature. \u003cb\u003e389\u003c/b\u003e, 33\u0026ndash;39 (1997)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNildas, K.J., Tiftneyt, B.: H. Patterns in vascular land plant diversification. (1983)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eI, P.: A community-derived classification for extant lycophytes and ferns. J. Syst. Evol. \u003cb\u003e54\u003c/b\u003e, 563\u0026ndash;603 (2016)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSessa, E.B., Der, J.P.: Chapter Seven - Evolutionary Genomics of Ferns and Lycophytes. In: Rensing, S.A. (ed.) Advances in Botanical Research, vol. 78, pp. 215\u0026ndash;254. Academic (2016)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFern\u0026aacute;ndez, P., et al.: A 160 Gbp fork fern genome shatters size record for eukaryotes. \u003cem\u003eiScience\u003c/em\u003e 27, (2024)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHuang, C.-H., Qi, X., Chen, D., Qi, J., Ma, H.: Recurrent genome duplication events likely contributed to both the ancient and recent rise of ferns. J. Integr. Plant Biol. \u003cb\u003e62\u003c/b\u003e, 433\u0026ndash;455 (2020)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePelosi, J.A., Kim, E.H., Barbazuk, W.B., Sessa, E.B.: Phylotranscriptomics Illuminates the Placement of Whole Genome Duplications and Gene Retention in Ferns. Front. Plant. Sci. \u003cb\u003e13\u003c/b\u003e, (2022)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFang, Y., et al.: The genome of homosporous maidenhair fern sheds light on the euphyllophyte evolution and defences. Nat. Plants. \u003cb\u003e8\u003c/b\u003e, 1024\u0026ndash;1037 (2022)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMarchant, D.B., et al.: Dynamic genome evolution in a model fern. Nat. Plants. \u003cb\u003e8\u003c/b\u003e, 1038\u0026ndash;1051 (2022)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi, F.-W., et al.: Fern genomes elucidate land plant evolution and cyanobacterial symbioses. Nat. Plants. \u003cb\u003e4\u003c/b\u003e, 460\u0026ndash;472 (2018)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHuang, X., et al.: The flying spider-monkey tree fern genome provides insights into fern evolution and arborescence. Nat. Plants. \u003cb\u003e8\u003c/b\u003e, 500\u0026ndash;512 (2022)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhong, Y., et al.: Genomic Insights into Genetic Diploidization in the Homosporous Fern Adiantum nelumboides. Genome Biol. Evol. \u003cb\u003e14\u003c/b\u003e, evac127 (2022)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChen, H., et al.: Revisiting ancient polyploidy in leptosporangiate ferns. New Phytol. \u003cb\u003e237\u003c/b\u003e, 1405\u0026ndash;1417 (2023)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAlseekh, S., Scossa, F., Fernie, A.R.: Mobile Transposable Elements Shape Plant Genome Diversity. Trends Plant Sci. \u003cb\u003e25\u003c/b\u003e, 1062\u0026ndash;1064 (2020)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHirsch, C.D., Springer, N.M.: Transposable element influences on gene expression in plants. Biochim. et Biophys. Acta (BBA) - Gene Regul. Mech. \u003cb\u003e1860\u003c/b\u003e, 157\u0026ndash;165 (2017)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHassan, A.H., Mokhtar, M.M., El Allali, A.: Transposable elements: multifunctional players in the plant genome. Front. Plant. Sci. \u003cb\u003e14\u003c/b\u003e, (2024)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWicker, T., et al.: A unified classification system for eukaryotic transposable elements. Nat. Rev. Genet. \u003cb\u003e8\u003c/b\u003e, 973\u0026ndash;982 (2007)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang, X., Wessler, S.R.: Genome-wide comparative analysis of the transposable elements in the related species Arabidopsis thaliana and Brassica oleracea. \u003cem\u003eProceedings of the National Academy of Sciences\u003c/em\u003e 101, 5589\u0026ndash;5594 (2004)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eStaton, S.E., Burke, J.M.: Evolutionary transitions in the Asteraceae coincide with marked shifts in transposable element abundance. BMC Genom. \u003cb\u003e16\u003c/b\u003e, 623 (2015)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWicker, T., et al.: Impact of transposable elements on genome structure and evolution in bread wheat. Genome Biol. \u003cb\u003e19\u003c/b\u003e, 103 (2018)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLTR-retrotransposons in plants: Engines of evolution. Gene. \u003cb\u003e626\u003c/b\u003e, 14\u0026ndash;25 (2017)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi, Z.-W., et al.: Transposable Elements Contribute to the Adaptation of Arabidopsis thaliana. Genome Biol. Evol. \u003cb\u003e10\u003c/b\u003e, 2140\u0026ndash;2150 (2018)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMakarevitch, I., et al.: Transposable elements contribute to activation of maize genes in response to abiotic stress. PLoS Genet. \u003cb\u003e11\u003c/b\u003e, e1004915 (2015)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLisch, D.: How important are transposons for plant evolution? Nat. Rev. Genet. \u003cb\u003e14\u003c/b\u003e, 49\u0026ndash;61 (2013)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eQin, G., et al.: Chromosome-Scale Genome of the Fern Cibotium barometz Unveils a Genetic Resource of Medicinal Value. Horticulturae. \u003cb\u003e10\u003c/b\u003e, 1191 (2024)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWei, Z.: Resolving the Stasis-Dynamism Paradox: Genome Evolution in Tree Ferns\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePelosi, J., et al.: The genome of the vining fern \u003cem\u003eLygodium microphyllum\u003c/em\u003e highlights genomic and functional differences between life phases of an invasive plant. Preprint at. (2025). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1101/2025.03.06.640867\u003c/span\u003e\u003cspan address=\"10.1101/2025.03.06.640867\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSessa, E.B., Zimmer, E.A., Givnish, T.J.: Phylogeny, divergence times, and historical biogeography of New World Dryopteris (Dryopteridaceae). Am. J. Bot. \u003cb\u003e99\u003c/b\u003e, 730\u0026ndash;750 (2012)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZuo, Z.-Y., et al.: A revised classification of Dryopteridaceae based on plastome phylogenomics and morphological evidence, with the description of a new genus, Pseudarachniodes. Plant. Divers. \u003cb\u003e47\u003c/b\u003e, 34\u0026ndash;52 (2025)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChen, L., et al.: Microbial-type terpene synthases significantly contribute to the terpene profile of glandular trichomes of the fern \u003cem\u003eDryopteris fragrans\u003c/em\u003e (L). Plant J. \u003cb\u003e121\u003c/b\u003e, e70079 (2025)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHassan, A.H., Mokhtar, M.M.: El Allali, A. Transposable elements: multifunctional players in the plant genome. Front. Plant. Sci. \u003cb\u003e14\u003c/b\u003e, 1330127 (2024)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang, Y., et al.: Evolutionary rewiring of the wheat transcriptional regulatory network by lineage-specific transposable elements. Genome Res. \u003cb\u003e31\u003c/b\u003e, 2276\u0026ndash;2289 (2021)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChoulet, F., et al.: Megabase Level Sequencing Reveals Contrasted Organization and Evolution Patterns of the Wheat Gene and Transposable Element Spaces. Plant. Cell. \u003cb\u003e22\u003c/b\u003e, 1686\u0026ndash;1701 (2010)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSong, C., Guan, Y., Zhang, D., Tang, X., Chang, Y.: Integrated mRNA and miRNA Transcriptome Analysis Suggests a Regulatory Network for UV\u0026ndash;B-Controlled Terpenoid Synthesis in Fragrant Woodfern (Dryopteris fragrans). \u003cem\u003eIJMS\u003c/em\u003e 23, 5708 (2022)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLu, Z., Huang, Q., Zhang, T., Hu, B., Chang, Y.: Global transcriptome analysis and characterization of Dryopteris fragrans (L.) Schott sporangium in different developmental stages. BMC Genom. \u003cb\u003e19\u003c/b\u003e, 471 (2018)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eXiao, Y., Wang, J.: Understanding the Regulation Activities of Transposons in Driving the Variation and Evolution of Polyploid Plant Genome. Plants. \u003cb\u003e14\u003c/b\u003e, 1160 (2025)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eOliver, K.R., McComb, J.A., Greene, W.K.: Transposable Elements: Powerful Contributors to Angiosperm Evolution and Diversity. Genome Biol. Evol. \u003cb\u003e5\u003c/b\u003e, 1886\u0026ndash;1901 (2013)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMhiri, C., Borges, F., Grandbastien, M.-A.: Specificities and Dynamics of Transposable Elements in Land Plants. Biology. \u003cb\u003e11\u003c/b\u003e, 488 (2022)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNegi, P., Rai, A.N., Suprasanna, P.: Moving through the Stressed Genome: Emerging Regulatory Roles for Transposons in Plant Stress Response. Front. Plant. Sci. \u003cb\u003e7\u003c/b\u003e, (2016)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRoquis, D., et al.: Genomic impact of stress-induced transposable element mobility in Arabidopsis. Nucleic Acids Res. \u003cb\u003e49\u003c/b\u003e, 10431\u0026ndash;10447 (2021)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eThieme, M., et al.: Experimentally heat-induced transposition increases drought tolerance in \u003cem\u003eArabidopsis thaliana\u003c/em\u003e. New Phytol. \u003cb\u003e236\u003c/b\u003e, 182\u0026ndash;194 (2022)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWick, R.R., Judd, L.M., Holt, K.E.: Performance of neural network basecalling tools for Oxford Nanopore sequencing. Genome Biol. \u003cb\u003e20\u003c/b\u003e, 129 (2019)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHu, J., et al.: An efficient error correction and accurate assembly tool for noisy long reads. Preprint at. (2023). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1101/2023.03.09.531669\u003c/span\u003e\u003cspan address=\"10.1101/2023.03.09.531669\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eVaser, R., Sović, I., Nagarajan, N., Šikić, M.: Fast and accurate de novo genome assembly from long uncorrected reads. 068122 Preprint at (2016). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1101/068122\u003c/span\u003e\u003cspan address=\"10.1101/068122\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHu, J., Fan, J., Sun, Z., Liu, S.: NextPolish: a fast and efficient genome polishing tool for long-read assembly. Bioinformatics. \u003cb\u003e36\u003c/b\u003e, 2253\u0026ndash;2255 (2020)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChen, S., Zhou, Y., Chen, Y., Gu, J.: fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. \u003cb\u003e34\u003c/b\u003e, i884\u0026ndash;i890 (2018)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSim\u0026atilde;o, F.A., Waterhouse, R.M., Ioannidis, P., Kriventseva, E.V., Zdobnov, E.M.: BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. \u003cb\u003e31\u003c/b\u003e, 3210\u0026ndash;3212 (2015)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eParra, G., Bradnam, K., Korf, I.: CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics. \u003cb\u003e23\u003c/b\u003e, 1061\u0026ndash;1067 (2007)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi, H., Durbin, R.: Fast and accurate short read alignment with Burrows\u0026ndash;Wheeler transform. Bioinformatics. \u003cb\u003e25\u003c/b\u003e, 1754\u0026ndash;1760 (2009)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi, H., et al.: The Sequence Alignment/Map format and SAMtools. Bioinformatics. \u003cb\u003e25\u003c/b\u003e, 2078\u0026ndash;2079 (2009)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDanecek, P., McCarthy, S.A.: BCFtools/csq: haplotype-aware variant consequences. Bioinformatics. \u003cb\u003e33\u003c/b\u003e, 2037\u0026ndash;2039 (2017)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKim, D., Langmead, B., Salzberg, S.L.: HISAT: a fast spliced aligner with low memory requirements. Nat. Methods. \u003cb\u003e12\u003c/b\u003e, 357\u0026ndash;360 (2015)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLin, Y., et al.: quarTeT: a telomere-to-telomere toolkit for gap-free genome assembly and centromeric repeat identification. Hortic. Res. \u003cb\u003e10\u003c/b\u003e, uhad127 (2023)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eServant, N., et al.: HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. \u003cb\u003e16\u003c/b\u003e, 259 (2015)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLangmead, B., Salzberg, S.L.: Fast gapped-read alignment with Bowtie 2. Nat. Methods. \u003cb\u003e9\u003c/b\u003e, 357\u0026ndash;359 (2012)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKeilwagen, J., et al.: Using intron position conservation for homology-based gene prediction. Nucleic Acids Res. \u003cb\u003e44\u003c/b\u003e, e89\u0026ndash;e89 (2016)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDobin, A., et al.: STAR: ultrafast universal RNA-seq aligner. Bioinformatics. \u003cb\u003e29\u003c/b\u003e, 15\u0026ndash;21 (2013)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKovaka, S., et al.: Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biol. \u003cb\u003e20\u003c/b\u003e, 278 (2019)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHaas, B.J., et al.: Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol. \u003cb\u003e9\u003c/b\u003e, R7 (2008)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eStanke, M., Diekhans, M., Baertsch, R., Haussler, D.: Using native and syntenically mapped cDNA alignments to improve \u003cem\u003ede novo\u003c/em\u003e gene finding. Bioinformatics. \u003cb\u003e24\u003c/b\u003e, 637\u0026ndash;644 (2008)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZdobnov, E.M., Apweiler, R.: InterProScan \u0026ndash; an integration platform for the signature-recognition methods in InterPro. Bioinformatics. \u003cb\u003e17\u003c/b\u003e, 847\u0026ndash;848 (2001)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMcGinnis, S., Madden, T.L.: BLAST: at the core of a powerful and diverse set of sequence analysis tools. Nucleic Acids Res. \u003cb\u003e32\u003c/b\u003e, W20\u0026ndash;W25 (2004)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eEmms, D.M., Kelly, S.: OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. \u003cb\u003e20\u003c/b\u003e, 238 (2019)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKatoh, K.: MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. \u003cb\u003e30\u003c/b\u003e, 3059\u0026ndash;3066 (2002)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCapella-Guti\u0026eacute;rrez, S., Silla-Mart\u0026iacute;nez, J.M., Gabald\u0026oacute;n, T.: trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics. \u003cb\u003e25\u003c/b\u003e, 1972\u0026ndash;1973 (2009)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMinh, B.Q., et al.: IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era. Mol. Biol. Evol. \u003cb\u003e37\u003c/b\u003e, 1530\u0026ndash;1534 (2020)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYang, Z.: PAML 4: Phylogenetic Analysis by Maximum Likelihood. Mol. Biol. Evol. \u003cb\u003e24\u003c/b\u003e, 1586\u0026ndash;1591 (2007)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMendes, F.K., Vanderpool, D., Fulton, B., Hahn: M. W. CAFE 5 models variation in evolutionary rates among gene families. Bioinformatics. \u003cb\u003e36\u003c/b\u003e, 5516\u0026ndash;5518 (2021)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eOu, S., et al.: Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline. Genome Biol. \u003cb\u003e20\u003c/b\u003e, 275 (2019)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYan, H., Bombarely, A., Li, S.: DeepTE: a computational method for de novo classification of transposons with convolutional neural network\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang, R.-G., Wang, Z.-X., Ou, S., Li, G.-Y.: TEsorter: lineage-level classification of transposable elements using conserved protein domains. Preprint at. (2019). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1101/800177\u003c/span\u003e\u003cspan address=\"10.1101/800177\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eQuinlan, A.R., BEDTools: The Swiss-Army Tool for Genome Feature Analysis. CP Bioinf. \u003cb\u003e47\u003c/b\u003e, (2014)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTang, H., et al.: JCVI: A versatile toolkit for comparative genomics analysis. \u003cem\u003eiMeta\u003c/em\u003e 3, e211 (2024)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang, Z., et al.: ParaAT: A parallel tool for constructing multiple protein-coding DNA alignments. Biochem. Biophys. Res. Commun. \u003cb\u003e419\u003c/b\u003e, 779\u0026ndash;781 (2012)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang, Z.: KaKs_Calculator 3.0: Calculating Selective Pressure on Coding and Non-Coding Sequences. Genom. Proteom. Bioinform. \u003cb\u003e20\u003c/b\u003e, 536\u0026ndash;540 (2022)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi, Z., et al.: Early genome duplications in conifers and other seed plants. Sci. Adv. (2015). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1126/sciadv.1501084\u003c/span\u003e\u003cspan address=\"10.1126/sciadv.1501084\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFu, L., Niu, B., Zhu, Z., Wu, S., Li, W.: CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics. \u003cb\u003e28\u003c/b\u003e, 3150\u0026ndash;3152 (2012)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi, H.: Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. \u003cb\u003e34\u003c/b\u003e, 3094\u0026ndash;3100 (2018)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePardo-Palacios, F.J., et al.: SQANTI3: curation of long-read transcriptomes for accurate identification of known and novel isoforms. Nat. Methods. \u003cb\u003e21\u003c/b\u003e, 793\u0026ndash;797 (2024)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTrincado, J.L., et al.: SUPPA2: fast, accurate, and uncertainty-aware differential splicing analysis across multiple conditions. Genome Biol. \u003cb\u003e19\u003c/b\u003e, 40 (2018)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChen, T., et al.: The Genome Sequence Archive Family: Toward Explosive Data Growth and Diverse Data Types. Genom. Proteom. Bioinform. \u003cb\u003e19\u003c/b\u003e, 578\u0026ndash;583 (2021)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMembers, C.N.C.B.-N.G.D.C., Partners, et al.: Database Resources of the National Genomics Data Center, China National Center for Bioinformation in 2025. Nucleic Acids Res. \u003cb\u003e53\u003c/b\u003e, D30\u0026ndash;D44 (2025)\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"},{"header":"Tables","content":"\u003cdiv\u003e\n \u003ctable border=\"0\" cellspacing=\"0\" cellpadding=\"0\" width=\"442\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd colspan=\"3\" style=\"width: 76.7526%;\"\u003e\n \u003cp\u003e\u003cstrong\u003eTable 1. Genome assembly statistics of \u003cem\u003eD. fragrans\u003c/em\u003e.\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd colspan=\"3\" style=\"width: 76.7526%;\"\u003e\n \u003cp\u003e\u003cstrong\u003eGenome\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 50.3412%;\"\u003e\n \u003cp\u003eSize (Gb)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 12.5853%;\"\u003e\n \u003cp\u003e4.45\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 14.0034%;\"\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 50.3412%;\"\u003e\n \u003cp\u003eContig N50 (Mb)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 12.5853%;\"\u003e\n \u003cp\u003e5.70\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 14.0034%;\"\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 50.3412%;\"\u003e\n \u003cp\u003eScaffold N50 (Mb)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 12.5853%;\"\u003e\n \u003cp\u003e108.12\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 14.0034%;\"\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 50.3412%;\"\u003e\n \u003cp\u003eContig number\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 12.5853%;\"\u003e\n \u003cp\u003e1,740\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 14.0034%;\"\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 50.3412%;\"\u003e\n \u003cp\u003eSequence anchored on chromosome (%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 12.5853%;\"\u003e\n \u003cp\u003e99.33\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 14.0034%;\"\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 50.3412%;\"\u003e\n \u003cp\u003eGC content (%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 12.5853%;\"\u003e\n \u003cp\u003e40.53\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 14.0034%;\"\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 50.3412%;\"\u003e\n \u003cp\u003eHigh-copy repeat content (%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 12.5853%;\"\u003e\n \u003cp\u003e85.39\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 14.0034%;\"\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd colspan=\"3\" style=\"width: 76.7526%;\"\u003e\n \u003cp\u003e\u003cstrong\u003eProtein-coding gene\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 50.3412%;\"\u003e\n \u003cp\u003eMean gene length (bp)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 12.5853%;\"\u003e\n \u003cp\u003e36,768\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 14.0034%;\"\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 50.3412%;\"\u003e\n \u003cp\u003eMean CDS length (bp)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 12.5853%;\"\u003e\n \u003cp\u003e1,301\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 14.0034%;\"\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 50.3412%;\"\u003e\n \u003cp\u003eMean exon length (bp)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 12.5853%;\"\u003e\n \u003cp\u003e251\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 14.0034%;\"\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 50.3412%;\"\u003e\n \u003cp\u003eMean intron length (bp)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 12.5853%;\"\u003e\n \u003cp\u003e8,497\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 14.0034%;\"\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 50.3412%;\"\u003e\n \u003cp\u003eMean exon number per gene\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 12.5853%;\"\u003e\n \u003cp\u003e5.17\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 14.0034%;\"\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 50.3412%;\"\u003e\n \u003cp\u003eGene number\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 12.5853%;\"\u003e\n \u003cp\u003e33,489\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 14.0034%;\"\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 50.3412%;\"\u003e\n \u003cp\u003e\u003cstrong\u003eNoncoding\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 12.5853%;\"\u003e\n \u003cp\u003e\u003cstrong\u003eNumber\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 15.4214%;\"\u003e\n \u003cp\u003e\u003cstrong\u003eTotal (bp)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 50.3412%;\"\u003e\n \u003cp\u003emiRNA\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 12.5853%;\"\u003e\n \u003cp\u003e116\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 14.0034%;\"\u003e\n \u003cp\u003e13,673\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 50.3412%;\"\u003e\n \u003cp\u003etRNA\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 12.5853%;\"\u003e\n \u003cp\u003e3,811\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 14.0034%;\"\u003e\n \u003cp\u003e287,962\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 50.3412%;\"\u003e\n \u003cp\u003erRNA\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 12.5853%;\"\u003e\n \u003cp\u003e514\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 14.0034%;\"\u003e\n \u003cp\u003e721,680\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n\u003c/div\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"nature-portfolio","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"","title":"Nature Portfolio","twitterHandle":"","acdcEnabled":false,"dfaEnabled":false,"editorialSystem":"ejp","reportingPortfolio":"","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"","lastPublishedDoi":"10.21203/rs.3.rs-7268223/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-7268223/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eFerns are ancient vascular plants pivotal to plant evolution research. Although sequencing technologies have advanced fern genomic studies, scarce genomic resources for specialist-habitat ferns limit insights into their genome evolution. \u003cem\u003eDryopteris fragrans\u003c/em\u003e (L.) Schott is a fern endemic to sun-exposed volcanic-lava habitats; here, we generated its high-quality chromosome-level genome assembly, and explored the drivers of its genomic evolution and habitat adaptation via whole-genome duplication (WGD) detection, gene family evolution analysis and other approaches. No recent WGD event was detected in \u003cem\u003eD. fragrans\u003c/em\u003e, while transposable elements (TEs)\u0026mdash;the major genomic component, associated with the expansion of environment-adaptive gene families\u0026mdash;were identified as the primary evolutionary driver. Specifically, TEs shape gene structure by forming clade-specific long-intron genes, regulate gene expression through promoter insertion, and increase alternative splicing events in host genes. This study reports the first high-quality genome of a volcanic-lava-adapted fern, revealing TEs as potential key drivers of \u003cem\u003eD. fragrans\u003c/em\u003e\u0026rsquo; genomic evolution and habitat adaptation. Our findings advance understanding of TE functions in non-seed plant evolution, and provide valuable genomic resources for researching early land plant adaptation and regulatory innovation.\u003c/p\u003e","manuscriptTitle":"The chromosome-level genome assembly of Dryopteris fragrans reveals transposon-mediated genome evolution and adaptation","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-03-03 03:32:10","doi":"10.21203/rs.3.rs-7268223/v1","editorialEvents":[],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"communications-biology","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"commsbio","sideBox":"Learn more about [Communications Biology](http://www.nature.com/commsbio/)","snPcode":"","submissionUrl":"","title":"Communications Biology","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"ejp","reportingPortfolio":"Communications Series","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"41aa85e6-19a4-4c6f-8afe-1f01fc6b9313","owner":[],"postedDate":"March 3rd, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[{"id":63742106,"name":"Biological sciences/Plant sciences/Plant molecular biology"},{"id":63742107,"name":"Biological sciences/Plant sciences/Plant evolution"}],"tags":[],"updatedAt":"2026-04-24T17:20:33+00:00","versionOfRecord":[],"versionCreatedAt":"2026-03-03 03:32:10","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-7268223","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-7268223","identity":"rs-7268223","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.