Chromosome-Level Genome Assembly Unveils the Molecular Mechanisms Underlying Disease Resistance in Ulmus parvifolia

doi:10.21203/rs.3.rs-4754772/v1

Chromosome-Level Genome Assembly Unveils the Molecular Mechanisms Underlying Disease Resistance in Ulmus parvifolia

2024 · doi:10.21203/rs.3.rs-4754772/v1

preprint OA: closed

Full text JSON View at publisher

Full text 131,114 characters · extracted from preprint-html · click to expand

Chromosome-Level Genome Assembly Unveils the Molecular Mechanisms Underlying Disease Resistance in Ulmus parvifolia | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Chromosome-Level Genome Assembly Unveils the Molecular Mechanisms Underlying Disease Resistance in Ulmus parvifolia Yun-Zhou Lyu, Hai-Nan Sun, Rui-Chang Yan, Jiang-tao Shi, Li-Bin Huang, and 3 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-4754772/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract The absence of a comprehensive genome assembly for Ulmus parvifolia hinders advancements in scientific research and practical breeding efforts, ultimately affecting the cultivation of elm varieties with enhanced resistance to diseases. In this study, we presented a high-quality chromosome-level genome assembly of U. parvifolia by integrating various sequencing approaches. We discovered that the U. parvifolia genome is more than twice the size of Ulmus americana , primarily due to the large-scale amplification of long terminal repeat (LTR) retrotransposons. Phylogenetic analysis positioned U. parvifolia in a closer evolutionary relationship with Moraceae, followed by Cannabaceae, Rhamnaceae, and Rosaceae. Notably, gene families associated with disease resistance and immune response were significantly expanded in U. parvifolia , pointing to an adaptive evolution to various biotic and abiotic stresses. Chromosomal evolution analysis indicated a possible whole-genome triplication event in the evolutionary history of U. parvifolia . To study the differing susceptibility of U. parvifolia and U. americana to Dutch elm disease, we inoculated both elms with Ceratocystis ulmi and performed comparative transcriptomes analyses at 48, 96, and 144 hours post-inoculation. The results showed that several plant defense and immune response pathways were more highly expressed in U. parvifolia at 48 and 96 hours post-inoculation, implying a potential genetic basis for its higher resistance to Dutch elm disease. Our study represents an advancement in the genomic understanding of U. parvifolia , and especially sheds light on the genetic underpinnings of disease resistance in elms, and provides a foundation for future research into elm breeding for disease resistance and conservation efforts. Ulmus parvifolia de novo genome assembly Dutch elm disease Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Introduction Ulmus parvifolia , known as the Chinese elm, is a deciduous tree species native to East Asia, particularly found in regions such as China, Korea, and Japan, where it thrives in various climatic conditions, from the warm subtropical regions to the cooler temperate zones. U. parvifolia is recognized for its ecological and economic value, especially in urban forestry and horticulture. It is appreciated for its ability to tolerate pollution and harsh urban conditions, making it a favored option for city landscapes where it can provide shade, reduce noise, and improve air quality. Furthermore, its aesthetic appeal, including its attractive, peeling bark and dense foliage, enabling it a popular choice for urban landscaping and as a bonsai subject 1 – 3 . U. parvifolia has demonstrated a significant level of resistance to Dutch elm disease, a fungal infection caused by Ceratocystis ulmi , Ophiostoma ulmi and O. novo-ulmi , that has led to widespread decline of elm populations, particularly Ulmus americana , or the American elm. The resistance of U. parvifolia is of particular interest to arborists and plant pathologists, as it offers potential avenues for the development of resistant strains through breeding programs. Different from U. americana , which is highly susceptible to the disease and has suffered substantial losses, U. parvifolia has shown a capacity to withstand or tolerate the pathogen, making it a valuable resource in the fight against Dutch elm disease 4 – 9 . Therefore, understanding the contrasting susceptibility of U. parvifolia and U. americana to Dutch elm disease can provide insights into the genetic and molecular foundations of disease resistance, which is essential for the development of resistant elm varieties through breeding or genetic engineering techniques. Such research efforts are important for conservation, given the ecological significance of elms and the potential for their decline due to disease to significantly impact ecosystems. Understanding the variations in susceptibility between the two elm species can offer valuable insights into elm evolution and the possible effects of climate change on disease spread. However, the current limitation of having only one single, low-quality assembly for Ulmus species ( U. americana ) poses challenges for functional genomics, gene discovery, and application of modern biotechnology for breeding. Furthermore, comparative genomics research is also hindered, affecting broader scientific insights into plant evolution and disease resistance mechanisms. In this study, we presented a high-quality chromosome-level genome assembly for U. parvifolia by incorporating multiple sequencing strategies. Applying phylogenetic analysis, we delineated the phylogenetic relationships of U. parvifolia with other species of Rosales. Comparative transcriptome analysis unveiled the potential mechanisms accounting for the differing susceptibility of U. parvifolia and U. americana to Dutch elm disease. Our research contributes to a deeper understanding of the genetic basis of resistance to Dutch elm disease and advancing our understanding of elm-pathogen interactions, which are critical for the conservation of elms and their ecosystem services. Method Sample collection and library preparation Samples of U. parvifolia were collected from a superior asexual line at the good seed base of the Jiangsu Academy of Forestry, where a 30-year-old tree with a crown width of 8m x 10m was located. The young leaf samples of the tree were promptly preserved in a liquid nitrogen tank and brought back to the laboratory for storage in an ultra-low temperature freezer for subsequent DNA/RNA extraction. The same batch of fresh young leaf material can also be utilized for further analysis, such as Hi-C sequencing. The frozen U. parvifolia young leaf samples were ground after being rapidly frozen using liquid nitrogen, and then DNA was extracted according to the CTAB method. The extracted DNA was sent to Guangzhou BGI Genomics Co., Ltd. for sequencing. The genomic assessment data were generated by performing Illumina HiSeq PE150 sequencing on the extracted DNA. The genomic assembly data were generated using the high-throughput third-generation sequencing platform (PacBio Sequel II) in the Circular Consensus Sequencing mode. Additionally, Hi-C sequencing was performed using the Illumina HiSeq PE150 platform. Genome assembly and completeness evaluation The adapter and low-quality sequencing reads were removed using fastp and Trimmomatic, and then FastQC (v0.11.9) was utilized to assess the quality of the clean sequencing data 10 – 12 . The k-mer frequency analysis by jellyfishcount led to the selection of a k-mer length of 21 for GenomeScope, which was then utilized to estimate the genome size, heterozygosity, and repeat content 13 , 14 . The clean HiFi reads were de novo assembled using Hifiasm (v0.16.1-r375) software with the default parameters 15 . The Hi-C sequencing data underwent filtration with Hic-pro software to eliminate single-mapped, multiple mapped, and duplicated reads 16 . Scaffolding, sorting, and orienting of the genome were performed using Juicer and 3D-DNA, culminating in the generation of a chromosome-level genome assembly 17 , 18 . Finally, Juice-box was utilized to manually adjust chromosome boundaries and correct any misjoins, inversions, or translocations present in the assembly. 19 . Benchmarking Universal Single-Copy Orthologs (BUSCO, v5.5.0) with embryophyte_odb10, eudicots_odb10, eukaryota_odb10, viridiplantae_odb10 databases was applied to evaluate the completeness of the genome 20 . LTR assembly index (LAI) was calculated to assess the quality of genome 21 . Genome annotation Gene structure prediction was conducted using the GETA software (v2.5.7) ( https://github.com/chenlianfu/GETA ) that integrates three distinct gene prediction strategies, including the de novo method, the homology-based method, and the transcriptome-based method. A variety of tools were employed in this process, including PASA 22 , genewise 23 , AUGUSTUS 24 , Hisat2 25 , hmmsearch 26 , BGM2AT, exonerate and among others. To predict gene functions, interpro-scan and eggNOG-Mapper were utilized, while multiple gene function databases were referenced to provide comprehensive functional annotations. These databases encompassed Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), Pfam, Clusters of Orthologous Groups (COG), and the Swiss-Prot protein database 27 – 33 . Repeat Annotation For construction of repeat library for the U. parvifolia genome, the de novo identification of repeat elements was performed using RepeatModeler (v2.0.3) 34 . Subsequently, the identified repeat consensus sequences were utilized as seeds for RepeatMasker (v4.1.0) 35 to identify and analyze all corresponding repeat regions throughout the genome. To predict the full length LTR retrotransposons (LTR-RTs), a combination of tools was employed, including LTR_finder (v1.07) 36 , LTRharvest (v1.6.2) 37 and LTR_retriever (v2.9.0) 38 . To determine the evolution of LTR-RTs, the reverse transcriptase (RT) domains of the intact LTR-RTs were extracted to construct phylogenetic tree using IQ-TREE 39 . The subfamily information of LTR-RTs were annotated based on TEsorter 40 . Phylogenetic analysis To investigate the phylogenetic relationships of U. parvifolia , a selection of eight published genome assemblies from the Rosales order was utilized, comprising Morus notabilis , Cannabis sativa , Parasponia andersonii , Trema orientale , Rhamnella rubrinervis , Ziziphus jujuba , Malus domestica , Prunus persica , with Populus trichocarpa designated as the outgroup (as detailed in Table S2). Orthofinder (v2.5.2) and orthomcl were employed to discern single-copy orthologous genes across these species. 41 , 42 . For the alignment of multiple sequences, Muscle (v3.8.31) was applied, while Gblocks (v0.91b) was used to isolate conserved regions with the parameters set to '-b4 = 10 -b5 = h -t = p' 43 , 44 . The optimal substitution model was determined using ProtTest3, with both Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) pointing to the LG + I + G + F model. The construction of the phylogenetic tree was accomplished using RAxML (v8.2.12), employing both Maximum Likelihood and Bayesian algorithms 45 , 46 . Divergence times among U. parvifolia and the other selected species were estimated using MCMCTree with default parameter 47 . These estimates were cross-referenced with data from TimeTree5 for the divergence times between Morus notabilis and Trema orientale (ranging from 48.9 to 70.9 million years ago) and between Rhamnella rubrinervis and Prunus persica (ranging from 76.0 to 95.5 million years ago) 48 . The analysis of gene family expansion and contraction was performed using the CAFÉ software (V5.5.1.0) 49 . Prediction of whole-genome polyploidy event Collinearity analysis between genomes was conducted using JCVI ( https://github.com/tanghaibao/jcvi ) with the default settings 50 . Orthologous and paralogous gene pairs were identified based on the presence of syntenic blocks. Paired synonymous substitution rates (Ks) were calculated using TBtools 51 . RNA-seq analysis The raw reads were dealt using fastq 10 for quality control. The clean reads were mapped to the U. parvifolia genome using Hisat2 52 and samtools 53 . Stringtie 54 and featurecount 55 were used to quantify gene expression. Differential expression analysis was performed using DESeq2 56 . The FDR less than 0.05 and |log2 fold change (log2FC)| > 1 were set as the thresholds to define differentially expressed genes (DEGs). In total, 7280 DEGs were screened out for construction of co-expression network using WGCNA 57 . The power adjacency function was used to transform the similarity matrix into an adjacency matrix. According to the scale-free topology criterion, the power β = 10 was selected. Overall, 21 modules with a minimum gene number of 50 were identified. The samples that U. parvifolia and U. americana were inoculated with Ceratocystis ulmi at 48, 96 and 144 hours were defined as ‘Resistance’ and ‘Susceptibility’ traits, respectively. Then, the correlation between the modules and either the samples or traits were calculated, and p-values less than 0.01 were treated as significant correlation. The adjusted p-value < 0.05 was used to define the significant enrichment GO terms and KEGG pathways. qRT-PCR assay Six DEGs were randomly selected to validate gene expression patterns during infection by C. ulmi . PCR primers were designed using the Primer3plus ( Table S3 ). The housekeeping gene, Upa02186, in U. parvifolia was used as the reference. Total RNA was extracted using Trizol method, and then cDNA synthesis was performed using Hifair® II 1st Strand CDNA Synthesis SuperMix for qPCR (gDNA digester plus) kit from YEASEN. qRT-PCR analysis was conducted using cDNA as a template. Each experiment was performed with 3 replicates, and the relative gene expression levels were calculated using 2 −ΔΔCT method. Result Genome assembly and annotation The k-mer analysis (K = 21) using ~ 150 Gb NGS short-reads (89-fold coverage) indicated that the estimated genome size of Ulmus parvifolia was about 1.68 Gb, accompanied by a high level of heterozygosity (2.57%) (Fig. 1 A). We employed different sequencing platforms to develop a chromosome-level genome assembly for U. parvifolia . Initially, 49.8 Gb HiFi sequencing data (29.6-fold coverage) generated by the PacBio sequel II platform was exploited to assemble a draft genome, with the total size of 1.79 Gb, the contig number of 105, the contig N50 of 10,755,710 bp, and the GC content of 34.2% (Table 1 ). Using 130.2 Gb Hi-C sequencing data (73-fold coverage), a total of 149 contigs, covering 94.3% of the draft genome, were anchored into 14 pseudo-chromosomes (Fig. 1 B, D). The analysis of Benchmarking Universal Single-Copy Orthologs (BUSCO) based on Embryophyte, Eudicots, Eukaryota, Viridiplantae databases revealed that 98.2%, 97.2%, 98.9% and 99.0% of the gene models were complete, suggesting the high completeness of the U. parvifolia genome (Fig. 1 C). Furthermore, the LTR assembly index (LAI) was calculated to assess the genome assembly quality. The LAI of the whole genome was about 16.33 and the average LAIs of 14 pseudochromosomes range from 15.41 in Chr3 to 17.15 in Chr8 (Fig. 1 E). A combined strategy that encompassed ab initio prediction, homology-based prediction, and transcript-based prediction methods was used to annotate gene structure in the U. parvifolia genome, yielding a total of 34,351 gene models. A subsequent BUSCO analysis, conducted in protein mode, indicated a high level of completeness, with 98.9%, 97.8%, 99.6% and 100% of the conserved proteins identified according to the Embryophyte, Eudicots, Eukaryota, Viridiplantae databases ( Fig. S1 ). A total of 15239, 9382, 25559, 653 and 26235 genes, accounting for more than 80% of the total genes, were successfully annotated with at least one term from GO, KEGG, Pfam, CAZy and COG databases, respectively. These results collectively demonstrated the high assembly quality, completeness, and continuity of the U. parvifolia genome. Table 1 Genomic characteristics of U. parvifolia Genome Featurese Values Genome size(bp) 1,795,583,205 Contig Number 1051 Contig N50 10,755,710 Contig N90 2,802,378 Chr Number 14 Repeat Contente 71.79% Gene number 34,351 GC 34.20% LTR Assembly Index(LAI) 16.33 BUSCO 98.20% Among the species of Rosales with available reference genome, U. parvifolia possesses large genome size that is second only to Humulus lupulus . Many previous researches have unveiled the correlation between genome size and TE content 58 – 60 . Approximately 71.79% of the U. parvifolia genome was covered by repeat elements and majority of them were LTR retrotransposons (LTR-RTs), representing 63.65% of the assembly ( Table S1 ). Estimation of TE insertion time based on 12334 intact LTR-RT sequences revealed a recently large-scale amplification of LTR-RTs, which happened on approximately 124 thousand years ago (Fig. 1 F). Phylogenetic analysis of LTR-RTs showed that Ty3/Gypsy Retand subfamily was predominant, followed by Tekay and Athila subfamilies (Fig. 2 ). In aggregate, the recent burst of LTR-RTs, especially Ty3/Gypsy elements, may partially account for the large genome size of U. parvifolia . Population history We conducted the Pairwise Sequentially Markovian Coalescent (PSMC) analysis to deduce the historical population dynamics of U. parvifolia . Our findings indicated that the effective population size of U. parvifolia started to increase around one million years ago, reaching its maximum approximately 0.7 million years ago. Following, the effective population size showed a significant and ongoing decline till the present day (Fig. 3 A). We assumed that the reduction in the population size of U. parvifolia can be attributed to a variety of factors. For example, the significant climatic shifts, such as the glacial and interglacial periods, can lead to habitat loss or fragmentation, affecting the distribution and abundance of species. Moreover, the expansion of human populations, deforestation, urban development, and agriculture have led to loss of natural habitats, directly impacting the population sizes of U. parvifolia . Other factors, such as natural selection and genetic drift, disease, fragmentation and isolation, may also result in the population size reduction. Comparative genomics analysis To delineate the evolutionary history of U. parvifolia , we performed phylogenetic analysis by comparing it with eight other Rosales species, and Populus trichocarpa that belong to Malpighiales was selected as the outgroup (Table S2). In total, we identified 8685 single-copy orthologous genes across these species, which were then employed to establish the phylogenetic relationships. The phylogenetic tree organized the species into five major clusters based on family, including Ulmaceae ( U. parvifolia ), Moraceae ( Morus notabilis ), Cannabaceae ( Cannabis sativa , Parasponia andersonii , Trema orientale ), Rhamnaceae ( Rhamnella rubrinervis , Ziziphus jujuba ), Rosaceae ( Malus domestica , Prunus persica ). Notably, U. parvifolia exhibited a closer phylogenetic affinity to Morus notabilis among all the species examined. Moreover, U. parvifolia , Morus notabilis and Cannabaceae species diverged from their common ancestor approximately 73.25 million years ago (MYA). Comparative analysis of gene families among these species unveiled a total of 1100 and 1019 gene families that were significantly expanded and contracted in U. parvifolia , respectively (p-value < 0.01, Fig. 3 B). Interestingly, KEGG enrichment analysis for the 1100 expanded gene families revealed the strong correlation with plant disease resistance and immune response, such as ‘Brassinosteroid biosynthesis’, ‘Metabolism of xenobiotics by cytochrome P450’, ‘Plant-pathogen interaction’, ‘MAPK signaling pathway’, ‘Isoflavonoid biosynthesis’ (Fig. 3 C). Consistently, GO enrichment analysis showed many terms associated with plant defense response, including ‘detection of biotic stimulus’, ‘defense response by cell wall thickening’, ‘isoprenoid biosynthetic process’, ‘response to reactive oxygen species’ (Fig. 3 D). These results collectively reflected an adaptation of U. parvifolia to diverse biotic and abiotic stresses, enhancing its defensive capabilities, increasing allelic diversity and defense mechanism redundancy, which may also partially account for its resistance to Dutch Elm Disease. We analyzed the chromosomal collinearity within U. parvifolia to understand its chromosome evolution. Notably, we discovered several chromosomal segments that were collinear across chromosomes 5, 9, and 13, suggesting a potential whole-genome triplication event in the evolutionary history of U. parvifolia (Fig. 3 E). To substantiate this finding, we measured the synonymous substitution rate (Ks) to infer whole-genome polyploidy event. Our analysis demonstrated that the Ks values of U. parvifolia and M. notabilis exhibited a concordant peak, indicative of a shared ancestral whole-genome polyploidy event. Importantly, the peak Ks value for U. parvifolia was considerably lower compared to the values between U. parvifolia and either M. notabilis or P. trichocarpa , suggesting that the polyploidy event possibly occurred prior to the divergence of U. parvifolia , M. notabilis and P. trichocarpa from their common ancestor (Fig. 3 F). Considering the phylogenetic relationships and the estimated divergence time among these species (approximately 39.6 to 95.6 MYA between U. parvifolia and M. notabilis , and around 99.0 to 111.3 MYA between U. parvifolia and P. trichocarpa ), we proposed that the polyploidy event in U. parvifolia could be associated with the γ whole-genome triplication (γ-WGT) event, which happened around 120 million years ago and has been proved to be associated with the early diversification of the core eudicots. Gene Expression Profiling of U. parvifolia in Response to Ceratocystis ulmi Infection As one of the significant threats to U. parvifolia , the blight, commonly known as Dutch Elm Disease, that is caused by the fungus Ceratocystis ulmi has had a significant impact on elm populations worldwide. To investigate the interaction between U. parvifolia and C. ulmi , we inoculated U. parvifolia with C. ulmi and extracted RNA at 0, 48, 96 and 144 hours post inoculation (hpi) for sequencing. Across the process, we identified a total of 7280 differential expression genes (DEGs) by comparing gene expressions of each two timepoints. Principal component analysis (PCA) revealed that the expression pattern of the uninfected samples was distinct with that of either 48 hpi or 96 hpi while was relatively similar with that of 144 hpi, suggesting that U. parvifolia majorly responses to C. ulmi infection at 48 and 96 hpi ( Fig. S2 ). K-means clustering analysis uncovered four major gene expression patterns, including the genes that were down-regulated after inoculation of C. ulmi (cluster1), the genes that were predominantly up-regulated at 48 hpi (cluster2), 96 hpi (cluster2) and 144 hpi (cluster4), respectively (Fig. 4 A, B). The genes in cluster1 was significantly enriched on the pathways ‘Carotenoid biosynthesis’, ‘Cutin, suberine and wax biosynthesis’, ‘Diterpenoid biosynthesis’ (Fig. 4 C). Abnormally, these pathways have been known to play a role in plant defense mechanisms while were downregulated during infection stages. We assumed that U. parvifolia may prioritize the production of certain defensive compounds over others when responding to fungal infection, which could be a strategic shift to ensure short-term survival over long-term benefits. Noticeably, the genes in cluter2 and cluster3 were involved in several pathways associated with plant defense and immune response, such as ‘Metabolism of xenobiotics by cytochrome P450’, ‘Brassinosteroid biosynthesis’, ‘Monoterpenoid biosynthesis’, ‘MAPK signaling pathway − plant’, ‘Phenylpropanoid biosynthesis’ (Fig. 4 C). The results implied that U. parvifolia may prioritize these pathways to defense C. ulmi infection. The genes in cluster4 were mainly correlated with ‘Protein processing in endoplasmic reticulum’, ‘RNA transport’, ‘Spliceosome’, indicating the urgent needs for protein processing and transporting to ensure an adequate supply of useful compounds that can be deployed in the plant's immune response (Fig. 4 C). In summary, our analysis characterized the expression patterns of U. parvifolia under infection of C. ulmi at different timepoints and the results showed that U. parvifolia majorly response to infection at 48 and 96 hpi by upregulating several pathways that play crucial roles in plant defense responses. Comparative transcriptomes analysis unveiled the disease-resistance of U. parvifolia Variations in susceptibility to Dutch elm disease have been reported between U. parvifolia and U. americana , with U. parvifolia showing some resistance while U. americana being more susceptible. To analyze the molecular basis underlying their distinct susceptibility, we inoculated U. parvifolia and U. americana with C. ulmi and performed comparative transcriptome analysis at 48, 96 and 144 hpi. The PAC result showed that the overall expression patterns at both uninfected and infected stages were distinguished between two elms. However, the gene expressional trajectory during the infection process were similar, implying that the two species may exploit analogous strategies to defense pathogen infection (Fig. 5 A). Weighted gene co-expression network analysis (WGCNA) was utilized to explore the differential gene expression patterns that potentially explain the differing disease resistance observed in the two Ulmus species. In total, we identified 21 modules based on gene expression tendency (Fig. 5 B). Among these modules, the ‘MEyellow’ was specifically attracted because the genes in the module were highly expressed in U. parvifolia at 48 hpi and 96 hpi. Moreover, analysis of module-trait correlation revealed that the ‘MEyellow’ is the only module that is significantly correlated with the ‘resistance’ trait of Ulmus (correlation = 0.68, pvalue = 2e-04) (Fig. 5 C). The heatmap showed that the genes in the ‘MEyellow’ were predominantly expressed in U. parvifolia at 96 hpi, followed by 48 hpi (Fig. 5 D). Taken together, the results implied that the differential expression of genes in the ‘MEyellow’ may account for the contrasting susceptibility of U. parvifolia and U. americana to infection. KEGG enrichment analysis showed that these genes were significantly enriched on the pathways of ‘Metabolism of xenobiotics by cytochrome P450’, ‘Brassinosteroid biosynthesis’, ‘Phenylpropanoid biosynthesis’, and some important secondary metabolism pathways, such as ‘Tryptophan metabolism’, ‘Glutathione metabolism’ (Fig. 5 E). These pathways have been reported to play important roles in plant defense and immune response. For instance, the 'Metabolism of xenobiotics by cytochrome P450' pathway is pivotal in the plant's reaction to fungal invasion, facilitating processes such as the detoxification of fungal byproducts, synthesis of compounds associated with defense, and regulation of plant hormone levels. Consistently, GO enrichment analysis also displayed many enriched terms associated with plant response to biotic and abiotic stress (Fig. 5 F). In aggregate, our finding derived from comparative transcriptome analysis indicated that various pathways associated with plant defense and immune response were highly expressed in U. parvifolia under infection of C. ulmi at 48 and 96 hours, which may be a key factor contributing to the differing susceptibility of U. parvifolia and U. americana to Dutch elm disease. Discussion Differing from U. americana , which is highly susceptible to Dutch elm disease, U. parvifolia has shown certain ability to tolerate the infection of C. ulmi . However, the absence of genomic resource for U. parvifolia represents a significant gap that impedes a comprehensive understanding of its genetic resistance to Dutch elm disease, limiting our ability to utilize this knowledge for breeding resistant varieties and advancing elm conservation efforts. Therefore, obtaining a high-quality genome assembly for U. parvifolia is crucial for illuminating the molecular mechanisms of resistance, guiding biotechnological interventions, and preserving the ecological significance of elms in the face of disease challenges. In this study, we integrated HiFi and Hi-C sequencing to assemble a high-quality chromosome-level genome of U. parvifolia , with 14 pseudochromosomes covering over 94% of the total genome size. The U. parvifolia genome assembly showed an extremely high contig N50 (10.75 Mb), assembly quality, genome completeness, with high LAI (16.33) and BUSCO assessment (98.20%). In contrast, the assembly of U. americana , it encompassed more 500 thousand contigs, with a contig N50 of only 2.5kb, representing low assembly quality and continuity. Notably, the genome size of U. parvifolia (~ 1.79 Gb) is over two-fold than that of U. americana (~ 865 Mb). Considering the high proportion of LTR-RTs in the U. parvifolia genome, we presumed that amplification of transposable elements could be one of the factors leading to great difference of their genome sizes. LTR retrotransposons take up more than 60% of genome and nearly 90% of the total repeat contents, suggesting that LTR-RTs have significantly amplified in the U. parvifolia genome. Among the LTR-RTs, Ty3/Gypsy Retand subfamily were predominant and showed a recently large-scale amplification, possibly reflecting the adaptation of U. parvifolia to turbulent environments. Consistent with the previous studies, U. parvifolia displayed a closer phylogenetic relationship with Moraceae, followed by Cannabaceae, Rhamnaceae and Rosaceae 61 – 63 . Investigation of the gene families expanded in U. parvifolia revealed many terms or pathways associated with plant defense and immune response, indicating that U. parvifolia had evolved a complex immune system, possibly contributing to enhanced pathogen recognition and adaptation to environmental stress, effectively combating a variety of pathogens. Analysis of chromosomal collinearity within the U. parvifolia genome hinted the potential whole-genome triplication event in evolutionary history of U. parvifolia. Through investigation of Ks distribution, we discover one polyploid event in U. parvifolia that happened prior to the divergence of U. parvifolia and P. trichocarpa from their common ancestor. According to the divergence time estimated by Timetree5, U. parvifolia and M. notabilis diverged around 39.6–95.6 MYA, and U. parvifolia and P. trichocarpa diverged around 99.0-111.3 MYA, we deduced that the polyploid event in U. parvifolia possibly matched the Gamma WGT event that happened approximately 120 million years ago. Due to the lack of a comprehensive genome assembly of U. parvifolia , our knowledge concerning the distinguished susceptibility of U. parvifolia and U. americana to Dutch elm disease are very scanty. Previous study revealed distinct transcriptomic changes in resistant versus susceptible U. americana genotypes under infection by Ophiostoma novo-ulmi , highlighting genetic factors that may contribute to disease resistance of U. americana 8 . However, without an accurate and complete genomic blueprint of U. americana , the study may lead to gaps in understanding the complex genetic basis of resistance and susceptibility to O. novo-ulmi because the identification, annotation, and functional analysis of genes and their expression patterns can be less precise. To better compare gene expression patterns between U. parvifolia and U. americana under infection of C. ulmi , we mapped RNA-seq reads of U. americana to the U. parvifolia genome. The mapping results showed that most of the U. americana RNAseq data can be successfully aligned, with an average mapping ratio of 80.2%, demonstrating the feasibility of our method. Comparative transcriptome analysis using WGCNA revealed several enriched pathways involved in plant defense and immune response were more highly expressed in U. parvifolia especially at 96 hpi. Among the pathways, ‘Metabolism of xenobiotics by cytochrome P450’, playing a critical role in plant responses to fungal infections through detoxification of fungal metabolites, production of defense-related compounds, modulation of plant hormone levels and so on, is the most significant. Moreover, phenylpropanoid biosynthesis is a critical component of the plant's arsenal against fungal infection, contributing to the production of antimicrobial compounds, signaling molecules, and structural components that enhance resistance. Brassinosteroids play a multifaceted role in plant defense against fungal infections by modulating immune responses, interacting with other hormones, influencing cell wall properties, and regulating growth and development to enhance resistance while also potentially affecting fungal growth directly. The activation of these pathways in U. parvifolia under C. ulmi infection is part of the elm's defense strategy to counteract the pathogenic threat, including detoxification of harmful substances, aiding in the production of antimicrobial compounds and structural defense, enhancement of immune system and stress response. Data Records The genome sequence and gene sequence were deposited in NCBI database under the accession number of JBDNDV000000000. Raw data of HiFi, Illumina and Hi-C used for genome assembly were deposited at NCBI under BioProject PRJNA1068684. RNA-seq data of U. parvifolia and U. americana under infection by C. ulmi were were deposited at NCBI under BioProject PRJNA1093576 Declarations Acknowledgements This work was supported by the Independent Research Project of Jiangsu Academy of Forestry (ZZKY202104). We are grateful to Jianping Yi from Animals, Plants and Foods Inspection and Quarantine Technology Center (Shanghai, China) for providing the fungus Ceratocystis ulmi strain for this experiment and Guangzhou Genedenovo Biotechnology Co., Ltd for assisting in sequencing. Author contributions Yun-Zhou Lyu and Wei Xing conceived the idea, analyzed the data, wrote the original draft, revised the manuscript, and got the funding; Hai-Nan Sun, Rui-Chang Yan, Gang Wang analyzed the data; Jiang-tao Shi, Li-Bin Huang Xiao-Yun Dong prepared the materials; Wei Xing supervised the project; All authors have read and agreed to the published version of the manuscript. Competing interests The authors declare no conflict of interest. Code availability No specific script was used in this work. The codes and pipelines used for data processing were all executed according to the manual and protocols of the corresponding bioinformatics software. References Fu, L. & Xin, Y. 33. ULMACEAE in Higher Plants Of China Vol. 4: ANGIOSPERMAE Vol. 1 1-25 (Qingdao Publishing Group, 2000). Fragniere, Y. et al. Biogeographic Overview of Ulmaceae: Diversity, Distribution, Ecological Preferences, and Conservation Status. Plants 10 , http://dx.doi.org/10.3390/plants10061111 (2021). Lu, P. et al. Ancestors of Ulmus parvifolia from late Miocene sediments in Yunnan, Southwest China and its future distribution. 313 , 104879 (2023). Strobel, G.A. & Lanier, G.N.J.S.A. Dutch elm disease. 245 , 56-67 (1981). Hubbes, M.J.T.F.C. The American elm and Dutch elm disease. 75 , 265-273 (1999). Karnosky, D.F.J.E.C. Dutch elm disease: a review of the history, environmental implications, control, and research needs. 6 , 311-322 (1979). Scheffer, R., Voeten, J. & Guries, R.J.P.d. Biological control of Dutch elm disease. 92 , 192-200 (2008). Islam, M.T. et al. Deciphering the Genome-Wide Transcriptomic Changes during Interactions of Resistant and Susceptible Genotypes of American Elm with Ophiostoma novo-ulmi. J Fungi (Basel) 8 , http://dx.doi.org/10.3390/jof8020120 (2022). de Oliveira, T.C. et al. Unraveling the transcriptional features and gene expression networks of pathogenic and saprotrophic Ophiostoma species during the infection of Ulmus americana. Microbiology spectrum 12 , e0369423 http://dx.doi.org/10.1128/spectrum.03694-23 (2024). Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34 , i884-i890 http://dx.doi.org/10.1093/bioinformatics/bty560 (2018). Andrews, S. FastQC: a quality control tool for high throughput sequence data. (Babraham Bioinformatics, Babraham Institute, Cambridge, United Kingdom, 2010). Bolger, A.M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30 , 2114-20 http://dx.doi.org/10.1093/bioinformatics/btu170 (2014). Ranallo-Benavidez, T.R., Jaron, K.S. & Schatz, M.C. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nature Communications 11 , 1432 http://dx.doi.org/10.1038/s41467-020-14998-3 (2020). Marcais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27 , 764-70 http://dx.doi.org/10.1093/bioinformatics/btr011 (2011). Cheng, H., Concepcion, G.T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nature methods 18 , 170-175 http://dx.doi.org/10.1038/s41592-020-01056-5 (2021). Servant, N. et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biology 16 , 259 http://dx.doi.org/10.1186/s13059-015-0831-x (2015). Durand, N.C. et al. Juicer Provides a One-Click System for Analyzing Loop-Resolution Hi-C Experiments. Cell Systems 3 , 95-8 http://dx.doi.org/10.1016/j.cels.2016.07.002 (2016). Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 356 , 92-95 http://dx.doi.org/10.1126/science.aal3327 (2017). Robinson, J.T. et al. Juicebox.js Provides a Cloud-Based Visualization System for Hi-C Data. Cell Systems 6 , 256-258 e1 http://dx.doi.org/10.1016/j.cels.2018.01.001 (2018). Simao, F.A., Waterhouse, R.M., Ioannidis, P., Kriventseva, E.V. & Zdobnov, E.M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31 , 3210-2 http://dx.doi.org/10.1093/bioinformatics/btv351 (2015). Ou, S., Chen, J. & Jiang, N. Assessing genome assembly quality using the LTR Assembly Index (LAI). Nucleic acids research 46 , e126 http://dx.doi.org/10.1093/nar/gky730 (2018). Haas, B.J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biology 9 , R7 http://dx.doi.org/10.1186/gb-2008-9-1-r7 (2008). Birney, E., Clamp, M. & Durbin, R. GeneWise and Genomewise. Genome Research 14 , 988-95 http://dx.doi.org/10.1101/gr.1865504 (2004). Stanke, M. et al. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Research 34 , W435-9 http://dx.doi.org/10.1093/nar/gkl200 (2006). Kim, D., Paggi, J.M., Park, C., Bennett, C. & Salzberg, S.L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nature biotechnology 37 , 907-915 http://dx.doi.org/10.1038/s41587-019-0201-4 (2019). Johnson, L.S., Eddy, S.R. & Portugaly, E. Hidden Markov model speed heuristic and iterative HMM search procedure. BMC bioinformatics 11 , 431 http://dx.doi.org/10.1186/1471-2105-11-431 (2010). Huerta-Cepas, J. et al. Fast Genome-Wide Functional Annotation through Orthology Assignment by eggNOG-Mapper. Molecular Biology and Evolution 34 , 2115-2122 http://dx.doi.org/10.1093/molbev/msx148 (2017). Jones, P. et al. InterProScan 5: genome-scale protein function classification. Bioinformatics 30 , 1236-40 http://dx.doi.org/10.1093/bioinformatics/btu031 (2014). Kanehisa, M. & Goto, S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Research 28 , 27-30 http://dx.doi.org/10.1093/nar/28.1.27 (2000). Ashburner, M. et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature genetics 25 , 25-9 http://dx.doi.org/10.1038/75556 (2000). Boeckmann, B. et al. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Research 31 , 365-370 http://dx.doi.org/10.1093/nar/gkg095 (2003). Punta, M. et al. The Pfam protein families database. Nucleic Acids Research 40 , D290-301 http://dx.doi.org/10.1093/nar/gkr1065 (2012). Tatusov, R.L., Galperin, M.Y., Natale, D.A. & Koonin, E.V. The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Research 28 , 33-6 http://dx.doi.org/10.1093/nar/28.1.33 (2000). Flynn, J.M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proceedings of the National Academy of Sciences of the United States of America 117 , 9451-9457 http://dx.doi.org/10.1073/pnas.1921046117 (2020). Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Current protocols in bioinformatics Chapter 4 , Unit 4 10 http://dx.doi.org/10.1002/0471250953.bi0410s05 (2004). Xu, Z. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic acids research 35 , W265-8 http://dx.doi.org/10.1093/nar/gkm286 (2007). Ellinghaus, D., Kurtz, S. & Willhoeft, U. LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC bioinformatics 9 , 18 http://dx.doi.org/10.1186/1471-2105-9-18 (2008). Ou, S. & Jiang, N. LTR_retriever: A Highly Accurate and Sensitive Program for Identification of Long Terminal Repeat Retrotransposons. Plant physiology 176 , 1410-1422 http://dx.doi.org/10.1104/pp.17.01310 (2018). Nguyen, L.T., Schmidt, H.A., von Haeseler, A. & Minh, B.Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Molecular biology and evolution 32 , 268-74 http://dx.doi.org/10.1093/molbev/msu300 (2015). Zhang, R.G. et al. TEsorter: an accurate and fast method to classify LTR-retrotransposons in plant genomes. Horticulture research 9 , http://dx.doi.org/10.1093/hr/uhac017 (2022). Emms, D.M. & Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biology 20 , 238 http://dx.doi.org/10.1186/s13059-019-1832-y (2019). Li, L., Stoeckert, C.J., Jr. & Roos, D.S. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Research 13 , 2178-89 http://dx.doi.org/10.1101/gr.1224503 (2003). Edgar, R.C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research 32 , 1792-7 http://dx.doi.org/10.1093/nar/gkh340 (2004). Castresana, J. Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Molecular Biology and Evolution 17 , 540-52 http://dx.doi.org/10.1093/oxfordjournals.molbev.a026334 (2000). Rokas, A. Phylogenetic analysis of protein sequence data using the Randomized Axelerated Maximum Likelihood (RAXML) Program. Current protocols in molecular biology Chapter 19 , Unit19 11 http://dx.doi.org/10.1002/0471142727.mb1911s96 (2011). Darriba, D., Taboada, G.L., Doallo, R. & Posada, D. ProtTest 3: fast selection of best-fit models of protein evolution. Bioinformatics 27 , 1164-5 http://dx.doi.org/10.1093/bioinformatics/btr088 (2011). Yang, Z. PAML 4: phylogenetic analysis by maximum likelihood. Molecular Biology and Evolution 24 , 1586-91 http://dx.doi.org/10.1093/molbev/msm088 (2007). Kumar, S. et al. TimeTree 5: An Expanded Resource for Species Divergence Times. Molecular Biology and Evolution 39 , http://dx.doi.org/10.1093/molbev/msac174 (2022). De Bie, T., Cristianini, N., Demuth, J.P. & Hahn, M.W. CAFE: a computational tool for the study of gene family evolution. Bioinformatics 22 , 1269-71 http://dx.doi.org/10.1093/bioinformatics/btl097 (2006). Tang, H. et al. Synteny and collinearity in plant genomes. Science 320 , 486-8 http://dx.doi.org/10.1126/science.1153917 (2008). Chen, C. et al. TBtools: An Integrative Toolkit Developed for Interactive Analyses of Big Biological Data. Molecular plant 13 , 1194-1202 http://dx.doi.org/10.1016/j.molp.2020.06.009 (2020). Kim, D., Paggi, J.M., Park, C., Bennett, C. & Salzberg, S.L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nature biotechnology 37 , 907-915 http://dx.doi.org/10.1038/s41587-019-0201-4 (2019). Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25 , 2078-9 http://dx.doi.org/10.1093/bioinformatics/btp352 (2009). Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nature biotechnology 33 , 290-5 http://dx.doi.org/10.1038/nbt.3122 (2015). Liao, Y., Smyth, G.K. & Shi, W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30 , 923-30 http://dx.doi.org/10.1093/bioinformatics/btt656 (2014). Love, M.I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome biology 15 , 550 http://dx.doi.org/10.1186/s13059-014-0550-8 (2014). Langfelder, P. & Horvath, S. WGCNA: an R package for weighted correlation network analysis. BMC bioinformatics 9 , 559 http://dx.doi.org/10.1186/1471-2105-9-559 (2008). Flavell, R.B., Bennett, M.D., Smith, J.B. & Smith, D.B. Genome size and the proportion of repeated nucleotide sequence DNA in plants. Biochemical genetics 12 , 257-69 http://dx.doi.org/10.1007/BF00485947 (1974). Wang, D. et al. Which factors contribute most to genome size variation within angiosperms? Ecology and evolution 11 , 2660-2668 http://dx.doi.org/10.1002/ece3.7222 (2021). Kreiner, J.M., Hnatovska, S., Stinchcombe, J.R. & Wright, S.I. Quantifying the role of genome size and repeat content in adaptive variation and the architecture of flowering time in Amaranthus tuberculatus. PLoS genetics 19 , e1010865 http://dx.doi.org/10.1371/journal.pgen.1010865 (2023). Li, M., Chen, Q., Zhang, L., Guo, P. & Wang, Y. The complete chloroplast genome sequence of Ulmus parvifolia (Ulmaceae). Mitochondrial DNA. Part B, Resources 5 , 2957-2958 http://dx.doi.org/10.1080/23802359.2020.1791006 (2020). Lyu, Y., Zhai, M., Jiang, Z. & Chen, Q. The complete chloroplast genome of Ulmus parvifolia, an important landscaping tree. Mitochondrial DNA. Part B, Resources 5 , 3071-3072 http://dx.doi.org/10.1080/23802359.2020.1797586 (2020). Zuo, L.H. et al. The first complete chloroplast genome sequences of Ulmus species by de novo sequencing: Genome comparative and taxonomic position analysis. PloS one 12 , e0171264 http://dx.doi.org/10.1371/journal.pone.0171264 (2017). Additional Declarations No competing interests reported. Supplementary Files SupplementaryFigureandTable.zip Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-4754772","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":338678797,"identity":"45f52d13-f08c-4dee-80be-13494434d684","order_by":0,"name":"Yun-Zhou Lyu","email":"","orcid":"","institution":"Jiangsu Academy of Forestry","correspondingAuthor":false,"prefix":"","firstName":"Yun-Zhou","middleName":"","lastName":"Lyu","suffix":""},{"id":338678798,"identity":"ad27e25e-1017-43c9-bd2d-4b887a7c3e4f","order_by":1,"name":"Hai-Nan Sun","email":"","orcid":"","institution":"Jiangsu Academy of Forestry","correspondingAuthor":false,"prefix":"","firstName":"Hai-Nan","middleName":"","lastName":"Sun","suffix":""},{"id":338678799,"identity":"a3a66055-ce5e-4f00-9213-e6bfe174bac5","order_by":2,"name":"Rui-Chang Yan","email":"","orcid":"","institution":"Jiangsu Academy of Forestry","correspondingAuthor":false,"prefix":"","firstName":"Rui-Chang","middleName":"","lastName":"Yan","suffix":""},{"id":338678800,"identity":"c077a4ff-b06f-4833-8584-b41dcf131bd2","order_by":3,"name":"Jiang-tao Shi","email":"","orcid":"","institution":"Nanjing Forestry University","correspondingAuthor":false,"prefix":"","firstName":"Jiang-tao","middleName":"","lastName":"Shi","suffix":""},{"id":338678801,"identity":"3d340ced-bbdd-4726-9175-f23e44a750c3","order_by":4,"name":"Li-Bin Huang","email":"","orcid":"","institution":"Jiangsu Academy of Forestry","correspondingAuthor":false,"prefix":"","firstName":"Li-Bin","middleName":"","lastName":"Huang","suffix":""},{"id":338678802,"identity":"f15ee2b8-81fd-4872-b4b6-dfd032800205","order_by":5,"name":"Gang Wang","email":"","orcid":"","institution":"Yancheng Teachers University","correspondingAuthor":false,"prefix":"","firstName":"Gang","middleName":"","lastName":"Wang","suffix":""},{"id":338678803,"identity":"7153c671-390e-4cbd-9195-70037a192a52","order_by":6,"name":"Xiao-Yun Dong","email":"","orcid":"","institution":"Jiangsu Academy of Forestry","correspondingAuthor":false,"prefix":"","firstName":"Xiao-Yun","middleName":"","lastName":"Dong","suffix":""},{"id":338678804,"identity":"b9fadc7a-4a21-44bc-a7ef-05da1c0e5075","order_by":7,"name":"Wei Xing","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA3ElEQVRIiWNgGAWjYLCCBDDJfABIWDAw8BCvhQ1IJUgQqQUCeAyI02LO3vvsw8MdtQwGN3I+Pub9ISHHz3OA8cPHHNxaLHuOG89IPHOcQXJG7mZjngQJY8neBmbJmdtwazG4kcbMkNh2jIFfOnebNFBL4obzDGzMvPi03H8G0cImnfOMSC032EBaaoC25LBBtJxtIKDlDNhhBxgk5z8zNpyTBvRLz8Fm/H45foyZ8WdbHVDv4YcP3tjYAEMs+eCHj3i0QMHh+gYEh7EBlzJkUEeMolEwCkbBKBipAACV50lh2RC0QAAAAABJRU5ErkJggg==","orcid":"","institution":"Jiangsu Academy of Forestry","correspondingAuthor":true,"prefix":"","firstName":"Wei","middleName":"","lastName":"Xing","suffix":""}],"badges":[],"createdAt":"2024-07-17 08:39:25","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-4754772/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-4754772/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":62306204,"identity":"5ee70a88-b0d5-4fe3-9b44-59ae58211a6b","added_by":"auto","created_at":"2024-08-12 18:23:13","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":2333379,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eA\u003c/strong\u003e) Genome survey of \u003cem\u003eU. parvifolia\u003c/em\u003e genome. \u003cstrong\u003eB\u003c/strong\u003e) Hi-C heatmap of 14 pseudochromosomes. \u003cstrong\u003eC\u003c/strong\u003e) BUSCO analysis for evaluation of genome completeness. \u003cstrong\u003eD\u003c/strong\u003e) Circos plot of the \u003cem\u003eU. parvifolia\u003c/em\u003egenome assembly. From the outside to the inside, it includes chromosome, gene density, TE density, GC content, gene expression, syntenic relationships. \u003cstrong\u003eE\u003c/strong\u003e) The LAI value of each chromosome. \u003cstrong\u003eF\u003c/strong\u003e) Distribution of intact LTR-RTs insertion time.\u003c/p\u003e","description":"","filename":"Figure1.png","url":"https://assets-eu.researchsquare.com/files/rs-4754772/v1/fcc56baeac2e7aa5368e5711.png"},{"id":62305930,"identity":"665f1532-ddf1-4e60-897a-e9fc735a13e9","added_by":"auto","created_at":"2024-08-12 18:15:13","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":508580,"visible":true,"origin":"","legend":"\u003cp\u003ePhylogenetic tree of LTR-RTs by using the reverse transcriptase (RT) domains of the intact LTR-RTs. The LTR-RTs were divided into two major clades, \u003cem\u003eGypsy\u003c/em\u003e and \u003cem\u003eCopia\u003c/em\u003e.\u003c/p\u003e","description":"","filename":"Figure2.png","url":"https://assets-eu.researchsquare.com/files/rs-4754772/v1/a04adacf4fda916921e4f5b5.png"},{"id":62305663,"identity":"6ea64c7e-9562-45a0-a074-d5731db0cb3a","added_by":"auto","created_at":"2024-08-12 18:07:13","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":1252227,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eA\u003c/strong\u003e) Effective population size of \u003cem\u003eU. parvifolia\u003c/em\u003e. \u003cstrong\u003eB\u003c/strong\u003e) Phylogenetic tree of \u003cem\u003eU. parvifolia\u003c/em\u003e and other species. Red and blue numbers indicate the expanded and contracted gene families that, and numbers at the nodes of tree indicate the estimated divergence time. \u003cstrong\u003eC\u003c/strong\u003e) GO enrichment of the expanded gene families in \u003cem\u003eU. parvifolia\u003c/em\u003e. \u003cstrong\u003eD\u003c/strong\u003e) KEGG enrichment of the expanded gene families in \u003cem\u003eU. parvifolia\u003c/em\u003e. \u003cstrong\u003eE\u003c/strong\u003e) Chromosome collinearity within the \u003cem\u003eU. parvifolia\u003c/em\u003e genome. Red boxes indicate the potential polyploidy fragments. \u003cstrong\u003eF\u003c/strong\u003e) Distributions of synonymous substitution rate (Ks).\u003c/p\u003e","description":"","filename":"Figure3.png","url":"https://assets-eu.researchsquare.com/files/rs-4754772/v1/15c36470d3ea8042d8beccd0.png"},{"id":62305659,"identity":"1066948d-d6c3-4667-9507-1b98c08345a2","added_by":"auto","created_at":"2024-08-12 18:07:13","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":1903939,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eA\u003c/strong\u003e) Gene expression tendency in \u003cem\u003eU. parvifolia\u003c/em\u003e under infection of \u003cem\u003eC. ulmi\u003c/em\u003e at different timepoints. \u003cstrong\u003eB\u003c/strong\u003e) K-means method was applied to cluster the gene expression. \u003cstrong\u003eC\u003c/strong\u003e) KEGG enrichment analysis for the four clusters of genes.\u003c/p\u003e","description":"","filename":"Figure4.png","url":"https://assets-eu.researchsquare.com/files/rs-4754772/v1/bb585c8a60ac5572258cba84.png"},{"id":62305932,"identity":"ac48db2e-fc54-4adb-9295-4267fde957d9","added_by":"auto","created_at":"2024-08-12 18:15:13","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":1207563,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eA\u003c/strong\u003e) Principal component analysis using gene expression of \u003cem\u003eU. parvifolia\u003c/em\u003e and \u003cem\u003eU. americana\u003c/em\u003e under infection of \u003cem\u003eC. ulmi\u003c/em\u003e at different timepoints. ‘U’ and ‘A’ represent\u003cem\u003e U. parvifolia\u003c/em\u003e and \u003cem\u003eU. americana\u003c/em\u003e, respectively. \u003cstrong\u003eB\u003c/strong\u003e) WGCNA uncovered 21 major modules. \u003cstrong\u003eC\u003c/strong\u003e) The correlation between modules and either samples or traits (‘Resistance’ and ‘Susceptibility’). \u003cstrong\u003eD\u003c/strong\u003e) Expression heatmap of genes in ‘MEyellow’ module. \u003cstrong\u003eE, F\u003c/strong\u003e) KEGG and GO enrichment and analysis for the genes in ‘MEyellow’ module.\u003c/p\u003e","description":"","filename":"Figure5.png","url":"https://assets-eu.researchsquare.com/files/rs-4754772/v1/ade5181473aae2307e2a3532.png"},{"id":72640239,"identity":"9cdf1134-4e15-47f4-8dbc-ce8b95d15b18","added_by":"auto","created_at":"2024-12-30 16:02:03","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":7056384,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-4754772/v1/10098cd5-6f13-44d9-950e-9caf4def2232.pdf"},{"id":62305665,"identity":"76534fd0-59ec-40f4-82f4-6c3826258f97","added_by":"auto","created_at":"2024-08-12 18:07:13","extension":"zip","order_by":8,"title":"","display":"","copyAsset":false,"role":"supplement","size":1146454,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryFigureandTable.zip","url":"https://assets-eu.researchsquare.com/files/rs-4754772/v1/d7941a80df0fbca2242b308d.zip"}],"financialInterests":"No competing interests reported.","formattedTitle":"Chromosome-Level Genome Assembly Unveils the Molecular Mechanisms Underlying Disease Resistance in Ulmus parvifolia","fulltext":[{"header":"Introduction","content":"\u003cp\u003e \u003cem\u003eUlmus parvifolia\u003c/em\u003e, known as the Chinese elm, is a deciduous tree species native to East Asia, particularly found in regions such as China, Korea, and Japan, where it thrives in various climatic conditions, from the warm subtropical regions to the cooler temperate zones. \u003cem\u003eU. parvifolia\u003c/em\u003e is recognized for its ecological and economic value, especially in urban forestry and horticulture. It is appreciated for its ability to tolerate pollution and harsh urban conditions, making it a favored option for city landscapes where it can provide shade, reduce noise, and improve air quality. Furthermore, its aesthetic appeal, including its attractive, peeling bark and dense foliage, enabling it a popular choice for urban landscaping and as a bonsai subject \u003csup\u003e\u003cspan additionalcitationids=\"CR2\" citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003e \u003cem\u003eU. parvifolia\u003c/em\u003e has demonstrated a significant level of resistance to Dutch elm disease, a fungal infection caused by \u003cem\u003eCeratocystis ulmi\u003c/em\u003e, \u003cem\u003eOphiostoma ulmi\u003c/em\u003e and \u003cem\u003eO. novo-ulmi\u003c/em\u003e, that has led to widespread decline of elm populations, particularly \u003cem\u003eUlmus americana\u003c/em\u003e, or the American elm. The resistance of \u003cem\u003eU. parvifolia\u003c/em\u003e is of particular interest to arborists and plant pathologists, as it offers potential avenues for the development of resistant strains through breeding programs. Different from \u003cem\u003eU. americana\u003c/em\u003e, which is highly susceptible to the disease and has suffered substantial losses, \u003cem\u003eU. parvifolia\u003c/em\u003e has shown a capacity to withstand or tolerate the pathogen, making it a valuable resource in the fight against Dutch elm disease \u003csup\u003e\u003cspan additionalcitationids=\"CR5 CR6 CR7 CR8\" citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e\u003c/sup\u003e. Therefore, understanding the contrasting susceptibility of \u003cem\u003eU. parvifolia\u003c/em\u003e and \u003cem\u003eU. americana\u003c/em\u003e to Dutch elm disease can provide insights into the genetic and molecular foundations of disease resistance, which is essential for the development of resistant elm varieties through breeding or genetic engineering techniques. Such research efforts are important for conservation, given the ecological significance of elms and the potential for their decline due to disease to significantly impact ecosystems. Understanding the variations in susceptibility between the two elm species can offer valuable insights into elm evolution and the possible effects of climate change on disease spread. However, the current limitation of having only one single, low-quality assembly for Ulmus species (\u003cem\u003eU. americana\u003c/em\u003e) poses challenges for functional genomics, gene discovery, and application of modern biotechnology for breeding. Furthermore, comparative genomics research is also hindered, affecting broader scientific insights into plant evolution and disease resistance mechanisms.\u003c/p\u003e \u003cp\u003eIn this study, we presented a high-quality chromosome-level genome assembly for \u003cem\u003eU. parvifolia\u003c/em\u003e by incorporating multiple sequencing strategies. Applying phylogenetic analysis, we delineated the phylogenetic relationships of \u003cem\u003eU. parvifolia\u003c/em\u003e with other species of Rosales. Comparative transcriptome analysis unveiled the potential mechanisms accounting for the differing susceptibility of \u003cem\u003eU. parvifolia\u003c/em\u003e and \u003cem\u003eU. americana\u003c/em\u003e to Dutch elm disease. Our research contributes to a deeper understanding of the genetic basis of resistance to Dutch elm disease and advancing our understanding of elm-pathogen interactions, which are critical for the conservation of elms and their ecosystem services.\u003c/p\u003e"},{"header":"Method","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003eSample collection and library preparation\u003c/h2\u003e \u003cp\u003eSamples of \u003cem\u003eU. parvifolia\u003c/em\u003e were collected from a superior asexual line at the good seed base of the Jiangsu Academy of Forestry, where a 30-year-old tree with a crown width of 8m x 10m was located. The young leaf samples of the tree were promptly preserved in a liquid nitrogen tank and brought back to the laboratory for storage in an ultra-low temperature freezer for subsequent DNA/RNA extraction. The same batch of fresh young leaf material can also be utilized for further analysis, such as Hi-C sequencing. The frozen \u003cem\u003eU. parvifolia\u003c/em\u003e young leaf samples were ground after being rapidly frozen using liquid nitrogen, and then DNA was extracted according to the CTAB method. The extracted DNA was sent to Guangzhou BGI Genomics Co., Ltd. for sequencing. The genomic assessment data were generated by performing Illumina HiSeq PE150 sequencing on the extracted DNA. The genomic assembly data were generated using the high-throughput third-generation sequencing platform (PacBio Sequel II) in the Circular Consensus Sequencing mode. Additionally, Hi-C sequencing was performed using the Illumina HiSeq PE150 platform.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec4\" class=\"Section2\"\u003e \u003ch2\u003eGenome assembly and completeness evaluation\u003c/h2\u003e \u003cp\u003eThe adapter and low-quality sequencing reads were removed using fastp and Trimmomatic, and then FastQC (v0.11.9) was utilized to assess the quality of the clean sequencing data \u003csup\u003e\u003cspan additionalcitationids=\"CR11\" citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e–\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e\u003c/sup\u003e. The k-mer frequency analysis by jellyfishcount led to the selection of a k-mer length of 21 for GenomeScope, which was then utilized to estimate the genome size, heterozygosity, and repeat content \u003csup\u003e\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e,\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e\u003c/sup\u003e. The clean HiFi reads were \u003cem\u003ede novo\u003c/em\u003e assembled using Hifiasm (v0.16.1-r375) software with the default parameters \u003csup\u003e\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e\u003c/sup\u003e. The Hi-C sequencing data underwent filtration with Hic-pro software to eliminate single-mapped, multiple mapped, and duplicated reads \u003csup\u003e\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e\u003c/sup\u003e. Scaffolding, sorting, and orienting of the genome were performed using Juicer and 3D-DNA, culminating in the generation of a chromosome-level genome assembly \u003csup\u003e\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e,\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e\u003c/sup\u003e. Finally, Juice-box was utilized to manually adjust chromosome boundaries and correct any misjoins, inversions, or translocations present in the assembly. \u003csup\u003e\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e\u003c/sup\u003e. Benchmarking Universal Single-Copy Orthologs (BUSCO, v5.5.0) with embryophyte_odb10, eudicots_odb10, eukaryota_odb10, viridiplantae_odb10 databases was applied to evaluate the completeness of the genome \u003csup\u003e\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e\u003c/sup\u003e. LTR assembly index (LAI) was calculated to assess the quality of genome \u003csup\u003e\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec5\" class=\"Section2\"\u003e \u003ch2\u003eGenome annotation\u003c/h2\u003e \u003cp\u003eGene structure prediction was conducted using the GETA software (v2.5.7) (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/chenlianfu/GETA\u003c/span\u003e\u003cspan address=\"https://github.com/chenlianfu/GETA\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e) that integrates three distinct gene prediction strategies, including the de novo method, the homology-based method, and the transcriptome-based method. A variety of tools were employed in this process, including PASA \u003csup\u003e\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e\u003c/sup\u003e, genewise \u003csup\u003e\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e\u003c/sup\u003e, AUGUSTUS \u003csup\u003e\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e\u003c/sup\u003e, Hisat2 \u003csup\u003e25\u003c/sup\u003e, hmmsearch \u003csup\u003e\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e\u003c/sup\u003e, BGM2AT, exonerate and among others. To predict gene functions, interpro-scan and eggNOG-Mapper were utilized, while multiple gene function databases were referenced to provide comprehensive functional annotations. These databases encompassed Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), Pfam, Clusters of Orthologous Groups (COG), and the Swiss-Prot protein database \u003csup\u003e\u003cspan additionalcitationids=\"CR28 CR29 CR30 CR31 CR32\" citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e–\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec6\" class=\"Section2\"\u003e \u003ch2\u003eRepeat Annotation\u003c/h2\u003e \u003cp\u003eFor construction of repeat library for the \u003cem\u003eU. parvifolia\u003c/em\u003e genome, the \u003cem\u003ede novo\u003c/em\u003e identification of repeat elements was performed using RepeatModeler (v2.0.3)\u003csup\u003e\u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e\u003c/sup\u003e. Subsequently, the identified repeat consensus sequences were utilized as seeds for RepeatMasker (v4.1.0)\u003csup\u003e\u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e\u003c/sup\u003e to identify and analyze all corresponding repeat regions throughout the genome. To predict the full length LTR retrotransposons (LTR-RTs), a combination of tools was employed, including LTR_finder (v1.07)\u003csup\u003e\u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e\u003c/sup\u003e, LTRharvest (v1.6.2)\u003csup\u003e\u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e37\u003c/span\u003e\u003c/sup\u003e and LTR_retriever (v2.9.0)\u003csup\u003e\u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e\u003c/sup\u003e. To determine the evolution of LTR-RTs, the reverse transcriptase (RT) domains of the intact LTR-RTs were extracted to construct phylogenetic tree using IQ-TREE\u003csup\u003e\u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e\u003c/sup\u003e. The subfamily information of LTR-RTs were annotated based on TEsorter \u003csup\u003e\u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e40\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec7\" class=\"Section2\"\u003e \u003ch2\u003ePhylogenetic analysis\u003c/h2\u003e \u003cp\u003eTo investigate the phylogenetic relationships of \u003cem\u003eU. parvifolia\u003c/em\u003e, a selection of eight published genome assemblies from the Rosales order was utilized, comprising \u003cem\u003eMorus notabilis\u003c/em\u003e, \u003cem\u003eCannabis sativa\u003c/em\u003e, \u003cem\u003eParasponia andersonii\u003c/em\u003e, \u003cem\u003eTrema orientale\u003c/em\u003e, \u003cem\u003eRhamnella rubrinervis\u003c/em\u003e, \u003cem\u003eZiziphus jujuba\u003c/em\u003e, \u003cem\u003eMalus domestica\u003c/em\u003e, \u003cem\u003ePrunus persica\u003c/em\u003e, with \u003cem\u003ePopulus trichocarpa\u003c/em\u003e designated as the outgroup (as detailed in Table S2). Orthofinder (v2.5.2) and orthomcl were employed to discern single-copy orthologous genes across these species. \u003csup\u003e\u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e,\u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e\u003c/sup\u003e. For the alignment of multiple sequences, Muscle (v3.8.31) was applied, while Gblocks (v0.91b) was used to isolate conserved regions with the parameters set to '-b4 = 10 -b5 = h -t = p' \u003csup\u003e\u003cspan citationid=\"CR43\" class=\"CitationRef\"\u003e43\u003c/span\u003e,\u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e44\u003c/span\u003e\u003c/sup\u003e. The optimal substitution model was determined using ProtTest3, with both Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) pointing to the LG + I + G + F model. The construction of the phylogenetic tree was accomplished using RAxML (v8.2.12), employing both Maximum Likelihood and Bayesian algorithms \u003csup\u003e\u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e45\u003c/span\u003e,\u003cspan citationid=\"CR46\" class=\"CitationRef\"\u003e46\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eDivergence times among \u003cem\u003eU. parvifolia\u003c/em\u003e and the other selected species were estimated using MCMCTree with default parameter \u003csup\u003e\u003cspan citationid=\"CR47\" class=\"CitationRef\"\u003e47\u003c/span\u003e\u003c/sup\u003e. These estimates were cross-referenced with data from TimeTree5 for the divergence times between \u003cem\u003eMorus notabilis\u003c/em\u003e and \u003cem\u003eTrema orientale\u003c/em\u003e (ranging from 48.9 to 70.9\u0026nbsp;million years ago) and between \u003cem\u003eRhamnella rubrinervis\u003c/em\u003e and \u003cem\u003ePrunus persica\u003c/em\u003e (ranging from 76.0 to 95.5\u0026nbsp;million years ago) \u003csup\u003e\u003cspan citationid=\"CR48\" class=\"CitationRef\"\u003e48\u003c/span\u003e\u003c/sup\u003e. The analysis of gene family expansion and contraction was performed using the CAFÉ software (V5.5.1.0) \u003csup\u003e\u003cspan citationid=\"CR49\" class=\"CitationRef\"\u003e49\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003ePrediction of whole-genome polyploidy event\u003c/h2\u003e \u003cp\u003eCollinearity analysis between genomes was conducted using JCVI (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/tanghaibao/jcvi\u003c/span\u003e\u003cspan address=\"https://github.com/tanghaibao/jcvi\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e) with the default settings \u003csup\u003e\u003cspan citationid=\"CR50\" class=\"CitationRef\"\u003e50\u003c/span\u003e\u003c/sup\u003e. Orthologous and paralogous gene pairs were identified based on the presence of syntenic blocks. Paired synonymous substitution rates (Ks) were calculated using TBtools \u003csup\u003e\u003cspan citationid=\"CR51\" class=\"CitationRef\"\u003e51\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec9\" class=\"Section2\"\u003e \u003ch2\u003eRNA-seq analysis\u003c/h2\u003e \u003cp\u003eThe raw reads were dealt using fastq \u003csup\u003e\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e\u003c/sup\u003e for quality control. The clean reads were mapped to the \u003cem\u003eU. parvifolia\u003c/em\u003e genome using Hisat2 \u003csup\u003e52\u003c/sup\u003e and samtools \u003csup\u003e\u003cspan citationid=\"CR53\" class=\"CitationRef\"\u003e53\u003c/span\u003e\u003c/sup\u003e. Stringtie \u003csup\u003e\u003cspan citationid=\"CR54\" class=\"CitationRef\"\u003e54\u003c/span\u003e\u003c/sup\u003e and featurecount \u003csup\u003e\u003cspan citationid=\"CR55\" class=\"CitationRef\"\u003e55\u003c/span\u003e\u003c/sup\u003e were used to quantify gene expression. Differential expression analysis was performed using DESeq2 \u003csup\u003e56\u003c/sup\u003e. The FDR less than 0.05 and |log2 fold change (log2FC)| \u0026gt; 1 were set as the thresholds to define differentially expressed genes (DEGs).\u003c/p\u003e \u003cp\u003eIn total, 7280 DEGs were screened out for construction of co-expression network using WGCNA \u003csup\u003e\u003cspan citationid=\"CR57\" class=\"CitationRef\"\u003e57\u003c/span\u003e\u003c/sup\u003e. The power adjacency function was used to transform the similarity matrix into an adjacency matrix. According to the scale-free topology criterion, the power β = 10 was selected. Overall, 21 modules with a minimum gene number of 50 were identified. The samples that \u003cem\u003eU. parvifolia\u003c/em\u003e and \u003cem\u003eU. americana\u003c/em\u003e were inoculated with \u003cem\u003eCeratocystis ulmi\u003c/em\u003e at 48, 96 and 144 hours were defined as ‘Resistance’ and ‘Susceptibility’ traits, respectively. Then, the correlation between the modules and either the samples or traits were calculated, and p-values less than 0.01 were treated as significant correlation. The adjusted p-value \u0026lt; 0.05 was used to define the significant enrichment GO terms and KEGG pathways.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec10\" class=\"Section2\"\u003e \u003ch2\u003eqRT-PCR assay\u003c/h2\u003e \u003cp\u003eSix DEGs were randomly selected to validate gene expression patterns during infection by \u003cem\u003eC. ulmi\u003c/em\u003e. PCR primers were designed using the Primer3plus (\u003cb\u003eTable S3\u003c/b\u003e). The housekeeping gene, Upa02186, in \u003cem\u003eU. parvifolia\u003c/em\u003e was used as the reference. Total RNA was extracted using Trizol method, and then cDNA synthesis was performed using Hifair® II 1st Strand CDNA Synthesis SuperMix for qPCR (gDNA digester plus) kit from YEASEN. qRT-PCR analysis was conducted using cDNA as a template. Each experiment was performed with 3 replicates, and the relative gene expression levels were calculated using 2\u003csup\u003e−ΔΔCT\u003c/sup\u003e method.\u003c/p\u003e \u003c/div\u003e "},{"header":"Result","content":"\u003ch2\u003eGenome assembly and annotation\u003c/h2\u003e\u003cp\u003eThe k-mer analysis (K = 21) using ~ 150 Gb NGS short-reads (89-fold coverage) indicated that the estimated genome size of \u003cem\u003eUlmus parvifolia\u003c/em\u003e was about 1.68 Gb, accompanied by a high level of heterozygosity (2.57%) (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e1\u003c/span\u003eA). We employed different sequencing platforms to develop a chromosome-level genome assembly for \u003cem\u003eU. parvifolia\u003c/em\u003e. Initially, 49.8 Gb HiFi sequencing data (29.6-fold coverage) generated by the PacBio sequel II platform was exploited to assemble a draft genome, with the total size of 1.79 Gb, the contig number of 105, the contig N50 of 10,755,710 bp, and the GC content of 34.2% (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). Using 130.2 Gb Hi-C sequencing data (73-fold coverage), a total of 149 contigs, covering 94.3% of the draft genome, were anchored into 14 pseudo-chromosomes (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e1\u003c/span\u003eB, D). The analysis of Benchmarking Universal Single-Copy Orthologs (BUSCO) based on Embryophyte, Eudicots, Eukaryota, Viridiplantae databases revealed that 98.2%, 97.2%, 98.9% and 99.0% of the gene models were complete, suggesting the high completeness of the \u003cem\u003eU. parvifolia\u003c/em\u003e genome (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e1\u003c/span\u003eC). Furthermore, the LTR assembly index (LAI) was calculated to assess the genome assembly quality. The LAI of the whole genome was about 16.33 and the average LAIs of 14 pseudochromosomes range from 15.41 in Chr3 to 17.15 in Chr8 (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e1\u003c/span\u003eE).\u003c/p\u003e\u003cp\u003eA combined strategy that encompassed ab initio prediction, homology-based prediction, and transcript-based prediction methods was used to annotate gene structure in the \u003cem\u003eU. parvifolia\u003c/em\u003e genome, yielding a total of 34,351 gene models. A subsequent BUSCO analysis, conducted in protein mode, indicated a high level of completeness, with 98.9%, 97.8%, 99.6% and 100% of the conserved proteins identified according to the Embryophyte, Eudicots, Eukaryota, Viridiplantae databases (\u003cb\u003eFig. \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003eS1\u003c/span\u003e\u003c/b\u003e). A total of 15239, 9382, 25559, 653 and 26235 genes, accounting for more than 80% of the total genes, were successfully annotated with at least one term from GO, KEGG, Pfam, CAZy and COG databases, respectively. These results collectively demonstrated the high assembly quality, completeness, and continuity of the \u003cem\u003eU. parvifolia\u003c/em\u003e genome.\u003c/p\u003e\u003cdiv class=\"gridtable\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eGenomic characteristics of \u003cem\u003eU. parvifolia\u003c/em\u003e\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e\u003ccolgroup cols=\"2\"\u003e\u003c/colgroup\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eGenome Featurese\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eValues\u003c/p\u003e \u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eGenome size(bp)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e1,795,583,205\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eContig Number\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e1051\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eContig N50\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e10,755,710\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eContig N90\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e2,802,378\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eChr Number\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e14\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRepeat Contente\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e71.79%\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eGene number\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e34,351\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eGC\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e34.20%\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLTR Assembly Index(LAI)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e16.33\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eBUSCO\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e98.20%\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/table\u003e\u003c/div\u003e\u003cp\u003eAmong the species of Rosales with available reference genome, \u003cem\u003eU. parvifolia\u003c/em\u003e possesses large genome size that is second only to \u003cem\u003eHumulus lupulus\u003c/em\u003e. Many previous researches have unveiled the correlation between genome size and TE content \u003csup\u003e\u003cspan additionalcitationids=\"CR59\" citationid=\"CR58\" class=\"CitationRef\"\u003e58\u003c/span\u003e–\u003cspan citationid=\"CR60\" class=\"CitationRef\"\u003e60\u003c/span\u003e\u003c/sup\u003e. Approximately 71.79% of the \u003cem\u003eU. parvifolia\u003c/em\u003e genome was covered by repeat elements and majority of them were LTR retrotransposons (LTR-RTs), representing 63.65% of the assembly (\u003cb\u003eTable \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003eS1\u003c/span\u003e\u003c/b\u003e). Estimation of TE insertion time based on 12334 intact LTR-RT sequences revealed a recently large-scale amplification of LTR-RTs, which happened on approximately 124 thousand years ago (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e1\u003c/span\u003eF). Phylogenetic analysis of LTR-RTs showed that \u003cem\u003eTy3/Gypsy\u003c/em\u003e Retand subfamily was predominant, followed by Tekay and Athila subfamilies (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e2\u003c/span\u003e). In aggregate, the recent burst of LTR-RTs, especially \u003cem\u003eTy3/Gypsy\u003c/em\u003e elements, may partially account for the large genome size of \u003cem\u003eU. parvifolia\u003c/em\u003e.\u003c/p\u003e\u003ch2\u003ePopulation history\u003c/h2\u003e\u003cp\u003eWe conducted the Pairwise Sequentially Markovian Coalescent (PSMC) analysis to deduce the historical population dynamics of \u003cem\u003eU. parvifolia\u003c/em\u003e. Our findings indicated that the effective population size of \u003cem\u003eU. parvifolia\u003c/em\u003e started to increase around one million years ago, reaching its maximum approximately 0.7\u0026nbsp;million years ago. Following, the effective population size showed a significant and ongoing decline till the present day (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e3\u003c/span\u003eA). We assumed that the reduction in the population size of \u003cem\u003eU. parvifolia\u003c/em\u003e can be attributed to a variety of factors. For example, the significant climatic shifts, such as the glacial and interglacial periods, can lead to habitat loss or fragmentation, affecting the distribution and abundance of species. Moreover, the expansion of human populations, deforestation, urban development, and agriculture have led to loss of natural habitats, directly impacting the population sizes of \u003cem\u003eU. parvifolia\u003c/em\u003e. Other factors, such as natural selection and genetic drift, disease, fragmentation and isolation, may also result in the population size reduction.\u003c/p\u003e\u003ch2\u003eComparative genomics analysis\u003c/h2\u003e\u003cp\u003eTo delineate the evolutionary history of \u003cem\u003eU. parvifolia\u003c/em\u003e, we performed phylogenetic analysis by comparing it with eight other Rosales species, and \u003cem\u003ePopulus trichocarpa\u003c/em\u003e that belong to Malpighiales was selected as the outgroup (Table S2). In total, we identified 8685 single-copy orthologous genes across these species, which were then employed to establish the phylogenetic relationships. The phylogenetic tree organized the species into five major clusters based on family, including Ulmaceae (\u003cem\u003eU. parvifolia\u003c/em\u003e), Moraceae (\u003cem\u003eMorus notabilis\u003c/em\u003e), Cannabaceae (\u003cem\u003eCannabis sativa\u003c/em\u003e, \u003cem\u003eParasponia andersonii\u003c/em\u003e, \u003cem\u003eTrema orientale\u003c/em\u003e), Rhamnaceae (\u003cem\u003eRhamnella rubrinervis\u003c/em\u003e, \u003cem\u003eZiziphus jujuba\u003c/em\u003e), Rosaceae (\u003cem\u003eMalus domestica\u003c/em\u003e, \u003cem\u003ePrunus persica\u003c/em\u003e). Notably, \u003cem\u003eU. parvifolia\u003c/em\u003e exhibited a closer phylogenetic affinity to \u003cem\u003eMorus notabilis\u003c/em\u003e among all the species examined. Moreover, \u003cem\u003eU. parvifolia\u003c/em\u003e, \u003cem\u003eMorus notabilis\u003c/em\u003e and Cannabaceae species diverged from their common ancestor approximately 73.25\u0026nbsp;million years ago (MYA).\u003c/p\u003e\u003cp\u003eComparative analysis of gene families among these species unveiled a total of 1100 and 1019 gene families that were significantly expanded and contracted in \u003cem\u003eU. parvifolia\u003c/em\u003e, respectively (p-value \u0026lt; 0.01, Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e3\u003c/span\u003eB). Interestingly, KEGG enrichment analysis for the 1100 expanded gene families revealed the strong correlation with plant disease resistance and immune response, such as ‘Brassinosteroid biosynthesis’, ‘Metabolism of xenobiotics by cytochrome P450’, ‘Plant-pathogen interaction’, ‘MAPK signaling pathway’, ‘Isoflavonoid biosynthesis’ (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e3\u003c/span\u003eC). Consistently, GO enrichment analysis showed many terms associated with plant defense response, including ‘detection of biotic stimulus’, ‘defense response by cell wall thickening’, ‘isoprenoid biosynthetic process’, ‘response to reactive oxygen species’ (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e3\u003c/span\u003eD). These results collectively reflected an adaptation of \u003cem\u003eU. parvifolia\u003c/em\u003e to diverse biotic and abiotic stresses, enhancing its defensive capabilities, increasing allelic diversity and defense mechanism redundancy, which may also partially account for its resistance to Dutch Elm Disease.\u003c/p\u003e\u003cp\u003eWe analyzed the chromosomal collinearity within \u003cem\u003eU. parvifolia\u003c/em\u003e to understand its chromosome evolution. Notably, we discovered several chromosomal segments that were collinear across chromosomes 5, 9, and 13, suggesting a potential whole-genome triplication event in the evolutionary history of \u003cem\u003eU. parvifolia\u003c/em\u003e (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e3\u003c/span\u003eE). To substantiate this finding, we measured the synonymous substitution rate (Ks) to infer whole-genome polyploidy event. Our analysis demonstrated that the Ks values of \u003cem\u003eU. parvifolia\u003c/em\u003e and \u003cem\u003eM. notabilis\u003c/em\u003e exhibited a concordant peak, indicative of a shared ancestral whole-genome polyploidy event. Importantly, the peak Ks value for \u003cem\u003eU. parvifolia\u003c/em\u003e was considerably lower compared to the values between \u003cem\u003eU. parvifolia\u003c/em\u003e and either \u003cem\u003eM. notabilis\u003c/em\u003e or \u003cem\u003eP. trichocarpa\u003c/em\u003e, suggesting that the polyploidy event possibly occurred prior to the divergence of \u003cem\u003eU. parvifolia\u003c/em\u003e, \u003cem\u003eM. notabilis\u003c/em\u003e and \u003cem\u003eP. trichocarpa\u003c/em\u003e from their common ancestor (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e3\u003c/span\u003eF). Considering the phylogenetic relationships and the estimated divergence time among these species (approximately 39.6 to 95.6 MYA between \u003cem\u003eU. parvifolia\u003c/em\u003e and \u003cem\u003eM. notabilis\u003c/em\u003e, and around 99.0 to 111.3 MYA between \u003cem\u003eU. parvifolia\u003c/em\u003e and \u003cem\u003eP. trichocarpa\u003c/em\u003e), we proposed that the polyploidy event in \u003cem\u003eU. parvifolia\u003c/em\u003e could be associated with the γ whole-genome triplication (γ-WGT) event, which happened around 120\u0026nbsp;million years ago and has been proved to be associated with the early diversification of the core eudicots.\u003c/p\u003e\u003cp\u003e \u003cb\u003eGene Expression Profiling of\u003c/b\u003e \u003cb\u003eU. parvifolia\u003c/b\u003e \u003cb\u003ein Response to\u003c/b\u003e \u003cb\u003eCeratocystis ulmi\u003c/b\u003e \u003cb\u003eInfection\u003c/b\u003e\u003c/p\u003e\u003cp\u003eAs one of the significant threats to \u003cem\u003eU. parvifolia\u003c/em\u003e, the blight, commonly known as Dutch Elm Disease, that is caused by the fungus \u003cem\u003eCeratocystis ulmi\u003c/em\u003e has had a significant impact on elm populations worldwide. To investigate the interaction between \u003cem\u003eU. parvifolia\u003c/em\u003e and \u003cem\u003eC. ulmi\u003c/em\u003e, we inoculated \u003cem\u003eU. parvifolia\u003c/em\u003e with \u003cem\u003eC. ulmi\u003c/em\u003e and extracted RNA at 0, 48, 96 and 144 hours post inoculation (hpi) for sequencing. Across the process, we identified a total of 7280 differential expression genes (DEGs) by comparing gene expressions of each two timepoints. Principal component analysis (PCA) revealed that the expression pattern of the uninfected samples was distinct with that of either 48 hpi or 96 hpi while was relatively similar with that of 144 hpi, suggesting that \u003cem\u003eU. parvifolia\u003c/em\u003e majorly responses to \u003cem\u003eC. ulmi\u003c/em\u003e infection at 48 and 96 hpi (\u003cb\u003eFig. S2\u003c/b\u003e).\u003c/p\u003e\u003cp\u003eK-means clustering analysis uncovered four major gene expression patterns, including the genes that were down-regulated after inoculation of \u003cem\u003eC. ulmi\u003c/em\u003e (cluster1), the genes that were predominantly up-regulated at 48 hpi (cluster2), 96 hpi (cluster2) and 144 hpi (cluster4), respectively (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e4\u003c/span\u003eA, B). The genes in cluster1 was significantly enriched on the pathways ‘Carotenoid biosynthesis’, ‘Cutin, suberine and wax biosynthesis’, ‘Diterpenoid biosynthesis’ (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e4\u003c/span\u003eC). Abnormally, these pathways have been known to play a role in plant defense mechanisms while were downregulated during infection stages. We assumed that \u003cem\u003eU. parvifolia\u003c/em\u003e may prioritize the production of certain defensive compounds over others when responding to fungal infection, which could be a strategic shift to ensure short-term survival over long-term benefits. Noticeably, the genes in cluter2 and cluster3 were involved in several pathways associated with plant defense and immune response, such as ‘Metabolism of xenobiotics by cytochrome P450’, ‘Brassinosteroid biosynthesis’, ‘Monoterpenoid biosynthesis’, ‘MAPK signaling pathway − plant’, ‘Phenylpropanoid biosynthesis’ (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e4\u003c/span\u003eC). The results implied that \u003cem\u003eU. parvifolia\u003c/em\u003e may prioritize these pathways to defense \u003cem\u003eC. ulmi\u003c/em\u003e infection. The genes in cluster4 were mainly correlated with ‘Protein processing in endoplasmic reticulum’, ‘RNA transport’, ‘Spliceosome’, indicating the urgent needs for protein processing and transporting to ensure an adequate supply of useful compounds that can be deployed in the plant's immune response (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e4\u003c/span\u003eC).\u003c/p\u003e\u003cp\u003eIn summary, our analysis characterized the expression patterns of \u003cem\u003eU. parvifolia\u003c/em\u003e under infection of \u003cem\u003eC. ulmi\u003c/em\u003e at different timepoints and the results showed that \u003cem\u003eU. parvifolia\u003c/em\u003e majorly response to infection at 48 and 96 hpi by upregulating several pathways that play crucial roles in plant defense responses.\u003c/p\u003e\u003cp\u003e \u003cb\u003eComparative transcriptomes analysis unveiled the disease-resistance of\u003c/b\u003e \u003cb\u003eU. parvifolia\u003c/b\u003e\u003c/p\u003e\u003cp\u003eVariations in susceptibility to Dutch elm disease have been reported between \u003cem\u003eU. parvifolia\u003c/em\u003e and \u003cem\u003eU. americana\u003c/em\u003e, with \u003cem\u003eU. parvifolia\u003c/em\u003e showing some resistance while \u003cem\u003eU. americana\u003c/em\u003e being more susceptible. To analyze the molecular basis underlying their distinct susceptibility, we inoculated \u003cem\u003eU. parvifolia\u003c/em\u003e and \u003cem\u003eU. americana\u003c/em\u003e with \u003cem\u003eC. ulmi\u003c/em\u003e and performed comparative transcriptome analysis at 48, 96 and 144 hpi. The PAC result showed that the overall expression patterns at both uninfected and infected stages were distinguished between two elms. However, the gene expressional trajectory during the infection process were similar, implying that the two species may exploit analogous strategies to defense pathogen infection (Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e5\u003c/span\u003eA).\u003c/p\u003e\u003cp\u003eWeighted gene co-expression network analysis (WGCNA) was utilized to explore the differential gene expression patterns that potentially explain the differing disease resistance observed in the two Ulmus species. In total, we identified 21 modules based on gene expression tendency (Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e5\u003c/span\u003eB). Among these modules, the ‘MEyellow’ was specifically attracted because the genes in the module were highly expressed in \u003cem\u003eU. parvifolia\u003c/em\u003e at 48 hpi and 96 hpi. Moreover, analysis of module-trait correlation revealed that the ‘MEyellow’ is the only module that is significantly correlated with the ‘resistance’ trait of Ulmus (correlation = 0.68, pvalue = 2e-04) (Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e5\u003c/span\u003eC). The heatmap showed that the genes in the ‘MEyellow’ were predominantly expressed in \u003cem\u003eU. parvifolia\u003c/em\u003e at 96 hpi, followed by 48 hpi (Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e5\u003c/span\u003eD). Taken together, the results implied that the differential expression of genes in the ‘MEyellow’ may account for the contrasting susceptibility of \u003cem\u003eU. parvifolia\u003c/em\u003e and \u003cem\u003eU. americana\u003c/em\u003e to infection. KEGG enrichment analysis showed that these genes were significantly enriched on the pathways of ‘Metabolism of xenobiotics by cytochrome P450’, ‘Brassinosteroid biosynthesis’, ‘Phenylpropanoid biosynthesis’, and some important secondary metabolism pathways, such as ‘Tryptophan metabolism’, ‘Glutathione metabolism’ (Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e5\u003c/span\u003eE). These pathways have been reported to play important roles in plant defense and immune response. For instance, the 'Metabolism of xenobiotics by cytochrome P450' pathway is pivotal in the plant's reaction to fungal invasion, facilitating processes such as the detoxification of fungal byproducts, synthesis of compounds associated with defense, and regulation of plant hormone levels. Consistently, GO enrichment analysis also displayed many enriched terms associated with plant response to biotic and abiotic stress (Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e5\u003c/span\u003eF). In aggregate, our finding derived from comparative transcriptome analysis indicated that various pathways associated with plant defense and immune response were highly expressed in \u003cem\u003eU. parvifolia\u003c/em\u003e under infection of \u003cem\u003eC. ulmi\u003c/em\u003e at 48 and 96 hours, which may be a key factor contributing to the differing susceptibility of \u003cem\u003eU. parvifolia\u003c/em\u003e and \u003cem\u003eU. americana\u003c/em\u003e to Dutch elm disease.\u003c/p\u003e"},{"header":"Discussion","content":"\u003cp\u003eDiffering from \u003cem\u003eU. americana\u003c/em\u003e, which is highly susceptible to Dutch elm disease, \u003cem\u003eU. parvifolia\u003c/em\u003e has shown certain ability to tolerate the infection of \u003cem\u003eC. ulmi\u003c/em\u003e. However, the absence of genomic resource for \u003cem\u003eU. parvifolia\u003c/em\u003e represents a significant gap that impedes a comprehensive understanding of its genetic resistance to Dutch elm disease, limiting our ability to utilize this knowledge for breeding resistant varieties and advancing elm conservation efforts. Therefore, obtaining a high-quality genome assembly for \u003cem\u003eU. parvifolia\u003c/em\u003e is crucial for illuminating the molecular mechanisms of resistance, guiding biotechnological interventions, and preserving the ecological significance of elms in the face of disease challenges. In this study, we integrated HiFi and Hi-C sequencing to assemble a high-quality chromosome-level genome of \u003cem\u003eU. parvifolia\u003c/em\u003e, with 14 pseudochromosomes covering over 94% of the total genome size. The \u003cem\u003eU. parvifolia\u003c/em\u003e genome assembly showed an extremely high contig N50 (10.75 Mb), assembly quality, genome completeness, with high LAI (16.33) and BUSCO assessment (98.20%). In contrast, the assembly of \u003cem\u003eU. americana\u003c/em\u003e, it encompassed more 500 thousand contigs, with a contig N50 of only 2.5kb, representing low assembly quality and continuity. Notably, the genome size of \u003cem\u003eU. parvifolia\u003c/em\u003e (~\u0026thinsp;1.79 Gb) is over two-fold than that of \u003cem\u003eU. americana\u003c/em\u003e (~\u0026thinsp;865 Mb). Considering the high proportion of LTR-RTs in the \u003cem\u003eU. parvifolia\u003c/em\u003e genome, we presumed that amplification of transposable elements could be one of the factors leading to great difference of their genome sizes. LTR retrotransposons take up more than 60% of genome and nearly 90% of the total repeat contents, suggesting that LTR-RTs have significantly amplified in the \u003cem\u003eU. parvifolia\u003c/em\u003e genome. Among the LTR-RTs, \u003cem\u003eTy3/Gypsy\u003c/em\u003e Retand subfamily were predominant and showed a recently large-scale amplification, possibly reflecting the adaptation of \u003cem\u003eU. parvifolia\u003c/em\u003e to turbulent environments.\u003c/p\u003e \u003cp\u003eConsistent with the previous studies, \u003cem\u003eU. parvifolia\u003c/em\u003e displayed a closer phylogenetic relationship with Moraceae, followed by Cannabaceae, Rhamnaceae and Rosaceae \u003csup\u003e\u003cspan additionalcitationids=\"CR62\" citationid=\"CR61\" class=\"CitationRef\"\u003e61\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR63\" class=\"CitationRef\"\u003e63\u003c/span\u003e\u003c/sup\u003e. Investigation of the gene families expanded in \u003cem\u003eU. parvifolia\u003c/em\u003e revealed many terms or pathways associated with plant defense and immune response, indicating that \u003cem\u003eU. parvifolia\u003c/em\u003e had evolved a complex immune system, possibly contributing to enhanced pathogen recognition and adaptation to environmental stress, effectively combating a variety of pathogens. Analysis of chromosomal collinearity within the \u003cem\u003eU. parvifolia\u003c/em\u003e genome hinted the potential whole-genome triplication event in evolutionary history of \u003cem\u003eU. parvifolia.\u003c/em\u003e Through investigation of Ks distribution, we discover one polyploid event in \u003cem\u003eU. parvifolia\u003c/em\u003e that happened prior to the divergence of \u003cem\u003eU. parvifolia\u003c/em\u003e and \u003cem\u003eP. trichocarpa\u003c/em\u003e from their common ancestor. According to the divergence time estimated by Timetree5, \u003cem\u003eU. parvifolia\u003c/em\u003e and \u003cem\u003eM. notabilis\u003c/em\u003e diverged around 39.6\u0026ndash;95.6 MYA, and \u003cem\u003eU. parvifolia\u003c/em\u003e and \u003cem\u003eP. trichocarpa\u003c/em\u003e diverged around 99.0-111.3 MYA, we deduced that the polyploid event in \u003cem\u003eU. parvifolia\u003c/em\u003e possibly matched the Gamma WGT event that happened approximately 120\u0026nbsp;million years ago.\u003c/p\u003e \u003cp\u003eDue to the lack of a comprehensive genome assembly of \u003cem\u003eU. parvifolia\u003c/em\u003e, our knowledge concerning the distinguished susceptibility of \u003cem\u003eU. parvifolia\u003c/em\u003e and \u003cem\u003eU. americana\u003c/em\u003e to Dutch elm disease are very scanty. Previous study revealed distinct transcriptomic changes in resistant versus susceptible \u003cem\u003eU. americana\u003c/em\u003e genotypes under infection by \u003cem\u003eOphiostoma novo-ulmi\u003c/em\u003e, highlighting genetic factors that may contribute to disease resistance of \u003cem\u003eU. americana\u003c/em\u003e \u003csup\u003e\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e\u003c/sup\u003e. However, without an accurate and complete genomic blueprint of \u003cem\u003eU. americana\u003c/em\u003e, the study may lead to gaps in understanding the complex genetic basis of resistance and susceptibility to \u003cem\u003eO. novo-ulmi\u003c/em\u003e because the identification, annotation, and functional analysis of genes and their expression patterns can be less precise. To better compare gene expression patterns between \u003cem\u003eU. parvifolia\u003c/em\u003e and \u003cem\u003eU. americana\u003c/em\u003e under infection of \u003cem\u003eC. ulmi\u003c/em\u003e, we mapped RNA-seq reads of \u003cem\u003eU. americana\u003c/em\u003e to the \u003cem\u003eU. parvifolia\u003c/em\u003e genome. The mapping results showed that most of the \u003cem\u003eU. americana\u003c/em\u003e RNAseq data can be successfully aligned, with an average mapping ratio of 80.2%, demonstrating the feasibility of our method. Comparative transcriptome analysis using WGCNA revealed several enriched pathways involved in plant defense and immune response were more highly expressed in \u003cem\u003eU. parvifolia\u003c/em\u003e especially at 96 hpi. Among the pathways, \u0026lsquo;Metabolism of xenobiotics by cytochrome P450\u0026rsquo;, playing a critical role in plant responses to fungal infections through detoxification of fungal metabolites, production of defense-related compounds, modulation of plant hormone levels and so on, is the most significant. Moreover, phenylpropanoid biosynthesis is a critical component of the plant's arsenal against fungal infection, contributing to the production of antimicrobial compounds, signaling molecules, and structural components that enhance resistance. Brassinosteroids play a multifaceted role in plant defense against fungal infections by modulating immune responses, interacting with other hormones, influencing cell wall properties, and regulating growth and development to enhance resistance while also potentially affecting fungal growth directly. The activation of these pathways in \u003cem\u003eU. parvifolia\u003c/em\u003e under \u003cem\u003eC. ulmi\u003c/em\u003e infection is part of the elm's defense strategy to counteract the pathogenic threat, including detoxification of harmful substances, aiding in the production of antimicrobial compounds and structural defense, enhancement of immune system and stress response.\u003c/p\u003e \u003cdiv id=\"Sec16\" class=\"Section2\"\u003e \u003ch2\u003eData Records\u003c/h2\u003e \u003cp\u003eThe genome sequence and gene sequence were deposited in NCBI database under the accession number of JBDNDV000000000. Raw data of HiFi, Illumina and Hi-C used for genome assembly were deposited at NCBI under BioProject PRJNA1068684. RNA-seq data of \u003cem\u003eU. parvifolia\u003c/em\u003e and \u003cem\u003eU. americana\u003c/em\u003e under infection by \u003cem\u003eC. ulmi\u003c/em\u003e were were deposited at NCBI under BioProject PRJNA1093576\u003c/p\u003e \u003c/div\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eAcknowledgements\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis work was supported by the Independent Research Project of Jiangsu Academy of Forestry (ZZKY202104). We are grateful to Jianping Yi from Animals, Plants and Foods Inspection and Quarantine Technology Center (Shanghai, China) for providing the fungus \u003cem\u003eCeratocystis ulmi\u003c/em\u003e strain for this experiment and Guangzhou Genedenovo Biotechnology Co., Ltd for assisting in sequencing.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthor contributions\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eYun-Zhou Lyu and Wei Xing conceived the idea, analyzed the data, wrote the original draft, revised the manuscript, and got the funding; Hai-Nan Sun, Rui-Chang Yan, Gang Wang analyzed the data; Jiang-tao Shi, Li-Bin Huang Xiao-Yun Dong prepared the materials; Wei Xing supervised the project; All authors have read and agreed to the published version of the manuscript.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCompeting interests\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors declare no conflict of interest.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCode availability\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNo specific script was used in this work. The codes and pipelines used for data processing were all executed according to the manual and protocols of the corresponding bioinformatics software.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eFu, L. \u0026amp; Xin, Y. 33. ULMACEAE in Higher Plants Of China Vol. 4: ANGIOSPERMAE Vol. 1 1-25 (Qingdao Publishing Group, 2000).\u003c/li\u003e\n\u003cli\u003eFragniere, Y.\u003cem\u003e et al.\u003c/em\u003e Biogeographic Overview of Ulmaceae: Diversity, Distribution, Ecological Preferences, and Conservation Status. \u003cem\u003ePlants\u003c/em\u003e \u003cstrong\u003e10\u003c/strong\u003e, http://dx.doi.org/10.3390/plants10061111 (2021).\u003c/li\u003e\n\u003cli\u003eLu, P.\u003cem\u003e et al.\u003c/em\u003e Ancestors of Ulmus parvifolia from late Miocene sediments in Yunnan, Southwest China and its future distribution. \u003cstrong\u003e313\u003c/strong\u003e, 104879 (2023).\u003c/li\u003e\n\u003cli\u003eStrobel, G.A. \u0026amp; Lanier, G.N.J.S.A. Dutch elm disease. \u003cstrong\u003e245\u003c/strong\u003e, 56-67 (1981).\u003c/li\u003e\n\u003cli\u003eHubbes, M.J.T.F.C. The American elm and Dutch elm disease. \u003cstrong\u003e75\u003c/strong\u003e, 265-273 (1999).\u003c/li\u003e\n\u003cli\u003eKarnosky, D.F.J.E.C. Dutch elm disease: a review of the history, environmental implications, control, and research needs. \u003cstrong\u003e6\u003c/strong\u003e, 311-322 (1979).\u003c/li\u003e\n\u003cli\u003eScheffer, R., Voeten, J. \u0026amp; Guries, R.J.P.d. Biological control of Dutch elm disease. \u003cstrong\u003e92\u003c/strong\u003e, 192-200 (2008).\u003c/li\u003e\n\u003cli\u003eIslam, M.T.\u003cem\u003e et al.\u003c/em\u003e Deciphering the Genome-Wide Transcriptomic Changes during Interactions of Resistant and Susceptible Genotypes of American Elm with Ophiostoma novo-ulmi. \u003cem\u003eJ Fungi (Basel)\u003c/em\u003e \u003cstrong\u003e8\u003c/strong\u003e, http://dx.doi.org/10.3390/jof8020120 (2022).\u003c/li\u003e\n\u003cli\u003ede Oliveira, T.C.\u003cem\u003e et al.\u003c/em\u003e Unraveling the transcriptional features and gene expression networks of pathogenic and saprotrophic Ophiostoma species during the infection of Ulmus americana. \u003cem\u003eMicrobiology spectrum\u003c/em\u003e \u003cstrong\u003e12\u003c/strong\u003e, e0369423 http://dx.doi.org/10.1128/spectrum.03694-23 (2024).\u003c/li\u003e\n\u003cli\u003eChen, S., Zhou, Y., Chen, Y. \u0026amp; Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. \u003cem\u003eBioinformatics\u003c/em\u003e \u003cstrong\u003e34\u003c/strong\u003e, i884-i890 http://dx.doi.org/10.1093/bioinformatics/bty560 (2018).\u003c/li\u003e\n\u003cli\u003eAndrews, S. FastQC: a quality control tool for high throughput sequence data. (Babraham Bioinformatics, Babraham Institute, Cambridge, United Kingdom, 2010).\u003c/li\u003e\n\u003cli\u003eBolger, A.M., Lohse, M. \u0026amp; Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. \u003cem\u003eBioinformatics\u003c/em\u003e \u003cstrong\u003e30\u003c/strong\u003e, 2114-20 http://dx.doi.org/10.1093/bioinformatics/btu170 (2014).\u003c/li\u003e\n\u003cli\u003eRanallo-Benavidez, T.R., Jaron, K.S. \u0026amp; Schatz, M.C. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. \u003cem\u003eNature Communications\u003c/em\u003e \u003cstrong\u003e11\u003c/strong\u003e, 1432 http://dx.doi.org/10.1038/s41467-020-14998-3 (2020).\u003c/li\u003e\n\u003cli\u003eMarcais, G. \u0026amp; Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. \u003cem\u003eBioinformatics\u003c/em\u003e \u003cstrong\u003e27\u003c/strong\u003e, 764-70 http://dx.doi.org/10.1093/bioinformatics/btr011 (2011).\u003c/li\u003e\n\u003cli\u003eCheng, H., Concepcion, G.T., Feng, X., Zhang, H. \u0026amp; Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. \u003cem\u003eNature methods\u003c/em\u003e \u003cstrong\u003e18\u003c/strong\u003e, 170-175 http://dx.doi.org/10.1038/s41592-020-01056-5 (2021).\u003c/li\u003e\n\u003cli\u003eServant, N.\u003cem\u003e et al.\u003c/em\u003e HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. \u003cem\u003eGenome Biology\u003c/em\u003e \u003cstrong\u003e16\u003c/strong\u003e, 259 http://dx.doi.org/10.1186/s13059-015-0831-x (2015).\u003c/li\u003e\n\u003cli\u003eDurand, N.C.\u003cem\u003e et al.\u003c/em\u003e Juicer Provides a One-Click System for Analyzing Loop-Resolution Hi-C Experiments. \u003cem\u003eCell Systems\u003c/em\u003e \u003cstrong\u003e3\u003c/strong\u003e, 95-8 http://dx.doi.org/10.1016/j.cels.2016.07.002 (2016).\u003c/li\u003e\n\u003cli\u003eDudchenko, O.\u003cem\u003e et al.\u003c/em\u003e De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. \u003cem\u003eScience\u003c/em\u003e \u003cstrong\u003e356\u003c/strong\u003e, 92-95 http://dx.doi.org/10.1126/science.aal3327 (2017).\u003c/li\u003e\n\u003cli\u003eRobinson, J.T.\u003cem\u003e et al.\u003c/em\u003e Juicebox.js Provides a Cloud-Based Visualization System for Hi-C Data. \u003cem\u003eCell Systems\u003c/em\u003e \u003cstrong\u003e6\u003c/strong\u003e, 256-258 e1 http://dx.doi.org/10.1016/j.cels.2018.01.001 (2018).\u003c/li\u003e\n\u003cli\u003eSimao, F.A., Waterhouse, R.M., Ioannidis, P., Kriventseva, E.V. \u0026amp; Zdobnov, E.M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. \u003cem\u003eBioinformatics\u003c/em\u003e \u003cstrong\u003e31\u003c/strong\u003e, 3210-2 http://dx.doi.org/10.1093/bioinformatics/btv351 (2015).\u003c/li\u003e\n\u003cli\u003eOu, S., Chen, J. \u0026amp; Jiang, N. Assessing genome assembly quality using the LTR Assembly Index (LAI). \u003cem\u003eNucleic acids research\u003c/em\u003e \u003cstrong\u003e46\u003c/strong\u003e, e126 http://dx.doi.org/10.1093/nar/gky730 (2018).\u003c/li\u003e\n\u003cli\u003eHaas, B.J.\u003cem\u003e et al.\u003c/em\u003e Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. \u003cem\u003eGenome Biology\u003c/em\u003e \u003cstrong\u003e9\u003c/strong\u003e, R7 http://dx.doi.org/10.1186/gb-2008-9-1-r7 (2008).\u003c/li\u003e\n\u003cli\u003eBirney, E., Clamp, M. \u0026amp; Durbin, R. GeneWise and Genomewise. \u003cem\u003eGenome Research\u003c/em\u003e \u003cstrong\u003e14\u003c/strong\u003e, 988-95 http://dx.doi.org/10.1101/gr.1865504 (2004).\u003c/li\u003e\n\u003cli\u003eStanke, M.\u003cem\u003e et al.\u003c/em\u003e AUGUSTUS: ab initio prediction of alternative transcripts. \u003cem\u003eNucleic Acids Research\u003c/em\u003e \u003cstrong\u003e34\u003c/strong\u003e, W435-9 http://dx.doi.org/10.1093/nar/gkl200 (2006).\u003c/li\u003e\n\u003cli\u003eKim, D., Paggi, J.M., Park, C., Bennett, C. \u0026amp; Salzberg, S.L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. \u003cem\u003eNature biotechnology\u003c/em\u003e \u003cstrong\u003e37\u003c/strong\u003e, 907-915 http://dx.doi.org/10.1038/s41587-019-0201-4 (2019).\u003c/li\u003e\n\u003cli\u003eJohnson, L.S., Eddy, S.R. \u0026amp; Portugaly, E. Hidden Markov model speed heuristic and iterative HMM search procedure. \u003cem\u003eBMC bioinformatics\u003c/em\u003e \u003cstrong\u003e11\u003c/strong\u003e, 431 http://dx.doi.org/10.1186/1471-2105-11-431 (2010).\u003c/li\u003e\n\u003cli\u003eHuerta-Cepas, J.\u003cem\u003e et al.\u003c/em\u003e Fast Genome-Wide Functional Annotation through Orthology Assignment by eggNOG-Mapper. \u003cem\u003eMolecular Biology and Evolution\u003c/em\u003e \u003cstrong\u003e34\u003c/strong\u003e, 2115-2122 http://dx.doi.org/10.1093/molbev/msx148 (2017).\u003c/li\u003e\n\u003cli\u003eJones, P.\u003cem\u003e et al.\u003c/em\u003e InterProScan 5: genome-scale protein function classification. \u003cem\u003eBioinformatics\u003c/em\u003e \u003cstrong\u003e30\u003c/strong\u003e, 1236-40 http://dx.doi.org/10.1093/bioinformatics/btu031 (2014).\u003c/li\u003e\n\u003cli\u003eKanehisa, M. \u0026amp; Goto, S. KEGG: kyoto encyclopedia of genes and genomes. \u003cem\u003eNucleic Acids Research\u003c/em\u003e \u003cstrong\u003e28\u003c/strong\u003e, 27-30 http://dx.doi.org/10.1093/nar/28.1.27 (2000).\u003c/li\u003e\n\u003cli\u003eAshburner, M.\u003cem\u003e et al.\u003c/em\u003e Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. \u003cem\u003eNature genetics\u003c/em\u003e \u003cstrong\u003e25\u003c/strong\u003e, 25-9 http://dx.doi.org/10.1038/75556 (2000).\u003c/li\u003e\n\u003cli\u003eBoeckmann, B.\u003cem\u003e et al.\u003c/em\u003e The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. \u003cem\u003eNucleic Acids Research\u003c/em\u003e \u003cstrong\u003e31\u003c/strong\u003e, 365-370 http://dx.doi.org/10.1093/nar/gkg095 (2003).\u003c/li\u003e\n\u003cli\u003ePunta, M.\u003cem\u003e et al.\u003c/em\u003e The Pfam protein families database. \u003cem\u003eNucleic Acids Research\u003c/em\u003e \u003cstrong\u003e40\u003c/strong\u003e, D290-301 http://dx.doi.org/10.1093/nar/gkr1065 (2012).\u003c/li\u003e\n\u003cli\u003eTatusov, R.L., Galperin, M.Y., Natale, D.A. \u0026amp; Koonin, E.V. The COG database: a tool for genome-scale analysis of protein functions and evolution. \u003cem\u003eNucleic Acids Research\u003c/em\u003e \u003cstrong\u003e28\u003c/strong\u003e, 33-6 http://dx.doi.org/10.1093/nar/28.1.33 (2000).\u003c/li\u003e\n\u003cli\u003eFlynn, J.M.\u003cem\u003e et al.\u003c/em\u003e RepeatModeler2 for automated genomic discovery of transposable element families. \u003cem\u003eProceedings of the National Academy of Sciences of the United States of America\u003c/em\u003e \u003cstrong\u003e117\u003c/strong\u003e, 9451-9457 http://dx.doi.org/10.1073/pnas.1921046117 (2020).\u003c/li\u003e\n\u003cli\u003eChen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. \u003cem\u003eCurrent protocols in bioinformatics\u003c/em\u003e \u003cstrong\u003eChapter 4\u003c/strong\u003e, Unit 4 10 http://dx.doi.org/10.1002/0471250953.bi0410s05 (2004).\u003c/li\u003e\n\u003cli\u003eXu, Z. \u0026amp; Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. \u003cem\u003eNucleic acids research\u003c/em\u003e \u003cstrong\u003e35\u003c/strong\u003e, W265-8 http://dx.doi.org/10.1093/nar/gkm286 (2007).\u003c/li\u003e\n\u003cli\u003eEllinghaus, D., Kurtz, S. \u0026amp; Willhoeft, U. LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. \u003cem\u003eBMC bioinformatics\u003c/em\u003e \u003cstrong\u003e9\u003c/strong\u003e, 18 http://dx.doi.org/10.1186/1471-2105-9-18 (2008).\u003c/li\u003e\n\u003cli\u003eOu, S. \u0026amp; Jiang, N. LTR_retriever: A Highly Accurate and Sensitive Program for Identification of Long Terminal Repeat Retrotransposons. \u003cem\u003ePlant physiology\u003c/em\u003e \u003cstrong\u003e176\u003c/strong\u003e, 1410-1422 http://dx.doi.org/10.1104/pp.17.01310 (2018).\u003c/li\u003e\n\u003cli\u003eNguyen, L.T., Schmidt, H.A., von Haeseler, A. \u0026amp; Minh, B.Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. \u003cem\u003eMolecular biology and evolution\u003c/em\u003e \u003cstrong\u003e32\u003c/strong\u003e, 268-74 http://dx.doi.org/10.1093/molbev/msu300 (2015).\u003c/li\u003e\n\u003cli\u003eZhang, R.G.\u003cem\u003e et al.\u003c/em\u003e TEsorter: an accurate and fast method to classify LTR-retrotransposons in plant genomes. \u003cem\u003eHorticulture research\u003c/em\u003e \u003cstrong\u003e9\u003c/strong\u003e, http://dx.doi.org/10.1093/hr/uhac017 (2022).\u003c/li\u003e\n\u003cli\u003eEmms, D.M. \u0026amp; Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. \u003cem\u003eGenome Biology\u003c/em\u003e \u003cstrong\u003e20\u003c/strong\u003e, 238 http://dx.doi.org/10.1186/s13059-019-1832-y (2019).\u003c/li\u003e\n\u003cli\u003eLi, L., Stoeckert, C.J., Jr. \u0026amp; Roos, D.S. OrthoMCL: identification of ortholog groups for eukaryotic genomes. \u003cem\u003eGenome Research\u003c/em\u003e \u003cstrong\u003e13\u003c/strong\u003e, 2178-89 http://dx.doi.org/10.1101/gr.1224503 (2003).\u003c/li\u003e\n\u003cli\u003eEdgar, R.C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. \u003cem\u003eNucleic Acids Research\u003c/em\u003e \u003cstrong\u003e32\u003c/strong\u003e, 1792-7 http://dx.doi.org/10.1093/nar/gkh340 (2004).\u003c/li\u003e\n\u003cli\u003eCastresana, J. Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. \u003cem\u003eMolecular Biology and Evolution\u003c/em\u003e \u003cstrong\u003e17\u003c/strong\u003e, 540-52 http://dx.doi.org/10.1093/oxfordjournals.molbev.a026334 (2000).\u003c/li\u003e\n\u003cli\u003eRokas, A. Phylogenetic analysis of protein sequence data using the Randomized Axelerated Maximum Likelihood (RAXML) Program. \u003cem\u003eCurrent protocols in molecular biology\u003c/em\u003e \u003cstrong\u003eChapter 19\u003c/strong\u003e, Unit19 11 http://dx.doi.org/10.1002/0471142727.mb1911s96 (2011).\u003c/li\u003e\n\u003cli\u003eDarriba, D., Taboada, G.L., Doallo, R. \u0026amp; Posada, D. ProtTest 3: fast selection of best-fit models of protein evolution. \u003cem\u003eBioinformatics\u003c/em\u003e \u003cstrong\u003e27\u003c/strong\u003e, 1164-5 http://dx.doi.org/10.1093/bioinformatics/btr088 (2011).\u003c/li\u003e\n\u003cli\u003eYang, Z. PAML 4: phylogenetic analysis by maximum likelihood. \u003cem\u003eMolecular Biology and Evolution\u003c/em\u003e \u003cstrong\u003e24\u003c/strong\u003e, 1586-91 http://dx.doi.org/10.1093/molbev/msm088 (2007).\u003c/li\u003e\n\u003cli\u003eKumar, S.\u003cem\u003e et al.\u003c/em\u003e TimeTree 5: An Expanded Resource for Species Divergence Times. \u003cem\u003eMolecular Biology and Evolution\u003c/em\u003e \u003cstrong\u003e39\u003c/strong\u003e, http://dx.doi.org/10.1093/molbev/msac174 (2022).\u003c/li\u003e\n\u003cli\u003eDe Bie, T., Cristianini, N., Demuth, J.P. \u0026amp; Hahn, M.W. CAFE: a computational tool for the study of gene family evolution. \u003cem\u003eBioinformatics\u003c/em\u003e \u003cstrong\u003e22\u003c/strong\u003e, 1269-71 http://dx.doi.org/10.1093/bioinformatics/btl097 (2006).\u003c/li\u003e\n\u003cli\u003eTang, H.\u003cem\u003e et al.\u003c/em\u003e Synteny and collinearity in plant genomes. \u003cem\u003eScience\u003c/em\u003e \u003cstrong\u003e320\u003c/strong\u003e, 486-8 http://dx.doi.org/10.1126/science.1153917 (2008).\u003c/li\u003e\n\u003cli\u003eChen, C.\u003cem\u003e et al.\u003c/em\u003e TBtools: An Integrative Toolkit Developed for Interactive Analyses of Big Biological Data. \u003cem\u003eMolecular plant\u003c/em\u003e \u003cstrong\u003e13\u003c/strong\u003e, 1194-1202 http://dx.doi.org/10.1016/j.molp.2020.06.009 (2020).\u003c/li\u003e\n\u003cli\u003eKim, D., Paggi, J.M., Park, C., Bennett, C. \u0026amp; Salzberg, S.L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. \u003cem\u003eNature biotechnology\u003c/em\u003e \u003cstrong\u003e37\u003c/strong\u003e, 907-915 http://dx.doi.org/10.1038/s41587-019-0201-4 (2019).\u003c/li\u003e\n\u003cli\u003eLi, H.\u003cem\u003e et al.\u003c/em\u003e The Sequence Alignment/Map format and SAMtools. \u003cem\u003eBioinformatics\u003c/em\u003e \u003cstrong\u003e25\u003c/strong\u003e, 2078-9 http://dx.doi.org/10.1093/bioinformatics/btp352 (2009).\u003c/li\u003e\n\u003cli\u003ePertea, M.\u003cem\u003e et al.\u003c/em\u003e StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. \u003cem\u003eNature biotechnology\u003c/em\u003e \u003cstrong\u003e33\u003c/strong\u003e, 290-5 http://dx.doi.org/10.1038/nbt.3122 (2015).\u003c/li\u003e\n\u003cli\u003eLiao, Y., Smyth, G.K. \u0026amp; Shi, W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. \u003cem\u003eBioinformatics\u003c/em\u003e \u003cstrong\u003e30\u003c/strong\u003e, 923-30 http://dx.doi.org/10.1093/bioinformatics/btt656 (2014).\u003c/li\u003e\n\u003cli\u003eLove, M.I., Huber, W. \u0026amp; Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. \u003cem\u003eGenome biology\u003c/em\u003e \u003cstrong\u003e15\u003c/strong\u003e, 550 http://dx.doi.org/10.1186/s13059-014-0550-8 (2014).\u003c/li\u003e\n\u003cli\u003eLangfelder, P. \u0026amp; Horvath, S. WGCNA: an R package for weighted correlation network analysis. \u003cem\u003eBMC bioinformatics\u003c/em\u003e \u003cstrong\u003e9\u003c/strong\u003e, 559 http://dx.doi.org/10.1186/1471-2105-9-559 (2008).\u003c/li\u003e\n\u003cli\u003eFlavell, R.B., Bennett, M.D., Smith, J.B. \u0026amp; Smith, D.B. Genome size and the proportion of repeated nucleotide sequence DNA in plants. \u003cem\u003eBiochemical genetics\u003c/em\u003e \u003cstrong\u003e12\u003c/strong\u003e, 257-69 http://dx.doi.org/10.1007/BF00485947 (1974).\u003c/li\u003e\n\u003cli\u003eWang, D.\u003cem\u003e et al.\u003c/em\u003e Which factors contribute most to genome size variation within angiosperms? \u003cem\u003eEcology and evolution\u003c/em\u003e \u003cstrong\u003e11\u003c/strong\u003e, 2660-2668 http://dx.doi.org/10.1002/ece3.7222 (2021).\u003c/li\u003e\n\u003cli\u003eKreiner, J.M., Hnatovska, S., Stinchcombe, J.R. \u0026amp; Wright, S.I. Quantifying the role of genome size and repeat content in adaptive variation and the architecture of flowering time in Amaranthus tuberculatus. \u003cem\u003ePLoS genetics\u003c/em\u003e \u003cstrong\u003e19\u003c/strong\u003e, e1010865 http://dx.doi.org/10.1371/journal.pgen.1010865 (2023).\u003c/li\u003e\n\u003cli\u003eLi, M., Chen, Q., Zhang, L., Guo, P. \u0026amp; Wang, Y. The complete chloroplast genome sequence of Ulmus parvifolia (Ulmaceae). \u003cem\u003eMitochondrial DNA. Part B, Resources\u003c/em\u003e \u003cstrong\u003e5\u003c/strong\u003e, 2957-2958 http://dx.doi.org/10.1080/23802359.2020.1791006 (2020).\u003c/li\u003e\n\u003cli\u003eLyu, Y., Zhai, M., Jiang, Z. \u0026amp; Chen, Q. The complete chloroplast genome of Ulmus parvifolia, an important landscaping tree. \u003cem\u003eMitochondrial DNA. Part B, Resources\u003c/em\u003e \u003cstrong\u003e5\u003c/strong\u003e, 3071-3072 http://dx.doi.org/10.1080/23802359.2020.1797586 (2020).\u003c/li\u003e\n\u003cli\u003eZuo, L.H.\u003cem\u003e et al.\u003c/em\u003e The first complete chloroplast genome sequences of Ulmus species by de novo sequencing: Genome comparative and taxonomic position analysis. \u003cem\u003ePloS one\u003c/em\u003e \u003cstrong\u003e12\u003c/strong\u003e, e0171264 http://dx.doi.org/10.1371/journal.pone.0171264 (2017).\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Ulmus parvifolia, de novo genome assembly, Dutch elm disease","lastPublishedDoi":"10.21203/rs.3.rs-4754772/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-4754772/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eThe absence of a comprehensive genome assembly for \u003cem\u003eUlmus parvifolia\u003c/em\u003e hinders advancements in scientific research and practical breeding efforts, ultimately affecting the cultivation of elm varieties with enhanced resistance to diseases. In this study, we presented a high-quality chromosome-level genome assembly of \u003cem\u003eU. parvifolia\u003c/em\u003e by integrating various sequencing approaches. We discovered that the \u003cem\u003eU. parvifolia\u003c/em\u003e genome is more than twice the size of \u003cem\u003eUlmus americana\u003c/em\u003e, primarily due to the large-scale amplification of long terminal repeat (LTR) retrotransposons. Phylogenetic analysis positioned \u003cem\u003eU. parvifolia\u003c/em\u003e in a closer evolutionary relationship with Moraceae, followed by Cannabaceae, Rhamnaceae, and Rosaceae. Notably, gene families associated with disease resistance and immune response were significantly expanded in \u003cem\u003eU. parvifolia\u003c/em\u003e, pointing to an adaptive evolution to various biotic and abiotic stresses. Chromosomal evolution analysis indicated a possible whole-genome triplication event in the evolutionary history of \u003cem\u003eU. parvifolia\u003c/em\u003e. To study the differing susceptibility of \u003cem\u003eU. parvifolia\u003c/em\u003e and \u003cem\u003eU. americana\u003c/em\u003e to Dutch elm disease, we inoculated both elms with \u003cem\u003eCeratocystis ulmi\u003c/em\u003e and performed comparative transcriptomes analyses at 48, 96, and 144 hours post-inoculation. The results showed that several plant defense and immune response pathways were more highly expressed in \u003cem\u003eU. parvifolia\u003c/em\u003e at 48 and 96 hours post-inoculation, implying a potential genetic basis for its higher resistance to Dutch elm disease. Our study represents an advancement in the genomic understanding of \u003cem\u003eU. parvifolia\u003c/em\u003e, and especially sheds light on the genetic underpinnings of disease resistance in elms, and provides a foundation for future research into elm breeding for disease resistance and conservation efforts.\u003c/p\u003e","manuscriptTitle":"Chromosome-Level Genome Assembly Unveils the Molecular Mechanisms Underlying Disease Resistance in Ulmus parvifolia","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-08-12 18:07:08","doi":"10.21203/rs.3.rs-4754772/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"d4571433-b799-4002-b9e6-ffac4b3268a7","owner":[],"postedDate":"August 12th, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2024-12-30T15:53:45+00:00","versionOfRecord":[],"versionCreatedAt":"2024-08-12 18:07:08","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-4754772","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-4754772","identity":"rs-4754772","version":["v1"]},"buildId":"qtupq5eGEP_6zYnWcrvyt","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2024) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00