From Genome to Gene Expression: The Genomic Landscape of a Hybrid Species of Eucalyptus urophylla × Eucalyptus grandis and Its Divergence from Parental Species Hybrid

preprint OA: closed
Full text JSON View at publisher
Full text 216,999 characters · extracted from preprint-html · click to expand
From Genome to Gene Expression: The Genomic Landscape of a Hybrid Species of Eucalyptus urophylla × Eucalyptus grandis and Its Divergence from Parental Species Hybrid | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article From Genome to Gene Expression: The Genomic Landscape of a Hybrid Species of Eucalyptus urophylla × Eucalyptus grandis and Its Divergence from Parental Species Hybrid Guo Liu, Jianzhong Luo, Wanhong Lu, Yan Lin, Lei Zhang, Jingyi Pan, and 2 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-6912338/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 27 Oct, 2025 Read the published version in BMC Plant Biology → Version 1 posted 14 You are reading this latest preprint version Abstract Background Eucalyptus urophylla × Eucalyptus grandis ( E. urograndis ) is a globally significant forest tree species renowned for its rapid growth, high yield, and exceptional wood production efficiency. A comparative analysis of its parental genomes, coupled with an in-depth investigation of the expression patterns of wood-related genes, will provide critical genomic resources to enhance research and utilization of this superior hybrid eucalyptus species. Results In this study, we present a draft genome assembly consisting of 592.09 Mb of data, with 99.91% anchored to 11 pseudochromosomes. The assembly achieved a contig N50 of up to 3.73 Mb and a scaffold N50 of up to 58.62 Mb. Gene annotation and evaluation revealed that the E. urograndis genome contains 32,151 genes, of which 93.5% were fully annotated using Benchmarking Universal Single-Copy Orthologs (BUSCOs). Based on evolutionary analysis, E. grandis and E. urograndis are estimated to have diverged approximately 2.9 million years ago (Mya). Additionally, 131 gene families were found to be significantly expanded, and 475 positively selected genes (PSGs) were identified in the E. urograndis genome. Furthermore, RNA sequencing (RNA-seq) technology was employed to analyze allele-specific expression patterns of key enzymes involved in cellulose, xylan, and lignin biosynthesis. Several allele-specific expression genes (ASEGs) were identified, potentially associated with heterosis in E. urograndis . Conclusions The chromosomal-level genome assembly of E. urograndis presented in this study serves as a valuable genomic resource for eucalyptus molecular breeding, provides novel insights into its evolution, wood formation improvement, and adaptability, and enhances our understanding of the genetic and molecular mechanisms underlying heterosis in Eucalyptus hybrids. Genome Chromosome-scale assembly Eucalyptus urophylla × Eucalyptus grandis Cellulose and lignin biosynthesis Allele-specific expression Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Backgroud Eucalyptus is one of the most important fast-growing timber species for the fiber, energy, and paper industries worldwide, and improvements in its genetic constitution largely determine the competitiveness of the corresponding industries [ 1 ]. Moreover, eucalyptus is a widely known effective reforestation tree species owing to its fast growth and high adaptability to various environments [ 2 ]. As the most important means of genetic improvement, hybrid breeding has been widely used to achieve great improvements in eucalyptus. Currently, Eucalyptus spp. and their hybrids are among the world’s leading sources of woody biomass and are the main hardwoods used for pulpwood and timber [ 3 ]. In particular, almost all the main afforestation varieties in major eucalyptus plantations worldwide are hybrids [ 4 , 5 ]. In general, hybrids are superior to parents in terms of growth rate, yield, disease resistance, and viability [ 6 , 7 ]. Among the various Eucalyptus species, the hybrid species Eucalyptus urophylla × Eucalyptus grandis ( E. urograndis ) is the most popular and has a fast growth rate [ 8 ]. Additionally, it shows obvious heterosis, rapid growth, high yield, a straight shape, a neat forest phase, wide adaptability, 5.5 years of stock up to 213.8 m 3 /hm 2 , and annual growth up to 40.95 m 3 /hm 2 . The texture of E. urograndis timber is straight, the basic density of the finished material is 500 ± 20 kg/m 3 [ 9 ], and the mechanical strength is about 80.82% [ 10 , 11 ]. Furthermore, E. urograndis has a wide range of uses, such as for building materials, furniture, agricultural tools, fuelwood, communication poles, and pillars, thereby involving the construction, furniture, agriculture, communication, and manufacturing industries [ 12 ]. Considering the above, obtaining a high-quality genome of E. urograndis at the chromosomal level is of great commercial importance and essential for understanding the basis of its superior properties to extend these attributes to other hybrid species. The formation of hybrids is highly important for the environmental adaptability of species. Hybridization is an evolutionary phenomenon that has fascinated biologists for centuries [ 13 ]. Before the advent of whole-genome sequencing, hybridization played a clear role in the evolutionary history of many extant taxa, particularly those of plants [ 14 ]. A high-quality genome assembly would be a valuable genomic resource for elucidating the genetic basis of the fast growth and adaptive evolution of E. urograndis . These historic genetic concepts of heterosis (the dominance model, the overdominance model, and the epistasis concept) were developed at a time when not even DNA had been identified as the carrier of genetic information and, therefore, cannot be directly associated with molecular principles [ 15 ]. The difference in gene expression constitutes a critical source of phenotypic diversity, and the specificity of diploid hybrids arises from the co-expression of parental alleles at specific loci. The expression of alleles at specific loci may differ between hybrids and their parents, with five primary patterns identified: (1) alleles expressed in both parents but silenced in hybrids (parental co-silencing); (2) alleles expressed exclusively in one parent but not in the hybrid (parent-specific expression); (3) alleles expressed only in hybrids and absent in both parents (hybrid-specific expression); (4) alleles expressed in hybrids and one parent (single-parent expression, SPE); and (5) alleles expressed in both parents and hybrids [ 16 ]. The first four patterns correspond to differences in gene expression quality, specifically presence/absence variation (PAV), whereas the fifth pattern pertains to differences in gene expression quantity. Compared with the parental genotype, the gene expression effect of the F1 hybrid can be categorized into additive and non-additive expression patterns. Additive expression refers to the scenario where the hybrid exhibits an intermediate phenotype between the two parents, whereas non-additive expression encompasses four distinct patterns: overdominance, high-parent dominance, low-parent dominance, and underdominance. In recent years, increasing support for the notion that variation in the transcriptional regulation of hybrids compared with their parental inbred lines is related to increased performance of hybrids [ 17 – 19 ]. In this study, we report a chromosomal-level genome assembly of E. urograndis (DH32-29) with long-read single-molecule real-time (SMRT) PacBio long sequence reads and high-throughput chromosome conformation capture (Hi-C) data. We then used it to analyze hybridization dominance by genome annotation and gene family analyses. Furthermore, the evolutionary trajectories of E. urograndis and other 12 typical plant species, including E. grandis [ 20 , 21 ], E. melliodora (GCA_004368105.3), E. pauciflora (GCA_007663325.1), Solanum tuberosum [ 22 ], Zea mays [ 23 ], Oryza sativa [ 24 ], Phoenix dactylifera [ 25 ], Populus trichocarpa [ 26 ], Physcomitrella patens [ 27 ], Arabidopsis thaliana [ 28 ], Vitis vinifera [ 29 ], and Theobroma cacao [ 30 ], were explored using comparative genomic analysis. Additionally, the genomes of E. grandis and E. urograndis were used to analyze collinearity; the E. grandis and V. vinifera genomes were combined to analyze the whole-genome duplication (WGD) of the E. urograndis genome. Based on the results of comparative genomics, functional annotation, expansion of gene families, and positive selection genes (PSGs) were determined. Compared with the genome of E. urograndis reported by Shen et al. [ 31 ], the improved varieties of E. urograndis in this study were from the identified paternal (G46) and maternal (U16) varieties. Furthermore, through the application of second-generation high-throughput transcriptome sequencing, the biosynthetic pathways of cellulose, xylan, and lignin have been extensively investigated, as these pathways are closely associated with the heterosis of wood properties in E. urograndis . Overall, the E. urograndis genome in this study provides valuable genomic resources for further research and utilization of this excellent hybrid eucalyptus species. Moreover, it could expand our understanding of the hybridization of woody plants, provide a powerful tool to accelerate crossbreeding, and enhance our understanding of comparative biology and biotechnology. Results Complete Genome Survey A total of 93.82 Gb of clean stLFR (single-tube long-fragment reads) data were obtained after filtering using the BGISEQ-500 platform (BGI, Shenzhen, China). On the basis of the clean data, the genome size, heterozygosity, and repeat sequence percentage were estimated to be 566.72 Mb, 2.71%, and 59%, respectively, using 17-kmer analysis when the coverage was equal to 135. A repetitive peak and a heterozygous peak represented repeat sequences and heterozygous loci in the genome of E. urograndis , respectively, according to the GenomeScope profile (Supplementary Fig. S1 A). Hence, E. urograndis presented high heterozygosity. SOAPdenovo was used for the preliminary assembly of high-throughput sequencing data, and the results are shown in Supplementary Table S1 . The genome size was 903.56 Mb; the contig N50 and scaffold N50 lengths were 331 bp and 1.63 kb, respectively. The GC content was 39.19%. Using the contig sequences (length ≥ 500) of the filtered data as the window for assembled genome sequences, the average GC content and average depth of nonduplicate fragments were recorded and visualized, which revealed that the genome was not contaminated by that of other species (Supplementary Fig. S1 B). Additionally, the NT comparison of the filtered data using BLAST demonstrated that the top six species were eucalyptus (Supplementary Table S2 ). Therefore, the E. urograndis samples used for genome sequencing were relatively pure, and subsequent analysis could be performed. Chromosomal-level Genome Assembly SMRT analysis with the PacBio platform generated 8,179,761 reads for 124.40 Gb, and the total length of the E. urograndis genome was 591.88 Mb, with a contig N50 of 3.73 Mb and a GC content of up to 39.44%. The minimum length of the contig was 1380 bp, and the largest contig size reached 12.70 Mb (Supplementary Table S1 ). To assess the integrity of the genome assembly accurately, the GC distribution and depth/coverage were determined, and BUSCO analysis was performed. The distributions of the GC content and sequencing depth were relatively concentrated, with an average GC content of 39.44%, and the scatter plot distribution was similar to the Poisson distribution (Supplementary Fig. S1 B). The alignment rate of all small fragment reads to the genome was about 96.36%, and the percentage of genome coverage was about 99.80%. BUSCO analysis revealed that the assembly captured 94.9% of the conserved single-copy orthologous genes, including 92.9% of the complete genes and 2.0% of the fragment genes (Supplementary Table S3). These results indicate good consistency and integrity, confirming the good quality of the chromosomal-level assembled genome of E. urograndis . Using Hi-C sequencing, we obtained 652,905,028 raw read pairs, amounting to 130.58 Gb of Hi-C data. After the Hi-C reads were mapped against the assembly of the E. urograndis genome, 154.51 Mb of valid pair reads, accounting for 70.88% of the mapping pair reads, were used for the Hi-C analysis. The contigs in the draft assembly were then anchored and oriented into a chromosomal-scale assembly using the Hi-C scaffolding approach. Juicer and 3D-DNA were used to construct chromosomal-level scaffolds (Supplementary Fig. S1 C); the total length of the E. urograndis genome was 592.09 Mb, with a scaffold N50 of 58.62 Mb. Moreover, 11 chromosomes were generated, with lengths ranging from 39.56–72.11 Mb, accounting for 99.91% of the assembly (Supplementary Table S4). Annotation of the E. urograndis Genome In this study, the consensus and non-redundant repetitive sequences were obtained via a combination of known novel and tandem repeats, generating a total of 281.02 Mb of repetitive sequences and accounting for 47.48% of the whole-genome assembly after redundancy was removed (Supplementary Table S5). Long terminal repeats (LTRs) were the most abundant, amounting to 205.45 Mb, which is 34.68% of the genome. The second most common type of LTR was DNA elements (43.55 Mb), accounting for 7.36% of the genome, followed by long interspersed nuclear elements (LINEs) (17.73 Mb, 3.00% of the genome) and unknown repeats (11.79 Mb, 1.99% of the genome). Additionally, 0.32 Mb (0.05%) of the E. urograndis genome contained short interspersed nuclear elements (SINEs). The spatial distribution of these tandem repeat sequences across the 11 chromosomes was variable, with higher densities located differently on different chromosomes (Fig. 1 ). Recent studies have revealed that about 90% of genes in eukaryotic genomes are transcriptional genes, and only 1–2% of these transcriptional genes encode proteins, most of which are transcribed as ncRNAs [ 32 , 33 ]. Although ncRNAs are not translated into proteins, they have important biological functions in plant immunity [ 34 ]. In the E. urograndis genome, annotation of ncRNA genes revealed 538 tRNAs, 144 snRNAs, 297 snoRNAs, 994 miRNAs, and 747 rRNAs with total lengths of 40,196, 20,226, 51,140, 93,638, and 162,195 bp, respectively (Supplementary Table S6). Among them, miRNAs are the most abundant (994 copies) and play vital roles in diverse biological processes, such as plant growth and development and hormone and stress responses [ 33 , 35 – 37 ]. However, rRNA has the longest total length among the four types of ncRNAs. In general, eukaryotic rRNAs are classified into four types (5S, 5.8S, 18S, and 28S) [ 38 ]. Among the four types of rRNAs, 5S rRNA genes were the most abundant (365 copies, total length of 43,303 bp, amounting to 0.0073% of the genome). However, 18S rRNA genes were the longest (89,142 bp, amounting to 0.0151% of the genome). The number of C/D box snoRNAs, which are associated with methylation, is greater than that of H/ACA box snoRNAs, which are associated with pseudouridylation [ 39 ]. The combined results from homology-based, de novo , and RNA-seq methods for protein-coding gene prediction and functional annotation predicted a total of 32,151 protein-coding genes in the E. urograndis genome. The average lengths of the genes and CDSs were 4776.80 bp and 1322.38 bp, respectively. The average number of exons per gene, average exon length, and average intron length were 4.99, 265.25 bp, and 866.77 bp, respectively (Supplementary Fig. S2 , Supplementary Table S7). As shown in Supplementary Table S7, the gene length, exon length, and intron length were greater in the E. urograndis genome than in the genomes of the other four species ( E. grandis , P. granatum , P. trichocarpa , and H. brasiliensis ). BUSCO analysis revealed that 93.50% of the single-copy orthologous genes retrieved from the assembly were fully annotated, including 88.80% of the complete single-copy genes and 4.70% of the complete duplicated BUSCOs (Supplementary Table S3). These results showed that the genome annotation accurately represented the completeness of the gene set of the E. urograndis genome. The protein sequences predicted by gene structure were compared with those in five protein databases; a total of 31,019 genes (96.48% of all the predicted genes) were functionally annotated using the SwissProt, KEGG, TrEMBL, InterPro, and GO databases (E-values ≤ 10 –5 ), and 18,099 genes were annotated in all five databases. KEGG pathway-based analysis assigned 25,010 genes to a total of 136 pathways, with the majority of these genes involved in metabolic pathways (ko01100, 4089 genes) and the biosynthesis of secondary metabolites (ko01110, 2633 genes) (Supplementary Fig. S3A). The phenylpropanoid biosynthesis pathway (ko00940), which is closely related to lignin biosynthesis, was enriched with 338 genes. In particular, 690 genes were enriched in the plant-pathogen interaction pathway (ko04626), which was significantly related to disease resistance in E. urograndis . GO analyses were performed to classify the functions of the E. urograndis genes—19,906 genes were classified into 3478 GO terms and 45 subcategories (Supplementary Fig. S3B, Supplementary Table S8). According to the GO annotation, the main GO category of biological process (BP) comprised 1,920 GO terms. The terms metabolic process (GO:0008152, 8,473 genes), cellular process (GO:0009987, 6,385 genes), and single-organism process (GO:0044699, 4,915 genes) were enriched with a high number of genes. The other main GO categories of cellular component (CC) included membrane (GO:0016020, 2,319 genes), cell part (GO:0044464, 2,085 genes), cell (GO:0005623, 2,085), and 422 other GO terms. Additionally, the molecular function (MF) category included binding (GO:0005488, 11,898 genes), catalytic activity (GO:0003824, 9,509 genes), transporter activity (GO:0005215, 1,032 genes), and 1,130 other GO terms. Gene Family Analysis The protein sequences of 13 species ( P. patens, P. trichocarpa, A. thaliana, T. cacao, E. grandis, E. urograndis, E. melliodora, E. pauciflora, V. vinifera, S. tuberosum, Z. mays, O. sativa , and P. dactylifera ) were used to identify the gene families using the OrthoMCL method. Consequently, 32,151 E. urograndis genes were clustered into 28,078 gene families, including 142 unique E. urograndis families, whereas 4,073 E. urograndis- specific genes were unclustered (Fig. 2 A, Supplementary Table S9). In addition, the results revealed 1,454 gene families specific to four Eucalyptus species, 428 gene families specific to E. grandis and E. urograndis , and 577 gene families that were shared by the 13 species. We also identified 13,098 gene families that were shared by four eucalyptus species ( E. urograndis, E. grandis, E. melliodora , and E. pauciflora ), and 236 gene families were unique to E. urograndis . Moreover, E. urograndis shared 14,738, 14,997, and 15,249 gene families with E. grandis, E. pauciflora , and E. melliodora , respectively (Fig. 2 B). Phylogenetic Analysis and Divergent Time Estimation To understand the phylogenetic position of E. urograndis , a phylogenetic tree was constructed on the basis of 577 single-gene families in the 13 sequenced plant genomes. The phylogenetic analysis revealed that E. urograndis and E. grandis were more closely related to E. melliodora than to E. pauciflora in the Eucalyptus subgenus, which is consistent with their phylogenetic classification on the basis of morphological characteristics. The Myrtales lineage represented by four Eucalyptus species formed a sister clade to Malvids with the basic rosid lineage Vitales, whereas P. trichocarpa was grouped with Malvids (Fig. 2 C). These results were consistent with the genomic analysis of E. grandis [ 21 ]. Moreover, on the basis of the gene family clustering results and phylogenetic relationships among species, the divergence times were estimated, considering that the hybrid E. urograndis occurred naturally. The results indicated that E. urograndis and E. grandis diverged about 2.9 million years ago (Mya). This finding implied that the hybrid E. urograndis might have formed 2.9 Mya in the case of non-human intervention. Expansion and Contraction of Gene Families Gene families expand and contract in plants because plants experienced selection pressure during evolution [ 40 ]. Hence, these processes play major roles in the phenotypic diversification of plants [ 41 , 42 ]. In this study, the expansion and contraction of the gene families in E. urograndis were analyzed by comparing the gene families in the other 12 representative species, and the results are shown in Fig. 2 C. Phylogenetic analysis of 6,899 gene families revealed 242 expanded gene families encompassing 344 genes and 483 contracted gene families encompassing 618 genes in E. urograndis . Among the 242 expanded gene families, 131 gene families significantly expanded ( p < 0.05). The E. urograndis genome presented the least gene family expansion and the most contractions among the genomes of the other three eucalyptus species. Furthermore, we used the 131 gene families (828 genes) annotated to the KEGG and GO databases that were significantly expanded. KEGG pathway analysis grouped 131 gene families into 79 pathways (Supplementary Table S10). In particular, 20 genes were enriched in the phenylpropanoid biosynthesis pathway (ko00940), of which 16 genes encoded cinnamyl alcohol dehydrogenase (CAD; EC:1.1.1.195). CAD catalyzes the final step in phenylpropanoid synthesis, leading to the production of lignin monomers, and is closely related to plant growth and development and resistance to pathogen invasion [ 43 ]. The remaining four genes encoded cytochrome P450 (C3’H; EC:1.14.1496), which plays a key role in plant development and defense. Additionally, 18 significantly expanded genes encoding sucrose synthase (SUSY; EC:2.4.1.13) were enriched in the starch and sucrose metabolism pathway (ko00500). The Susy enzyme plays a central role in source-sink coordination and carbon flow in trees. Gessler et al. [ 44 ] and Dominguez et al. [ 45 ] demonstrated that SUSY activity influences C allocation to developing woody tissues and maintains the C balance in the whole tree. Moreover, the plant-pathogen interaction pathway (ko04626) was enriched with seven significantly expanded genes, including three genes encoding PTI1-like tyrosine-protein kinase 3 (PTK3; EC:2.7.11.1) and four genes encoding calmodulin (CALM), which possibly explains the wide suitability and high disease resistance of E. urograndis . Positive Selection of Genes in the E. urograndis Genome The screening and analysis of PSGs in the genome can aid in understanding the specific evolutionary adaptability of a species [ 46 ]. A total of 475 PSGs were identified in the E. urograndis genome. KEGG pathway annotation revealed high enrichment of PSGs related to metabolic pathways (64 genes, ko01100), biosynthesis of secondary metabolites (35 genes, ko01110), and biosynthesis of amino acids (10 genes, ko01230) (Supplementary Tables S11). The 20 significantly enriched pathways with p values of annotated and enriched genes are shown in Supplementary Fig. S4. Four PSGs were enriched in the phenylpropanoid biosynthesis pathway (ko00940), and weijuan_GLEAN_10017689, encoding 4-coumarate-CoA ligase (4CL; EC:6.2.1.12), is key to the phenylpropanoid pathway and participates in monolignol biosynthesis through p-coumaroyl-CoA production. Furthermore, weijuan_GLEAN_10016490, encoding caffeic acid 3-O-methyltransferase (COMT; EC:2.1.1.68), catalyzes the multi-step methylation reactions of hydroxylated monomeric lignin precursors and is believed to occupy a pivotal position in the lignin biosynthetic pathway. The gene of weijuan_GLEAN_10029076 encodes cinnamoyl CoA reductase (CCR; EC:1.2.1.44), which is the first enzyme in the monolignol-specific branch of the lignin biosynthetic pathway, where it converts feruloyl-CoA to coniferaldehyde [ 47 ]. In addition, 6 PSGs were enriched in the plant-pathogen interaction pathway (ko04626), including 3 PSGs encoding a calcium-binding protein (CML), and weijuan_GLEAN_10010339, weijuan_GLEAN_10032007, and weijuan_GLEAN_10022765 encoding WRKY41, LRR receptor-like serine/threonine-protein kinase, and nitric oxide synthase (NOS; EC:1.14.13.39), respectively. This pathway could be attributed to the strong environmental adaptability and disease resistance of E. urograndis . Collinearity and Whole Genome Duplication (WGD) Analysis To investigate the natural evolutionary course of E. urograndis , we first performed a collinearity analysis between the genomes of E. urograndis and E. grandis , and the results are shown in Fig. 3 A. The collinearity analysis between E. urograndis and E. grandis revealed 39,042 paralogous gene pairs (59.60%) in 636 syntenic regions. Hence, the E. urograndis genome showed high synteny with the E. grandis genome. Chromosomes 1, 2, 5, 9, and 11 of the E. urograndis genome presented high collinearity with chromosomes 2, 3, 4, 8, and 9 of the E. grandis genome, respectively (Fig. 3 B). However, the other six chromosomes of E. urograndis underwent a fusion event with the E. grandis chromosomes. For example, chromosome 8 of E. urograndis shares many paralogous gene pairs with chromosomes 11 and 1 of E. grandis (Fig. 3 B). Moreover, chromosomal recombination events during the evolution of tree species and the formation of hybrids were confirmed. Additionally, more translocation events were detected in chromosomes 3, 4, 6, 7, 8, and 10 of E. urograndis than in the E. grandis chromosomes. Thus, E. urograndis may have experienced chromosomal fusion, inversion, or other rearrangement events. To estimate the WGD events that occurred during the natural evolutionary course of E. urograndis , the E. grandis and V. vinifera genomes were analyzed to explore the WGD history of the Eucalyptus genus (Fig. 3 C). By determining the distribution of the synonymous substitution rate ( Ks ) using syntenic ortholog pairs within each genome, species differentiation was found to occur between the genomes of E. urograndis and E. grandis at about 0.03, those of E. grandis and V. vinifera at about 1.21, and those of E. urograndis and V. vinifera at about 1.23. The distributions of Ks values of paralogous pairs in the genomes of E. urograndis , E. grandis , and V. vinifera presented two clear peaks in all three genomes (recent WGD and ancient WGD). Moreover, the Ks peak values of the three genomes for the ancient WGD events were very close, indicating that they experienced gamma events (γ), the hexaploidization event shared by core eudicots[ 47 , 48 ]. Using the γ time (117 ± 1 Mya) [ 48 ] and peak Ks values (1.11) of V. vinifera , the synonymous substitutions per site per year were estimated as 4.75 × 10 –9 for Eucalyptus ; thus, the time of speciation was estimated at about 2.68 Mya (million years ago), which is close to the divergence time (2.9 Mya) between E. urograndis and E. grandis . Furthermore, the WGD of E. urograndis was estimated to have occurred about 127.79 Mya. Additionally, the ancient WGD of E. grandis was estimated to have occurred at 114.87 Mya in this study, which is close to the value (105.9–113.9 Mya) reported by Myburg et al. [ 21 ]. However, the densities of the three genomes for recent WGD events were relatively low, indicating a small-scale duplication event. Allele-specific expression associated with Cellulose, xylan, and Lignin Biosynthesis in E. urograndis and its parental species Shao et al. [ 49 ] reported that allele-specific expression (ASE), or an imbalance between the expression levels of two parental alleles in a hybrid, has been suggested as a mechanism of heterosis. Cellulose and lignin are the major components in forest wood and non-food biomass [ 50 ], and lignin is the second most abundant polymer in nature. Cellulose and lignin are the key components of plant cell walls and provide remarkable strength to wood. Understanding the biosynthetic mechanism of cellulose and lignin in E. urograndis , the most popular hybrid species, is important. In this study, the analysis of wood chemical components in E. urograndis and its parental species revealed that the levels of three chemical components in E. urograndis were more closely aligned with those of the maternal parent ( E. urophylla ) and were significantly different from those of the paternal parent ( E. grandis ) (Supplementary Fig. S5). These findings are consistent with the conclusions drawn by Shen et al. [ 31 ], who reported greater maternal heritability of chemical components in hybrid wood than paternal heritability during genetic analysis of the F1 generation of E. urograndis . Based on a comparative genomic analysis of E. urograndis and 12 other species, we identified putative functional homologs of genes encoding six enzymatic steps in cellulose biosynthesis from sucrose (Fig. 4 and Supplementary Table S12). The CesA gene family comprises 29 homologous genes, including 9 CesA genes ( CesA 1, CesA 3 to 10). Among them, 10 and 4 genes homologous to CesA1 and CesA5, respectively , were present. The CesA1 and CesA5 genes are associated with the formation of primary walls in Arabidopsis [ 51 ]. Notably, the first key enzyme in the direct production pathway of UDP-glucose was Susy, and 23 homologous genes were identified in the Susy gene family in this study, of which 18 genes were significantly expanded compared with the genes in the other 12 typical plant species. In this study, we analyzed DEGs involved in the cellulose biosynthesis pathway of E. urograndis and its parental species. The results revealed that among the 14 DEGs of the SUSY gene family, seven DEGs exhibited overdominance, whereas four DEGs demonstrated high-parent dominance. The differential expression patterns of these genes in E. urograndis and its parents may be associated with the increased cellulose content observed in the three tree species. The indirect pathway for the production of UDP-glucose comprises four key enzymes—INV (beta-fructofuranosidase; EC:3.2.1.26), HEX (hexokinase; EC:2.7.1.1), PGM (phosphoglucomutase; EC:5.4.2.2), and UGP (UTP-glucose-1-phosphate uridylyltransferase; EC:2.7.7.9), for which 35, seven, four, and five homologous genes, respectively, are found in the E. urograndis genome. On the basis of the transcriptomic analysis of immature xylem in E. urograndis and its parental species, a total of 50 DEGs in the cellulose biosynthesis pathway were identified. Among these, 25 DEGs exhibited overdominance, whereas six and eight DEGs displayed high-parent dominance and low-parent dominance, respectively. Additionally, 11 DEGs were underdominant. Notably, among the 19 DEGs of the CesA gene family, 14 DEGs were overdominance, and seven DEGs were silenced in E. grandis , suggesting that these seven DEGs conform to the SPE model. These findings may be associated with the higher cellulose content observed in hybrid E. urograndis than in its parental species. Hemicellulose xylan is a plant cell wall polysaccharide that is widely considered the second most abundant plant biopolymer on earth after cellulose [ 52 ]. Heteroxylan synthesis occurs in the Golgi apparatus through the coordinated action of several enzyme classes (as shown in Fig. 4 ). In this study, one gene of IRX7 (irregular xylem 7, weijuan_GLEAN_10024396) was identified as a PSG. Furthermore, analogous to the cellulose biosynthesis pathway, 44 DEGs involved in xylan biosynthesis exhibited significant differential expression between E. urograndis and its parents. Among these DEGs, 24 demonstrated overdominance, 14 showed high-parent dominance, and five and one exhibited low-parent dominance and underdominance, respectively. Notably, only one DEG from the GATL (galacturonosyl transferase-like) gene family (weijuan_GLEAN_10029009) and only one DEG (weijuan_GLEAN_10017065) from the XYL4 (xylan 1,4-beta-xylosidase 4) gene family (weijuan_GLEAN_10017065) presented relatively high expression levels in E. grandis . Furthermore, three DEGs were silenced in E. grandis , suggesting that these three DEGs conform to the SPE model. This finding aligns with the observed trend in hemicellulose content changes among E. urograndis and its parents. Lignin is synthesized from phenylalanine, which is a complex process involving 12 enzymes and a series of steps—deamination, hydroxylation, methylation, and reduction (Fig. 5 and Supplementary Table S12). Lignin monomers (H lignin, G lignin, and S lignin) are produced in the cytoplasm and transported across the cell membrane to be polymerized in the cell wall [ 53 ]. CCR and CAD are the last two enzymes in the monolignol synthesis pathway; one gene of the CCR gene family (30 homologous genes) was identified as a PSG, and 14 genes of the CAD gene family (36 homologous genes) and four homologous genes in the C3’H gene family were identified as significantly expanded genes. In addition, two genes belonging to the 4CL (20 homologous genes, weijuan_GLEAN_10020758) and COMT (28 homologous genes, weijuan_GLEAN_10022151) gene families were identified as PSGs. From the analysis of DEGs in the lignin biosynthesis pathway of immature xylem in E. urograndis and its parental species, 17 DEGs exhibited underdominance, whereas 16 DEGs showed low-parent dominance. Additionally, nine DEGs demonstrated overdominance, and seven DEGs exhibited high-parent dominance. Notably, the number of upregulated genes in E. grandis was the greatest, potentially contributing to its significantly higher lignin content than those of E. urophylla and E. urograndis . Specifically, one DEG (NewGene_9791) of CAD and one DEG of CCR (weijuan_GLEAN_10023993) were silenced in E. urograndis , suggesting that these two DEGs conform to the parental co-silencing model (Fig. 5 ). These findings may be associated with the lower lignin content in E. urograndis and the higher lignin content observed in E. grandis . Discussion Genomic data help characterize the history of hybridization and the genetic basis of speciation [ 54 ]. The importance of eucalypt species and their interspecific hybrids has been demonstrated in forestry programs owing to their wood quality and ability to adapt to diverse environmental conditions [ 55 ]. In this study, a chromosomal-scale genome assembly of E. urograndis was prepared using the BGI-SEQ500 platform, PacBio-SMRT technology, and Hi-C-assisted assembly technology. A high-quality reference genome for E. urograndis was obtained with a scaffold N50 of up to 58.62 Mb, and 99.91% of the sequences were anchored to 11 pseudochromosomes. Genome surveys revealed that the E. urograndis genome presented high heterozygosity, implying high genetic variability. According to the results of the genome assembly quality assessment, the E. urograndis genome presented better quality, with the mapping rate of all small fragment reads to the genome being 96.36% and the percentage of genome coverage reaching 99.80%. Conserved single-copy orthologous genes were assembled using the BUSCO method. The assembled genome can serve as a high-quality reference genome for E. urograndis to support studies of molecular breeding, genetics, and evolution in hybrids of E. urophylla and E. grandis , as well as the Eucalyptus genus. Traditionally, the genome size of hybrid offspring is determined by the genome size of both the male and female parents [ 56 ]. The assembly size of E. urograndis was 592.09 Mb in this study, which is close to the mid-parent value (588.03 Mb) of E. urophylla (559.53 Mb, provided by the Guangdong Academy of Forestry) and E. grandis (616.53 Mb, published on April 2, 2021). This confirmed that E. grandis and E. urophylla had good hybridization affinity during the formation of offspring. Compared with the draft assembly of E. grandis , the genome assembly of E. urograndis presented better continuity and greater coverage. The E. urograndis genome presented a contig N50 size of 3.73 Mb and a scaffold N50 size of 58.62 Mb, which was greater than that of E. grandis (contig N50 size, 0.61 Mb; scaffold N50 size, 58.49 Mb). The percentage of embryophyte BUSCO genes in E. urograndis was 94.90%, which was slightly greater than that in E. grandis (92.3%). The number of protein-coding genes in E. urograndis was 32,151, which is lower than that in E. grandis v2.0 (36, 349). This might be due to the smaller genome size of E. urograndis than that of E. grandis . Transposable elements, essential elements in plant genomes, can move around the genome by either “cut-paste” (DNA transposons) or “copy-paste” mechanisms (RNA transposons) [ 57 ]. The percentage of transposable elements in the E. urograndis genome reached 45.01%, which was slightly greater than that in E. grandis (44.5%). A substantial portion of eukaryotic genomes consists of highly repeated non-coding DNA sequences [ 58 ]. The repetition rate in the E. urograndis genome was 47.48%, which is close to that reported for the genomes of flowering cherry ( Prunus yedoensis , 47.2%) [ 59 ], E. melliodora (47.97%), and E. pauciflora (46.66%). Interspecific hybridization is one of the main mechanisms of plant speciation. The merging of two genomes from different subspecies, species, or even genera is frequently accompanied by WGD [ 13 ]. In this study, we analyzed collinearity and WGD events between E. urograndis and E. grandis . The results of collinearity analysis revealed a high degree of collinearity between the E. urograndis and E. grandis genomes, suggesting that the E. urograndis genome inherited many genes from its male parent ( E. grandis ). In this study, E. urograndis was generated via artificial crossing. However, the WGD event analysis indicated that the hybrid might have been formed by natural crossing. The speciation of E. urograndis from E. grandis was estimated to have occurred at the end of the Pliocene epoch (~ 2.9 Mya). During the Pliocene period (5.3–2.59 Mya), the biological world was similar to the modern world; the plant kingdom comprised the same species as those found in the modern world, and the first human-like animals appeared at the end of this period. This suggests that the natural hybrid E. urograndis possibly formed during the end of the Pliocene epoch. Additionally, the number of synonymous substitutions per site per year was estimated to be 4.75 × 10 –9 for Eucalyptus , which is slightly greater than that for Laurales (4.21 × 10 –9 ) [ 41 ] and less than that for Ranunculales (6.98 × 10 –9 ) [ 60 ]. The ancient γ WGD event might have been related mainly to environmental adaptation in the Cretaceous [ 40 ], and the recent small-scale duplication event would have been related mainly to artificial breeding and environmental changes. E. urograndis is widely used in the forestry industry because of its great potential for use as timber, short rotation, high basic density, high lignin and cellulose contents, and high mechanical strength. In general, the wood of angiosperm trees contains 42–55% cellulose and 20–25% lignin [ 61 ]; however, E. urograndis wood contains 68.41% and 29.94% cellulose and lignin contents, respectively [ 11 ]. Therefore, identification of the cellulose and lignin biosynthetic pathways in E. urograndis from the perspective of genome evolution is necessary. In this study, the expansion and contraction of genes and PSGs in the E. urograndis genome, as well as the cellulose and lignin biosynthetic pathways, were identified. Six enzymatic steps involving 126 genes involved in cellulose biosynthesis from sucrose were identified. Genomic analysis revealed that E. urograndis has 29 CesA genes; however, the E. grandis genome has 16 CesA genes [ 21 ]. Additionally, the 29 CesA genes in the E. urograndis genome comprised 9 variants ( CesA 1, CesA 3–10), which were comparable to the 10 CesA genes in the A. thaliana genome [ 28 ], but CesA2 was absent. Studies have shown that CesA2, 6, 5 , and 9 are functionally redundant and involved in the synthesis of primary wall cellulose. Burn et al. [ 62 ] reported that the function of CesA2 was less obvious in Arabidopsis , which might be related to the lack of CesA2 in the E. urograndis genome. Lignin biosynthesis from phenylalanine involves 12 enzymes and 317 genes in E. urograndis . Among them, four gene families were contracted, and two gene families were under positive selection. The evolutionary relationships of these candidate genes might be related to the high cellulose and lignin contents of E. urograndis . The higher the cellulose content is, the higher the biomass energy yield [ 63 ]. However, lignin is an important factor affecting pulp yield and quality when wood is used in the papermaking industry [ 64 ]. The modification of lignin-related genes and/or lignin composition in wood will help improve our current knowledge of lignification and facilitate chemical pulping and bleaching processes, thereby lowering their energy demand and environmental impact. Analysis of the cellulose and lignin biosynthetic pathways in this study will provide a theoretical basis for changing the chemical composition and structure of E. urograndis wood through genetic engineering in the future to produce new varieties that are more conducive to energy conservation and would help reduce pollution. Compared with other eucalyptus hybrids, E. urograndis has greater adaptability to the environment. This might be due to the evolution of intricate mechanisms to recognize and defend itself against potential pathogens. According to the evolutionary analysis of the plant-pathogen interaction pathway in E. urograndis , 6 out of 43 enzymes were identified as significantly expanded genes or PSGs; these six enzymes belong to PTI, which is the primary response against pathogen invasion in plants. Rojas et al. [ 65 ] revealed that primary metabolism is involved in regulating plant defense against pathogens. In this study, the analysis of the expanded gene families and PSGs revealed that carbohydrate, amino acid, and lipid metabolism pathways were more enriched in E. urograndis than in the other 13 plants. These results may help elucidate the reasons for the greater adaptability of E. urograndis and provide a useful direction for improving genetics and breeding using molecular biological techniques. As a key raw pulp material species globally, eucalyptus is characterized by low production costs, high pulping yields, and excellent papermaking properties. The primary chemical components of plant fiber raw materials include cellulose, hemicellulose, and lignin. In this study, gene families related to the biosynthesis of cellulose, xylan, and lignin were systematically analyzed through an integration of gene annotation analysis and mRNA-seq expression profiling. Cellulose constitutes the primary component of plant raw materials and represents the most valuable fraction in pulping and papermaking processes [ 66 ]. The cellulose content in fibrous raw materials serves as the key indicator for evaluating the suitability of materials for pulping and papermaking. Lignin, a major constituent of the plant cell wall, functions to reinforce the cell wall and bind fibers together. However, when used as a raw material for pulping, a lower lignin content in wood generally correlates with superior pulping and papermaking performance. Hemicellulose refers to the collective term for non-cellulosic high glycans present in plant cell walls. In general, retaining hemicellulose during pulping is desirable to increase the pulp yield. The results of this study demonstrated that the hybrid ( E. urograndis ) presented a higher cellulose content, higher hemicellulose content and lower lignin content than its parental species. Additionally, the analysis of gene expression profiles revealed that the number of DEGs associated with the cellulose and hemicellulose biosynthesis pathways predominantly displayed patterns of overdominance and high-parent dominance. In contrast, within the lignin biosynthesis pathway, the majority of DEGs were characterized by underdominance and low-parent dominance. These findings are largely consistent with the hypothesis of "direction-shifting ASEGs (genes showing ASE)" proposed by Shao et al. [ 49 ], which suggests that under specific spatiotemporal conditions, hybrids can selectively express advantageous alleles at particular genetic loci, thereby conferring heterosis. Conclusions In this study, we consturcted a high-quality chromosomal-level genome assembly of E. urograndis (DH32-29) using the BGI-SEQ500 platform, PacBio-SMRT technology, and Hi-C-assisted assembly technology. The resulting draft genome assembly spans 592.09 Mb, with 99.91% of the data anchored onto 11 pseudochromosomes. This assembly achieved a contig N50 of 3.73 Mb and a scaffold N50 of 58.62 Mb. Gene annotation and evaluation indicated that the E. urograndis genome contains 32,151 genes, of which 93.5% were fully annotated by BUSCOs. Additionally, 47.48% of the genome consists of repeat sequences, and the functions of 96.48% of the genes could be predicted. Based on evolutionary analysis, E. grandis and E. urograndis are estimated to have diverged approximately 2.9 Mya. Furthermore, 131 gene families were significantly expanded, and 475 PSGs were identified in the E. urograndis genome. Among these, 48 significantly expanded genes and 15 PSGs potentially associated with the fast growth and high disease resistance of E. urograndis were screened. Moreover, comparative genomic analysis revealed highly conserved synteny between the genomes of E. urograndis and E. grandis , while the E. urograndis genome exhibited chromosomal fusion, inversion, and other rearrangement events. Additionally, RNA-seq technology was employed to analyze allele-specific expression patterns of key enzymes involved in cellulose, xylan, and lignin biosynthesis. Several genes displaying allele-specific expression were identified, which may contribute to heterosis in E. urograndis . Collectively, this study provides a foundational resource and offers significant insights into the genetic and molecular mechanisms underlying Eucalyptus heterosis, thereby accelerating selective breeding in Eucalyptus species. Materials and Methods Plant Material The improved varieties of E. urograndis (NO. Gui S-SC-EUG-001–2009), specifically the well-known dominant Eucalyptus hybrid clonal line DH32-29, provided by the Guangxi State-owned Dongmen Forest Farm, were utilized for whole-genome sequencing in this study. Young leaves were collected from about 3-year-old eucalyptus plants in autumn for genome sequencing. Young plantlets were used for Hi-C library construction and sequencing. Three fresh tissue samples, namely, the leaves, xylem, and cambium, from Gui S-SC-EUG-001-2009 were collected for RNA-seq, which could help with genome annotation. In addition, for the RNA-seq analysis, the immature xylem at the diameter at breast height (DBH) level of each tree was collected from E. urograndis and its identified paternal (G46) and maternal (U16) plants. Three biological replicates were obtained for each sample. Sequencing, Assembly, and Assessment Genomic DNA was isolated from fresh leaves using a QIAamp DNA purification kit (Qiagen, Germany) according to the manufacturer’s instructions. The integrity and quality of the extracted DNA were evaluated using 1% gel electrophoresis. The DNA concentration was assessed using a Pultton DNA/Protein Analyzer (Plextech, USA). DNA samples with a total amount ≥ 20 µg, 1.8 < OD 260/280 12.5 ng/µL were used to construct the sequencing libraries. For long-read sequencing, we constructed a SMRTbell library with a fragment size of 20 kb using the SMRTBell template preparation kit 1.0 (PacBio, USA). The PacBio Sequel II-continuous long reads (CLR) sequencing library was sequenced using a PacBio Sequel system (Pacific Biosciences, Menlo Park, CA, USA) with version 3.0 chemistry, and data from one SMRT cell were generated. To obtain a chromosome-scale genome assembly, we constructed a Hi-C library for sequencing. The genomic DNA from the leaf samples was fixed with 1% formaldehyde, and the fixation was terminated with 0.2 M glycine. A Hi-C library was prepared following the Hi-C library protocol [ 67 ], followed by sequencing using a BGISEQ-500 sequencing platform (BGI, Shenzhen, China). Tissue Collection and RNA Sequencing To facilitate the prediction of protein-coding genes, total RNA was extracted from three different tissues of E. urograndis , namely, leaves, xylem, and cambium, using the TRIzol reagent (Invitrogen, CA, USA). RNA integrity and quantity were evaluated using an Agilent 2100 Bioanalyzer (Agilent, USA). The three RNA-Seq libraries, which were prepared using the NEBNext Ultra RNA Library Prep Kit (Illumina, USA) following the manufacturer’s protocol, were sequenced on a BGISEQ-500 sequencing platform (BGI, Shenzhen, China), which produced 97.65 Gb of raw data. The quality control (QC) of the raw reads was performed using a QC pipeline for RNA-Seq data—RNA-QC-Chain—and 93.82 Gb of clean RNA-Seq data were obtained for further analyses. Genome Assembly and Quality Assessment On the basis of the quality-filtered reads, the genome size, heterozygosity, and repeat sequence information of the E. urograndis genome were estimated using k-mer analysis. The k-mer count frequencies were computed using Jellyfish (v2.2.10) [ 68 ], with k = 17 and a maximum k-mer count of 10,000. The k-mer distribution was measured and plotted using GenomeScope [ 69 ]. The genome size was calculated using the formula G = N17-mer/D17-mer, where N17-mer is the total number of 17-mers and D17-mer denotes the peak frequency of 17-mers. The short reads were assembled de novo into contigs and scaffolds using SOAPdenovo2 [ 70 ]. Gaps in the initial assembly were filled using TGS-Gapcloser (v1.12) [ 71 ] with the parameters “avg_ins = 364, max_ins = 500, and min_ins = 260”. The draft assembly was then anchored and oriented into a chromosomal-scale assembly using the Hi-C scaffolding approach. First, the raw Hi-C reads were filtered using HiC-Pro (v2.8.0) [ 72 ]. Then, 3D-DNA (v170123) [ 73 ] with the parameters “-m haploid -s 0 -c 24” was used to anchor the primary contigs and scaffolds into chromosomes. The inter/intrachromosomal contact maps were built and visualized using Juicebox [ 74 ]. To further improve the integrity and accuracy of the genome assembly, we employed TGS-GapCloser, which uses low-depth (≥ 10x) single-molecule sequencing long reads without any error correction to close the gaps in the draft assembly. The long sequences were split into three groups, namely, total reads (with options –min_idy 0.2, –min_match 200 –r_round 1), reads with a length ≥ 20 kb (with options –min_idy 0 –min_match 0 –r_round 3), and reads with a length of 2–20 kb (with options –min_idy 0 –min_match 0 –r_round 3), and each group was used to fill the corresponding aligned gaps. The completeness of the genome assembly was assessed using Benchmarking sets of Universal Single-Copy Orthologs (BUSCO v5.4.3) and GC content analyses. The single-copy orthologs of Embryophyta_odb9 (BUSCO, v2.0) [ 75 ] were searched against the assembled genome using the BUSCO tool. The GC content and average sequencing depth across the genome were also measured with 10 kb non-overlapping sliding windows, and windows harboring more than 50% N were filtered. No external contamination was detected in the genome. Repetitive Sequence Annotation The repetitive sequences in the E. urograndis genome were annotated using both homology searches in known repeat databases and de novo predictions. Known repeats were identified using RepeatMasker (v3.3.0) [ 76 ] and the RepBase TE library (v14.06) [ 77 ]. The RepeatProteinMask (v3.2.2) implemented in RepeatMasker was used to detect the TE-relevant proteins. Novel repeats were predicted using RepeatModeler ( http://www.repeatmasker.org ) on the basis of the de novo repeat library constructed with LTR_Finder [ 78 ] and RepeatScout (1.0.6) [ 79 ]. In addition, the tandem repeat finder (TRF, v4.09) [ 80 ] was used to identify the tandem repeats in the genome with the parameters “Match = 2, Mismatch = 7, Delta = 7, PM = 80, PI = 10, Minscore = 50, and MaxPerid = 2000”. Gene Prediction and Annotation Based on the repeat-masked genome, we employed de novo , homology-based, and transcriptome-assisted predictions to detect protein-coding genes. De novo gene prediction was performed using Augustus (v2.7) [ 81 ] and Genscan [ 82 ] with the default settings. For homology-based prediction, protein sequences of E. grandis, Citrus sinensis, Populus trichocarpa, Prunus persica, Punica granatum, Solanum lycopersicum, Theobroma cacao , and Hevea brasiliensis were downloaded from the NCBI database and aligned to the E. urograndis genome using tBLASTn ( E value ≤ 1e-5). The homologous genome sequences were then aligned against the matching proteins using GeneWise (v2.4.0) [ 83 ] for accurate spliced alignments. Transcriptomic data were generated from three RNA-Seq libraries constructed from three different tissue samples, namely, leaves, xylem, and cambium. A total of 36.45 Gb of clean data were aligned to the assembled genome sequences using HISAT2-StringTie (v2.0.10) [ 84 ], and the putative transcript structures were detected using StringTie (v2.1.1) [ 85 ]. The candidate protein-coding regions within the transcript sequences were then predicted using TransDecoder (v5.5.0) ( https://github.com/TransDecoder/TransDecoder/ ). Finally, genes predicted using the above methods were merged into a consensus gene set using Glean [ 86 ]. The completeness of the gene set was evaluated using BUSCO v5.4.3 software, and the gene set of embryogenic plants was selected as the reference. For gene function annotation, we aligned all the genes of E. urograndis to the TrEMBL, SwissProt, Kyoto Encyclopedia of Genes and Genomes (KEGG), Gene Ontology (GO), and InterProScan databases. Additionally, for non-coding RNA prediction, 500 tRNAs were identified using tRNAscan-SE 2.0 [ 87 ]. Moreover, 438 snRNAs and 994 miRNAs were annotated by the infernal tool [ 88 ] in the Rfam database [ 89 ]. Additionally, 747 rRNAs were identified via homology searches against closely related species using the BLASTN tool. Comparative and Evolutionary Genomic Analysis To construct the phylogenetic tree, we identified orthologous gene families by comparing the protein and cDNA sequences of E. urograndis and 12 sequenced plant genomes— E. grandis, S. tuberosum, Z. mays, O. sativa, P. dactylifera, Populus trichocarpa, P. patens, A. thaliana, E. melliodora, E. pauciflora, V. vinifera , and Theobroma cacao . First, BLASTP software was used to compare the protein sequences of the 13 species, and the E value threshold was set as 1E-5. OrthoMCL software [ 90 ] was used to cluster the gene families of all the species. The gene set data were downloaded from Ensemble or NCBI. The genes obtained from databases with frameshifts or fewer than 50 amino acids were removed, and for protein-coding genes with alternative splicing isoforms, only the longest protein sequence prediction was selected for their representatives. Additionally, the genome assembly of E. urograndis was compared with the published genomes of other species, including E. grandis, E. melliodora , and E. pauciflora . Second, the protein sequences of single-copy gene families were compared using MUSCLE [ 91 ], and the protein sequences were transcribed into coding sequences (CDSs) on the basis of the alignment sequence; the phase 1 locus was extracted and concatenated into a supergene. MrBayes 3.2.4 ( https://sourceforge.net/projects/mrbayes/files/mrbayes/3.2.4/ ) was used to define the orthologous and paralogous relationships among all the organisms. Using the single-copy orthologous genes, a phylogenetic tree was generated on the basis of the Bayes model using PhyML (v3.0) [ 92 ] with 500 bootstrap replications. The MCMCTree program implemented in the PAML package [ 93 ] was used to predict the divergence times. The divergence times were estimated using the approximate method with fossil calibrations from the TimeTree database ( http://www.timetree.org/ ). Third, we identified the orthologous groups among these 13 species using all-to-all BLAST ( E value ≤ 1e-5, identity ≥ 80%) and identified the expanded and contracted gene families using CAFÉ 5 [ 94 ]. GO and KEGG enrichment analyses were performed to identify the functional implications of the expanded and contracted genes (Fisher’s exact test, adjusted p value < 0.05). CodeML in PAML was used for positive selection analysis, and the “branch-site” model was selected, with the foreground branches being E. urograndis and E. grandis and the background branches being E. melliodora , E. pauciflora , and P. trichocarpa . Collinearity and Whole-Genome Duplication Analysis To reveal the collinearity relationships between E. urograndis and E. grandis , we aligned the chromosomes of E. urograndis with those of E. grandis using the LASTZ tool (v1.04.22) [ 95 ] with default options. Chromosomal collinearity was constructed using mapped regions with lengths > 2 kb and visualized using Circos (v0.69) [ 96 ]. Additionally, syntenic blocks within the E. urograndis and E. grandis genomes were identified using MCScanX [ 97 ] with default parameters based on Diamond v0.9.29.130 software [ 98 ], which was used to compare the gene sequences of the two species to determine similar gene pairs with an E value cutoff of 1e-5 and a C score ≥ 0.5, where the value of the C score was filtered using JCVI v0.9.13 software [ 99 ]. The Ks (synonymous substitutions per site) values between collinear genes were estimated using the CodeML approach as implemented in the PAML package. The Ks and 4DTv (fourfold synonymous third-codon transversion rate) methods were used to identify WGD events. WGD v1.1.1 software [ 100 ] and a custom script ( https://github.com/JinfengChen/Scripts ) were used to identify WGD events in E. urograndis , E. grandis , E. pauciflora , C. citriodora , and V. vinifera . RNA Sequencing and Differentially Expressed Gene Analysis Total RNA was isolated from the 9 immature xylem samples (three biological replicates from each of the three tree species) using an RNAprep Pure Plant Kit (TIANGEN Biotech, Beijing, China) with DNase I treatment to remove genomic DNA. Then, the qualified RNA was processed for library construction. To ensure the quality of the library, a Qubit 2.0 and Agilent 2100 instruments were used to examine the concentration of cDNA and the insert size. The qualified library was sequenced by the high-throughput sequencing platform in PE150 mode. FPKM (Fragments Per Kilobase of transcript per Million fragments mapped) was applied to measure the expression level of a gene by StringTie using maximum flow algorithm [ 101 ]. Differential expression analysis was performed using DESeq2 (v1.31.16) [ 102 ], and DEGs were identified on the basis of the criteria of a fold change (FC) ≥ 2 and a false discovery rate (FDR) < 0.01. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses were subsequently conducted for the identified DEGs. Abbreviations 4CL: 4-coumarate-CoA ligase; ASE: allele-specific expression; ASEGs: genes showing allele-specific expression; BP: biological process; BUSCOs: benchmarking universal single-copy orthologs; C3'H: 5-O-(4-coumaroyl)-D-quinate 3'-monooxygenase; C4H: trans-cinnamate 4-monooxygenase; CAD: cinnamyl-alcohol dehydrogenase; CALM: calmodulin; CC: cellular component; CCoAOMT: caffeoyl-CoA O-methyltransferase; CCR: cinnamoyl-CoA reductase; CesA: cellulose synthase A; COMT: caffeic acid 3-O-methyltransferase; DEGs: differentially expressed genes; DUF: domain of unknown function; E. urograndis : Eucalyptus urophylla × Eucalyptus grandis ; F5H: ferulate-5-hydroxylase; GATL: galacturonosyltransferase-like; GUX: xylan alpha-glucuronosyltransferase; GXM: glucuronoxylan 4-O-methyltransferase; HCT: shikimate O-hydroxycinnamoyltransferase; HEX: hexokinase; Hi-C: high-throughput chromosome conformation capture; INV: beta-fructofuranosidase; IRX: irregular xylem; LINEs: long interspersed nuclear elements; LTRs: long terminal repeats; MF: molecular function; Mya: million years ago; PAL: phenylalanine ammonia-lyase; PAV: presence/absence variation; PGM: phosphoglucomutase; PSGs: positive selection genes; PTK3: PTI1-like tyrosine-protein kinase 3; RWA: reduced wall acetylation; SINEs: short interspersed nuclear elements; SMRT: long-read single-molecule real-time; SNPs: single nucleotide polymorphisms; SPE: single-parent expression; SUSY: sucrose synthase; UGDH: UDPglucose 6-dehydrogenase; UGP: UTP-glucose-1-phosphate uridylyltransferase; UXS: UDP-glucuronate decarboxylase; WGD: whole-genome duplication; XYL: xylan 1,4-beta-xylosidase; XYS: 1,4-beta-D-xylan synthase. Declarations Acknowledgement We appreciate the comments from Prof. Jihua Ding (Huazhong Agricultural University, College of Horticulture and Forestry Sciences) and the information about the E. urophylla genome provided by Prof. Weihua Zhang (Guangdong Academy of Forestry). We gratefully acknowledge the valuable comments and four anonymous reviewers that helped to improve our manuscript. Author Contributions G.L. and J. L. designed and managed the project. W.L., L,Z., Y.L., J.P., J.B., and A.H. participated in material collecting and processing. W.L. and Y.L. performed bioinformatics analyses. G.L. and J.Z. wrote the manuscript. J.P. and A.H. contributed to validation works. J.L., G.L., and W.L. revised the manuscript. All authors read and approved the final manuscript. Funding This work was financially supported by the Fundamental Research Funds of CAF (CAFYBB2023MB034) and the National Key R&D Program of China (Grant No. 2022YFD2200203 and 2023YFD2201001). Data availability The genomic data and transcriptomic data generated in this study were deposited in the Genome Sequence Archive (GSA) of the China National Center for Bioinformation (CNCB) under accession number of CRA017352 and CRA024878, respectively. All data generated or analyzed during this study are included in this manuscript and supplementary file. Ethics approval and consent to participate Not applicable. Clinical trial number Not applicable. Consent for publication Not applicable. Competing interests The authors declare that they have no competing interests. References Vilasboa J, Da Costa CT, Fett-Neto AG. Rooting of eucalypt cuttings as a problem-solving oriented model in plant biology. Prog Biophys Mol Biol. 2019;146:85–97. Du K, Xia Y, Zhan D, Xu T, Lu T, Yang J, et al. Genome-wide identification of the Eucalyptus urophylla GATA gene family and its diverse roles in chlorophyll biosynthesis. Int J Mol Sci. 2022;23(9):5251. Paiva JA, Prat E, Vautrin S, Santos MD, San-Clemente H, Brommonschenkel S, et al. Advancing Eucalyptus genomics: identification and sequencing of lignin biosynthesis genes from deep-coverage BAC libraries. BMC Genomics. 2011;12:137. Grattapaglia D, Silva-Junior OB, Kirst M, de Lima BM, Faria DA, Pappas GJ. Jr. High-throughput SNP genotyping in the highly heterozygous genome of Eucalyptus : assay success, polymorphism and transferability across species. BMC Plant Biol. 2011;11:65. Retief ECL, Stanger TK. Genetic parameters of pure and hybrid populations of Eucalyptus grandis and E. urophylla and implications for hybrid breeding strategy. South Forests. 2009;71(2):133–40. Hochholdinger F, Hoecker N. Towards the molecular basis of heterosis. Trends Plant Sci. 2007;12(9):427–32. Wu X, Liu Y, Zhang Y, Gu R. Advances in research on the mechanism of heterosis in plants. Front Plant Sci. 2021;12:745726. Shinya T, Iwata E, Nakahama K, Fukuda Y, Hayashi K, Nanto K, et al. Transcriptional profiles of hybrid Eucalyptus genotypes with contrasting lignin content reveal that monolignol biosynthesis-related gnes regulate wood composition. Front Plant Sci. 2016;7:443. Shi D. Analysis of characteristics of several Eucalyptus urophlla × E. grandis species and their direction of production. Forestry Sci Technol. 2019;44(6):41–3. Gonçalves FG, Oliveira JTS, Lucia RMD, Sartório RC. Estudo de algumas propriedades mecânicas da madeira de um híbrido clonal de Eucalyptus urophylla x Eucalyptus grandis . Rev Árvore. 2009;33(3):501–9. Rocha MEL, Ristau ACP, Cruz MSFV, Oliveira Neto CFd M, Malavasi MM. Growth dynamics of container seedlings of Eucalyptus grandis x Eucalyptus urophylla and Hymenaea courbaril L. Rev Ceres. 2022;69(4):425–35. Wang Y, Chen F, Ma Y, Zhang T, Sun P, Lan M, et al. An ancient whole-genome duplication event and its contribution to flavor compounds in the tea plant ( Camellia sinensis ). Hortic Res. 2021;8(1):176. Glombik M, Bačovský V, Hobza R, Kopecký D. Competition of parental genomes in plant hybrids. Front Plant Sci. 2020;11:200. Taylor SA, Larson EL. Insights from genomes into the evolutionary importance and prevalence of hybridization in nature. Nat Ecol Evol. 2019;3(2):170–7. Gu Z, Gong J, Zhu Z, Li Z, Feng Q, Wang C, et al. Structure and function of rice hybrid genomes reveal genetic basis and optimal performance of heterosis. Nat Genet. 2023;55(10):1745–56. Xu CL, Sun XM, Zhang SG. Mechanism on differential gene expression and heterosis formation. Hereditas (Beijing). 2013;35(6):714–26. Hochholdinger F, Yu P. Molecular concepts to explain heterosis in crops. Trends Plant Sci. 2025;30(1):95–104. Xie J, Wang W, Yang T, Zhang Q, Zhang Z, Zhu X, et al. Large-scale genomic and transcriptomic profiles of rice hybrids reveal a core mechanism underlying heterosis. Genome Biol. 2022;23(1):264. Botet R, Keurentjes JJB. The role of transcriptional regulation in hybrid vigor. Front Plant Sci. 2020;11:410. Bartholomé J, Mandrou E, Mabiala A, Jenkins J, Nabihoudine I, Klopp C, et al. High-resolution genetic maps of Eucalyptus improve Eucalyptus grandis genome assembly. New Phytol. 2015;206(4):1283–96. Myburg AA, Grattapaglia D, Tuskan GA, Hellsten U, Hayes RD, Grimwood J, et al. The genome of Eucalyptus grandis . Nature. 2014;510:356–62. Kyriakidou M, Achakkagari SR, Gálvez López JH, Zhu X, Tang CY, Tai HH, et al. Structural genome analysis in cultivated potato taxa. Theor Appl Genet. 2020;133(3):951–66. Nie S, Wang B, Ding H, Lin H, Zhang L, Li Q, et al. Genome assembly of the Chinese maize elite inbred line RP125 and its EMS mutant collection provide new resources for maize genetics research and crop improvement. Plant J. 2021;108(1):40–54. Wang L, Zhao L, Zhang X, Zhang Q, Jia Y, Wang G, et al. Large-scale identification and functional analysis of NLR genes in blast resistance in the Tetep rice genome sequence. Proc Natl Acad Sci U S A. 2019;116(37):18479–87. Fang Y, Wu H, Zhang T, Yang M, Yin Y, Pan L, et al. A complete sequence and transcriptomic analyses of date palm ( Phoenix dactylifera L.) mitochondrial genome. PLoS ONE. 2012;7(5):e37164. Tuskan GA, Difazio S, Jansson S, Bohlmann J, Grigoriev I, Hellsten U, et al. The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science. 2006;313(5793):1596–604. Lang D, Ullrich KK, Murat F, Fuchs J, Jenkins J, Haas FB, et al. The physcomitrella patens chromosome-scale assembly reveals moss genome structure and evolution. Plant J. 2018;93(3):515–33. Hou X, Wang D, Cheng Z, Wang Y, Jiao Y. A near-complete assembly of an Arabidopsis thaliana genome. Mol Plant. 2022;15(8):1247–50. Magris G, Jurman I, Fornasiero A, Paparelli E, Schwope R, Marroni F, et al. The genomes of 204 Vitis vinifera accessions reveal the origin of European wine grapes. Nat Commun. 2021;12(1):7240. Motamayor JC, Mockaitis K, Schmutz J, Haiminen N, Livingstone D 3rd, Cornejo O, et al. The genome sequence of the most widely cultivated cacao type and its use to identify candidate genes regulating pod color. Genome Biol. 2013;14(6):r53. Shen C, Li L, Ouyang L, Su M, Guo K. E. urophylla × E. grandis high-quality genome and comparative genomics provide insights on evolution and diversification of eucalyptus. BMC Genomics. 2023;24(1):223. Rai MI, Alam M, Lightfoot DA, Gurha P, Afzal AJ. Classification and experimental identification of plant long non-coding RNAs. Genomics. 2019;111(5):997–1005. Yu Y, Zhang Y, Chen X, Chen Y. Plant noncoding RNAs: hidden players in development and stress responses. Annu Rev Cell Dev Biol. 2019;35:407–31. Nadarajah KK, Abdul Rahman NSN. The role of non-coding RNA in rice immunity. Agronomy. 2021;12(1):39. D'Ario M, Griffiths-Jones S, Kim M. Small RNAs: big impact on plant development. Trends Plant Sci. 2017;22(12):1056–68. Martinez G, Köhler C. Role of small RNAs in epigenetic reprogramming during plant sexual reproduction. Curr Opin Plant Biol. 2017;36:22–8. Tang J, Chu C. MicroRNAs in crop improvement: fine-tuners for complex traits. Nat Plants. 2017;3:17077. Lafontaine DL. Noncoding RNAs in eukaryotic ribosome biogenesis and function. Nat Struct Mol Biol. 2015;22(1):11–9. Chanfreau GF, Tamanoi F. The enzymes. USA: Academic; 2012. Zhang L, Wu S, Chang X, Wang X, Zhao Y, Xia Y, et al. The ancient wave of polyploidization events in flowering plants and their facilitated adaptation to environmental stress. Plant Cell Environ. 2020;43(12):2847–56. Shang J, Tian J, Cheng H, Yan Q, Li L, Jamal A, et al. The chromosome-level wintersweet ( Chimonanthus praecox ) genome provides insights into floral scent biosynthesis and flowering in winter. Genome Biol. 2020;21(1):200. Zhang L, Li X, Ma B, Gao Q, Du H, Han Y, et al. The tartary buckwheat genome provides insights into rrutin biosynthesis and abiotic stress tolerance. Mol Plant. 2017;10(9):1224–37. Barakat A, Bagniewska-Zadworna A, Choi A, Plakkat U, DiLoreto DS, Yellanki P, et al. The cinnamyl alcohol dehydrogenase gene family in Populus : phylogeny, organization, and expression. BMC Plant Biol. 2009;9:26. Gessler A. Sucrose synthase - an enzyme with a central role in the source-sink coordination and carbon flow in trees. New Phytol. 2021;229(1):8–10. Dominguez PG, Donev E, Derba-Maceluch M, Bünder A, Hedenström M, Tomášková I, et al. Sucrose synthase determines carbon allocation in developing wood and alters carbon flow at the whole tree level in aspen. New Phytol. 2021;229(1):186–98. Booker TR, Jackson BC, Keightley PD. Detecting positive selection in the genome. BMC Biol. 2017;15(1):98. Zhang L, Wu S, Chang X, Wang X, Zhao Y, Xia Y, et al. The ancient wave of polyploidization events in flowering plants and their facilitated adaptation to environmental stress. Plant Cell Environ. 2020;43(12):2847–56. Wang Z, Li L, Ouyang L. Efficient genetic transformation method for Eucalyptus genome editing. PLoS ONE. 2021;16(5):e0252011. Shao L, Xing F, Xu C, Zhang Q, Che J, Wang X, et al. Patterns of genome-wide allele-specific expression in hybrid rice and the implications on the genetic basis of heterosis. Proc Natl Acad Sci U S A. 2019;116(12):5653–8. Liu G, Xie Y, Shang X, Wu Z. Expression patterns and gene analysis of the cellulose synthase gene superfamily in Eucalyptus grandis . Forests. 2021;12(9):1254. Carroll A, Mansoori N, Li S, Lei L, Vernhettes S, Visser RG, et al. Complexes with mixed primary and secondary cellulose synthases are functional in Arabidopsis plants. Plant Physiol. 2012;160(2):726–37. Curry TM, Peña MJ, Urbanowicz BR. An update on xylan structure, biosynthesis, and potential commercial applications. Cell Surf. 2023;9:100101. Miao YC, Liu CJ. ATP-binding cassette-like transporters are involved in the transport of lignin precursors across plasma and vacuolar membranes. Proc Natl Acad Sci U S A. 2010;107(52):22728–33. Payseur BA, Rieseberg LH. A genomic perspective on hybridization and speciation. Mol Ecol. 2016;25(11):2337–60. Souza DMSC, Avelar MLM, Fernandes SB, Silva EO, Duarte VP, Molinari LV, et al. Spectral quality and temporary immersion bioreactor for in vitro multiplication of Eucalytpus grandis × Eucalyptus urophylla . 3 Biotech. 2020;10(10):457. Boutte J, Maillet L, Chaussepied T, Letort S, Aury JM, Belser C, et al. Genome size variation and comparative genomics reveal intraspecific diversity in Brassica rapa . Front Plant Sci. 2020;11:577536. Ramakrishnan M, Satish L, Sharma A, Kurungara Vinod K, Emamverdian A, Zhou M, et al. Transposable elements in plants: recent advancements, tools and prospects. Plant Mol Biol Rep. 2022;40:628–45. Cooper G, Adams K. The cell: A molecular approach. 2nd ed. Sunderland (MA): Sinauer Associates; 2000. Baek S, Choi K, Kim GB, Yu HJ, Cho A, Jang H, et al. Draft genome sequence of wild Prunus yedoensis reveals massive inter-specific hybridization between sympatric flowering cherries. Genome Biol. 2018;19(1):127. Guo L, Winzer T, Yang X, Li Y, Ning Z, He Z, et al. The opium poppy genome and morphinan production. Science. 2018;362(6412):343–7. Suzuki S, Li L, Sun YH, Chiang VL. The cellulose synthase gene superfamily and biochemical functions of xylem-specific cellulose synthase-like genes in Populus trichocarpa . Plant Physiol. 2006;142(3):1233–45. Burn JE, Hocart CH, Birch RJ, Cork AC, Williamson RE. Functional analysis of the cellulose synthase genes CesA1 , CesA2 , and CesA3 in Arabidopsis . Plant Physiol. 2002;129(2):797–807. Madhu P, Nithiyesh Kumar C, Anojkumar L, Matheswaran M. Selection of biomass materials for bio-oil yield: a hybrid multi-criteria decision making approach. Clean Techn Environ Policy. 2018;20:1377–84. Santos RB, Hart PW, Jameel H, Chang H. Wood based lignin reactions important to the biorefinery and pulp and paper industries. BioRes. 2013;8(1):1456–77. Rojas CM, Senthil-Kumar M, Tzin V, Mysore KS. Regulation of primary plant metabolism during plant-pathogen interactions and its contribution to plant defense. Front Plant Sci. 2014;5:17. Liu G, Xie Y, Shang X, Wu Z. Expression patterns and gene analysis of the cellulose synthase gene superfamily in Eucalyptus grandis . Forests. 2021;12(9):1254. Lafontaine DL, Yang L, Dekker J, Gibcus JH, Hi. -C 3.0: improved protocol for genome-wide chromosome conformation capture. Curr Protoc. 2021;1(7):e198. Marçais G, Kingsford C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics. 2011;27(6):764–70. Vurture GW, Sedlazeck FJ, Nattestad M, Underwood CJ, Fang H, Gurtowski J, et al. GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics. 2017;33(14):2202–4. Luo R, Liu B, Xie Y, Li Z, Huang W, Yuan J, et al. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience. 2012;1(1):18. Xu M, Guo L, Gu S, Wang O, Zhang R, Peters BA, et al. TGS-GapCloser: A fast and accurate gap closer for large genomes with low coverage of error-prone long reads. Gigascience. 2020;9(9):giaa094. Servant N, Varoquaux N, Lajoie BR, Viara E, Chen CJ, Vert JP, et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 2015;16:259. Zhang Y, Xiong Y, Xiao Y. 3dDNA: a computational method of building DNA 3d structures. Molecules. 2022;27(18):5936. Durand NC, Robinson JT, Shamim MS, Machol I, Mesirov JP, Lander ES, et al. Juicebox provides a visualization system for Hi-C contact maps with unlimited zoom. Cell Syst. 2016;3(1):99–101. Manni M, Berkeley MR, Seppey M, Simão FA, Zdobnov EM. BUSCO update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Mol Biol Evol. 2021;38(10):4647–54. Saha S, Bridges S, Magbanua ZV, Peterson DG. Computational approaches and tools used in identification of dispersed repetitive DNA sequences. Trop Plant Biol. 2008;1:85–96. Bao W, Kojima KK, Kohany O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mob DNA. 2015;6:11. Xu Z, Wang H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 2007;35(Web Server issue):W265–8. Price AL, Jones NC, Pevzner PA. De novo identification of repeat families in large genomes. Bioinformatics. 2005;21(Suppl 1):i351–8. Benson G. Tandem repeats finder: A program to analyze DNA sequences. Nucleic Acids Res. 1999;27(2):573–80. Keller O, Kollmar M, Stanke M, Waack S. A novel hybrid gene prediction method employing protein multiple sequence alignments. Bioinformatics. 2011;27(6):757–63. Burge C, Karlin S. Prediction of complete gene structures in human genomic DNA. J Mol Biol. 1997;268(1):78–94. Birney E, Clamp M, Durbin R. GeneWise and genomewise. Genome Res. 2004;14(5):988–95. Kim D, Paggi JM, Park C, Bennett C, Salzberg SL. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat Biotechnol. 2019;37(8):907–15. Pertea M, Kim D, Pertea GM, Leek JT, Salzberg SL. Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. Nat Protoc. 2016;11(9):1650–67. Chan KCK, Xu X, Wang X, Gu J, Loy CC. GLEAN: generative latent bank for image super-resolution and beyond. IEEE Trans Pattern Anal Mach Intell. 2023;45(3):3154–68. Chan PP, Lin BY, Mak AJ, Lowe TM. tRNAscan-SE 2.0: improved detection and functional classification of transfer RNA genes. Nucleic Acids Res. 2021;49(16):9077–96. Nawrocki EP, Eddy SR. Computational identification of functional RNA homologs in metagenomic data. RNA Biol. 2013;10(7):1170–9. Kalvari I, Nawrocki EP, Argasinska J, Quinones-Olvera N, Finn RD, Bateman A, et al. Non-coding RNA analysis using the Rfam database. Curr Protoc Bioinf. 2018;62(1):e51. Li L, Stoeckert CJ Jr., Roos DS. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 2003;13(9):2178–89. Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32(5):1792–7. Guindon S, Dufayard JF, Lefort V, Anisimova M, Hordijk W, Gascuel O. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol. 2010;59(3):307–21. Yang Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol. 2007;24(8):1586–91. Mendes FK, Vanderpool D, Fulton B, Hahn MW. CAFE 5 models variation in evolutionary rates among gene families. Bioinformatics. 2021;36(22–23):5516–8. Harris RS. Improved pairwise alignment of genomic DNA [Ph.D. Thesis]. Ann Arbor, MI: The Pennsylvania State University; 2007. Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R, Horsman D, et al. Circos: an information aesthetic for comparative genomics. Genome Res. 2009;19(9):1639–45. Wang Y, Tang H, Debarry JD, Tan X, Li J, Wang X, et al. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 2012;40(7):e49. Buchfink B, Xie C, Huson DH. Fast and sensitive protein alignment using DIAMOND. Nat Methods. 2015;12(1):59–60. Tang H, Bowers JE, Wang X, Ming R, Alam M, Paterson AH. Synteny and collinearity in plant genomes. Science. 2008;320(5875):486–8. Zwaenepoel A, Van de Peer Y. wgd-simple command line tools for the analysis of ancient whole-genome duplications. Bioinformatics. 2019;35(12):2153–5. Shumate A, Wong B, Pertea G, Pertea M. Improved transcriptome assembly using a hybrid of long and short reads with StringTie. PLoS Comput Biol. 2022;18(6):e1009730. Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15(12):550. Additional Declarations No competing interests reported. Supplementary Files Additionalfile1.docx Additional file 1: Fig S1-S5. Fig. S1. Assembly results of the E. urograndis genome. (A) Genomescope analysis of heterozygosity; (B) Scatter diagram of GC content and sequencing depth. (C) Hi-C interaction heatmap of E. urograndis . The darker the red color, the stronger the interaction intensity. The interaction intensity within the same chromosome is more pronounced than between staining, and the chromosome boundary is more obvious, resulting in an ideal mounting effect. Fig. S2. Annotation quality comparison of protein-coding genes. The messenger RNA (mRNA) length, CDS length, exon length, and intron length among 5 species: E. urograndis , E. grandis , P. granatum , P. trichocarpa , H. brasiliensis . Fig. S3. The KEGG enrichment (A) and GO annotation (B) results of the E. urograndis genome. Fig. S4. The 20 significantly enriched pathways with p -values of annotated and enriched genes in the E. urograndis genome. Fig. S5. The results of three major main wood chemical compositions components of in the immature xylem of E. urophylla , E. urograndis and E. grandis . The same letters in the figure indicate no significant difference, and the different letters indicate significant differences. The date was subjected to the method of multiple comparison of LSD. Error bars indicate standard error derived from three replicates. Additionalfile2.xlsx Additional file 2: Tables S1-S12. Table S1. Assembly statistics of E. urograndis genome. Table S2. Results of NT comparison. Table S3. Results of BUSCO assessment of genome assembly. Table S4. The lengths of 11 chromosomes in the E. urograndis genome. Table S5. Statistics results of repetitive sequences. Table S6. Statistical results of non-coding RNA. Table S7. Statistical results of gene structure prediction. Table S8. GO enrichement analysis of all the E. uorgrandis genes. Table S9. Clustering results of gene families in the E. urograndis genome. Table S10. KEGG pathway analysis of the 131 expansion and contraction of gene families. Table S11. Positive selection genes in the E. urograndis genome. Table S12. Key enzymes and paralogous genes identified in cellulose, xylan and lignin biosynthesis pathways in the E. urograndis genome. Cite Share Download PDF Status: Published Journal Publication published 27 Oct, 2025 Read the published version in BMC Plant Biology → Version 1 posted Editorial decision: Revision requested 24 Jul, 2025 Reviews received at journal 18 Jul, 2025 Reviews received at journal 14 Jul, 2025 Reviews received at journal 13 Jul, 2025 Reviews received at journal 10 Jul, 2025 Reviewers agreed at journal 08 Jul, 2025 Reviewers agreed at journal 06 Jul, 2025 Reviewers agreed at journal 03 Jul, 2025 Reviewers agreed at journal 02 Jul, 2025 Reviewers invited by journal 30 Jun, 2025 Editor invited by journal 27 Jun, 2025 Editor assigned by journal 27 Jun, 2025 Submission checks completed at journal 27 Jun, 2025 First submitted to journal 17 Jun, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-6912338","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":478365522,"identity":"8afa22e4-3ee1-4230-b134-483fae29f2ca","order_by":0,"name":"Guo Liu","email":"","orcid":"","institution":"Research Institute of Fast-growing Trees, Chinese Academy of Forestry","correspondingAuthor":false,"prefix":"","firstName":"Guo","middleName":"","lastName":"Liu","suffix":""},{"id":478365523,"identity":"65f6b66a-b5e6-43b9-8efd-737c86a0606c","order_by":1,"name":"Jianzhong Luo","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAxUlEQVRIiWNgGAWjYBACPiA2AGI5BgbGB8RpYYNqMWZgYDYgXgsIJDYQr0Ui+UDhjwq79P72wwwMP2oY5M0Ja0lLMJA4k5w740wyA2PPMQbDnQ0EteQYGBi2HcjdwJB/gIG3gSHB4AAxWhL/HUg34H/MwPiXaC0HGw4AXZfMwEycLTzPEgwbjiUbzrjxmOGwzDEJww2EtPCzJx8z/FFjJ8/fn8z48E2NjTxBW0AWweMDqFiCsHogYH5AlLJRMApGwSgYuQAAb/Q4RkKCtZcAAAAASUVORK5CYII=","orcid":"","institution":"Research Institute of Fast-growing Trees, Chinese Academy of Forestry","correspondingAuthor":true,"prefix":"","firstName":"Jianzhong","middleName":"","lastName":"Luo","suffix":""},{"id":478365524,"identity":"5d0d8ea6-782c-4ad6-8148-21787233fee9","order_by":2,"name":"Wanhong Lu","email":"","orcid":"","institution":"Research Institute of Fast-growing Trees, Chinese Academy of Forestry","correspondingAuthor":false,"prefix":"","firstName":"Wanhong","middleName":"","lastName":"Lu","suffix":""},{"id":478365525,"identity":"003141b5-7732-46c3-a9d5-0201ba0dca6c","order_by":3,"name":"Yan Lin","email":"","orcid":"","institution":"Research Institute of Fast-growing Trees, Chinese Academy of Forestry","correspondingAuthor":false,"prefix":"","firstName":"Yan","middleName":"","lastName":"Lin","suffix":""},{"id":478365526,"identity":"1beda670-9c2e-4494-a1ba-135cbfc5021c","order_by":4,"name":"Lei Zhang","email":"","orcid":"","institution":"Guangxi State-owned Dongmen Forest Farm","correspondingAuthor":false,"prefix":"","firstName":"Lei","middleName":"","lastName":"Zhang","suffix":""},{"id":478365527,"identity":"4459b9eb-6691-430d-9471-bd8f74e36102","order_by":5,"name":"Jingyi Pan","email":"","orcid":"","institution":"Research Institute of Fast-growing Trees, Chinese Academy of Forestry","correspondingAuthor":false,"prefix":"","firstName":"Jingyi","middleName":"","lastName":"Pan","suffix":""},{"id":478365528,"identity":"5c3348c1-43da-480e-91b1-f829a0989367","order_by":6,"name":"Jiangbo Zhai","email":"","orcid":"","institution":"Research Institute of Fast-growing Trees, Chinese Academy of Forestry","correspondingAuthor":false,"prefix":"","firstName":"Jiangbo","middleName":"","lastName":"Zhai","suffix":""},{"id":478365529,"identity":"e3144bf5-1afb-472a-9076-6e947b3d7c24","order_by":7,"name":"Anying Huang","email":"","orcid":"","institution":"Research Institute of Fast-growing Trees, Chinese Academy of Forestry","correspondingAuthor":false,"prefix":"","firstName":"Anying","middleName":"","lastName":"Huang","suffix":""}],"badges":[],"createdAt":"2025-06-17 08:53:26","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-6912338/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-6912338/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1186/s12870-025-07371-3","type":"published","date":"2025-10-27T15:58:08+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":85876546,"identity":"bf7cf3e4-f9d4-4795-8f6d-dfd01d1d0a08","added_by":"auto","created_at":"2025-07-02 15:16:01","extension":"jpg","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":3101796,"visible":true,"origin":"","legend":"\u003cp\u003eOverview of the \u003cem\u003eEucalyptus urograndis\u003c/em\u003e genome. The genome features at 1 Mb intervals across the 11 chromosomes. Units on the circumference represent megabase values. (a) The outer layer of the colored blocks represents the chromosomes, and Chr1~11 represent the number of chromosomes. (b) Density of genes (3–125); (c) repeat coverage (14.00–81.45%); (d) guanine-cytosine (GC) content (37.57–44.82%); (e) density of DNA transposons (0.22–40.59%); (f) density of tandem repeat sequences (0.30–38.97%); (g) density of single nucleotide polymorphisms (SNPs) (7.7\u0026nbsp;×\u0026nbsp;10\u003csup\u003e–6–\u003c/sup\u003e9.24\u0026nbsp;×\u0026nbsp;10\u003csup\u003e–3\u003c/sup\u003e); and (h) relationship between syntenic blocks, as indicated by lines.\u003c/p\u003e","description":"","filename":"Figure1.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6912338/v1/86867483982f32051ed8f75b.jpg"},{"id":85875355,"identity":"073ed6d5-6a42-45f2-ab39-d0668f592605","added_by":"auto","created_at":"2025-07-02 15:08:01","extension":"jpg","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":1672222,"visible":true,"origin":"","legend":"\u003cp\u003eGene families and evolution of the \u003cem\u003eE. urograndis \u003c/em\u003egenome. (A) Comparison of homologous gene numbers. The x-axis indicates the species, and the y-axis indicates the number of genes. The dark blue color represents single-copy orthologs; the light blue color represents multicopy orthologs; the pink color represents unique paralogs; the light yellow color represents other orthologs; and the green color represents unclustered genes. (B) Venn diagram of shared and unique homologous gene families among the four eucalyptus species (\u003cem\u003eE. urograndis\u003c/em\u003e,\u003cem\u003e E. grandis\u003c/em\u003e,\u003cem\u003e E. melliodora\u003c/em\u003e, and \u003cem\u003eE. pauciflora\u003c/em\u003e). (C) Phylogenetic tree of 13 plant species. The blue numbers denote the divergence time of each node (Mya, million years ago); the green and red numbers on the branch represent the total number of expanded and contracted gene families, respectively.\u003c/p\u003e","description":"","filename":"Figure2.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6912338/v1/3e8f1ece248613b5679feb82.jpg"},{"id":85875357,"identity":"bfb259b3-b064-4e60-b776-dc81e6429ea7","added_by":"auto","created_at":"2025-07-02 15:08:01","extension":"jpg","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":1430280,"visible":true,"origin":"","legend":"\u003cp\u003eComparative genomic analyses. (A) Collinearity patterns between the genomic regions of \u003cem\u003eE. urograndis\u003c/em\u003e and \u003cem\u003eE. grandis\u003c/em\u003e. (B) Bar graph illustration of collinearity blocks on chromosomes between genomes. The same color indicates the corresponding chromosomes between the two genomes. (C) Distribution of synonymous substitution levels (\u003cem\u003eKs\u003c/em\u003e) of syntenic orthologous (speciation) and paralogous genes (WGD) between \u003cem\u003eE. urograndis\u003c/em\u003e and \u003cem\u003eE. grandis\u003c/em\u003e or \u003cem\u003eV. vinifera\u003c/em\u003e.\u003c/p\u003e","description":"","filename":"Figure3.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6912338/v1/d045877ecb174be60b0facff.jpg"},{"id":85876549,"identity":"d8518bab-5f4a-496f-8e68-965ce29ef3bf","added_by":"auto","created_at":"2025-07-02 15:16:01","extension":"jpg","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":3414246,"visible":true,"origin":"","legend":"\u003cp\u003eGenes involved in the cellulose and xylan biosynthesis in the immature xylem tissue of the \u003cem\u003eE. urograndis\u003c/em\u003e genome. Relative expression profile (green-deeppink scale) of secondary cell-wall-related genes implicated in cellulose and xylan biosynthesis. The numbers in parentheses represent the number of homologous genes in the gene family/the number of DEGs. The bold red or purple fonts symbolizes the DEGs of significantly expanded genes or PSGs. Detailed protein names, annotation and RNA-seq expression data are provided in Supplementary Table S12. EU, \u003cem\u003eE. urophylla\u003c/em\u003e; EUG, \u003cem\u003eE. urograndis\u003c/em\u003e; EG, \u003cem\u003eE. grandis\u003c/em\u003e. CESA: cellulose synthase A; DUF: domain of unknown function; GATL: galacturonosyltransferase-like; GUX: xylan alpha-glucuronosyltransferase; GXM: glucuronoxylan 4-O-methyltransferase; HEX: hexokinase; INV: beta-fructofuranosidase; IRX: irregular xylem; PGM: phosphoglucomutase; RWA: reduced wall acetylation; SUSY: sucrose synthase; UGDH: UDPglucose 6-dehydrogenase; UGP: UTP-glucose-1-phosphate uridylyltransferase; UXS: UDP-glucuronate decarboxylase; XYL: xylan 1,4-beta-xylosidase; XYS: 1,4-beta-D-xylan synthase.\u003c/p\u003e","description":"","filename":"Figure4.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6912338/v1/2ad634f125d4977b30aad7a8.jpg"},{"id":85877672,"identity":"a647b7a5-1a5f-45eb-8d63-706e9d2b2af3","added_by":"auto","created_at":"2025-07-02 15:24:01","extension":"jpg","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":2229908,"visible":true,"origin":"","legend":"\u003cp\u003eGenes involved in the lignin biosynthesis in the immature xylem tissue of the \u003cem\u003eE. urograndis\u003c/em\u003e genome. Relative expression profile (green-deeppink scale) of secondary cell-wall-related genes implicated in cellulose and xylan biosynthesis. The numbers in parentheses represent the number of homologous genes in the gene family/the number of DEGs. The bold red or purple fonts symbolizes the DEGs of significantly expanded genes or PSGs. Detailed protein names, annotation and RNA-seq expression data are provided in Supplementary Table S12. EU, \u003cem\u003eE. urophylla\u003c/em\u003e; EUG, \u003cem\u003eE. urograndis\u003c/em\u003e; EG, \u003cem\u003eE. grandis\u003c/em\u003e. 4CL: 4-coumarate-CoA ligase; C3'H: 5-O-(4-coumaroyl)-D-quinate 3'-monooxygenase; C4H: trans-cinnamate 4-monooxygenase; CAD: cinnamyl-alcohol dehydrogenase; CCoAOMT: caffeoyl-CoA O-methyltransferase; CCR: cinnamoyl-CoA reductase; COMT: caffeic acid 3-O-methyltransferase; F5H: ferulate-5-hydroxylase; HCT: shikimate O-hydroxycinnamoyltransferase; PAL: phenylalanine ammonia-lyase.\u003c/p\u003e","description":"","filename":"Figure5.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6912338/v1/15383e0994745fd1560aef5d.jpg"},{"id":95040012,"identity":"540c7139-9ba5-4317-8a0c-7d26420682b1","added_by":"auto","created_at":"2025-11-03 16:07:23","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":13081768,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6912338/v1/7ebcb142-fefd-4993-8405-7cfc1ed01faf.pdf"},{"id":85878065,"identity":"26b0f931-fe03-48d0-876a-79e4f43df33c","added_by":"auto","created_at":"2025-07-02 15:32:01","extension":"docx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":1565405,"visible":true,"origin":"","legend":"\u003cp\u003eAdditional file 1: Fig S1-S5. Fig. S1. Assembly results of the \u003cem\u003eE. urograndis\u003c/em\u003e genome. (A) Genomescope analysis of heterozygosity; (B) Scatter diagram of GC content and sequencing depth. (C) Hi-C interaction heatmap of \u003cem\u003eE. urograndis\u003c/em\u003e. The darker the red color, the stronger the interaction intensity. The interaction intensity within the same chromosome is more pronounced than between staining, and the chromosome boundary is more obvious, resulting in an ideal mounting effect. Fig. S2. Annotation quality comparison of protein-coding genes. The messenger RNA (mRNA) length, CDS length, exon length, and intron length among 5 species: \u003cem\u003eE. urograndis\u003c/em\u003e, \u003cem\u003eE. grandis\u003c/em\u003e, \u003cem\u003eP. granatum\u003c/em\u003e, \u003cem\u003eP. trichocarpa\u003c/em\u003e, \u003cem\u003eH. brasiliensis\u003c/em\u003e. Fig. S3. The KEGG enrichment (A) and GO annotation (B) results of the \u003cem\u003eE. urograndis\u003c/em\u003e genome. Fig. S4.\u003cstrong\u003e \u003c/strong\u003eThe 20 significantly enriched pathways with \u003cem\u003ep\u003c/em\u003e-values of annotated and enriched genes in the \u003cem\u003eE. urograndis\u003c/em\u003e genome. Fig. S5. The results of three major main wood chemical compositions components of in the immature xylem of \u003cem\u003eE. urophylla\u003c/em\u003e,\u003cem\u003eE. urograndis \u003c/em\u003eand \u003cem\u003eE. grandis\u003c/em\u003e. The same letters in the figure indicate no significant difference, and the different letters indicate significant differences. The date was subjected to the method of multiple comparison of LSD. Error bars indicate standard error derived from three replicates.\u003c/p\u003e","description":"","filename":"Additionalfile1.docx","url":"https://assets-eu.researchsquare.com/files/rs-6912338/v1/562ede5ce1ede197346c4016.docx"},{"id":85875360,"identity":"ca0c9e1f-b4db-4d33-aea7-97c267f8e6e7","added_by":"auto","created_at":"2025-07-02 15:08:01","extension":"xlsx","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":270775,"visible":true,"origin":"","legend":"\u003cp\u003eAdditional file 2: Tables S1-S12. Table S1. Assembly statistics of \u003cem\u003eE. urograndis \u003c/em\u003egenome. Table S2. Results of NT comparison. Table S3. Results of BUSCO assessment of genome assembly. Table S4. The lengths of 11 chromosomes in the \u003cem\u003eE. urograndis \u003c/em\u003egenome. Table S5. Statistics results of repetitive sequences. Table S6. Statistical results of non-coding RNA. Table S7. Statistical results of gene structure prediction. Table S8. GO enrichement analysis of all the\u003cem\u003e E. uorgrandis\u003c/em\u003egenes. Table S9. Clustering results of gene families in the \u003cem\u003eE. urograndis\u003c/em\u003e genome. Table S10. KEGG pathway analysis of the 131 expansion and contraction of gene families. Table S11. Positive selection genes in the \u003cem\u003eE. urograndis\u003c/em\u003e genome. Table S12. Key enzymes and paralogous genes identified in cellulose, xylan and lignin biosynthesis pathways in the \u003cem\u003eE. urograndis\u003c/em\u003e genome.\u003c/p\u003e","description":"","filename":"Additionalfile2.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-6912338/v1/2e7e9779532e8b49a51a147f.xlsx"}],"financialInterests":"No competing interests reported.","formattedTitle":"From Genome to Gene Expression: The Genomic Landscape of a Hybrid Species of Eucalyptus urophylla × Eucalyptus grandis and Its Divergence from Parental Species Hybrid","fulltext":[{"header":"Backgroud","content":"\u003cp\u003eEucalyptus is one of the most important fast-growing timber species for the fiber, energy, and paper industries worldwide, and improvements in its genetic constitution largely determine the competitiveness of the corresponding industries [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e]. Moreover, eucalyptus is a widely known effective reforestation tree species owing to its fast growth and high adaptability to various environments [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e]. As the most important means of genetic improvement, hybrid breeding has been widely used to achieve great improvements in eucalyptus. Currently, \u003cem\u003eEucalyptus\u003c/em\u003e spp. and their hybrids are among the world\u0026rsquo;s leading sources of woody biomass and are the main hardwoods used for pulpwood and timber [\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e]. In particular, almost all the main afforestation varieties in major eucalyptus plantations worldwide are hybrids [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e, \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e]. In general, hybrids are superior to parents in terms of growth rate, yield, disease resistance, and viability [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e, \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e]. Among the various \u003cem\u003eEucalyptus\u003c/em\u003e species, the hybrid species \u003cem\u003eEucalyptus urophylla\u003c/em\u003e \u0026times; \u003cem\u003eEucalyptus grandis\u003c/em\u003e (\u003cem\u003eE. urograndis\u003c/em\u003e) is the most popular and has a fast growth rate [\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e]. Additionally, it shows obvious heterosis, rapid growth, high yield, a straight shape, a neat forest phase, wide adaptability, 5.5 years of stock up to 213.8 m\u003csup\u003e3\u003c/sup\u003e/hm\u003csup\u003e2\u003c/sup\u003e, and annual growth up to 40.95 m\u003csup\u003e3\u003c/sup\u003e/hm\u003csup\u003e2\u003c/sup\u003e. The texture of \u003cem\u003eE. urograndis\u003c/em\u003e timber is straight, the basic density of the finished material is 500\u0026thinsp;\u0026plusmn;\u0026thinsp;20 kg/m\u003csup\u003e3\u003c/sup\u003e [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e], and the mechanical strength is about 80.82% [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e, \u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e]. Furthermore, \u003cem\u003eE. urograndis\u003c/em\u003e has a wide range of uses, such as for building materials, furniture, agricultural tools, fuelwood, communication poles, and pillars, thereby involving the construction, furniture, agriculture, communication, and manufacturing industries [\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e]. Considering the above, obtaining a high-quality genome of \u003cem\u003eE. urograndis\u003c/em\u003e at the chromosomal level is of great commercial importance and essential for understanding the basis of its superior properties to extend these attributes to other hybrid species.\u003c/p\u003e \u003cp\u003eThe formation of hybrids is highly important for the environmental adaptability of species. Hybridization is an evolutionary phenomenon that has fascinated biologists for centuries [\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e]. Before the advent of whole-genome sequencing, hybridization played a clear role in the evolutionary history of many extant taxa, particularly those of plants [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e]. A high-quality genome assembly would be a valuable genomic resource for elucidating the genetic basis of the fast growth and adaptive evolution of \u003cem\u003eE. urograndis\u003c/em\u003e.\u003c/p\u003e \u003cp\u003eThese historic genetic concepts of heterosis (the dominance model, the overdominance model, and the epistasis concept) were developed at a time when not even DNA had been identified as the carrier of genetic information and, therefore, cannot be directly associated with molecular principles [\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e]. The difference in gene expression constitutes a critical source of phenotypic diversity, and the specificity of diploid hybrids arises from the co-expression of parental alleles at specific loci. The expression of alleles at specific loci may differ between hybrids and their parents, with five primary patterns identified: (1) alleles expressed in both parents but silenced in hybrids (parental co-silencing); (2) alleles expressed exclusively in one parent but not in the hybrid (parent-specific expression); (3) alleles expressed only in hybrids and absent in both parents (hybrid-specific expression); (4) alleles expressed in hybrids and one parent (single-parent expression, SPE); and (5) alleles expressed in both parents and hybrids [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e]. The first four patterns correspond to differences in gene expression quality, specifically presence/absence variation (PAV), whereas the fifth pattern pertains to differences in gene expression quantity. Compared with the parental genotype, the gene expression effect of the F1 hybrid can be categorized into additive and non-additive expression patterns. Additive expression refers to the scenario where the hybrid exhibits an intermediate phenotype between the two parents, whereas non-additive expression encompasses four distinct patterns: overdominance, high-parent dominance, low-parent dominance, and underdominance. In recent years, increasing support for the notion that variation in the transcriptional regulation of hybrids compared with their parental inbred lines is related to increased performance of hybrids [\u003cspan additionalcitationids=\"CR18\" citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eIn this study, we report a chromosomal-level genome assembly of \u003cem\u003eE. urograndis\u003c/em\u003e (DH32-29) with long-read single-molecule real-time (SMRT) PacBio long sequence reads and high-throughput chromosome conformation capture (Hi-C) data. We then used it to analyze hybridization dominance by genome annotation and gene family analyses. Furthermore, the evolutionary trajectories of \u003cem\u003eE. urograndis\u003c/em\u003e and other 12 typical plant species, including \u003cem\u003eE. grandis\u003c/em\u003e [\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e, \u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e], \u003cem\u003eE. melliodora\u003c/em\u003e (GCA_004368105.3), \u003cem\u003eE. pauciflora\u003c/em\u003e (GCA_007663325.1), \u003cem\u003eSolanum tuberosum\u003c/em\u003e [\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e], \u003cem\u003eZea mays\u003c/em\u003e [\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e], \u003cem\u003eOryza sativa\u003c/em\u003e [\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e], \u003cem\u003ePhoenix dactylifera\u003c/em\u003e [\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e], \u003cem\u003ePopulus trichocarpa\u003c/em\u003e [\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e], \u003cem\u003ePhyscomitrella patens\u003c/em\u003e [\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e], \u003cem\u003eArabidopsis thaliana\u003c/em\u003e [\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e], \u003cem\u003eVitis vinifera\u003c/em\u003e [\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e], and \u003cem\u003eTheobroma cacao\u003c/em\u003e [\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e], were explored using comparative genomic analysis. Additionally, the genomes of \u003cem\u003eE. grandis\u003c/em\u003e and \u003cem\u003eE. urograndis\u003c/em\u003e were used to analyze collinearity; the \u003cem\u003eE. grandis\u003c/em\u003e and \u003cem\u003eV. vinifera\u003c/em\u003e genomes were combined to analyze the whole-genome duplication (WGD) of the \u003cem\u003eE. urograndis\u003c/em\u003e genome. Based on the results of comparative genomics, functional annotation, expansion of gene families, and positive selection genes (PSGs) were determined. Compared with the genome of \u003cem\u003eE. urograndis\u003c/em\u003e reported by Shen et al. [\u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e], the improved varieties of \u003cem\u003eE. urograndis\u003c/em\u003e in this study were from the identified paternal (G46) and maternal (U16) varieties. Furthermore, through the application of second-generation high-throughput transcriptome sequencing, the biosynthetic pathways of cellulose, xylan, and lignin have been extensively investigated, as these pathways are closely associated with the heterosis of wood properties in \u003cem\u003eE. urograndis\u003c/em\u003e. Overall, the \u003cem\u003eE. urograndis\u003c/em\u003e genome in this study provides valuable genomic resources for further research and utilization of this excellent hybrid eucalyptus species. Moreover, it could expand our understanding of the hybridization of woody plants, provide a powerful tool to accelerate crossbreeding, and enhance our understanding of comparative biology and biotechnology.\u003c/p\u003e"},{"header":"Results","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003eComplete Genome Survey\u003c/h2\u003e \u003cp\u003eA total of 93.82 Gb of clean stLFR (single-tube long-fragment reads) data were obtained after filtering using the BGISEQ-500 platform (BGI, Shenzhen, China). On the basis of the clean data, the genome size, heterozygosity, and repeat sequence percentage were estimated to be 566.72 Mb, 2.71%, and 59%, respectively, using 17-kmer analysis when the coverage was equal to 135. A repetitive peak and a heterozygous peak represented repeat sequences and heterozygous loci in the genome of \u003cem\u003eE. urograndis\u003c/em\u003e, respectively, according to the GenomeScope profile (Supplementary Fig. \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003eS1\u003c/span\u003eA). Hence, \u003cem\u003eE. urograndis\u003c/em\u003e presented high heterozygosity.\u003c/p\u003e \u003cp\u003eSOAPdenovo was used for the preliminary assembly of high-throughput sequencing data, and the results are shown in Supplementary Table \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003eS1\u003c/span\u003e. The genome size was 903.56 Mb; the contig N50 and scaffold N50 lengths were 331 bp and 1.63 kb, respectively. The GC content was 39.19%. Using the contig sequences (length\u0026thinsp;\u0026ge;\u0026thinsp;500) of the filtered data as the window for assembled genome sequences, the average GC content and average depth of nonduplicate fragments were recorded and visualized, which revealed that the genome was not contaminated by that of other species (Supplementary Fig. \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003eS1\u003c/span\u003eB). Additionally, the NT comparison of the filtered data using BLAST demonstrated that the top six species were eucalyptus (Supplementary Table \u003cspan refid=\"MOESM2\" class=\"InternalRef\"\u003eS2\u003c/span\u003e). Therefore, the \u003cem\u003eE. urograndis\u003c/em\u003e samples used for genome sequencing were relatively pure, and subsequent analysis could be performed.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eChromosomal-level Genome Assembly\u003c/h3\u003e\n\u003cp\u003eSMRT analysis with the PacBio platform generated 8,179,761 reads for 124.40 Gb, and the total length of the \u003cem\u003eE. urograndis\u003c/em\u003e genome was 591.88 Mb, with a contig N50 of 3.73 Mb and a GC content of up to 39.44%. The minimum length of the contig was 1380 bp, and the largest contig size reached 12.70 Mb (Supplementary Table \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003eS1\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eTo assess the integrity of the genome assembly accurately, the GC distribution and depth/coverage were determined, and BUSCO analysis was performed. The distributions of the GC content and sequencing depth were relatively concentrated, with an average GC content of 39.44%, and the scatter plot distribution was similar to the Poisson distribution (Supplementary Fig. \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003eS1\u003c/span\u003eB). The alignment rate of all small fragment reads to the genome was about 96.36%, and the percentage of genome coverage was about 99.80%. BUSCO analysis revealed that the assembly captured 94.9% of the conserved single-copy orthologous genes, including 92.9% of the complete genes and 2.0% of the fragment genes (Supplementary Table S3). These results indicate good consistency and integrity, confirming the good quality of the chromosomal-level assembled genome of \u003cem\u003eE. urograndis\u003c/em\u003e.\u003c/p\u003e \u003cp\u003eUsing Hi-C sequencing, we obtained 652,905,028 raw read pairs, amounting to 130.58 Gb of Hi-C data. After the Hi-C reads were mapped against the assembly of the \u003cem\u003eE. urograndis\u003c/em\u003e genome, 154.51 Mb of valid pair reads, accounting for 70.88% of the mapping pair reads, were used for the Hi-C analysis. The contigs in the draft assembly were then anchored and oriented into a chromosomal-scale assembly using the Hi-C scaffolding approach. Juicer and 3D-DNA were used to construct chromosomal-level scaffolds (Supplementary Fig. \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003eS1\u003c/span\u003eC); the total length of the \u003cem\u003eE. urograndis\u003c/em\u003e genome was 592.09 Mb, with a scaffold N50 of 58.62 Mb. Moreover, 11 chromosomes were generated, with lengths ranging from 39.56\u0026ndash;72.11 Mb, accounting for 99.91% of the assembly (Supplementary Table S4).\u003c/p\u003e \u003cp\u003e \u003cb\u003eAnnotation of the\u003c/b\u003e \u003cb\u003eE. urograndis\u003c/b\u003e \u003cb\u003eGenome\u003c/b\u003e\u003c/p\u003e \u003cp\u003eIn this study, the consensus and non-redundant repetitive sequences were obtained via a combination of known novel and tandem repeats, generating a total of 281.02 Mb of repetitive sequences and accounting for 47.48% of the whole-genome assembly after redundancy was removed (Supplementary Table S5). Long terminal repeats (LTRs) were the most abundant, amounting to 205.45 Mb, which is 34.68% of the genome. The second most common type of LTR was DNA elements (43.55 Mb), accounting for 7.36% of the genome, followed by long interspersed nuclear elements (LINEs) (17.73 Mb, 3.00% of the genome) and unknown repeats (11.79 Mb, 1.99% of the genome). Additionally, 0.32 Mb (0.05%) of the \u003cem\u003eE. urograndis\u003c/em\u003e genome contained short interspersed nuclear elements (SINEs). The spatial distribution of these tandem repeat sequences across the 11 chromosomes was variable, with higher densities located differently on different chromosomes (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eRecent studies have revealed that about 90% of genes in eukaryotic genomes are transcriptional genes, and only 1\u0026ndash;2% of these transcriptional genes encode proteins, most of which are transcribed as ncRNAs [\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e, \u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e]. Although ncRNAs are not translated into proteins, they have important biological functions in plant immunity [\u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e]. In the \u003cem\u003eE. urograndis\u003c/em\u003e genome, annotation of ncRNA genes revealed 538 tRNAs, 144 snRNAs, 297 snoRNAs, 994 miRNAs, and 747 rRNAs with total lengths of 40,196, 20,226, 51,140, 93,638, and 162,195 bp, respectively (Supplementary Table S6). Among them, miRNAs are the most abundant (994 copies) and play vital roles in diverse biological processes, such as plant growth and development and hormone and stress responses [\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e, \u003cspan additionalcitationids=\"CR36\" citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e37\u003c/span\u003e]. However, rRNA has the longest total length among the four types of ncRNAs. In general, eukaryotic rRNAs are classified into four types (5S, 5.8S, 18S, and 28S) [\u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e]. Among the four types of rRNAs, 5S rRNA genes were the most abundant (365 copies, total length of 43,303 bp, amounting to 0.0073% of the genome). However, 18S rRNA genes were the longest (89,142 bp, amounting to 0.0151% of the genome). The number of C/D box snoRNAs, which are associated with methylation, is greater than that of H/ACA box snoRNAs, which are associated with pseudouridylation [\u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eThe combined results from homology-based, \u003cem\u003ede novo\u003c/em\u003e, and RNA-seq methods for protein-coding gene prediction and functional annotation predicted a total of 32,151 protein-coding genes in the \u003cem\u003eE. urograndis\u003c/em\u003e genome. The average lengths of the genes and CDSs were 4776.80 bp and 1322.38 bp, respectively. The average number of exons per gene, average exon length, and average intron length were 4.99, 265.25 bp, and 866.77 bp, respectively (Supplementary Fig. \u003cspan refid=\"MOESM2\" class=\"InternalRef\"\u003eS2\u003c/span\u003e, Supplementary Table S7). As shown in Supplementary Table S7, the gene length, exon length, and intron length were greater in the \u003cem\u003eE. urograndis\u003c/em\u003e genome than in the genomes of the other four species (\u003cem\u003eE. grandis\u003c/em\u003e, \u003cem\u003eP. granatum\u003c/em\u003e, \u003cem\u003eP. trichocarpa\u003c/em\u003e, and \u003cem\u003eH. brasiliensis\u003c/em\u003e). BUSCO analysis revealed that 93.50% of the single-copy orthologous genes retrieved from the assembly were fully annotated, including 88.80% of the complete single-copy genes and 4.70% of the complete duplicated BUSCOs (Supplementary Table S3). These results showed that the genome annotation accurately represented the completeness of the gene set of the \u003cem\u003eE. urograndis\u003c/em\u003e genome.\u003c/p\u003e \u003cp\u003eThe protein sequences predicted by gene structure were compared with those in five protein databases; a total of 31,019 genes (96.48% of all the predicted genes) were functionally annotated using the SwissProt, KEGG, TrEMBL, InterPro, and GO databases (E-values\u0026thinsp;\u0026le;\u0026thinsp;10\u003csup\u003e\u0026ndash;5\u003c/sup\u003e), and 18,099 genes were annotated in all five databases. KEGG pathway-based analysis assigned 25,010 genes to a total of 136 pathways, with the majority of these genes involved in metabolic pathways (ko01100, 4089 genes) and the biosynthesis of secondary metabolites (ko01110, 2633 genes) (Supplementary Fig. S3A). The phenylpropanoid biosynthesis pathway (ko00940), which is closely related to lignin biosynthesis, was enriched with 338 genes. In particular, 690 genes were enriched in the plant-pathogen interaction pathway (ko04626), which was significantly related to disease resistance in \u003cem\u003eE. urograndis\u003c/em\u003e. GO analyses were performed to classify the functions of the \u003cem\u003eE. urograndis\u003c/em\u003e genes\u0026mdash;19,906 genes were classified into 3478 GO terms and 45 subcategories (Supplementary Fig. S3B, Supplementary Table S8). According to the GO annotation, the main GO category of biological process (BP) comprised 1,920 GO terms. The terms metabolic process (GO:0008152, 8,473 genes), cellular process (GO:0009987, 6,385 genes), and single-organism process (GO:0044699, 4,915 genes) were enriched with a high number of genes. The other main GO categories of cellular component (CC) included membrane (GO:0016020, 2,319 genes), cell part (GO:0044464, 2,085 genes), cell (GO:0005623, 2,085), and 422 other GO terms. Additionally, the molecular function (MF) category included binding (GO:0005488, 11,898 genes), catalytic activity (GO:0003824, 9,509 genes), transporter activity (GO:0005215, 1,032 genes), and 1,130 other GO terms.\u003c/p\u003e\n\u003ch3\u003eGene Family Analysis\u003c/h3\u003e\n\u003cp\u003eThe protein sequences of 13 species (\u003cem\u003eP. patens, P. trichocarpa, A. thaliana, T. cacao, E. grandis, E. urograndis, E. melliodora, E. pauciflora, V. vinifera, S. tuberosum, Z. mays, O. sativa\u003c/em\u003e, and \u003cem\u003eP. dactylifera\u003c/em\u003e) were used to identify the gene families using the OrthoMCL method. Consequently, 32,151 \u003cem\u003eE. urograndis\u003c/em\u003e genes were clustered into 28,078 gene families, including 142 unique \u003cem\u003eE. urograndis\u003c/em\u003e families, whereas 4,073 \u003cem\u003eE. urograndis-\u003c/em\u003especific genes were unclustered (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eA, Supplementary Table S9). In addition, the results revealed 1,454 gene families specific to four \u003cem\u003eEucalyptus\u003c/em\u003e species, 428 gene families specific to \u003cem\u003eE. grandis\u003c/em\u003e and \u003cem\u003eE. urograndis\u003c/em\u003e, and 577 gene families that were shared by the 13 species.\u003c/p\u003e \u003cp\u003eWe also identified 13,098 gene families that were shared by four eucalyptus species (\u003cem\u003eE. urograndis, E. grandis, E. melliodora\u003c/em\u003e, and \u003cem\u003eE. pauciflora\u003c/em\u003e), and 236 gene families were unique to \u003cem\u003eE. urograndis\u003c/em\u003e. Moreover, \u003cem\u003eE. urograndis\u003c/em\u003e shared 14,738, 14,997, and 15,249 gene families with \u003cem\u003eE. grandis, E. pauciflora\u003c/em\u003e, and \u003cem\u003eE. melliodora\u003c/em\u003e, respectively (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eB).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e\n\u003ch3\u003ePhylogenetic Analysis and Divergent Time Estimation\u003c/h3\u003e\n\u003cp\u003eTo understand the phylogenetic position of \u003cem\u003eE. urograndis\u003c/em\u003e, a phylogenetic tree was constructed on the basis of 577 single-gene families in the 13 sequenced plant genomes. The phylogenetic analysis revealed that \u003cem\u003eE. urograndis\u003c/em\u003e and \u003cem\u003eE. grandis\u003c/em\u003e were more closely related to \u003cem\u003eE. melliodora\u003c/em\u003e than to \u003cem\u003eE. pauciflora\u003c/em\u003e in the \u003cem\u003eEucalyptus\u003c/em\u003e subgenus, which is consistent with their phylogenetic classification on the basis of morphological characteristics. The Myrtales lineage represented by four \u003cem\u003eEucalyptus\u003c/em\u003e species formed a sister clade to Malvids with the basic rosid lineage Vitales, whereas \u003cem\u003eP. trichocarpa\u003c/em\u003e was grouped with Malvids (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eC). These results were consistent with the genomic analysis of \u003cem\u003eE. grandis\u003c/em\u003e [\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e]. Moreover, on the basis of the gene family clustering results and phylogenetic relationships among species, the divergence times were estimated, considering that the hybrid \u003cem\u003eE. urograndis\u003c/em\u003e occurred naturally. The results indicated that \u003cem\u003eE. urograndis\u003c/em\u003e and \u003cem\u003eE. grandis\u003c/em\u003e diverged about 2.9\u0026nbsp;million years ago (Mya). This finding implied that the hybrid \u003cem\u003eE. urograndis\u003c/em\u003e might have formed 2.9 Mya in the case of non-human intervention.\u003c/p\u003e\n\u003ch3\u003eExpansion and Contraction of Gene Families\u003c/h3\u003e\n\u003cp\u003eGene families expand and contract in plants because plants experienced selection pressure during evolution [\u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e40\u003c/span\u003e]. Hence, these processes play major roles in the phenotypic diversification of plants [\u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e, \u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e]. In this study, the expansion and contraction of the gene families in \u003cem\u003eE. urograndis\u003c/em\u003e were analyzed by comparing the gene families in the other 12 representative species, and the results are shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eC. Phylogenetic analysis of 6,899 gene families revealed 242 expanded gene families encompassing 344 genes and 483 contracted gene families encompassing 618 genes in \u003cem\u003eE. urograndis\u003c/em\u003e. Among the 242 expanded gene families, 131 gene families significantly expanded (\u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.05). The \u003cem\u003eE. urograndis\u003c/em\u003e genome presented the least gene family expansion and the most contractions among the genomes of the other three eucalyptus species.\u003c/p\u003e \u003cp\u003eFurthermore, we used the 131 gene families (828 genes) annotated to the KEGG and GO databases that were significantly expanded. KEGG pathway analysis grouped 131 gene families into 79 pathways (Supplementary Table S10). In particular, 20 genes were enriched in the phenylpropanoid biosynthesis pathway (ko00940), of which 16 genes encoded cinnamyl alcohol dehydrogenase (CAD; EC:1.1.1.195). CAD catalyzes the final step in phenylpropanoid synthesis, leading to the production of lignin monomers, and is closely related to plant growth and development and resistance to pathogen invasion [\u003cspan citationid=\"CR43\" class=\"CitationRef\"\u003e43\u003c/span\u003e]. The remaining four genes encoded cytochrome P450 (C3\u0026rsquo;H; EC:1.14.1496), which plays a key role in plant development and defense. Additionally, 18 significantly expanded genes encoding sucrose synthase (SUSY; EC:2.4.1.13) were enriched in the starch and sucrose metabolism pathway (ko00500). The Susy enzyme plays a central role in source-sink coordination and carbon flow in trees. Gessler et al. [\u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e44\u003c/span\u003e] and Dominguez et al. [\u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e45\u003c/span\u003e] demonstrated that SUSY activity influences C allocation to developing woody tissues and maintains the C balance in the whole tree. Moreover, the plant-pathogen interaction pathway (ko04626) was enriched with seven significantly expanded genes, including three genes encoding PTI1-like tyrosine-protein kinase 3 (PTK3; EC:2.7.11.1) and four genes encoding calmodulin (CALM), which possibly explains the wide suitability and high disease resistance of \u003cem\u003eE. urograndis\u003c/em\u003e.\u003c/p\u003e \u003cp\u003e \u003cb\u003ePositive Selection of Genes in the\u003c/b\u003e \u003cb\u003eE. urograndis\u003c/b\u003e \u003cb\u003eGenome\u003c/b\u003e\u003c/p\u003e \u003cp\u003eThe screening and analysis of PSGs in the genome can aid in understanding the specific evolutionary adaptability of a species [\u003cspan citationid=\"CR46\" class=\"CitationRef\"\u003e46\u003c/span\u003e]. A total of 475 PSGs were identified in the \u003cem\u003eE. urograndis\u003c/em\u003e genome. KEGG pathway annotation revealed high enrichment of PSGs related to metabolic pathways (64 genes, ko01100), biosynthesis of secondary metabolites (35 genes, ko01110), and biosynthesis of amino acids (10 genes, ko01230) (Supplementary Tables S11). The 20 significantly enriched pathways with \u003cem\u003ep\u003c/em\u003e values of annotated and enriched genes are shown in Supplementary Fig. S4. Four PSGs were enriched in the phenylpropanoid biosynthesis pathway (ko00940), and weijuan_GLEAN_10017689, encoding 4-coumarate-CoA ligase (4CL; EC:6.2.1.12), is key to the phenylpropanoid pathway and participates in monolignol biosynthesis through p-coumaroyl-CoA production. Furthermore, weijuan_GLEAN_10016490, encoding caffeic acid 3-O-methyltransferase (COMT; EC:2.1.1.68), catalyzes the multi-step methylation reactions of hydroxylated monomeric lignin precursors and is believed to occupy a pivotal position in the lignin biosynthetic pathway. The gene of weijuan_GLEAN_10029076 encodes cinnamoyl CoA reductase (CCR; EC:1.2.1.44), which is the first enzyme in the monolignol-specific branch of the lignin biosynthetic pathway, where it converts feruloyl-CoA to coniferaldehyde [\u003cspan citationid=\"CR47\" class=\"CitationRef\"\u003e47\u003c/span\u003e]. In addition, 6 PSGs were enriched in the plant-pathogen interaction pathway (ko04626), including 3 PSGs encoding a calcium-binding protein (CML), and weijuan_GLEAN_10010339, weijuan_GLEAN_10032007, and weijuan_GLEAN_10022765 encoding WRKY41, LRR receptor-like serine/threonine-protein kinase, and nitric oxide synthase (NOS; EC:1.14.13.39), respectively. This pathway could be attributed to the strong environmental adaptability and disease resistance of \u003cem\u003eE. urograndis\u003c/em\u003e.\u003c/p\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003eCollinearity and Whole Genome Duplication (WGD) Analysis\u003c/h2\u003e \u003cp\u003eTo investigate the natural evolutionary course of \u003cem\u003eE. urograndis\u003c/em\u003e, we first performed a collinearity analysis between the genomes of \u003cem\u003eE. urograndis\u003c/em\u003e and \u003cem\u003eE. grandis\u003c/em\u003e, and the results are shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eA. The collinearity analysis between \u003cem\u003eE. urograndis\u003c/em\u003e and \u003cem\u003eE. grandis\u003c/em\u003e revealed 39,042 paralogous gene pairs (59.60%) in 636 syntenic regions. Hence, the \u003cem\u003eE. urograndis\u003c/em\u003e genome showed high synteny with the \u003cem\u003eE. grandis\u003c/em\u003e genome.\u003c/p\u003e \u003cp\u003eChromosomes 1, 2, 5, 9, and 11 of the \u003cem\u003eE. urograndis\u003c/em\u003e genome presented high collinearity with chromosomes 2, 3, 4, 8, and 9 of the \u003cem\u003eE. grandis\u003c/em\u003e genome, respectively (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eB). However, the other six chromosomes of \u003cem\u003eE. urograndis\u003c/em\u003e underwent a fusion event with the \u003cem\u003eE. grandis\u003c/em\u003e chromosomes. For example, chromosome 8 of \u003cem\u003eE. urograndis\u003c/em\u003e shares many paralogous gene pairs with chromosomes 11 and 1 of \u003cem\u003eE. grandis\u003c/em\u003e (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eB). Moreover, chromosomal recombination events during the evolution of tree species and the formation of hybrids were confirmed. Additionally, more translocation events were detected in chromosomes 3, 4, 6, 7, 8, and 10 of \u003cem\u003eE. urograndis\u003c/em\u003e than in the \u003cem\u003eE. grandis\u003c/em\u003e chromosomes. Thus, \u003cem\u003eE. urograndis\u003c/em\u003e may have experienced chromosomal fusion, inversion, or other rearrangement events.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eTo estimate the WGD events that occurred during the natural evolutionary course of \u003cem\u003eE. urograndis\u003c/em\u003e, the \u003cem\u003eE. grandis\u003c/em\u003e and \u003cem\u003eV. vinifera\u003c/em\u003e genomes were analyzed to explore the WGD history of the \u003cem\u003eEucalyptus\u003c/em\u003e genus (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eC). By determining the distribution of the synonymous substitution rate (\u003cem\u003eKs\u003c/em\u003e) using syntenic ortholog pairs within each genome, species differentiation was found to occur between the genomes of \u003cem\u003eE. urograndis\u003c/em\u003e and \u003cem\u003eE. grandis\u003c/em\u003e at about 0.03, those of \u003cem\u003eE. grandis\u003c/em\u003e and \u003cem\u003eV. vinifera\u003c/em\u003e at about 1.21, and those of \u003cem\u003eE. urograndis\u003c/em\u003e and \u003cem\u003eV. vinifera\u003c/em\u003e at about 1.23. The distributions of \u003cem\u003eKs\u003c/em\u003e values of paralogous pairs in the genomes of \u003cem\u003eE. urograndis\u003c/em\u003e, \u003cem\u003eE. grandis\u003c/em\u003e, and \u003cem\u003eV. vinifera\u003c/em\u003e presented two clear peaks in all three genomes (recent WGD and ancient WGD). Moreover, the \u003cem\u003eKs\u003c/em\u003e peak values of the three genomes for the ancient WGD events were very close, indicating that they experienced gamma events (γ), the hexaploidization event shared by core eudicots[\u003cspan citationid=\"CR47\" class=\"CitationRef\"\u003e47\u003c/span\u003e, \u003cspan citationid=\"CR48\" class=\"CitationRef\"\u003e48\u003c/span\u003e]. Using the γ time (117\u0026thinsp;\u0026plusmn;\u0026thinsp;1 Mya) [\u003cspan citationid=\"CR48\" class=\"CitationRef\"\u003e48\u003c/span\u003e] and peak \u003cem\u003eKs\u003c/em\u003e values (1.11) of \u003cem\u003eV. vinifera\u003c/em\u003e, the synonymous substitutions per site per year were estimated as 4.75 \u0026times; 10\u003csup\u003e\u0026ndash;9\u003c/sup\u003e for \u003cem\u003eEucalyptus\u003c/em\u003e; thus, the time of speciation was estimated at about 2.68 Mya (million years ago), which is close to the divergence time (2.9 Mya) between \u003cem\u003eE. urograndis\u003c/em\u003e and \u003cem\u003eE. grandis\u003c/em\u003e. Furthermore, the WGD of \u003cem\u003eE. urograndis\u003c/em\u003e was estimated to have occurred about 127.79 Mya. Additionally, the ancient WGD of \u003cem\u003eE. grandis\u003c/em\u003e was estimated to have occurred at 114.87 Mya in this study, which is close to the value (105.9\u0026ndash;113.9 Mya) reported by Myburg et al. [\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e]. However, the densities of the three genomes for recent WGD events were relatively low, indicating a small-scale duplication event.\u003c/p\u003e \u003cp\u003e \u003cb\u003eAllele-specific expression associated with Cellulose, xylan, and Lignin Biosynthesis in\u003c/b\u003e \u003cb\u003eE. urograndis\u003c/b\u003e \u003cb\u003eand its parental species\u003c/b\u003e\u003c/p\u003e \u003cp\u003eShao et al. [\u003cspan citationid=\"CR49\" class=\"CitationRef\"\u003e49\u003c/span\u003e] reported that allele-specific expression (ASE), or an imbalance between the expression levels of two parental alleles in a hybrid, has been suggested as a mechanism of heterosis. Cellulose and lignin are the major components in forest wood and non-food biomass [\u003cspan citationid=\"CR50\" class=\"CitationRef\"\u003e50\u003c/span\u003e], and lignin is the second most abundant polymer in nature. Cellulose and lignin are the key components of plant cell walls and provide remarkable strength to wood. Understanding the biosynthetic mechanism of cellulose and lignin in \u003cem\u003eE. urograndis\u003c/em\u003e, the most popular hybrid species, is important. In this study, the analysis of wood chemical components in \u003cem\u003eE. urograndis\u003c/em\u003e and its parental species revealed that the levels of three chemical components in \u003cem\u003eE. urograndis\u003c/em\u003e were more closely aligned with those of the maternal parent (\u003cem\u003eE. urophylla\u003c/em\u003e) and were significantly different from those of the paternal parent (\u003cem\u003eE. grandis\u003c/em\u003e) (Supplementary Fig. S5). These findings are consistent with the conclusions drawn by Shen et al. [\u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e], who reported greater maternal heritability of chemical components in hybrid wood than paternal heritability during genetic analysis of the F1 generation of \u003cem\u003eE. urograndis\u003c/em\u003e. Based on a comparative genomic analysis of \u003cem\u003eE. urograndis\u003c/em\u003e and 12 other species, we identified putative functional homologs of genes encoding six enzymatic steps in cellulose biosynthesis from sucrose (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e and Supplementary Table S12). The \u003cem\u003eCesA\u003c/em\u003e gene family comprises 29 homologous genes, including 9 \u003cem\u003eCesA\u003c/em\u003e genes (\u003cem\u003eCesA\u003c/em\u003e1, \u003cem\u003eCesA\u003c/em\u003e3 to 10). Among them, 10 and 4 genes homologous to \u003cem\u003eCesA1\u003c/em\u003e and \u003cem\u003eCesA5, respectively\u003c/em\u003e, were present. The \u003cem\u003eCesA1\u003c/em\u003e and \u003cem\u003eCesA5\u003c/em\u003e genes are associated with the formation of primary walls in \u003cem\u003eArabidopsis\u003c/em\u003e [\u003cspan citationid=\"CR51\" class=\"CitationRef\"\u003e51\u003c/span\u003e]. Notably, the first key enzyme in the direct production pathway of UDP-glucose was Susy, and 23 homologous genes were identified in the \u003cem\u003eSusy\u003c/em\u003e gene family in this study, of which 18 genes were significantly expanded compared with the genes in the other 12 typical plant species. In this study, we analyzed DEGs involved in the cellulose biosynthesis pathway of \u003cem\u003eE. urograndis\u003c/em\u003e and its parental species. The results revealed that among the 14 DEGs of the \u003cem\u003eSUSY\u003c/em\u003e gene family, seven DEGs exhibited overdominance, whereas four DEGs demonstrated high-parent dominance. The differential expression patterns of these genes in \u003cem\u003eE. urograndis\u003c/em\u003e and its parents may be associated with the increased cellulose content observed in the three tree species. The indirect pathway for the production of UDP-glucose comprises four key enzymes\u0026mdash;INV (beta-fructofuranosidase; EC:3.2.1.26), HEX (hexokinase; EC:2.7.1.1), PGM (phosphoglucomutase; EC:5.4.2.2), and UGP (UTP-glucose-1-phosphate uridylyltransferase; EC:2.7.7.9), for which 35, seven, four, and five homologous genes, respectively, are found in the \u003cem\u003eE. urograndis\u003c/em\u003e genome. On the basis of the transcriptomic analysis of immature xylem in \u003cem\u003eE. urograndis\u003c/em\u003e and its parental species, a total of 50 DEGs in the cellulose biosynthesis pathway were identified. Among these, 25 DEGs exhibited overdominance, whereas six and eight DEGs displayed high-parent dominance and low-parent dominance, respectively. Additionally, 11 DEGs were underdominant. Notably, among the 19 DEGs of the \u003cem\u003eCesA\u003c/em\u003e gene family, 14 DEGs were overdominance, and seven DEGs were silenced in \u003cem\u003eE. grandis\u003c/em\u003e, suggesting that these seven DEGs conform to the SPE model. These findings may be associated with the higher cellulose content observed in hybrid \u003cem\u003eE. urograndis\u003c/em\u003e than in its parental species.\u003c/p\u003e \u003cp\u003eHemicellulose xylan is a plant cell wall polysaccharide that is widely considered the second most abundant plant biopolymer on earth after cellulose [\u003cspan citationid=\"CR52\" class=\"CitationRef\"\u003e52\u003c/span\u003e]. Heteroxylan synthesis occurs in the Golgi apparatus through the coordinated action of several enzyme classes (as shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e). In this study, one gene of \u003cem\u003eIRX7\u003c/em\u003e (irregular xylem 7, weijuan_GLEAN_10024396) was identified as a PSG. Furthermore, analogous to the cellulose biosynthesis pathway, 44 DEGs involved in xylan biosynthesis exhibited significant differential expression between \u003cem\u003eE. urograndis\u003c/em\u003e and its parents. Among these DEGs, 24 demonstrated overdominance, 14 showed high-parent dominance, and five and one exhibited low-parent dominance and underdominance, respectively. Notably, only one DEG from the \u003cem\u003eGATL\u003c/em\u003e (galacturonosyl transferase-like) gene family (weijuan_GLEAN_10029009) and only one DEG (weijuan_GLEAN_10017065) from the \u003cem\u003eXYL4\u003c/em\u003e (xylan 1,4-beta-xylosidase 4) gene family (weijuan_GLEAN_10017065) presented relatively high expression levels in \u003cem\u003eE. grandis\u003c/em\u003e. Furthermore, three DEGs were silenced in \u003cem\u003eE. grandis\u003c/em\u003e, suggesting that these three DEGs conform to the SPE model. This finding aligns with the observed trend in hemicellulose content changes among \u003cem\u003eE. urograndis\u003c/em\u003e and its parents.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eLignin is synthesized from phenylalanine, which is a complex process involving 12 enzymes and a series of steps\u0026mdash;deamination, hydroxylation, methylation, and reduction (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e and Supplementary Table S12). Lignin monomers (H lignin, G lignin, and S lignin) are produced in the cytoplasm and transported across the cell membrane to be polymerized in the cell wall [\u003cspan citationid=\"CR53\" class=\"CitationRef\"\u003e53\u003c/span\u003e]. CCR and CAD are the last two enzymes in the monolignol synthesis pathway; one gene of the \u003cem\u003eCCR\u003c/em\u003e gene family (30 homologous genes) was identified as a PSG, and 14 genes of the \u003cem\u003eCAD\u003c/em\u003e gene family (36 homologous genes) and four homologous genes in the \u003cem\u003eC3\u0026rsquo;H\u003c/em\u003e gene family were identified as significantly expanded genes. In addition, two genes belonging to the \u003cem\u003e4CL\u003c/em\u003e (20 homologous genes, weijuan_GLEAN_10020758) and \u003cem\u003eCOMT\u003c/em\u003e (28 homologous genes, weijuan_GLEAN_10022151) gene families were identified as PSGs. From the analysis of DEGs in the lignin biosynthesis pathway of immature xylem in \u003cem\u003eE. urograndis\u003c/em\u003e and its parental species, 17 DEGs exhibited underdominance, whereas 16 DEGs showed low-parent dominance. Additionally, nine DEGs demonstrated overdominance, and seven DEGs exhibited high-parent dominance. Notably, the number of upregulated genes in \u003cem\u003eE. grandis\u003c/em\u003e was the greatest, potentially contributing to its significantly higher lignin content than those of \u003cem\u003eE. urophylla\u003c/em\u003e and \u003cem\u003eE. urograndis\u003c/em\u003e. Specifically, one DEG (NewGene_9791) of \u003cem\u003eCAD\u003c/em\u003e and one DEG of \u003cem\u003eCCR\u003c/em\u003e (weijuan_GLEAN_10023993) were silenced in \u003cem\u003eE. urograndis\u003c/em\u003e, suggesting that these two DEGs conform to the parental co-silencing model (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e). These findings may be associated with the lower lignin content in \u003cem\u003eE. urograndis\u003c/em\u003e and the higher lignin content observed in \u003cem\u003eE. grandis\u003c/em\u003e.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e"},{"header":"Discussion","content":"\u003cp\u003eGenomic data help characterize the history of hybridization and the genetic basis of speciation [\u003cspan citationid=\"CR54\" class=\"CitationRef\"\u003e54\u003c/span\u003e]. The importance of eucalypt species and their interspecific hybrids has been demonstrated in forestry programs owing to their wood quality and ability to adapt to diverse environmental conditions [\u003cspan citationid=\"CR55\" class=\"CitationRef\"\u003e55\u003c/span\u003e]. In this study, a chromosomal-scale genome assembly of \u003cem\u003eE. urograndis\u003c/em\u003e was prepared using the BGI-SEQ500 platform, PacBio-SMRT technology, and Hi-C-assisted assembly technology. A high-quality reference genome for \u003cem\u003eE. urograndis\u003c/em\u003e was obtained with a scaffold N50 of up to 58.62 Mb, and 99.91% of the sequences were anchored to 11 pseudochromosomes. Genome surveys revealed that the \u003cem\u003eE. urograndis\u003c/em\u003e genome presented high heterozygosity, implying high genetic variability. According to the results of the genome assembly quality assessment, the \u003cem\u003eE. urograndis\u003c/em\u003e genome presented better quality, with the mapping rate of all small fragment reads to the genome being 96.36% and the percentage of genome coverage reaching 99.80%. Conserved single-copy orthologous genes were assembled using the BUSCO method. The assembled genome can serve as a high-quality reference genome for \u003cem\u003eE. urograndis\u003c/em\u003e to support studies of molecular breeding, genetics, and evolution in hybrids of \u003cem\u003eE. urophylla\u003c/em\u003e and \u003cem\u003eE. grandis\u003c/em\u003e, as well as the \u003cem\u003eEucalyptus\u003c/em\u003e genus.\u003c/p\u003e \u003cp\u003eTraditionally, the genome size of hybrid offspring is determined by the genome size of both the male and female parents [\u003cspan citationid=\"CR56\" class=\"CitationRef\"\u003e56\u003c/span\u003e]. The assembly size of \u003cem\u003eE. urograndis\u003c/em\u003e was 592.09 Mb in this study, which is close to the mid-parent value (588.03 Mb) of \u003cem\u003eE. urophylla\u003c/em\u003e (559.53 Mb, provided by the Guangdong Academy of Forestry) and \u003cem\u003eE. grandis\u003c/em\u003e (616.53 Mb, published on April 2, 2021). This confirmed that \u003cem\u003eE. grandis\u003c/em\u003e and \u003cem\u003eE. urophylla\u003c/em\u003e had good hybridization affinity during the formation of offspring. Compared with the draft assembly of \u003cem\u003eE. grandis\u003c/em\u003e, the genome assembly of \u003cem\u003eE. urograndis\u003c/em\u003e presented better continuity and greater coverage. The \u003cem\u003eE. urograndis\u003c/em\u003e genome presented a contig N50 size of 3.73 Mb and a scaffold N50 size of 58.62 Mb, which was greater than that of \u003cem\u003eE. grandis\u003c/em\u003e (contig N50 size, 0.61 Mb; scaffold N50 size, 58.49 Mb). The percentage of embryophyte BUSCO genes in \u003cem\u003eE. urograndis\u003c/em\u003e was 94.90%, which was slightly greater than that in \u003cem\u003eE. grandis\u003c/em\u003e (92.3%). The number of protein-coding genes in \u003cem\u003eE. urograndis\u003c/em\u003e was 32,151, which is lower than that in \u003cem\u003eE. grandis\u003c/em\u003e v2.0 (36, 349). This might be due to the smaller genome size of \u003cem\u003eE. urograndis\u003c/em\u003e than that of \u003cem\u003eE. grandis\u003c/em\u003e.\u003c/p\u003e \u003cp\u003eTransposable elements, essential elements in plant genomes, can move around the genome by either \u0026ldquo;cut-paste\u0026rdquo; (DNA transposons) or \u0026ldquo;copy-paste\u0026rdquo; mechanisms (RNA transposons) [\u003cspan citationid=\"CR57\" class=\"CitationRef\"\u003e57\u003c/span\u003e]. The percentage of transposable elements in the \u003cem\u003eE. urograndis\u003c/em\u003e genome reached 45.01%, which was slightly greater than that in \u003cem\u003eE. grandis\u003c/em\u003e (44.5%). A substantial portion of eukaryotic genomes consists of highly repeated non-coding DNA sequences [\u003cspan citationid=\"CR58\" class=\"CitationRef\"\u003e58\u003c/span\u003e]. The repetition rate in the \u003cem\u003eE. urograndis\u003c/em\u003e genome was 47.48%, which is close to that reported for the genomes of flowering cherry (\u003cem\u003ePrunus yedoensis\u003c/em\u003e, 47.2%) [\u003cspan citationid=\"CR59\" class=\"CitationRef\"\u003e59\u003c/span\u003e], \u003cem\u003eE. melliodora\u003c/em\u003e (47.97%), and \u003cem\u003eE. pauciflora\u003c/em\u003e (46.66%).\u003c/p\u003e \u003cp\u003eInterspecific hybridization is one of the main mechanisms of plant speciation. The merging of two genomes from different subspecies, species, or even genera is frequently accompanied by WGD [\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e]. In this study, we analyzed collinearity and WGD events between \u003cem\u003eE. urograndis\u003c/em\u003e and \u003cem\u003eE. grandis\u003c/em\u003e. The results of collinearity analysis revealed a high degree of collinearity between the \u003cem\u003eE. urograndis\u003c/em\u003e and \u003cem\u003eE. grandis\u003c/em\u003e genomes, suggesting that the \u003cem\u003eE. urograndis\u003c/em\u003e genome inherited many genes from its male parent (\u003cem\u003eE. grandis\u003c/em\u003e). In this study, \u003cem\u003eE. urograndis\u003c/em\u003e was generated via artificial crossing. However, the WGD event analysis indicated that the hybrid might have been formed by natural crossing. The speciation of \u003cem\u003eE. urograndis\u003c/em\u003e from \u003cem\u003eE. grandis\u003c/em\u003e was estimated to have occurred at the end of the Pliocene epoch (~\u0026thinsp;2.9 Mya). During the Pliocene period (5.3\u0026ndash;2.59 Mya), the biological world was similar to the modern world; the plant kingdom comprised the same species as those found in the modern world, and the first human-like animals appeared at the end of this period. This suggests that the natural hybrid \u003cem\u003eE. urograndis\u003c/em\u003e possibly formed during the end of the Pliocene epoch. Additionally, the number of synonymous substitutions per site per year was estimated to be 4.75 \u0026times; 10\u003csup\u003e\u0026ndash;9\u003c/sup\u003e for \u003cem\u003eEucalyptus\u003c/em\u003e, which is slightly greater than that for Laurales (4.21 \u0026times; 10\u003csup\u003e\u0026ndash;9\u003c/sup\u003e) [\u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e] and less than that for Ranunculales (6.98 \u0026times; 10\u003csup\u003e\u0026ndash;9\u003c/sup\u003e) [\u003cspan citationid=\"CR60\" class=\"CitationRef\"\u003e60\u003c/span\u003e]. The ancient γ WGD event might have been related mainly to environmental adaptation in the Cretaceous [\u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e40\u003c/span\u003e], and the recent small-scale duplication event would have been related mainly to artificial breeding and environmental changes.\u003c/p\u003e \u003cp\u003e \u003cem\u003eE. urograndis\u003c/em\u003e is widely used in the forestry industry because of its great potential for use as timber, short rotation, high basic density, high lignin and cellulose contents, and high mechanical strength. In general, the wood of angiosperm trees contains 42\u0026ndash;55% cellulose and 20\u0026ndash;25% lignin [\u003cspan citationid=\"CR61\" class=\"CitationRef\"\u003e61\u003c/span\u003e]; however, \u003cem\u003eE. urograndis\u003c/em\u003e wood contains 68.41% and 29.94% cellulose and lignin contents, respectively [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e]. Therefore, identification of the cellulose and lignin biosynthetic pathways in \u003cem\u003eE. urograndis\u003c/em\u003e from the perspective of genome evolution is necessary. In this study, the expansion and contraction of genes and PSGs in the \u003cem\u003eE. urograndis\u003c/em\u003e genome, as well as the cellulose and lignin biosynthetic pathways, were identified. Six enzymatic steps involving 126 genes involved in cellulose biosynthesis from sucrose were identified. Genomic analysis revealed that \u003cem\u003eE. urograndis\u003c/em\u003e has 29 \u003cem\u003eCesA\u003c/em\u003e genes; however, the \u003cem\u003eE. grandis\u003c/em\u003e genome has 16 \u003cem\u003eCesA\u003c/em\u003e genes [\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e]. Additionally, the 29 \u003cem\u003eCesA\u003c/em\u003e genes in the \u003cem\u003eE. urograndis\u003c/em\u003e genome comprised 9 variants (\u003cem\u003eCesA\u003c/em\u003e1, \u003cem\u003eCesA\u003c/em\u003e3\u0026ndash;10), which were comparable to the 10 \u003cem\u003eCesA\u003c/em\u003e genes in the \u003cem\u003eA. thaliana\u003c/em\u003e genome [\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e], but \u003cem\u003eCesA2\u003c/em\u003e was absent. Studies have shown that \u003cem\u003eCesA2, 6, 5\u003c/em\u003e, and \u003cem\u003e9\u003c/em\u003e are functionally redundant and involved in the synthesis of primary wall cellulose. Burn et al. [\u003cspan citationid=\"CR62\" class=\"CitationRef\"\u003e62\u003c/span\u003e] reported that the function of \u003cem\u003eCesA2\u003c/em\u003e was less obvious in \u003cem\u003eArabidopsis\u003c/em\u003e, which might be related to the lack of \u003cem\u003eCesA2\u003c/em\u003e in the \u003cem\u003eE. urograndis\u003c/em\u003e genome. Lignin biosynthesis from phenylalanine involves 12 enzymes and 317 genes in \u003cem\u003eE. urograndis\u003c/em\u003e. Among them, four gene families were contracted, and two gene families were under positive selection. The evolutionary relationships of these candidate genes might be related to the high cellulose and lignin contents of \u003cem\u003eE. urograndis\u003c/em\u003e.\u003c/p\u003e \u003cp\u003eThe higher the cellulose content is, the higher the biomass energy yield [\u003cspan citationid=\"CR63\" class=\"CitationRef\"\u003e63\u003c/span\u003e]. However, lignin is an important factor affecting pulp yield and quality when wood is used in the papermaking industry [\u003cspan citationid=\"CR64\" class=\"CitationRef\"\u003e64\u003c/span\u003e]. The modification of lignin-related genes and/or lignin composition in wood will help improve our current knowledge of lignification and facilitate chemical pulping and bleaching processes, thereby lowering their energy demand and environmental impact. Analysis of the cellulose and lignin biosynthetic pathways in this study will provide a theoretical basis for changing the chemical composition and structure of \u003cem\u003eE. urograndis\u003c/em\u003e wood through genetic engineering in the future to produce new varieties that are more conducive to energy conservation and would help reduce pollution.\u003c/p\u003e \u003cp\u003eCompared with other eucalyptus hybrids, \u003cem\u003eE. urograndis\u003c/em\u003e has greater adaptability to the environment. This might be due to the evolution of intricate mechanisms to recognize and defend itself against potential pathogens. According to the evolutionary analysis of the plant-pathogen interaction pathway in \u003cem\u003eE. urograndis\u003c/em\u003e, 6 out of 43 enzymes were identified as significantly expanded genes or PSGs; these six enzymes belong to PTI, which is the primary response against pathogen invasion in plants. Rojas et al. [\u003cspan citationid=\"CR65\" class=\"CitationRef\"\u003e65\u003c/span\u003e] revealed that primary metabolism is involved in regulating plant defense against pathogens. In this study, the analysis of the expanded gene families and PSGs revealed that carbohydrate, amino acid, and lipid metabolism pathways were more enriched in \u003cem\u003eE. urograndis\u003c/em\u003e than in the other 13 plants. These results may help elucidate the reasons for the greater adaptability of \u003cem\u003eE. urograndis\u003c/em\u003e and provide a useful direction for improving genetics and breeding using molecular biological techniques.\u003c/p\u003e \u003cp\u003eAs a key raw pulp material species globally, eucalyptus is characterized by low production costs, high pulping yields, and excellent papermaking properties. The primary chemical components of plant fiber raw materials include cellulose, hemicellulose, and lignin. In this study, gene families related to the biosynthesis of cellulose, xylan, and lignin were systematically analyzed through an integration of gene annotation analysis and mRNA-seq expression profiling. Cellulose constitutes the primary component of plant raw materials and represents the most valuable fraction in pulping and papermaking processes [\u003cspan citationid=\"CR66\" class=\"CitationRef\"\u003e66\u003c/span\u003e]. The cellulose content in fibrous raw materials serves as the key indicator for evaluating the suitability of materials for pulping and papermaking. Lignin, a major constituent of the plant cell wall, functions to reinforce the cell wall and bind fibers together. However, when used as a raw material for pulping, a lower lignin content in wood generally correlates with superior pulping and papermaking performance. Hemicellulose refers to the collective term for non-cellulosic high glycans present in plant cell walls. In general, retaining hemicellulose during pulping is desirable to increase the pulp yield. The results of this study demonstrated that the hybrid (\u003cem\u003eE. urograndis\u003c/em\u003e) presented a higher cellulose content, higher hemicellulose content and lower lignin content than its parental species. Additionally, the analysis of gene expression profiles revealed that the number of DEGs associated with the cellulose and hemicellulose biosynthesis pathways predominantly displayed patterns of overdominance and high-parent dominance. In contrast, within the lignin biosynthesis pathway, the majority of DEGs were characterized by underdominance and low-parent dominance. These findings are largely consistent with the hypothesis of \"direction-shifting ASEGs (genes showing ASE)\" proposed by Shao et al. [\u003cspan citationid=\"CR49\" class=\"CitationRef\"\u003e49\u003c/span\u003e], which suggests that under specific spatiotemporal conditions, hybrids can selectively express advantageous alleles at particular genetic loci, thereby conferring heterosis.\u003c/p\u003e"},{"header":"Conclusions","content":"\u003cp\u003eIn this study, we consturcted a high-quality chromosomal-level genome assembly of \u003cem\u003eE. urograndis\u003c/em\u003e (DH32-29) using the BGI-SEQ500 platform, PacBio-SMRT technology, and Hi-C-assisted assembly technology. The resulting draft genome assembly spans 592.09 Mb, with 99.91% of the data anchored onto 11 pseudochromosomes. This assembly achieved a contig N50 of 3.73 Mb and a scaffold N50 of 58.62 Mb. Gene annotation and evaluation indicated that the \u003cem\u003eE. urograndis\u003c/em\u003e genome contains 32,151 genes, of which 93.5% were fully annotated by BUSCOs. Additionally, 47.48% of the genome consists of repeat sequences, and the functions of 96.48% of the genes could be predicted. Based on evolutionary analysis, \u003cem\u003eE. grandis\u003c/em\u003e and \u003cem\u003eE. urograndis\u003c/em\u003e are estimated to have diverged approximately 2.9 Mya. Furthermore, 131 gene families were significantly expanded, and 475 PSGs were identified in the \u003cem\u003eE. urograndis\u003c/em\u003e genome. Among these, 48 significantly expanded genes and 15 PSGs potentially associated with the fast growth and high disease resistance of \u003cem\u003eE. urograndis\u003c/em\u003e were screened. Moreover, comparative genomic analysis revealed highly conserved synteny between the genomes of \u003cem\u003eE. urograndis\u003c/em\u003e and \u003cem\u003eE. grandis\u003c/em\u003e, while the \u003cem\u003eE. urograndis\u003c/em\u003e genome exhibited chromosomal fusion, inversion, and other rearrangement events. Additionally, RNA-seq technology was employed to analyze allele-specific expression patterns of key enzymes involved in cellulose, xylan, and lignin biosynthesis. Several genes displaying allele-specific expression were identified, which may contribute to heterosis in \u003cem\u003eE. urograndis\u003c/em\u003e. Collectively, this study provides a foundational resource and offers significant insights into the genetic and molecular mechanisms underlying \u003cem\u003eEucalyptus\u003c/em\u003e heterosis, thereby accelerating selective breeding in \u003cem\u003eEucalyptus\u003c/em\u003e species.\u003c/p\u003e"},{"header":"Materials and Methods","content":"\u003cdiv id=\"Sec12\" class=\"Section2\"\u003e \u003ch2\u003ePlant Material\u003c/h2\u003e \u003cp\u003eThe improved varieties of \u003cem\u003eE. urograndis\u003c/em\u003e (NO. Gui S-SC-EUG-001\u0026ndash;2009), specifically the well-known dominant Eucalyptus hybrid clonal line DH32-29, provided by the Guangxi State-owned Dongmen Forest Farm, were utilized for whole-genome sequencing in this study. Young leaves were collected from about 3-year-old eucalyptus plants in autumn for genome sequencing. Young plantlets were used for Hi-C library construction and sequencing. Three fresh tissue samples, namely, the leaves, xylem, and cambium, from Gui S-SC-EUG-001-2009 were collected for RNA-seq, which could help with genome annotation. In addition, for the RNA-seq analysis, the immature xylem at the diameter at breast height (DBH) level of each tree was collected from \u003cem\u003eE. urograndis\u003c/em\u003e and its identified paternal (G46) and maternal (U16) plants. Three biological replicates were obtained for each sample.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec13\" class=\"Section2\"\u003e \u003ch2\u003eSequencing, Assembly, and Assessment\u003c/h2\u003e \u003cp\u003eGenomic DNA was isolated from fresh leaves using a QIAamp DNA purification kit (Qiagen, Germany) according to the manufacturer\u0026rsquo;s instructions. The integrity and quality of the extracted DNA were evaluated using 1% gel electrophoresis. The DNA concentration was assessed using a Pultton DNA/Protein Analyzer (Plextech, USA). DNA samples with a total amount\u0026thinsp;\u0026ge;\u0026thinsp;20 \u0026micro;g, 1.8\u0026thinsp;\u0026lt;\u0026thinsp;OD\u003csub\u003e260/280\u003c/sub\u003e \u0026lt; 2.0, and concentration\u0026thinsp;\u0026gt;\u0026thinsp;12.5 ng/\u0026micro;L were used to construct the sequencing libraries.\u003c/p\u003e \u003cp\u003eFor long-read sequencing, we constructed a SMRTbell library with a fragment size of 20 kb using the SMRTBell template preparation kit 1.0 (PacBio, USA). The PacBio Sequel II-continuous long reads (CLR) sequencing library was sequenced using a PacBio Sequel system (Pacific Biosciences, Menlo Park, CA, USA) with version 3.0 chemistry, and data from one SMRT cell were generated.\u003c/p\u003e \u003cp\u003eTo obtain a chromosome-scale genome assembly, we constructed a Hi-C library for sequencing. The genomic DNA from the leaf samples was fixed with 1% formaldehyde, and the fixation was terminated with 0.2 M glycine. A Hi-C library was prepared following the Hi-C library protocol [\u003cspan citationid=\"CR67\" class=\"CitationRef\"\u003e67\u003c/span\u003e], followed by sequencing using a BGISEQ-500 sequencing platform (BGI, Shenzhen, China).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec14\" class=\"Section2\"\u003e \u003ch2\u003eTissue Collection and RNA Sequencing\u003c/h2\u003e \u003cp\u003eTo facilitate the prediction of protein-coding genes, total RNA was extracted from three different tissues of \u003cem\u003eE. urograndis\u003c/em\u003e, namely, leaves, xylem, and cambium, using the TRIzol reagent (Invitrogen, CA, USA). RNA integrity and quantity were evaluated using an Agilent 2100 Bioanalyzer (Agilent, USA). The three RNA-Seq libraries, which were prepared using the NEBNext Ultra RNA Library Prep Kit (Illumina, USA) following the manufacturer\u0026rsquo;s protocol, were sequenced on a BGISEQ-500 sequencing platform (BGI, Shenzhen, China), which produced 97.65 Gb of raw data. The quality control (QC) of the raw reads was performed using a QC pipeline for RNA-Seq data\u0026mdash;RNA-QC-Chain\u0026mdash;and 93.82 Gb of clean RNA-Seq data were obtained for further analyses.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec15\" class=\"Section2\"\u003e \u003ch2\u003eGenome Assembly and Quality Assessment\u003c/h2\u003e \u003cp\u003eOn the basis of the quality-filtered reads, the genome size, heterozygosity, and repeat sequence information of the \u003cem\u003eE. urograndis\u003c/em\u003e genome were estimated using k-mer analysis. The k-mer count frequencies were computed using Jellyfish (v2.2.10) [\u003cspan citationid=\"CR68\" class=\"CitationRef\"\u003e68\u003c/span\u003e], with k\u0026thinsp;=\u0026thinsp;17 and a maximum k-mer count of 10,000. The k-mer distribution was measured and plotted using GenomeScope [\u003cspan citationid=\"CR69\" class=\"CitationRef\"\u003e69\u003c/span\u003e]. The genome size was calculated using the formula G\u0026thinsp;=\u0026thinsp;N17-mer/D17-mer, where N17-mer is the total number of 17-mers and D17-mer denotes the peak frequency of 17-mers. The short reads were assembled \u003cem\u003ede novo\u003c/em\u003e into contigs and scaffolds using SOAPdenovo2 [\u003cspan citationid=\"CR70\" class=\"CitationRef\"\u003e70\u003c/span\u003e]. Gaps in the initial assembly were filled using TGS-Gapcloser (v1.12) [\u003cspan citationid=\"CR71\" class=\"CitationRef\"\u003e71\u003c/span\u003e] with the parameters \u0026ldquo;avg_ins\u0026thinsp;=\u0026thinsp;364, max_ins\u0026thinsp;=\u0026thinsp;500, and min_ins\u0026thinsp;=\u0026thinsp;260\u0026rdquo;. The draft assembly was then anchored and oriented into a chromosomal-scale assembly using the Hi-C scaffolding approach. First, the raw Hi-C reads were filtered using HiC-Pro (v2.8.0) [\u003cspan citationid=\"CR72\" class=\"CitationRef\"\u003e72\u003c/span\u003e]. Then, 3D-DNA (v170123) [\u003cspan citationid=\"CR73\" class=\"CitationRef\"\u003e73\u003c/span\u003e] with the parameters \u0026ldquo;-m haploid -s 0 -c 24\u0026rdquo; was used to anchor the primary contigs and scaffolds into chromosomes. The inter/intrachromosomal contact maps were built and visualized using Juicebox [\u003cspan citationid=\"CR74\" class=\"CitationRef\"\u003e74\u003c/span\u003e]. To further improve the integrity and accuracy of the genome assembly, we employed TGS-GapCloser, which uses low-depth (\u0026ge;\u0026thinsp;10x) single-molecule sequencing long reads without any error correction to close the gaps in the draft assembly. The long sequences were split into three groups, namely, total reads (with options \u0026ndash;min_idy 0.2, \u0026ndash;min_match 200 \u0026ndash;r_round 1), reads with a length\u0026thinsp;\u0026ge;\u0026thinsp;20 kb (with options \u0026ndash;min_idy 0 \u0026ndash;min_match 0 \u0026ndash;r_round 3), and reads with a length of 2\u0026ndash;20 kb (with options \u0026ndash;min_idy 0 \u0026ndash;min_match 0 \u0026ndash;r_round 3), and each group was used to fill the corresponding aligned gaps.\u003c/p\u003e \u003cp\u003eThe completeness of the genome assembly was assessed using Benchmarking sets of Universal Single-Copy Orthologs (BUSCO v5.4.3) and GC content analyses. The single-copy orthologs of Embryophyta_odb9 (BUSCO, v2.0) [\u003cspan citationid=\"CR75\" class=\"CitationRef\"\u003e75\u003c/span\u003e] were searched against the assembled genome using the BUSCO tool. The GC content and average sequencing depth across the genome were also measured with 10 kb non-overlapping sliding windows, and windows harboring more than 50% N were filtered. No external contamination was detected in the genome.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec16\" class=\"Section2\"\u003e \u003ch2\u003eRepetitive Sequence Annotation\u003c/h2\u003e \u003cp\u003eThe repetitive sequences in the \u003cem\u003eE. urograndis\u003c/em\u003e genome were annotated using both homology searches in known repeat databases and \u003cem\u003ede novo\u003c/em\u003e predictions. Known repeats were identified using RepeatMasker (v3.3.0) [\u003cspan citationid=\"CR76\" class=\"CitationRef\"\u003e76\u003c/span\u003e] and the RepBase TE library (v14.06) [\u003cspan citationid=\"CR77\" class=\"CitationRef\"\u003e77\u003c/span\u003e]. The RepeatProteinMask (v3.2.2) implemented in RepeatMasker was used to detect the TE-relevant proteins. Novel repeats were predicted using RepeatModeler (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://www.repeatmasker.org\u003c/span\u003e\u003cspan address=\"http://www.repeatmasker.org\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e) on the basis of the \u003cem\u003ede novo\u003c/em\u003e repeat library constructed with LTR_Finder [\u003cspan citationid=\"CR78\" class=\"CitationRef\"\u003e78\u003c/span\u003e] and RepeatScout (1.0.6) [\u003cspan citationid=\"CR79\" class=\"CitationRef\"\u003e79\u003c/span\u003e]. In addition, the tandem repeat finder (TRF, v4.09) [\u003cspan citationid=\"CR80\" class=\"CitationRef\"\u003e80\u003c/span\u003e] was used to identify the tandem repeats in the genome with the parameters \u0026ldquo;Match\u0026thinsp;=\u0026thinsp;2, Mismatch\u0026thinsp;=\u0026thinsp;7, Delta\u0026thinsp;=\u0026thinsp;7, PM\u0026thinsp;=\u0026thinsp;80, PI\u0026thinsp;=\u0026thinsp;10, Minscore\u0026thinsp;=\u0026thinsp;50, and MaxPerid\u0026thinsp;=\u0026thinsp;2000\u0026rdquo;.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec17\" class=\"Section2\"\u003e \u003ch2\u003eGene Prediction and Annotation\u003c/h2\u003e \u003cp\u003eBased on the repeat-masked genome, we employed \u003cem\u003ede novo\u003c/em\u003e, homology-based, and transcriptome-assisted predictions to detect protein-coding genes. \u003cem\u003eDe novo\u003c/em\u003e gene prediction was performed using Augustus (v2.7) [\u003cspan citationid=\"CR81\" class=\"CitationRef\"\u003e81\u003c/span\u003e] and Genscan [\u003cspan citationid=\"CR82\" class=\"CitationRef\"\u003e82\u003c/span\u003e] with the default settings. For homology-based prediction, protein sequences of \u003cem\u003eE. grandis, Citrus sinensis, Populus trichocarpa, Prunus persica, Punica granatum, Solanum lycopersicum, Theobroma cacao\u003c/em\u003e, and \u003cem\u003eHevea brasiliensis\u003c/em\u003e were downloaded from the NCBI database and aligned to the \u003cem\u003eE. urograndis\u003c/em\u003e genome using tBLASTn (\u003cem\u003eE\u003c/em\u003e value\u0026thinsp;\u0026le;\u0026thinsp;1e-5). The homologous genome sequences were then aligned against the matching proteins using GeneWise (v2.4.0) [\u003cspan citationid=\"CR83\" class=\"CitationRef\"\u003e83\u003c/span\u003e] for accurate spliced alignments. Transcriptomic data were generated from three RNA-Seq libraries constructed from three different tissue samples, namely, leaves, xylem, and cambium. A total of 36.45 Gb of clean data were aligned to the assembled genome sequences using HISAT2-StringTie (v2.0.10) [\u003cspan citationid=\"CR84\" class=\"CitationRef\"\u003e84\u003c/span\u003e], and the putative transcript structures were detected using StringTie (v2.1.1) [\u003cspan citationid=\"CR85\" class=\"CitationRef\"\u003e85\u003c/span\u003e]. The candidate protein-coding regions within the transcript sequences were then predicted using TransDecoder (v5.5.0) (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/TransDecoder/TransDecoder/\u003c/span\u003e\u003cspan address=\"https://github.com/TransDecoder/TransDecoder/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e). Finally, genes predicted using the above methods were merged into a consensus gene set using Glean [\u003cspan citationid=\"CR86\" class=\"CitationRef\"\u003e86\u003c/span\u003e]. The completeness of the gene set was evaluated using BUSCO v5.4.3 software, and the gene set of embryogenic plants was selected as the reference. For gene function annotation, we aligned all the genes of \u003cem\u003eE. urograndis\u003c/em\u003e to the TrEMBL, SwissProt, Kyoto Encyclopedia of Genes and Genomes (KEGG), Gene Ontology (GO), and InterProScan databases. Additionally, for non-coding RNA prediction, 500 tRNAs were identified using tRNAscan-SE 2.0 [\u003cspan citationid=\"CR87\" class=\"CitationRef\"\u003e87\u003c/span\u003e]. Moreover, 438 snRNAs and 994 miRNAs were annotated by the infernal tool [\u003cspan citationid=\"CR88\" class=\"CitationRef\"\u003e88\u003c/span\u003e] in the Rfam database [\u003cspan citationid=\"CR89\" class=\"CitationRef\"\u003e89\u003c/span\u003e]. Additionally, 747 rRNAs were identified via homology searches against closely related species using the BLASTN tool.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec18\" class=\"Section2\"\u003e \u003ch2\u003eComparative and Evolutionary Genomic Analysis\u003c/h2\u003e \u003cp\u003eTo construct the phylogenetic tree, we identified orthologous gene families by comparing the protein and cDNA sequences of \u003cem\u003eE. urograndis\u003c/em\u003e and 12 sequenced plant genomes\u0026mdash;\u003cem\u003eE. grandis, S. tuberosum, Z. mays, O. sativa, P. dactylifera, Populus trichocarpa, P. patens, A. thaliana, E. melliodora, E. pauciflora, V. vinifera\u003c/em\u003e, and \u003cem\u003eTheobroma cacao\u003c/em\u003e. First, BLASTP software was used to compare the protein sequences of the 13 species, and the \u003cem\u003eE\u003c/em\u003e value threshold was set as 1E-5. OrthoMCL software [\u003cspan citationid=\"CR90\" class=\"CitationRef\"\u003e90\u003c/span\u003e] was used to cluster the gene families of all the species. The gene set data were downloaded from Ensemble or NCBI. The genes obtained from databases with frameshifts or fewer than 50 amino acids were removed, and for protein-coding genes with alternative splicing isoforms, only the longest protein sequence prediction was selected for their representatives. Additionally, the genome assembly of \u003cem\u003eE. urograndis\u003c/em\u003e was compared with the published genomes of other species, including \u003cem\u003eE. grandis, E. melliodora\u003c/em\u003e, and \u003cem\u003eE. pauciflora\u003c/em\u003e. Second, the protein sequences of single-copy gene families were compared using MUSCLE [\u003cspan citationid=\"CR91\" class=\"CitationRef\"\u003e91\u003c/span\u003e], and the protein sequences were transcribed into coding sequences (CDSs) on the basis of the alignment sequence; the phase 1 locus was extracted and concatenated into a supergene. MrBayes 3.2.4 (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://sourceforge.net/projects/mrbayes/files/mrbayes/3.2.4/\u003c/span\u003e\u003cspan address=\"https://sourceforge.net/projects/mrbayes/files/mrbayes/3.2.4/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e) was used to define the orthologous and paralogous relationships among all the organisms. Using the single-copy orthologous genes, a phylogenetic tree was generated on the basis of the Bayes model using PhyML (v3.0) [\u003cspan citationid=\"CR92\" class=\"CitationRef\"\u003e92\u003c/span\u003e] with 500 bootstrap replications. The MCMCTree program implemented in the PAML package [\u003cspan citationid=\"CR93\" class=\"CitationRef\"\u003e93\u003c/span\u003e] was used to predict the divergence times. The divergence times were estimated using the approximate method with fossil calibrations from the TimeTree database (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://www.timetree.org/\u003c/span\u003e\u003cspan address=\"http://www.timetree.org/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e). Third, we identified the orthologous groups among these 13 species using all-to-all BLAST (\u003cem\u003eE\u003c/em\u003e value\u0026thinsp;\u0026le;\u0026thinsp;1e-5, identity\u0026thinsp;\u0026ge;\u0026thinsp;80%) and identified the expanded and contracted gene families using CAF\u0026Eacute; 5 [\u003cspan citationid=\"CR94\" class=\"CitationRef\"\u003e94\u003c/span\u003e]. GO and KEGG enrichment analyses were performed to identify the functional implications of the expanded and contracted genes (Fisher\u0026rsquo;s exact test, adjusted \u003cem\u003ep\u003c/em\u003e value\u0026thinsp;\u0026lt;\u0026thinsp;0.05). CodeML in PAML was used for positive selection analysis, and the \u0026ldquo;branch-site\u0026rdquo; model was selected, with the foreground branches being \u003cem\u003eE. urograndis\u003c/em\u003e and \u003cem\u003eE. grandis\u003c/em\u003e and the background branches being \u003cem\u003eE. melliodora\u003c/em\u003e, \u003cem\u003eE. pauciflora\u003c/em\u003e, and \u003cem\u003eP. trichocarpa\u003c/em\u003e.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec19\" class=\"Section2\"\u003e \u003ch2\u003eCollinearity and Whole-Genome Duplication Analysis\u003c/h2\u003e \u003cp\u003eTo reveal the collinearity relationships between \u003cem\u003eE. urograndis\u003c/em\u003e and \u003cem\u003eE. grandis\u003c/em\u003e, we aligned the chromosomes of \u003cem\u003eE. urograndis\u003c/em\u003e with those of \u003cem\u003eE. grandis\u003c/em\u003e using the LASTZ tool (v1.04.22) [\u003cspan citationid=\"CR95\" class=\"CitationRef\"\u003e95\u003c/span\u003e] with default options. Chromosomal collinearity was constructed using mapped regions with lengths\u0026thinsp;\u0026gt;\u0026thinsp;2 kb and visualized using Circos (v0.69) [\u003cspan citationid=\"CR96\" class=\"CitationRef\"\u003e96\u003c/span\u003e]. Additionally, syntenic blocks within the \u003cem\u003eE. urograndis\u003c/em\u003e and \u003cem\u003eE. grandis\u003c/em\u003e genomes were identified using MCScanX [\u003cspan citationid=\"CR97\" class=\"CitationRef\"\u003e97\u003c/span\u003e] with default parameters based on Diamond v0.9.29.130 software [\u003cspan citationid=\"CR98\" class=\"CitationRef\"\u003e98\u003c/span\u003e], which was used to compare the gene sequences of the two species to determine similar gene pairs with an \u003cem\u003eE\u003c/em\u003e value cutoff of 1e-5 and a C score\u0026thinsp;\u0026ge;\u0026thinsp;0.5, where the value of the C score was filtered using JCVI v0.9.13 software [\u003cspan citationid=\"CR99\" class=\"CitationRef\"\u003e99\u003c/span\u003e]. The \u003cem\u003eKs\u003c/em\u003e (synonymous substitutions per site) values between collinear genes were estimated using the CodeML approach as implemented in the PAML package. The \u003cem\u003eKs\u003c/em\u003e and 4DTv (fourfold synonymous third-codon transversion rate) methods were used to identify WGD events. WGD v1.1.1 software [\u003cspan citationid=\"CR100\" class=\"CitationRef\"\u003e100\u003c/span\u003e] and a custom script (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/JinfengChen/Scripts\u003c/span\u003e\u003cspan address=\"https://github.com/JinfengChen/Scripts\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e) were used to identify WGD events in \u003cem\u003eE. urograndis\u003c/em\u003e, \u003cem\u003eE. grandis\u003c/em\u003e, \u003cem\u003eE. pauciflora\u003c/em\u003e, \u003cem\u003eC. citriodora\u003c/em\u003e, and \u003cem\u003eV. vinifera\u003c/em\u003e.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec20\" class=\"Section2\"\u003e \u003ch2\u003eRNA Sequencing and Differentially Expressed Gene Analysis\u003c/h2\u003e \u003cp\u003eTotal RNA was isolated from the 9 immature xylem samples (three biological replicates from each of the three tree species) using an RNAprep Pure Plant Kit (TIANGEN Biotech, Beijing, China) with DNase I treatment to remove genomic DNA. Then, the qualified RNA was processed for library construction. To ensure the quality of the library, a Qubit 2.0 and Agilent 2100 instruments were used to examine the concentration of cDNA and the insert size. The qualified library was sequenced by the high-throughput sequencing platform in PE150 mode.\u003c/p\u003e \u003cp\u003eFPKM (Fragments Per Kilobase of transcript per Million fragments mapped) was applied to measure the expression level of a gene by StringTie using maximum flow algorithm [\u003cspan citationid=\"CR101\" class=\"CitationRef\"\u003e101\u003c/span\u003e]. Differential expression analysis was performed using DESeq2 (v1.31.16) [\u003cspan citationid=\"CR102\" class=\"CitationRef\"\u003e102\u003c/span\u003e], and DEGs were identified on the basis of the criteria of a fold change (FC)\u0026thinsp;\u0026ge;\u0026thinsp;2 and a false discovery rate (FDR)\u0026thinsp;\u0026lt;\u0026thinsp;0.01. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses were subsequently conducted for the identified DEGs.\u003c/p\u003e \u003c/div\u003e"},{"header":"Abbreviations","content":"\u003cp\u003e4CL: 4-coumarate-CoA ligase; ASE: allele-specific expression; ASEGs: genes showing allele-specific expression; BP: biological process; BUSCOs: benchmarking universal single-copy orthologs; C3\u0026apos;H: 5-O-(4-coumaroyl)-D-quinate 3\u0026apos;-monooxygenase; C4H: trans-cinnamate 4-monooxygenase; CAD: cinnamyl-alcohol dehydrogenase; CALM: calmodulin; CC: cellular component; CCoAOMT: caffeoyl-CoA O-methyltransferase; CCR: cinnamoyl-CoA reductase; CesA: cellulose synthase A; COMT: caffeic acid 3-O-methyltransferase; DEGs: differentially expressed genes; DUF: domain of unknown function; \u003cem\u003eE. urograndis\u003c/em\u003e:\u003cem\u003e\u0026nbsp;Eucalyptus urophylla\u003c/em\u003e \u0026times; \u003cem\u003eEucalyptus grandis\u003c/em\u003e; F5H: ferulate-5-hydroxylase; GATL: galacturonosyltransferase-like; GUX: xylan alpha-glucuronosyltransferase; GXM: glucuronoxylan 4-O-methyltransferase; HCT: shikimate O-hydroxycinnamoyltransferase; HEX: hexokinase; Hi-C: high-throughput chromosome conformation capture; INV: beta-fructofuranosidase; IRX: irregular xylem; LINEs: long interspersed nuclear elements; LTRs: long terminal repeats; MF: molecular function; Mya: million years ago; PAL: phenylalanine ammonia-lyase; PAV: presence/absence variation; PGM: phosphoglucomutase; PSGs: positive selection genes; PTK3: PTI1-like tyrosine-protein kinase 3; RWA: reduced wall acetylation; SINEs: short interspersed nuclear elements; SMRT: long-read single-molecule real-time; SNPs: single nucleotide polymorphisms; SPE: single-parent expression; SUSY: sucrose synthase; UGDH: UDPglucose 6-dehydrogenase; UGP: UTP-glucose-1-phosphate uridylyltransferase; UXS: UDP-glucuronate decarboxylase; WGD: whole-genome duplication; XYL: xylan 1,4-beta-xylosidase; XYS: 1,4-beta-D-xylan synthase.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eAcknowledgement\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe appreciate the comments from Prof. Jihua Ding (Huazhong Agricultural University, College of Horticulture and Forestry Sciences) and the information about the\u003cem\u003e\u0026nbsp;E. urophylla\u003c/em\u003e genome provided by Prof. Weihua Zhang (Guangdong Academy of Forestry). We gratefully acknowledge the valuable comments and four anonymous reviewers that helped to improve our manuscript.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthor Contributions\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eG.L. and J. L. designed and managed the project. W.L., L,Z., Y.L., J.P., J.B., and A.H. participated in material collecting and processing. W.L. and Y.L. performed bioinformatics analyses. G.L. and J.Z. wrote the manuscript. J.P. and A.H. contributed to validation works. J.L., G.L., and W.L. \u0026nbsp;revised the manuscript. All authors read and approved the final manuscript.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis work was financially supported by the Fundamental Research Funds of CAF (CAFYBB2023MB034) and the National Key R\u0026amp;D Program of China (Grant No. 2022YFD2200203 and 2023YFD2201001).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eData availability\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe genomic data and transcriptomic data generated in this study were deposited in the Genome Sequence Archive (GSA) of the China National Center for Bioinformation (CNCB) under accession number of CRA017352 and CRA024878, respectively. All data generated or analyzed during this study are included in this manuscript and supplementary file.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eEthics approval and consent to participate\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNot applicable.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eClinical trial number\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNot applicable.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConsent for publication\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNot applicable.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCompeting interests\u003c/strong\u003e\u003cstrong\u003e\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors declare that they have no competing interests.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eVilasboa J, Da Costa CT, Fett-Neto AG. Rooting of eucalypt cuttings as a problem-solving oriented model in plant biology. Prog Biophys Mol Biol. 2019;146:85\u0026ndash;97.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDu K, Xia Y, Zhan D, Xu T, Lu T, Yang J, et al. Genome-wide identification of the \u003cem\u003eEucalyptus urophylla GATA\u003c/em\u003e gene family and its diverse roles in chlorophyll biosynthesis. Int J Mol Sci. 2022;23(9):5251.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePaiva JA, Prat E, Vautrin S, Santos MD, San-Clemente H, Brommonschenkel S, et al. Advancing Eucalyptus genomics: identification and sequencing of lignin biosynthesis genes from deep-coverage BAC libraries. BMC Genomics. 2011;12:137.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGrattapaglia D, Silva-Junior OB, Kirst M, de Lima BM, Faria DA, Pappas GJ. Jr. High-throughput SNP genotyping in the highly heterozygous genome of \u003cem\u003eEucalyptus\u003c/em\u003e: assay success, polymorphism and transferability across species. BMC Plant Biol. 2011;11:65.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRetief ECL, Stanger TK. Genetic parameters of pure and hybrid populations of \u003cem\u003eEucalyptus grandis\u003c/em\u003e and \u003cem\u003eE. urophylla\u003c/em\u003e and implications for hybrid breeding strategy. South Forests. 2009;71(2):133\u0026ndash;40.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHochholdinger F, Hoecker N. Towards the molecular basis of heterosis. Trends Plant Sci. 2007;12(9):427\u0026ndash;32.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWu X, Liu Y, Zhang Y, Gu R. Advances in research on the mechanism of heterosis in plants. Front Plant Sci. 2021;12:745726.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eShinya T, Iwata E, Nakahama K, Fukuda Y, Hayashi K, Nanto K, et al. Transcriptional profiles of hybrid \u003cem\u003eEucalyptus\u003c/em\u003e genotypes with contrasting lignin content reveal that monolignol biosynthesis-related gnes regulate wood composition. Front Plant Sci. 2016;7:443.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eShi D. Analysis of characteristics of several \u003cem\u003eEucalyptus urophlla\u003c/em\u003e \u0026times; \u003cem\u003eE. grandis\u003c/em\u003e species and their direction of production. Forestry Sci Technol. 2019;44(6):41\u0026ndash;3.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGon\u0026ccedil;alves FG, Oliveira JTS, Lucia RMD, Sart\u0026oacute;rio RC. Estudo de algumas propriedades mec\u0026acirc;nicas da madeira de um h\u0026iacute;brido clonal de \u003cem\u003eEucalyptus urophylla\u003c/em\u003e x \u003cem\u003eEucalyptus grandis\u003c/em\u003e. Rev \u0026Aacute;rvore. 2009;33(3):501\u0026ndash;9.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRocha MEL, Ristau ACP, Cruz MSFV, Oliveira Neto CFd M, Malavasi MM. Growth dynamics of container seedlings of \u003cem\u003eEucalyptus grandis\u003c/em\u003e x \u003cem\u003eEucalyptus urophylla\u003c/em\u003e and \u003cem\u003eHymenaea courbaril\u003c/em\u003e L. Rev Ceres. 2022;69(4):425\u0026ndash;35.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang Y, Chen F, Ma Y, Zhang T, Sun P, Lan M, et al. An ancient whole-genome duplication event and its contribution to flavor compounds in the tea plant (\u003cem\u003eCamellia sinensis\u003c/em\u003e). Hortic Res. 2021;8(1):176.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGlombik M, Bačovsk\u0026yacute; V, Hobza R, Kopeck\u0026yacute; D. Competition of parental genomes in plant hybrids. Front Plant Sci. 2020;11:200.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTaylor SA, Larson EL. Insights from genomes into the evolutionary importance and prevalence of hybridization in nature. Nat Ecol Evol. 2019;3(2):170\u0026ndash;7.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGu Z, Gong J, Zhu Z, Li Z, Feng Q, Wang C, et al. Structure and function of rice hybrid genomes reveal genetic basis and optimal performance of heterosis. Nat Genet. 2023;55(10):1745\u0026ndash;56.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eXu CL, Sun XM, Zhang SG. Mechanism on differential gene expression and heterosis formation. Hereditas (Beijing). 2013;35(6):714\u0026ndash;26.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHochholdinger F, Yu P. Molecular concepts to explain heterosis in crops. Trends Plant Sci. 2025;30(1):95\u0026ndash;104.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eXie J, Wang W, Yang T, Zhang Q, Zhang Z, Zhu X, et al. Large-scale genomic and transcriptomic profiles of rice hybrids reveal a core mechanism underlying heterosis. Genome Biol. 2022;23(1):264.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBotet R, Keurentjes JJB. The role of transcriptional regulation in hybrid vigor. Front Plant Sci. 2020;11:410.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBartholom\u0026eacute; J, Mandrou E, Mabiala A, Jenkins J, Nabihoudine I, Klopp C, et al. High-resolution genetic maps of \u003cem\u003eEucalyptus\u003c/em\u003e improve \u003cem\u003eEucalyptus grandis\u003c/em\u003e genome assembly. New Phytol. 2015;206(4):1283\u0026ndash;96.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMyburg AA, Grattapaglia D, Tuskan GA, Hellsten U, Hayes RD, Grimwood J, et al. The genome of \u003cem\u003eEucalyptus grandis\u003c/em\u003e. Nature. 2014;510:356\u0026ndash;62.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKyriakidou M, Achakkagari SR, G\u0026aacute;lvez L\u0026oacute;pez JH, Zhu X, Tang CY, Tai HH, et al. Structural genome analysis in cultivated potato taxa. Theor Appl Genet. 2020;133(3):951\u0026ndash;66.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNie S, Wang B, Ding H, Lin H, Zhang L, Li Q, et al. Genome assembly of the Chinese maize elite inbred line RP125 and its EMS mutant collection provide new resources for maize genetics research and crop improvement. Plant J. 2021;108(1):40\u0026ndash;54.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang L, Zhao L, Zhang X, Zhang Q, Jia Y, Wang G, et al. Large-scale identification and functional analysis of \u003cem\u003eNLR\u003c/em\u003e genes in blast resistance in the Tetep rice genome sequence. Proc Natl Acad Sci U S A. 2019;116(37):18479\u0026ndash;87.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFang Y, Wu H, Zhang T, Yang M, Yin Y, Pan L, et al. A complete sequence and transcriptomic analyses of date palm (\u003cem\u003ePhoenix dactylifera\u003c/em\u003e L.) mitochondrial genome. PLoS ONE. 2012;7(5):e37164.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTuskan GA, Difazio S, Jansson S, Bohlmann J, Grigoriev I, Hellsten U, et al. The genome of black cottonwood, \u003cem\u003ePopulus trichocarpa\u003c/em\u003e (Torr. \u0026amp; Gray). Science. 2006;313(5793):1596\u0026ndash;604.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLang D, Ullrich KK, Murat F, Fuchs J, Jenkins J, Haas FB, et al. The physcomitrella patens chromosome-scale assembly reveals moss genome structure and evolution. Plant J. 2018;93(3):515\u0026ndash;33.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHou X, Wang D, Cheng Z, Wang Y, Jiao Y. A near-complete assembly of an \u003cem\u003eArabidopsis thaliana\u003c/em\u003e genome. Mol Plant. 2022;15(8):1247\u0026ndash;50.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMagris G, Jurman I, Fornasiero A, Paparelli E, Schwope R, Marroni F, et al. The genomes of 204 \u003cem\u003eVitis vinifera\u003c/em\u003e accessions reveal the origin of European wine grapes. Nat Commun. 2021;12(1):7240.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMotamayor JC, Mockaitis K, Schmutz J, Haiminen N, Livingstone D 3rd, Cornejo O, et al. The genome sequence of the most widely cultivated cacao type and its use to identify candidate genes regulating pod color. Genome Biol. 2013;14(6):r53.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eShen C, Li L, Ouyang L, Su M, Guo K. \u003cem\u003eE. urophylla\u003c/em\u003e \u0026times; \u003cem\u003eE. grandis\u003c/em\u003e high-quality genome and comparative genomics provide insights on evolution and diversification of eucalyptus. BMC Genomics. 2023;24(1):223.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRai MI, Alam M, Lightfoot DA, Gurha P, Afzal AJ. Classification and experimental identification of plant long non-coding RNAs. Genomics. 2019;111(5):997\u0026ndash;1005.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYu Y, Zhang Y, Chen X, Chen Y. Plant noncoding RNAs: hidden players in development and stress responses. Annu Rev Cell Dev Biol. 2019;35:407\u0026ndash;31.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNadarajah KK, Abdul Rahman NSN. The role of non-coding RNA in rice immunity. Agronomy. 2021;12(1):39.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eD'Ario M, Griffiths-Jones S, Kim M. Small RNAs: big impact on plant development. Trends Plant Sci. 2017;22(12):1056\u0026ndash;68.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMartinez G, K\u0026ouml;hler C. Role of small RNAs in epigenetic reprogramming during plant sexual reproduction. Curr Opin Plant Biol. 2017;36:22\u0026ndash;8.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTang J, Chu C. MicroRNAs in crop improvement: fine-tuners for complex traits. Nat Plants. 2017;3:17077.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLafontaine DL. Noncoding RNAs in eukaryotic ribosome biogenesis and function. Nat Struct Mol Biol. 2015;22(1):11\u0026ndash;9.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChanfreau GF, Tamanoi F. The enzymes. USA: Academic; 2012.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang L, Wu S, Chang X, Wang X, Zhao Y, Xia Y, et al. The ancient wave of polyploidization events in flowering plants and their facilitated adaptation to environmental stress. Plant Cell Environ. 2020;43(12):2847\u0026ndash;56.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eShang J, Tian J, Cheng H, Yan Q, Li L, Jamal A, et al. The chromosome-level wintersweet (\u003cem\u003eChimonanthus praecox\u003c/em\u003e) genome provides insights into floral scent biosynthesis and flowering in winter. Genome Biol. 2020;21(1):200.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang L, Li X, Ma B, Gao Q, Du H, Han Y, et al. The tartary buckwheat genome provides insights into rrutin biosynthesis and abiotic stress tolerance. Mol Plant. 2017;10(9):1224\u0026ndash;37.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBarakat A, Bagniewska-Zadworna A, Choi A, Plakkat U, DiLoreto DS, Yellanki P, et al. The cinnamyl alcohol dehydrogenase gene family in \u003cem\u003ePopulus\u003c/em\u003e: phylogeny, organization, and expression. BMC Plant Biol. 2009;9:26.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGessler A. Sucrose synthase - an enzyme with a central role in the source-sink coordination and carbon flow in trees. New Phytol. 2021;229(1):8\u0026ndash;10.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDominguez PG, Donev E, Derba-Maceluch M, B\u0026uuml;nder A, Hedenstr\u0026ouml;m M, Tom\u0026aacute;škov\u0026aacute; I, et al. Sucrose synthase determines carbon allocation in developing wood and alters carbon flow at the whole tree level in aspen. New Phytol. 2021;229(1):186\u0026ndash;98.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBooker TR, Jackson BC, Keightley PD. Detecting positive selection in the genome. BMC Biol. 2017;15(1):98.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang L, Wu S, Chang X, Wang X, Zhao Y, Xia Y, et al. The ancient wave of polyploidization events in flowering plants and their facilitated adaptation to environmental stress. Plant Cell Environ. 2020;43(12):2847\u0026ndash;56.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang Z, Li L, Ouyang L. Efficient genetic transformation method for \u003cem\u003eEucalyptus\u003c/em\u003e genome editing. PLoS ONE. 2021;16(5):e0252011.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eShao L, Xing F, Xu C, Zhang Q, Che J, Wang X, et al. Patterns of genome-wide allele-specific expression in hybrid rice and the implications on the genetic basis of heterosis. Proc Natl Acad Sci U S A. 2019;116(12):5653\u0026ndash;8.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLiu G, Xie Y, Shang X, Wu Z. Expression patterns and gene analysis of the cellulose synthase gene superfamily in \u003cem\u003eEucalyptus grandis\u003c/em\u003e. Forests. 2021;12(9):1254.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCarroll A, Mansoori N, Li S, Lei L, Vernhettes S, Visser RG, et al. Complexes with mixed primary and secondary cellulose synthases are functional in \u003cem\u003eArabidopsis\u003c/em\u003e plants. Plant Physiol. 2012;160(2):726\u0026ndash;37.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCurry TM, Pe\u0026ntilde;a MJ, Urbanowicz BR. An update on xylan structure, biosynthesis, and potential commercial applications. Cell Surf. 2023;9:100101.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMiao YC, Liu CJ. ATP-binding cassette-like transporters are involved in the transport of lignin precursors across plasma and vacuolar membranes. Proc Natl Acad Sci U S A. 2010;107(52):22728\u0026ndash;33.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePayseur BA, Rieseberg LH. A genomic perspective on hybridization and speciation. Mol Ecol. 2016;25(11):2337\u0026ndash;60.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSouza DMSC, Avelar MLM, Fernandes SB, Silva EO, Duarte VP, Molinari LV, et al. Spectral quality and temporary immersion bioreactor for in vitro multiplication of \u003cem\u003eEucalytpus grandis\u003c/em\u003e \u0026times; \u003cem\u003eEucalyptus urophylla\u003c/em\u003e. 3 Biotech. 2020;10(10):457.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBoutte J, Maillet L, Chaussepied T, Letort S, Aury JM, Belser C, et al. Genome size variation and comparative genomics reveal intraspecific diversity in \u003cem\u003eBrassica rapa\u003c/em\u003e. Front Plant Sci. 2020;11:577536.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRamakrishnan M, Satish L, Sharma A, Kurungara Vinod K, Emamverdian A, Zhou M, et al. Transposable elements in plants: recent advancements, tools and prospects. Plant Mol Biol Rep. 2022;40:628\u0026ndash;45.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCooper G, Adams K. The cell: A molecular approach. 2nd ed. Sunderland (MA): Sinauer Associates; 2000.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBaek S, Choi K, Kim GB, Yu HJ, Cho A, Jang H, et al. Draft genome sequence of wild \u003cem\u003ePrunus yedoensis\u003c/em\u003e reveals massive inter-specific hybridization between sympatric flowering cherries. Genome Biol. 2018;19(1):127.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGuo L, Winzer T, Yang X, Li Y, Ning Z, He Z, et al. The opium poppy genome and morphinan production. Science. 2018;362(6412):343\u0026ndash;7.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSuzuki S, Li L, Sun YH, Chiang VL. The cellulose synthase gene superfamily and biochemical functions of xylem-specific cellulose synthase-like genes in \u003cem\u003ePopulus trichocarpa\u003c/em\u003e. Plant Physiol. 2006;142(3):1233\u0026ndash;45.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBurn JE, Hocart CH, Birch RJ, Cork AC, Williamson RE. Functional analysis of the cellulose synthase genes \u003cem\u003eCesA1\u003c/em\u003e, \u003cem\u003eCesA2\u003c/em\u003e, and \u003cem\u003eCesA3\u003c/em\u003e in \u003cem\u003eArabidopsis\u003c/em\u003e. Plant Physiol. 2002;129(2):797\u0026ndash;807.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMadhu P, Nithiyesh Kumar C, Anojkumar L, Matheswaran M. Selection of biomass materials for bio-oil yield: a hybrid multi-criteria decision making approach. Clean Techn Environ Policy. 2018;20:1377\u0026ndash;84.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSantos RB, Hart PW, Jameel H, Chang H. Wood based lignin reactions important to the biorefinery and pulp and paper industries. BioRes. 2013;8(1):1456\u0026ndash;77.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRojas CM, Senthil-Kumar M, Tzin V, Mysore KS. Regulation of primary plant metabolism during plant-pathogen interactions and its contribution to plant defense. Front Plant Sci. 2014;5:17.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLiu G, Xie Y, Shang X, Wu Z. Expression patterns and gene analysis of the cellulose synthase gene superfamily in \u003cem\u003eEucalyptus grandis\u003c/em\u003e. Forests. 2021;12(9):1254.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLafontaine DL, Yang L, Dekker J, Gibcus JH, Hi. -C 3.0: improved protocol for genome-wide chromosome conformation capture. Curr Protoc. 2021;1(7):e198.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMar\u0026ccedil;ais G, Kingsford C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics. 2011;27(6):764\u0026ndash;70.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eVurture GW, Sedlazeck FJ, Nattestad M, Underwood CJ, Fang H, Gurtowski J, et al. GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics. 2017;33(14):2202\u0026ndash;4.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLuo R, Liu B, Xie Y, Li Z, Huang W, Yuan J, et al. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience. 2012;1(1):18.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eXu M, Guo L, Gu S, Wang O, Zhang R, Peters BA, et al. TGS-GapCloser: A fast and accurate gap closer for large genomes with low coverage of error-prone long reads. Gigascience. 2020;9(9):giaa094.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eServant N, Varoquaux N, Lajoie BR, Viara E, Chen CJ, Vert JP, et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 2015;16:259.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang Y, Xiong Y, Xiao Y. 3dDNA: a computational method of building DNA 3d structures. Molecules. 2022;27(18):5936.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDurand NC, Robinson JT, Shamim MS, Machol I, Mesirov JP, Lander ES, et al. Juicebox provides a visualization system for Hi-C contact maps with unlimited zoom. Cell Syst. 2016;3(1):99\u0026ndash;101.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eManni M, Berkeley MR, Seppey M, Sim\u0026atilde;o FA, Zdobnov EM. BUSCO update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Mol Biol Evol. 2021;38(10):4647\u0026ndash;54.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSaha S, Bridges S, Magbanua ZV, Peterson DG. Computational approaches and tools used in identification of dispersed repetitive DNA sequences. Trop Plant Biol. 2008;1:85\u0026ndash;96.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBao W, Kojima KK, Kohany O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mob DNA. 2015;6:11.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eXu Z, Wang H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 2007;35(Web Server issue):W265\u0026ndash;8.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePrice AL, Jones NC, Pevzner PA. De novo identification of repeat families in large genomes. Bioinformatics. 2005;21(Suppl 1):i351\u0026ndash;8.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBenson G. Tandem repeats finder: A program to analyze DNA sequences. Nucleic Acids Res. 1999;27(2):573\u0026ndash;80.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKeller O, Kollmar M, Stanke M, Waack S. A novel hybrid gene prediction method employing protein multiple sequence alignments. Bioinformatics. 2011;27(6):757\u0026ndash;63.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBurge C, Karlin S. Prediction of complete gene structures in human genomic DNA. J Mol Biol. 1997;268(1):78\u0026ndash;94.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBirney E, Clamp M, Durbin R. GeneWise and genomewise. Genome Res. 2004;14(5):988\u0026ndash;95.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKim D, Paggi JM, Park C, Bennett C, Salzberg SL. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat Biotechnol. 2019;37(8):907\u0026ndash;15.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePertea M, Kim D, Pertea GM, Leek JT, Salzberg SL. Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. Nat Protoc. 2016;11(9):1650\u0026ndash;67.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChan KCK, Xu X, Wang X, Gu J, Loy CC. GLEAN: generative latent bank for image super-resolution and beyond. IEEE Trans Pattern Anal Mach Intell. 2023;45(3):3154\u0026ndash;68.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChan PP, Lin BY, Mak AJ, Lowe TM. tRNAscan-SE 2.0: improved detection and functional classification of transfer RNA genes. Nucleic Acids Res. 2021;49(16):9077\u0026ndash;96.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNawrocki EP, Eddy SR. Computational identification of functional RNA homologs in metagenomic data. RNA Biol. 2013;10(7):1170\u0026ndash;9.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKalvari I, Nawrocki EP, Argasinska J, Quinones-Olvera N, Finn RD, Bateman A, et al. Non-coding RNA analysis using the Rfam database. Curr Protoc Bioinf. 2018;62(1):e51.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi L, Stoeckert CJ Jr., Roos DS. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 2003;13(9):2178\u0026ndash;89.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eEdgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32(5):1792\u0026ndash;7.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGuindon S, Dufayard JF, Lefort V, Anisimova M, Hordijk W, Gascuel O. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol. 2010;59(3):307\u0026ndash;21.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYang Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol. 2007;24(8):1586\u0026ndash;91.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMendes FK, Vanderpool D, Fulton B, Hahn MW. CAFE 5 models variation in evolutionary rates among gene families. Bioinformatics. 2021;36(22\u0026ndash;23):5516\u0026ndash;8.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHarris RS. Improved pairwise alignment of genomic DNA [Ph.D. Thesis]. Ann Arbor, MI: The Pennsylvania State University; 2007.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKrzywinski M, Schein J, Birol I, Connors J, Gascoyne R, Horsman D, et al. Circos: an information aesthetic for comparative genomics. Genome Res. 2009;19(9):1639\u0026ndash;45.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang Y, Tang H, Debarry JD, Tan X, Li J, Wang X, et al. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 2012;40(7):e49.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBuchfink B, Xie C, Huson DH. Fast and sensitive protein alignment using DIAMOND. Nat Methods. 2015;12(1):59\u0026ndash;60.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTang H, Bowers JE, Wang X, Ming R, Alam M, Paterson AH. Synteny and collinearity in plant genomes. Science. 2008;320(5875):486\u0026ndash;8.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZwaenepoel A, Van de Peer Y. wgd-simple command line tools for the analysis of ancient whole-genome duplications. Bioinformatics. 2019;35(12):2153\u0026ndash;5.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eShumate A, Wong B, Pertea G, Pertea M. Improved transcriptome assembly using a hybrid of long and short reads with StringTie. PLoS Comput Biol. 2022;18(6):e1009730.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLove MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15(12):550.\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"bmc-plant-biology","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"pbio","sideBox":"Learn more about [BMC Plant Biology](http://bmcplantbiol.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/pbio/default.aspx","title":"BMC Plant Biology","twitterHandle":"BMC_series","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"Genome, Chromosome-scale assembly, Eucalyptus urophylla × Eucalyptus grandis, Cellulose and lignin biosynthesis, Allele-specific expression","lastPublishedDoi":"10.21203/rs.3.rs-6912338/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-6912338/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003ch2\u003eBackground\u003c/h2\u003e \u003cp\u003e \u003cem\u003eEucalyptus urophylla\u003c/em\u003e \u0026times; \u003cem\u003eEucalyptus grandis\u003c/em\u003e (\u003cem\u003eE. urograndis\u003c/em\u003e) is a globally significant forest tree species renowned for its rapid growth, high yield, and exceptional wood production efficiency. A comparative analysis of its parental genomes, coupled with an in-depth investigation of the expression patterns of wood-related genes, will provide critical genomic resources to enhance research and utilization of this superior hybrid eucalyptus species.\u003c/p\u003e\u003ch2\u003eResults\u003c/h2\u003e \u003cp\u003eIn this study, we present a draft genome assembly consisting of 592.09 Mb of data, with 99.91% anchored to 11 pseudochromosomes. The assembly achieved a contig N50 of up to 3.73 Mb and a scaffold N50 of up to 58.62 Mb. Gene annotation and evaluation revealed that the \u003cem\u003eE. urograndis\u003c/em\u003e genome contains 32,151 genes, of which 93.5% were fully annotated using Benchmarking Universal Single-Copy Orthologs (BUSCOs). Based on evolutionary analysis, \u003cem\u003eE. grandis\u003c/em\u003e and \u003cem\u003eE. urograndis\u003c/em\u003e are estimated to have diverged approximately 2.9\u0026nbsp;million years ago (Mya). Additionally, 131 gene families were found to be significantly expanded, and 475 positively selected genes (PSGs) were identified in the \u003cem\u003eE. urograndis\u003c/em\u003e genome. Furthermore, RNA sequencing (RNA-seq) technology was employed to analyze allele-specific expression patterns of key enzymes involved in cellulose, xylan, and lignin biosynthesis. Several allele-specific expression genes (ASEGs) were identified, potentially associated with heterosis in \u003cem\u003eE. urograndis\u003c/em\u003e.\u003c/p\u003e\u003ch2\u003eConclusions\u003c/h2\u003e \u003cp\u003eThe chromosomal-level genome assembly of \u003cem\u003eE. urograndis\u003c/em\u003e presented in this study serves as a valuable genomic resource for eucalyptus molecular breeding, provides novel insights into its evolution, wood formation improvement, and adaptability, and enhances our understanding of the genetic and molecular mechanisms underlying heterosis in \u003cem\u003eEucalyptus\u003c/em\u003e hybrids.\u003c/p\u003e","manuscriptTitle":"From Genome to Gene Expression: The Genomic Landscape of a Hybrid Species of Eucalyptus urophylla × Eucalyptus grandis and Its Divergence from Parental Species Hybrid","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-07-02 15:07:56","doi":"10.21203/rs.3.rs-6912338/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2025-07-24T05:34:46+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-07-18T17:43:41+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-07-14T12:18:52+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-07-13T07:48:04+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-07-10T23:37:19+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"100256350466977811018281871488838881620","date":"2025-07-08T13:45:31+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"229591559482382710817093741274044969730","date":"2025-07-07T01:46:06+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"233074637749518537509935963356190515757","date":"2025-07-04T00:09:09+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"243147413993612528372491089837207173903","date":"2025-07-02T21:02:49+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2025-06-30T05:46:36+00:00","index":"","fulltext":""},{"type":"editorInvited","content":"","date":"2025-06-27T09:38:16+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2025-06-27T09:26:38+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2025-06-27T09:23:38+00:00","index":"","fulltext":""},{"type":"submitted","content":"BMC Plant Biology","date":"2025-06-17T08:42:56+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"bmc-plant-biology","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"pbio","sideBox":"Learn more about [BMC Plant Biology](http://bmcplantbiol.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/pbio/default.aspx","title":"BMC Plant Biology","twitterHandle":"BMC_series","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"246e3b01-c484-4d5d-ad02-4ad0dc02a49e","owner":[],"postedDate":"July 2nd, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[],"tags":[],"updatedAt":"2025-11-03T16:02:11+00:00","versionOfRecord":{"articleIdentity":"rs-6912338","link":"https://doi.org/10.1186/s12870-025-07371-3","journal":{"identity":"bmc-plant-biology","isVorOnly":false,"title":"BMC Plant Biology"},"publishedOn":"2025-10-27 15:58:08","publishedOnDateReadable":"October 27th, 2025"},"versionCreatedAt":"2025-07-02 15:07:56","video":"","vorDoi":"10.1186/s12870-025-07371-3","vorDoiUrl":"https://doi.org/10.1186/s12870-025-07371-3","workflowStages":[]},"version":"v1","identity":"rs-6912338","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-6912338","identity":"rs-6912338","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00