A Chromosome-Level Genome Assembly Reveals triterpenoid biosynthesis in Clinopodium chinense | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article A Chromosome-Level Genome Assembly Reveals triterpenoid biosynthesis in Clinopodium chinense Guohui Li, Mengda Wang, Muhammad Aamir Manzoor, Haiyu Wang, Junyi Cheng, and 4 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-7299302/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 29 Nov, 2025 Read the published version in BMC Plant Biology → Version 1 posted 11 You are reading this latest preprint version Abstract Background Clinopodium chinense is an important medicinal plant belonging to the Lamiaceae. The desiccated roots of C. chinense exhibit a variety of pharmacological properties and are utilized in traditional Chinese medicine. Results We present the first chromosome-level genome assembly of C. chinense , comprising 20 pseudochromosomes with an aggregate size of 0.61 Gb and 45,466 protein-coding genes. The analysis of genome evolution indicated that two recent bursts of long terminal repeats (LTRs) significantly increased the size of the C. chinense genome. Additionally, numerous large-scale chromosomal rearrangements have been identified between the genomes of C. chinense and Thymus quinquecostatu genomes. Through comparative genomics studies, it was found that a recent whole-genome duplication event unique to Labiatae plants has resulted in a notable expansion of gene families related to the biosynthesis of triterpenoids and flavonoids in C. chinense . Subsequently, we identified several putative key genes responsible for triterpenoid biosynthesis. The results of our study offer new perspectives on the biosynthesis of triterpenoids and flavonoids, potentially advancing future investigations into the genetic and medicinal properties of C. chinense . Conclusions Our research outcomes offer new perspectives on the biosynthesis of triterpenoids and flavonoids, and may aid subsequent studies on the genetic properties and medicinal uses of C. chinense. Comparative genomics Genome assembly C. chinense Triterpenoids Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Background Clinopodium chinense (Benth.) O. Kuntze, also known by its Chinese name “Feng Lun Cai”, is classified within the genus Clinopodium of the Lamiaceae family. This perennial herb is prevalent across the eastern, northeastern, and southwestern regions of China [1]. In China, the aerial parts of C. chinense- commonly known as “duanxueliu”-have long been used in traditional folk medicine, and they are employed to treat various ailments, including hematuria, influenza, and allergic dermatitis [2]. Due to the long growth cycle, polyploid nature of the genome, and ambiguity of the phenotype, the breeding of C. chinense is still in its early stages. There is also a phenomenon of unstable germplasm resources in agricultural applications. The reference genome offers insights into functional genes, evolutionary processes, and breeding practices. However, at present, only a small number of medicinal plants have been sequenced assembled into high-quality genomes [3–5]. The innovation of sequencing technology makes it possible to assemble higher quality medicinal plant genomes [6, 7]. However, due to the polyploidy of many medicinal plants, current genomic research is more difficult compared to conventional crops and horticultural plants. In earlier research, fluorescence in situ hybridization (FISH) analyses on ginseng have indicated that it is an allotetraploid [4, 8]. This type of polyploid is typically formed through the hybridization of two or more distinct diploid species followed by chromosome doubling. While genome doubling events contribute positively to boosting species diversity and environmental adaptability, this conclusion holds particularly in the initial phases of allopolyploid formation. To coordinate and coexist with the genomes of various evolved diploid ancestral species within a single nucleus, polyploid genomes may undergo gene loss, differentiation, or even chromosome reduction, thereby acquiring a more diploid-like condition. Previous studies have indicated that allopolyploids exhibit a subgenome advantage, wherein one subgenome typically has more genes, higher gene expression, and lower differentiation [9–11]. For example, allelic recombination occurring in the subgenome of polyploid cotton ( Gossypium hirsutum ) plays a role in its ecological adaptation and the evolution of fibers [12]. Rapid genomic changes and homologous inhibition leading to diploidization have also been observed in the subgenome of wheat ( Triticum aestivum ) [13]. In addition, the differences between the subgenomes of ginseng ( Panax ginseng ) may lead to the functional divergency and evolution of ginsenoside biosynthetic genes [4]. Therefore, elucidating the differences between the subgenomes of C. chinense will also contribute to functional and evolutionary research. Like other herbs, C. chinense also has complex metabolites that are considered effective compounds, among which triterpenoid saponins are recognized as the most important type. Previous studies have found that C. chinense contain abundant flavonoids, diterpenes, and triterpenes with have anti-inflammatory, antibacterial, and procoagulant activities [14–16]. It is generally believed that the presence of secondary metabolites triterpenoid saponins in C. chinense gives them significant pharmacological effects, including hemostatic, anti-tumor, and hypoglycemic activities [17]. However, it is difficult to directly extract sufficient amounts of triterpenoid saponins directly from C. chinense . Nevertheless, the biosynthesis pathway of triterpenoid saponins has not been well characterized in various medicinal plants. Thus, conducting genome sequencing research on species like C. chinense could greatly enhance our comprehension of these pathways. Triterpenoid saponins constitute a category of specialized metabolites with diverse structures, found in plants [18] as well as marine invertebrates such as sea cucumbers [19] and sponges [20]. Their biosynthetic processes involve the isoprenoid pathway, where isopentenyl pyrophosphate (IPP) serves as the precursor for all isoprenoids [21]. IPP can be produced through either the mevalonate (MVA) pathway or the 2-C-methyl-D-erythritol-4-phosphate (MEP) pathway [10]. The biosynthesis of triterpenoid saponins consists of three primary stages. Initially, isopentenyl pyrophosphate (IPP) is transformed into farnesyl pyrophosphate (FPP) through the action of geranyl-diphosphate synthase (GPPS) and farnesyl-diphosphate synthase (FPPS) [22]. In the second stage, 2,3-oxidosqualene undergoes cyclization catalyzed by 2,3-oxidosqualene cyclases—such as beta-amyrin synthase (β-AS) and lupeol synthase (LS)—resulting in the formation of various compounds like beta-amyrin and lupeol [23]. The modification stage involves a series of modifications such as oxidation, displacement, and glycosylation of terpenoid skeletons, which are carried out by specific cytochrome P450 monooxygenase (CYPs) and UDP-glycosyltransferases (UGTs) to generate triterpenoid saponins [24]. With the continuous development of genome sequencing technology, advancements in long-read sequencing techniques (e.g., HiFi sequencing) and high-throughput/resolution chromosome conformation capture (Hi-C) methods have made it possible to generate chromosome-sized genome sets in various plants. In present work, we assembled a chromosomal-level genome of C. chinense for traditional Chinese medicine, where two haplotypes fully represent it. We conducted a deeper investigation into the differences in triterpenoid saponin synthesis genes across subgenomes and identified gene clusters linked to triterpenoid saponin synthesis. In addition, several high-quality genome sequences of Labiatae plants published in recent years have further enhanced our understanding of the evolution and biology of Labiatae plants. The genomic resources and analyses presented in this article establish a crucial foundation for comprehending the evolutionary dynamics of Labiatae plants, while also advancing genetic research and breeding efforts for C. chinense . Results Genome sequencing, assembly, and annotation In this study, C. chinense is a diploid (2n = 2x = 38; Fig. 1 A, 1 B, 1 C; Table 1 ), with an estimated genome size of 0.61 Gb. K-mer analysis generated a total of 47,441,502,660 k-mers with a length of 19 and a peak depth of 77.96. The genome size of C. chinense is 608.5 Mb, with a heterozygosity of 0.89% and a duplication ratio of 54.64% (Table S1 ; Fig. S1 ). After pruning and quality control, we obtained 22.73 Gb of HiFi reads and 113.3 Gb of Hi-C paired reads (Table S2; Fig. S2). The assembled genome comprised 24 contigs with a contig N50 of 37.15 Mb and a GC content of 38.61%. The size of the assembled genome aligns closely with the projected genome size derived from k-mer analysis and flow cytometry (Table S3). The quality of C. chinense genome assembly in this study (N50 = 37.15 Mb) was significantly higher than that of recently published genome assemblies of other herbal plants, such as Camptotheca acuminata (N50 = 1.47 Mb) [25], Morinda officinalis (N50 = 4.21 Mb) [26], and Senna tora (N50 = 4.03 Mb) [25]. In addition, we further utilized Hi-C assembly information and found that these overlapping groups were further anchored to 20 pseudo-chromosomes with only one gap, covering approximately 99.12% of the assembly sequence. Genes are not evenly distributed across chromosomes; instead, a greater concentration is found at the chromosomal ends (Fig. 1 D, 1 E; Table S4). The length range of pseudo chromosomes is 7.21 to 44.32 Mb, with an average length of 30.16 Mb (Table S4). Furthermore, we used BUSCO software (version 5.2.2, parameter: -evalue 1e−05) [27] to evaluate of the C. chinense genome annotation. The results showed that approximately 96.8% of complete embryonic plant genes were identified in the assembled genome (Table 1 ). The BUSCO evaluation typically exceeds 90%, signifying robust genome annotation and underscoring the superior quality of the assembled genome. Table 1 Statistics of C. chinense genome assembly and annotation. Feature Statistic Assembled genome size (Gb) 0.61 GC content (%) 38.61 Contig number 128 Contig N50 (Mb) 34.33 Scaffold number 124 N50 (Mb) 37.15 N90 (Mb) 20.13 Minimum len (Mb) 0.02 Maximum len (Mb) 44.32 Chromosome numbers 20 Average coding sequence length (bp) 1250.95 Average exons per gene 5.51 Average exon length (bp) 337.69 Repeats in genome (%) 63.85 BUSCO (complete) (%) 96.84 The transcription element (TE) annotation of the reference genome indicated that C. chinense contained 61.45% repetitive sequences (Table S5), which was significantly lower than the repeat sequences found in other medicinal plant genomes. The ginseng genome comprises 83.17% repeat sequences [4], while the genome of Dendrobium huoshanense genome exhibits a repeat sequence value of 74.92% [5]. We further analyzed and found that long terminal repeat (LTR) sequences are the most abundant in repeat sequences, accounting for 42.42% of the C. chinense genome. Among them, LTR/Gypsy and LTR/Copia accounted for 19.23% and 18.86% of the genome, respectively (Table S5). To predict protein-coding genes, we integrated ab initio, homology-based, and transcriptome-based prediction approaches (Fig. S3). Across the C. chinense genome, a total of 45,466 protein-coding genes were identified, with each gene having an average length of 4097.38 bp and containing 5.51 exons on average (Fig. S4; Table S6). In addiotion, we have conducted in-depth research and discovered a group of non-coding RNAs (Table S7). Phylogenomic and evolution of the C. chinense We performed a comparative genomic analysis of C. chinense with 15 other plant genomes to investigate its evolutionary history. We conducted a thorough analysis of the statistical results related to gene family identification. This analysis revealed that a total of 68,482 orthologous gene families were identified across all species, comprising 517,286 genes. The total count of single-copy gene families was 48, while the gene families common to all species amounted to 7,132, comprising 226,662 genes. The distinctive gene family of this research species comprises 3451 members, encompassing 5212 genes (Table S8; Fig. S5). To elucidate the origin of C. chinense , we conducted a phylogenetic tree analysis involving 15 species. Phylogenetic analysis of 48 single-copy gene families indicates that C. chinense is most closely linked to the Thymus quinquecostatus within the Labiatae family, with a divergence time of approximately 14.0 million years ago (Fig. 2 A). Whole genome duplication (WGD), or ancient polyploidization, refers to the process of genomic duplication and subsequent doubling. To investigate the WGD events that occurred during the evolution of the C. chinense , we studied the distribution of Ks among the selected species (Fig. 2 B). The Ks value for C. chinense - T. quinquecostatu (CcTq) was 1.60. Our research indicated that members of the Labiatae family, including C. chinense and S. baicalensis , exhibit notable ancient peaks around Ks = 1.0. Furthermore, when incorporating J. sambac into the study, we observed that this peak emerged before the divergence peak between C. chinense and J. sambac , indicating that C. chinense underwent the same gamma WGD event as J. sambac and S. baicalensis. Furthermore, evolutionary analysis of gene families showed that 6,353 gene families have expanded, accounting for 35.51% of all gene families, while 619 gene families have contracted (accounting for 3.51%). Notably, 21,736 amplified gene families and 697 contracted gene families showed statistical significance (P < 0.05) in C. chinense (Table S9). To improve our understanding of the evolutionary history of C. chinense , we conducted genomic collinearity research involving C. chinense , T. quinquecostatus , and S. baicalensis (Fig. 2 C). Further analysis of different types of duplicated genes revealed that the recent WGD event specifically occurring in Labiatae contributed significantly to the expansion of these gene families. Gene family enrichment analysis To understand their biological functions, we conducted KEGG and GO analyses (Fig. 3 ). GO analysis emphasizes that significantly expanded gene families are enriched in multiple processes, such as UDP-glycosyitranslerase activity, regulation of terpenoid biosynthetic process, structural constituent of cytoskeleton, regulation of coumarin biosynthetic process, and response to high light intensity (Fig. 3 A; Table S10). KEGG analysis revealed that the majority of expanded genes were concentrated in pathways such as monoterpenoid biosynthesis, ubiquitin-mediated proteolysis, the biosynthesis of diverse plant secondary metabolites, ABC transporters, and plant hormone signal transduction (Fig. 3 B; Table S11). However, further research has revealed that the contracted gene families were associated with GO terms linked to polysaccharide binding, oligopeptide transmembrane transporter activity, lignin catabolic process, response to auxin, lipid transport, etc . (Fig. 3 C; Table S12). Furthermore, our KEGG pathway analysis of contraction genes indicated their involvement in photosynthesis, oxidative phosphorylation, protein processing in the endoplasmic reticulum, and glutathione metabolism (Fig. 3 D; Table S13). Consequently, based on the aforementioned gene family enrichment analysis, we put forward that the expansion and contraction of these genes may strengthen the adaptability of C. chinense in complex environments, enhance its stress tolerance and growth regulation capabilities, and ultimately ensure its survival under abiotic stress. Co-analysis of transcriptomics and metabolomics The use of single-omics data analysis to study complex biological processes and biological network regulation has certain limitations. The integration of multi-omics joint analysis is capable of offsetting data-related issues in single-omics data analysis, such as those arising from data loss and noise. Therefore, a combined analysis of the transcriptome and metabolome was conducted to study the biological functions of differentially expressed genes and metabolites in three different tissues of C. chinense : roots, stems, and leaves.. In this study, a combined analysis of transcriptome and metabolome was conducted to investigate the differential substances between different tissues of C. chinense . It is hoped that through this method, more comprehensive and accurate research results can be obtained, providing strong support for the development of related fields. The joint analysis results indicated that there were 1271, 2834, and 2013 differentially expressed chemicals annotated in 136, 141, and 138 KEGG pathways, respectively, among the three comparison groups of RvsS, RvsL, and SvsL in various tissues of C. chinense (Fig. 4 ). Among them, RvsL had the most differentially expressed substances, indicating more significant differences between root and leaf tissues. The analysis of the top 20 pathways with the lowest P-values in the KEGG enrichment results showed that the category with the highest number of annotated pathways was metabolic, indicating that differentially expressed genes and metabolites in the C. chinense had a significant impact on plant metabolism. There were 13, 9, and 13 metabolic pathways in the three control groups, respectively. In the RvsS group, differential substances were most significantly enriched in the ribosome pathway; The differential substances in the RvsL and SvsL groups were most significantly enriched in the biosynthesis of secondary metabolites pathway, indicating that the differential substances between different tissues involved in the regulation of plant secondary metabolic pathways are mainly involved in the C. chinense . Moreover, the differentially expressed genes across the three groups showed enrichment in two terpenoid biosynthesis-related pathways, specifically those involved in terpenoid skeleton biosynthesis, sesquiterpene biosynthesis, and triterpenoid biosynthesis. This suggests that the differentially expressed genes across various tissues of C. chinense exert a regulatory influence on terpenoid production in C. chinense . Considering that triterpenoid components are important active constituents in C. chinense , we will do extensive research on the metabolic pathways and genes related to terpenoid synthesis below. Identification of key genes for triterpenoid saponins biosynthesis in C. chinense Earlier research on the chemical components of C. chinense has shown that its rich triterpenoid saponins, which have been verified in clinical settings, are the primary bioactive substances [28]. Combining transcriptomic and metabolomic analysis of three different tissues undergoing C. chinense , we inferred and proposed the biosynthetic pathways of triterpenoid saponins and their derivatives in C. chinense . Terpenes are mainly synthesized through two pathways: the MVA pathway and the MEP pathway. Multiple key enzymes are involved in this process and play an important regulatory role in terpenoid synthesis (Fig. 5 ). Drawing on the KEGG enrichment outcomes, our analysis centered on differentially expressed genes (DEGs) within two terpenoid biosynthesis-related pathways: Terpenoid backbone biosynthesis and Sesquiterpenoid and triterpenoid biosynthesis. We also conducted in-depth exploration and analysis of enzyme genes involved in terpenoid biosynthesis in three different tissues with C. chinense . In the RvsS group, there are 13 differentially expressed genes (DEGs) annotated to the Terpenoid backbone biosynthetic pathway, while 7 DEGs are annotated to the Sesquiterpenoid and triterpenoid biosynthesis pathway. Among these DEGs, we identified key enzyme genes involved in terpenoid synthesis pathways, including 3 SEQ , 2 GPPS , 2 HMGS , 2 HMGR , 1 MVD , 1 DXS , 1 IDI , and 1 SS gene. In addition, there are a total of 45 DEGs annotated to the Terpenoid backbone biosynthetic pathway and 19 DEGs annotated to the RvsL group Sesquiterpenoid and triterpenoid biosynthesis pathway. Among these DEGs, we identified key enzyme genes involved in terpenoid synthesis pathways, including 7 SEQ , 6 GPPS , 5 β-AS , 4 HMGR , 4 PMK , 3 MVD , 3 DXS , 3 SS , 2 HMGS , 2 AACT , 2 HDR , 2 MVK , 2 IDI , 1 CMK , 1 DXR , 1 MCS , and 1 FPPS gene. A total of 35 differentially expressed genes (DEGs) are annotated to the terpenoid backbone biosynthetic pathway, while 14 DEGs are assigned to the sesquiterpenoid and triterpenoid biosynthesis pathway belonging to the SvsL group. Among these DEGs, we identified key enzyme genes involved in terpenoid synthesis pathways, including 6 SEQ , 5 HMGR , 4 GPPS , 3 MVD , 3 SS , 3 β-AS , 2 MVK , 2 AACT , 2 HMGS , 2 DXR , 2 IDI , 1 FPPS , 1 PMK , 1 CMK , 1 DXS , and 1 HDR gene (Table. 2). Table 2 Statistics on the number of genes for key enzymes of the terpenoid synthesis pathway. Gene name abbreviation groups RvsS RvsL SvsL acetyl-CoA C-acetyltransferase AACT 0 2 2 Hydroxymethylglutaryl-CoA synthase HMGS 2 2 2 Hydroxymethylglutaryl-CoA reductase HMGR 2 4 5 Mevalonate Kinase MVK 0 2 2 Phosphomevalonate Kinase PMK 0 4 1 Mevalonate pyrophosphate decarboxylase MVD 1 3 3 1-deoxy-D-xylulose 5-phosphate synthase DXS 1 3 1 1-deoxy-D-xylulose 5-phosphate reductoisomerase DXR 0 1 2 4-diphosphocytidyl−2-C-methyl-D-erythritol kinase CMK 0 1 1 2-C-methyl-d-erythritol 2,4-cyclodiphosphate synthase MCS 0 1 4-Hydroxy−3-methylbut−2-en−1-yl diphosphate reductase HDR 0 2 1 Isopentenyl diphosphate isomerase IDI 1 2 2 Farnesyl diphosphate synthase FPPS 0 1 1 Squalene synthase SS 1 3 3 Squalene monooxygenase SQE 3 7 6 β-Amyrin synthase β-AS 0 5 3 Geranylgeranyl pyrophosphate synthase GPPS 2 6 4 The biosynthesis of triterpenoid saponin involves over 20 catalytic steps, featuring essential enzymes such as squalene synthase (SS) for scaffold formation and 2,3-oxidosqualene cyclase (OSC), alongside modifying enzymes including cytochrome P450s (CYP450) and glycosyltransferases (UGT) [29]. Furthermore, the functional enrichment analysis revealed that these DEGs were mainly enriched in "cytochrome P450" and "transporters", which may be involved in the synthesis, transport, and storage of active ingredients and stress responses in C. chinense. To elucidate which members of the CYP and UGT gene families may be involved in the synthesis of triterpenoid saponins in C. chinense. We identified 105 CYP and 21 UGT genes in the C. chinense genome. By prior functional predictions, we evaluated 14 CYP members and 10 UGT members potentially implicated in triterpenoid saponin biosynthesis. We additionally performed an evolutionary study on them alongside CYP and UGT members with established roles. Based on the identification of 36 CYP enzymes implicated in triterpenoid biosynthesis, we developed a phylogenetic tree utilizing CcCYP derived from a blood cut source alongside these enzymes. We additionally used CYP101A1 from C. chinense as an outgroup, which is strongly associated with CYPs in plants. CYP101A1 from prokaryotes serves as an outgroup that efficiently anchors the base of the phylogenetic tree, facilitating a more precise examination of the branching direction and evolutionary distance among various CYP subfamilies. The phylogenetic tree indicated that CcCYP, which was related to triterpenoid biosynthesis, primarily clusters within the CYP85 and CYP71 subfamilies (Fig. 6 A). Specifically, CcCYP707A1-7, belonging to the CYP85 family is on the same branch as GuCYP88D6, which has C-11 oxidase function. We speculated that it may be responsible for catalyzing the oxidative modification of the 11th carbon atom of triterpenoid molecules, which is an essential step in the formation of triterpenoid structures. The clustering of CcCYP26A1 with C-16 α oxidase MlCYP87D16 suggests that its function may be related to catalyzing the oxidation reaction of the 16th α configuration carbon atom (such as introducing oxygen-containing groups such as hydroxyl and carbonyl groups). Meanwhile, the C-23 oxidase AtCYP71A16 is adjacent to CcCYP26A1, while CcCYP75B1A1-2 and CcCYP84 are similar to LjCYP71D353 responsible for C-20 and C-28 oxidation. CcCYP93B1-2 is also speculated to function as a C-24 oxidase potentially. Based on the discovery of 36 glycosyltransferases involved in triterpenoid biosynthesis, we have constructed a phylogenetic tree of CcUGT with these enzymes, which are involved in C. chinense . As shown in the Fig. 6 B, except for CcUGT72E1-2, the functions of CcUGT73C1-3 may be similar to those of oleanolic acid 3-O-glucuronosyltransferase. Discussion In traditional Chinese medicine, C. chinense is regarded as a significant medicinal plant, boasting a wealth of acknowledged bioactive components such as flavonoids, triterpenoids, polysaccharides, and amino acids [15, 30]. The analysis of whole genome sequences provides a foundation for studying the gene functions, phylogenetics, and gene evolution of C. chinense and many other medicinal plants. The genome of C. chinense has the characteristics of high heterozygosity (0.89%) and a large number of repetitive sequences (61.45%), making its assembly challenging. Moreover, medicinal plants have a relatively short domestication history, which often leads to disrupted genetic backgrounds and complicates genome mining efforts. To address this, we integrated Illumina and PacBio sequencing with high-throughput Hi-C technology to produce a high-quality C. chinense genome, featuring a scaffold N50 of 37.15 Mb and a contig N50 of 34.33 Mb (Table 1 ). Additionally, we constructed a 0.61 Gb telomere-to-telomere reference genome with a BUSCO completeness of 96.84% (Table 1 ). EGMA analysis further confirmed that the assembled genome fully covered 99.97% of the core eukaryotic genes. These results emphasize that compared to medicinal plant genomes sequenced recently, the integrity and quality of the C. chinense genome are higher. This reference genome aids in clarifying genetic variations within the saponin biosynthesis pathway throughout the evolutionary process of the C. chinense genome, offering guidance for the further domestication of ginseng to enhance its medicinal effectiveness. Through phylogenomic analysis, we put forward a framework for the evolutionary process of C. chinense . Ks analyses showed that C. chinense has not experienced an WGD event after differentiation. The proliferation of different LTR-RT lineages is an important and and essential factor that causes differences in plant genome size. For example, the Tat family is a key driver behind genome amplification in Camellia sinensis [31], while the Ogre family plays a major role in Pisum sativum [32]. Additionally, due to insertions, translocations, and deletions of LTR-RTs occurring during this process, incomplete autonomous LTRs now make up a substantial part of the genome. Despite the significant role LTR-RTs play in shaping genome size diversity, many aspects of them—including their origin, expression patterns, insertion specificity, evolutionary trajectories, and potential impacts on genetic and epigenetic regulation of genes—remain largely uninvestigated [33, 34]. Triterpenoids and flavonoids stand as the two primary bioactive components in C. chinense . Among them, triterpenoids exhibit a range of beneficial properties, including anti-inflammatory, hypotensive, and anti-aging effects, while also providing positive impacts on myocardial ischemia injury and chronic kidney disease [35]. We have tentatively verified the substantial expansion of gene families associated with the production of flavonoids and triterpenoids in AMM via comparative genomic analysis. In this study, based on the combined analysis of transcriptome and metabolomics, we identified 53 and 72 candidate genes in the biosynthesis pathways of triterpenoids and flavonoids, respectively, through homology search and functional annotation in C. chinense . Further analysis revealed that tandem replication may play a key role in gene amplification in the synthesis pathways of flavonoids and triterpenoid saponins, such as CYP450 and UGT genes, which are critical nodes controlling metabolic flow [36]. We hypothesized that an elevation in the copy numbers of these genes could facilitate the accumulation of active substances in C. chinense . Triterpenoids are a group of natural compounds with diverse structures. They are formed when triterpenoid saponins condense with one or more sugars and other chemical groups, with tetracyclic triterpenoids and pentacyclic triterpenoids being the most common types [37]. As a result, across various tissues of C. chinense , differentially expressed genes (DEGs) associated with terpenoid biosynthesis were most abundant in the RvsL group, followed by the SvsL group, and least prevalent in the RvsS group, highlighting notable disparities in gene expression between root and leaf tissues and DEGs. DEGs predominantly localize within the terpenoid backbone biosynthesis pathway, indicating their significant regulatory function in terpenoid synthesis in C. chinense , thereby establishing a basis for the investigation of terpenoid synthesis in this species. Conclusion In summary, we have constructed a chromosome-level genome of C. chinense , establishing a basis for further research on C. chinense and serving as a significant resource for the detailed study of several other medicinal plants. This study identifies potential genes associated with flavonoid and triterpene production in C. chinense , establishing a basis for future genetic enhancement, while genomic evolutionary analysis offers novel insights into the evolutionary history of orchid species. The investigation of the C. chinense genome is anticipated to elucidate its lineage selection and enhance the breeding of therapeutic plants for future generations. Materials and Methods Plant materials The C. chin ense samples were collected in August 2021 from an orchard of Taipingfan Township, Huoshan County, Lu'an, Anhui Province, China, for the purpose of sequencing analysis. High-quality DNA extracted from young roots was used to construct multiple genomic sequencing libraries. For Illumina sequencing, short-insert fragments are utilized, while long fragments are employed in SMRT sequencing. Genome sequencing In this study, genome sequencing was conducted via Nanopore ultra-long read, PacBio HiFi, and Hi-C sequencing technologies [38], with the work carried out by Wuhan Benegen Technology Co., Ltd. Specifically, ultra-long sequencing was performed using the Nanopore PromethION platform, and raw data were filtered to exclude failed reads with an average quality score below 7. Using Filtlong software (v0.2.4) ( https://github.com/rrwick/Filtlong ), and use Poreecho v0.2.4 to cut the joint sequence ( https://github.com/rrwick/Porechop ) [39]. Subsequently, we carried out final filtering on the acquired readings (length < 30 kb, average reading quality score ≤ 90%) and used them for genome assembly. For the HiFi raw data generated by the Pacbio Revio sequencing platform, filtering was implemented using CCS v6.0.0 ( https://github.com/PacificBiosciences/ccs ). The CCS readings obtained are ready for subsequent analysis [40]. Assisted assembly and annotation are performed using Hi-C. Trim the Hi-C raw data using fastp v0.21.0 and then align it with the reference genome using HICUP v0.8.0 ( http://www.bioinformatics.babraham.ac.uk/projects/hicup ). We used Fastp v0.21.0 to filter the raw readings obtained from sequencing, ultimately removing low-quality, short-length, high N-based content, and adapter-contaminated readings. The repetitive sequences generated by PCR amplification were also eliminated. Genome assembly The de novo assembly of PacBio long reads was performed by Falconv0.3.0 (github/PacicBiosciences/falcon) is performed. Firstly, all overlaps in the original reading were identified and error correction was performed on the reading using the overlap information. Overlap detection between generated correction reads for continuous components. Subsequently, the continuous sequences are polished using Quiver (a component of the SMRT analysis kit) to generate the primary assemblies. For the Bionano raw data, de novo assembly is performed in IrysView v. to produce consensus physical maps. The initial version of the reference genome assembly, referred to as hybrid scaffolds, is constructed by integrating the de novo primary assembly with Bionano genome imaging data via Bionano Solve V. As for the Hi-C sequencing data, it is mapped to the assembled scaffolds using BWA v (available at github.com/lh3/bwa) [41] to obtain information regarding the order and orientation of the scaffolds. The scaffolds were anchored to pseudochromosomes by LACHESIS v (github.com/shendurelab/LACHESIS) [42]. Phylogenetic analyses To identify paralogs and orthologs, as well as infer the species tree involving 15 plant species, OrthoFinder 2.5.5 was employed [43]. For example, Camptotheca acuminata , Salvia miltiorrhiza , and Thymus quinquecostatus et al. Construct a phylogenetic tree using identified single-copy genes. Functional annotation was performed on common and specific gene families by searching KEGG and GO databases [44]. We performed multiple sequence alignment on the single-copy gene family using Muscle v3.8.31 [45]. Next, the sequences were trimmed and aligned using Trimal v1.2rev59 with the parameter -gt set to 0.2. The processed data were then merged and fed into RAxML v8.2.10 for the construction of a maximum likelihood (ML) tree [46]. Lastly, CAFE v3.1 was employed to assess the expansion and contraction of gene families across each evolutionary branch. Gene families with significant expansion or contraction were identified based on a threshold of p ≤ 0.05 [47]. Syntenic analyses Homologous genes were identified both within and between species via BLASTP, and based on these findings, homologous blocks were defined using MCscanX [44]. Subsequently, WGDI was performed to calculate the Ks density of homologous and homologous gene pairs [48], and additionally, WGDI was used to generate syntenic point maps. Finally, SyRI was used to detect structural variations between subgenomes [49]. Gene duplication identification To categorize the duplicated genes under conditions of severed blood flow, we employed DupGen Finder [50] v1.12, which enabled us to classify them into five distinct groups: WGD, TD, PD, TRD, and DSD. Subsequently, further GO and KEGG analyses were performed on the genes in these duplicate categories using the R package clusterProfiler [44] v4.0. KaKs_Calculator [51] is used to calculate the values of Ka (non-synonymous permutation for each non-synonymous site) and Ks (synonymous permutation for each synonymous site). Abbreviations IPP Isopentenyl pyrophosphate MVA Mevalonate MEP 2-C-methyl-D-erythritol-4-phosphate FPP Farnesyl pyrophosphate GPPS Geranyl-diphosphate synthase β-AS Beta-amyrin synthase (β-AS) LS Lupeol synthase CYP450 CytochromeP450 monooxygenase UGT UDP-glycosyltransferases SS Squalene synthase Declarations Acknowledgements We would like to thank the reviewers and editors for their careful reading and helpful comments on this manuscript. Authors ’ contributions Conceptualization: GHL, CS, MAM and JD; Software: GHL, MDW, JYC, MA and HYW; Writing-review & editing: GHL MAM, CS,YPC and JD; Data curation: GHL, MDW, JYC and HYW; Funding acquisition: GHL, JD and CS. Data availability The raw data generated in this study were available on the NCBI database through the Bioproject number PRJNA 1062362. Funding This study was supported by the Anhui Provincial University Research Projects (2023AH052637), Startup fund for high-level talents of West Anhui University (WGKQ2021079), Quality Engineering Project of West Anhui University (wxxy2024011), Quality Engineering Project of Anhui Province (2024zybj032), Development of Big Data Integration and Analysis Platform for Traditional Chinese Medicine Genomics (0045025050), Anhui Innovation and Entrepreneurship Training Program for College Students (S202510376030), Key research and development project of Hainan Province (ZDYF2024SHFZ076) and Haikou Science and technology planning project (2022-008), and Guangxi “Bagui Young Talents” special fund. Ethics approval and consent to participate The experiments did not involve endangered or protected species. No specific permits were required for these locations/activities because C. chinense used in this study were obtained from an orchard in Taipingfan Township, Huoshan County, Lu'an, Anhui Province, China. All methods were carried out in accordance with relevant guidelines and regulations, under ethical approval and consent to participate. Consent for publication Not applicable. Competing interests The authors declare that they have no competing interests. Clinical trial number This experiment is not applicable to clinical trials. References Li HY, Liu XC, Chen XB, Liu QZ, Liu ZL. Chemical composition and insecticidal activities of the essential oil of Clinopodium chinense (Benth.) Kuntze aerial parts against liposcelis bostrychophila badonnel, J Food Prot. 2015; 78 (10): 1870-1874. Wang S, Ma G, Zhong M, Yu S, Xu X, Hu Y, Zhang Y, Wei H, Yang J. Triterpene saponins from Tabellae clinopodii . Fitoterapia. 2013; 90: 14-19. Sun YQ, Shang LU, Zhu QH, Fan LJ, Guo LB. Twenty years of plant genome sequencing: achievements and challenges. Trends Plant Sci. 2022; 27: 391-401. Song YT, Zhang YT, Wang X, Yu XK, Liao Y, Zhang H, Li LF, Wang YP, Liu B, Li W. Telomere-to-telomere reference genome for Panax ginseng highlights the evolution of saponin biosynthesis. Horticulture Research. 2024; 11: uhae107. Han BX, Jing Y, Dai J, Zheng T, Gu FL, Zhao Q, Zhu FC, Chen CW, Yue Z, Chen NF A chromosome-level genome assembly of Dendrobium Huoshanense using long reads and Hi-C data. Genome Biol Evol. 2020; 12(12): 2486-2490. Younessi-Hamzekhanlu M, Ozturk M, Jafarpour P, Mahna N. Exploitation of next generation sequencing technologies for unraveling metabolic pathways in medicinal plants: a concise review. Ind Crop Prod. 2022; 178: 114669. Bielecka M, Pencakowski B, Nicoletti R. Using next-generation sequencing technology to explore genetic pathways in endophytic fungi in the syntheses of plant bioactive metabolites. Agriculture. 2022; 12: 187. Choi HI, Waminal NE, Park HM, Kim NH, Choi BS, Park MY, Choi DI, Lim YP, Kwon SJ, Park BS, Kim HH, Yang TJ. Major repeat components covering one-third of the ginseng ( Panax ginseng CA Meyer) genome and evidence for allotetraploidy. Plant J. 2014; 77: 906-16. Nie SA, Zhao SW, Shi TL, Zhao W, Zhang RG, Tian XC, Guo JF, Yan XM, Bao YT, Li ZC, Kong L, Ma HY, Chen ZY, Liu H, El-Kassaby YA, Porth IG, Yang FS, Mao JF. Gapless genome assembly of azalea and multi-omics investigation into divergence between two species with distinct flower color. Hortic Res. 2023; 10: uhac241. Cheng F, Wu J, Cai X, Liang JLI, Freeling MA, Wang XW. Gene retention, fractionation and subgenome differences in polyploid plants. Nat Plants. 2018; 4: 258-68. Wang MJ, Tu LL, Lin M, Lin ZX, Wang PC, Yang QY, Ye ZX, Shen C, Li JY, Zhang L, Zhou XL, Nie XH, Li ZH, Guo K, Ma YZ, Huang C, Jin SX, Zhu LF, Yang XY, Min L, Yuan DJ, Zhang QH, Lindsey K, Zhang XL. Asymmetric subgenome selection and cis-regulatory divergence during cotton domestication. Nat Genet. 2017; 49: 579-87. Paterson AH, Wendel JF, Gundlach H, Guo H, Jenkins J, Jin DC, Llewellyn D, Showmaker KC, Shu SQ, Udall J, Yoo MJ, Byers R, Chen W, Faigenboim AD, Duke MV, Gong L, Grimwood J, Grover C, Grupp K, Hu GJ, Lee TH, Li JP, Lin LF, Liu T, Schmutz J. Repeated polyploidization of Gossypium genomes and the evolution of spinnable cotton fibres. Nature. 2012; 492: 423-427. Levy AA, Feldman M. Evolution and origin of bread wheat. Plant Cell. 2022; 34: 2549-67. Lili Li, Qi Huang, Xianchun Duan, Lan Han, Daiyin Peng. Protective effect of Clinopodium chinense (Benth.) O. Kuntze against abnormal uterine bleeding in female rats. J Pharma Sci. 2020; 02(004): 1347-8613. Zhu YD, Hong JY, Bao F, Xing N, Wang LT, Sun ZH, Luo Y, Jiang H, Xu XD, Zhu NL, Wu HF, Sun GB, Yang JS. Triterpenoid saponins from Clinopodium chinense (Benth.) O. Kuntze and their biological activity. Arch Pharm Res. 2018; 41(12): 1117e1130. Zhang HJ, Chen RC, Sun GB,Yang LP, Zhu YD, Xu XD, Sun XB. Protective effects of total flavonoids from Clinopodium chinense (Benth.) O. Ktunze on myocardial injury in vivo and in vitro via regulation of Akt/Nrf2/HO-1 pathway. Phytomedicine. 2018; 40: 88e97. Zhu YD, Hong JY, Bao FD, Xing N, Wang LT, Sun ZH, Luo Y, Jiang H, Xu XD, Zhu NL, Wu HF, Sun GB, Yang JS. Triterpenoid saponins from Clinopodium chinense (Benth.) O. Kuntze and their biological activity. Arch Pharm Res. 2018; 41: 1117-1130. Augustin JM, Kuzina V, Andersen SB, Bak S. Molecular activities, biosynthesis and evolution of triterpenoid saponins. Phytochemistry. 2011; 72: 435-457. Mondol MAM, Shin HJ, Rahman MA, Islam MT. Sea cucumber glycosides: chemical structures, producing species and important biological properties. Mar Drugs. 2017; 15: 317. Kalinin VI, Ivanchina NV, Krasokhin VB, Makarieva TN, Stonik VA. Glycosides from Marine Sponges (Porifera, Demospongiae): Structures, Taxonomical Distribution, Biological Activities and Biological Roles. Mar Drugs. 2012; 10: 1671-1710. Zhao CL, Cui XM, Chen YP, Liang Q. Key enzymes of triterpenoid saponin biosynthesis and the induction of their activities and gene expressions in plants. Nat Prod Commun. 2010; 5: 1147-1158. Haralampidis K, Trojanowska M, Osbourn AE. Biosynthesis of Triterpenoid Saponins in Plants. Adv. Biochem. Eng Biotechnol. 2002; 75: 31. Vincken JP, Heng L, de Groot A, Gruppen H. Saponins, classification and occurrence in the plant kingdom. Phytochemistry. 2007; 68: 275-297. Sawai S, Saito K. Triterpenoid biosynthesis and engineering in plants. Front Plant Sci. 2011; 2: 25. Kang SH, Pandey RP, Lee CM, Sim JS, Jeong JT, Choi BS, Jung M, Ginzburg D, Zhao K, Won SY, Oh TJ, Yu Y, Kim NH, Lee OR, LeeTH, Bashyal P, Kim TS, Lee WH, Hawkins C, Kim CK, Kim JS, Ahn BO, Rhee SY, Sohng JK. Genome enabled discovery of anthraquinone biosynthesis in Senna tora . Nat Communication. 2021; 11: 5875. Wang J, Xu S, Mei Y, Cai S, Gu Y, Sun M, Liang Z, Xiao Y, Zhang M,Yang S. A high quality genome assembly of Morinda cinalis , a famous native southern herb in the lingnan region of southern China. Hortic Res. 2021; 8: 135. Manni M, Berkeley MR, Seppey M, Simao FA, Zdobnov EM. “BUSCO update: Novel and streamlined work flows along with broader and deeper phylogenetic coverage of eukaryotic, prokaryotic, and viral genomes,” arXiv preprint arXiv. 2021; 2106: 11799. Shi YY, Zhang SX, Peng DY, Wang CK, Zhao D, Ma KL, Wu JW, Huang LQ. Transcriptome Analysis of Clinopodium chinense (Benth.) O. Kuntze and Identification of Genes Involved in Triterpenoid Saponin Biosynthesis. Int J Mol Sci. 2019; 20: 2643. Hou MQ, Wang RF, Zhao SJ, Wang ZT. Ginsenosides in Panax genus and their biosynthesis. Acta Pharm Sin B. 2021; 11: 1813-34. Gao YL, Wang YZ, Wang KZ, Zhu J, Li GS, Tian JW, Li CM, Wang ZH, Li J, Leed AW, Guo CH. Acute and a 28-day repeated-dose toxicity study of total flavonoids from Clinopodium chinense (Benth.) O. Ktze in mice and rats. Regul Toxicol Pharm. 2017; 91: 117-123. Zhang QJ, Li W, Li K, Nan H, Shi C, Zhang Y, Dai ZY, Lin YL, Yang XL, Tong Y. et al. The chromosome-level reference genome of tea tree unveils recent bursts of non-autonomous LTR retrotransposons in driving genome size evolution. Mol Plant. 2022; 13: 935-938. Kreplak J, Madoui MA, Ca´pal P, Nova´k P, Labadie K, Aubert G, Bayer PE, Gali KK, Syme RA, Main D. A reference genome for pea provides insight into legume genome evolution. Nat Genet. 2019; 51: 1411-1422. Zhao M, and Ma J. Co-evolution of plant LTR-retrotransposons and their host genomes. Protein Cell. 2013; 4: 493-501. Yi C, Fang T, Su H, Duan SF, Ma RR, Wang P, Wu L, Sun WB, Hu QC, Zhao MX, Sun LJ, Dong XH. A reference-grade genome assembly for Astragalus mongholicus and insights into the biosynthesis and high accumulation of triterpenoids and flavonoids in its roots. Plant Communications. 2022; 4: 100469. Zhang CH, Yang X, Wei JR, Chen NMH, Xu JP, Bi YQ, Yang M, Gong X, Li ZY, Ren K. Ethnopharmacology, phytochemistry, pharmacology, toxicology and clinical applications of Radix Astragali. Chin J Integr Med. 2021; 27: 229-240. Seki H, Tamura K, and Muranaka T. P450s and UGTs: key players in the structural diversity of triterpenoid saponins. Plant Cell Physiol. 2015; 56: 1463-1471. Thimmappa R, Geisler K, Louveau T, O’Maille P, and Osbourn A. Triterpene biosynthesis in plants. Annu Rev Plant Biol. 2014; 65: 225-257. Gong L, Wong CH, Idol J, Ngan CY, Wei CL. Ultra-long Read Sequencing for Whole 959 Genomic DNA Analysis. Journal of visualized experiments: JoVE. 2019. Bonenfant Q, Noé L, Touzet H. Porechop_ABI: discovering unknown adapters in Oxford 961 Nanopore Technology sequencing reads for downstream trimming. Bioinformatics 962 advances. 2023; 3: vbac085. Tamura KT, Teranishi YG, Ueda SY, Suzuki HY, Kawano NK, Yoshimatsu K, Saito KK, Kawahara NB, Muranaka TY, Seki H. Cytochrome P450 Monooxygenase CYP716A141 891 is a Unique β-Amyrin C-16β Oxidase Involved in Triterpenoid Saponin Biosynthesis in 892 Platycodon grandiflorus. Plant cell physiol. 2017; 58: 874-884. Li HX, Wu S, Lin RX, Xiao YR, Morotti ALM, Wang Y, Galilee MT, Qin HW, Huang T, Zhao Y, Zhou X, Yang J, Zhao Q, Kanellis AK, Martin C, Tatsis EC. The genomes of medicinal skullcaps reveal the polyphyletic origins of clerodane diterpene biosynthesis in the family Lamiaceae. Mol Plant. 16: 549-570. Wolff J, Rabbani L, Gilsbach R, Richard G, Manke T, Backofen R, Gruning BA . Galaxy HiCExplorer 3: a web server for 976 reproducible Hi-C, capture Hi-C and single-cell Hi-C data analysis, quality control and 977 visualization. Nucleic Acids Res. 2020; 48: W177-184. Emms DM, Kelly S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 2019; 20:1-14. Wu T, Hu E, Xu S, Chen M, Guo P, Dai Z, Feng T. ClusterProfiler 4.0: A universal enrichment tool for interpreting omics data. Innovations. 2021; 2: 100141. Yu HW, Wang HX, Liang X, Liu J, Jiang C, Chi XL, Zhi NN, PSu P, Zha LP, Gui SY. Telomere-to-telomere gap-free genome assembly provides genetic insight into the triterpenoid saponins biosynthesis in Platycodon grandiflorus. Hortic Res. 2025; 12: uhaf030 Capella-Gutiérrez S, Silla-Martínez JM, Gabaldón T. Al: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics (Oxford, England). 2009; 25: 1972-3. Mamta, Shikha S, Sangeeta K, Poonam P, Aasim M Gopal S, Ram KSa. High-quality haplotype-resolved chromosome assembly provides evolutionary insights and targeted steviol glycosides (SGs) biosynthesis in Stevia rebaudiana Bertoni. Plant Biotechnol J. 2024; 22: 3262-3277. Sun PC, Jiao BB, Yang YZ, Shan LX, Li T, Li XN, Xi ZX, Wang XY, Liu JQ. WGDI: a user-friendly toolkit for evolutionary analyses of whole-genome duplications and ancestral karyotypes. Mol Plant. 2022; 15: 1841-51. Goel M, Sun HQ, Jiao WB, Schneeberger KB. SyRI: finding genomic rearrangements and local sequence differences from whole-genome assemblies. Genome Biol. 2019; 20: 1-13. Qiao X, Li QH, Yin H, Qin KJ, Li LT, Wang RZ, Zhang SL, Paterson AH. Gene duplication and evolution in recurring polyploidization-diploidization cycles in plants. Genome Biol. 2019; 20: 1-23. Zhang Z. KaKs_calculator 3.0: calculating selective pressure on coding and non-coding sequences. Genom Proteom Bioinf. 2022; 20: 536-40. Additional Declarations No competing interests reported. Supplementary Files SupplementaryInformationlegends.docx Supplementarymateria.rar Cite Share Download PDF Status: Published Journal Publication published 29 Nov, 2025 Read the published version in BMC Plant Biology → Version 1 posted Editorial decision: Revision requested 28 Oct, 2025 Reviews received at journal 27 Oct, 2025 Reviewers agreed at journal 27 Oct, 2025 Reviews received at journal 25 Oct, 2025 Reviewers agreed at journal 09 Oct, 2025 Reviewers agreed at journal 24 Sep, 2025 Reviewers invited by journal 07 Sep, 2025 Editor invited by journal 02 Sep, 2025 Editor assigned by journal 08 Aug, 2025 Submission checks completed at journal 07 Aug, 2025 First submitted to journal 07 Aug, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-7299302","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":512878001,"identity":"ace0f343-cf91-4ab2-8322-3349bb50c8fe","order_by":0,"name":"Guohui Li","email":"","orcid":"","institution":"West Anhui University","correspondingAuthor":false,"prefix":"","firstName":"Guohui","middleName":"","lastName":"Li","suffix":""},{"id":512878004,"identity":"ebc31edf-7ff4-4b11-a42b-681ad437717d","order_by":1,"name":"Mengda Wang","email":"","orcid":"","institution":"West Anhui University","correspondingAuthor":false,"prefix":"","firstName":"Mengda","middleName":"","lastName":"Wang","suffix":""},{"id":512878005,"identity":"adb949ad-54a0-4758-a08a-cec8412429f2","order_by":2,"name":"Muhammad Aamir Manzoor","email":"","orcid":"","institution":"Hainan University","correspondingAuthor":false,"prefix":"","firstName":"Muhammad","middleName":"Aamir","lastName":"Manzoor","suffix":""},{"id":512878006,"identity":"faaf3db8-57bb-4f46-84f1-68b02de33dc7","order_by":3,"name":"Haiyu Wang","email":"","orcid":"","institution":"West Anhui University","correspondingAuthor":false,"prefix":"","firstName":"Haiyu","middleName":"","lastName":"Wang","suffix":""},{"id":512878008,"identity":"34d682f9-3572-490e-999f-91ec2efa5871","order_by":4,"name":"Junyi Cheng","email":"","orcid":"","institution":"Sichuan University","correspondingAuthor":false,"prefix":"","firstName":"Junyi","middleName":"","lastName":"Cheng","suffix":""},{"id":512878010,"identity":"45b67c8b-8a76-465b-852d-35cf7cddccc4","order_by":5,"name":"Muhammad Arif","email":"","orcid":"","institution":"Sakarya University of Applied Sciences","correspondingAuthor":false,"prefix":"","firstName":"Muhammad","middleName":"","lastName":"Arif","suffix":""},{"id":512878012,"identity":"1df52a2c-fc98-4634-9f6d-adc48234d638","order_by":6,"name":"Yunpeng Cao","email":"","orcid":"","institution":"Chinese Academy of Sciences","correspondingAuthor":false,"prefix":"","firstName":"Yunpeng","middleName":"","lastName":"Cao","suffix":""},{"id":512878013,"identity":"af87db2e-59e2-4d0e-966c-cb1455f8ac8b","order_by":7,"name":"Jun Dai","email":"","orcid":"","institution":"West Anhui University","correspondingAuthor":false,"prefix":"","firstName":"Jun","middleName":"","lastName":"Dai","suffix":""},{"id":512878014,"identity":"c395f859-795c-425d-b817-e4065b159836","order_by":8,"name":"Cheng Song","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABC0lEQVRIie3QsWrDMBCA4TMGZVHs9UJA7SPICEwHv0mXEwF3STtnMMUl4GzpmqG0rxDoCzgI3CWlr6CtS4Z0KZlCVehUkEO3DvqGm/RzkgCC4B9KvgcBCDZYbOx+hiKNY2P7EvaTqIRvJ9lqe6FGC1bKk4mjH5Hy8bCZ6fUbP8feZPDaWVsVigHRmDNUynCQUBWX3oTfXEnqSsFg02YPHEVuhq2FrryuvReb5kjMKBbVRDtElZuEZFQbf5LuXHI0uolBtlyifp672Zug26IblzDI7jihXscnE7dFL0vFOJ9EqxYVGvfJ1POWNJ3mo8NnIc6e3l8OH8dbkd4bY/dV4U086G/HgyAIgl++ADMcVIfdiBagAAAAAElFTkSuQmCC","orcid":"","institution":"West Anhui University","correspondingAuthor":true,"prefix":"","firstName":"Cheng","middleName":"","lastName":"Song","suffix":""}],"badges":[],"createdAt":"2025-08-05 10:09:24","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-7299302/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-7299302/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1186/s12870-025-07841-8","type":"published","date":"2025-11-29T15:58:10+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":91201851,"identity":"c8797483-7705-4fde-ad01-3f764a8803ed","added_by":"auto","created_at":"2025-09-12 15:48:37","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":475806,"visible":true,"origin":"","legend":"\u003cp\u003eThe habitat, phenotypes, karyotypes, and genomic features of \u003cem\u003eC. chinense\u003c/em\u003e.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eA:\u003c/strong\u003eCultivation and growth of \u003cem\u003eC. chinense\u003c/em\u003e in the field. \u003cstrong\u003eB:\u003c/strong\u003e Morphology of the leaf and stem of \u003cem\u003eC. chinense\u003c/em\u003e. \u003cstrong\u003eC:\u003c/strong\u003e Fluorescence staining of \u003cem\u003eC. chinense\u003c/em\u003e chromosome with DAPI (2n=2x=38). \u003cstrong\u003eD:\u003c/strong\u003e Hi-C map of \u003cem\u003eC. chinense \u003c/em\u003eof the draft genome assembly. \u003cstrong\u003eE: \u003c/strong\u003eGenomic features of the genome assembly. (I) The 20 assembled \u003cem\u003eC. chinense\u003c/em\u003e chromosomes. (II) The gene count along the genome. (III) The repetiti ve sequence density along the genome. (IV) The GC content along the genome. (IV) Syntenic relationships among different chr omosomes of \u003cem\u003eC. chinense.\u003c/em\u003e\u003c/p\u003e","description":"","filename":"floatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-7299302/v1/d7daebfe832edb8da55b3758.png"},{"id":91201854,"identity":"9f45b44e-ff7b-4f78-a1c5-f56f86de7d4a","added_by":"auto","created_at":"2025-09-12 15:48:37","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":536530,"visible":true,"origin":"","legend":"\u003cp\u003eComparative genomic and evolutionary analysis of \u003cem\u003eC. chinense.\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eA:\u003c/strong\u003ePhylogenetic tree of 15 plant species and the expansion/contraction of gene families. The numbers on the nodes indicate estimated divergence times (million years ago [Mya]), and the blue bars show the error range. The numbers in green and red indicate the expanded and contracted gene families in the lineage, respectively. The divergence time of speciation events are superimposed on the tree below. The background color represents the tribe to which the species belongs. All the nodes have 100% bootstrap support. MRCA, most recent common ancestor; WGD, whole genome duplication. \u003cstrong\u003eB: \u003c/strong\u003eKs distribution between \u003cem\u003eC. chinense\u003c/em\u003e, \u003cem\u003eT. quinquecostatus\u003c/em\u003e,\u003cem\u003e \u003c/em\u003eand\u003cem\u003e S. baicalensis\u003c/em\u003e. Astragalus membranaceus var. mongholicus is abbreviated as A. mongholicus. The peak location is denoted by dotted lines. Ks, synonymous substitution. \u003cstrong\u003eC:\u003c/strong\u003eSynteny between \u003cem\u003eC. chinense\u003c/em\u003e (20 chromosomes), \u003cem\u003eT. quinquecostatus\u003c/em\u003e (13 chromosomes), and \u003cem\u003eS. baicalensis\u003c/em\u003e (9 chromosomes).\u003c/p\u003e","description":"","filename":"floatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-7299302/v1/f71f2a78fecaa44011a28a87.png"},{"id":91202731,"identity":"58a0924a-72a9-4dbe-a469-150eb4196369","added_by":"auto","created_at":"2025-09-12 15:56:37","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":368982,"visible":true,"origin":"","legend":"\u003cp\u003eGene family GO and KEGG enrichment analysis.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eA:\u003c/strong\u003e Gene ontology (GO) enrichment analysis conducted on the genes with significant expansion.\u003cbr\u003e\n \u003cstrong\u003eB:\u003c/strong\u003e KEGG enrichment analysis performed on the genes that are significantly expanded. \u003cstrong\u003eC:\u003c/strong\u003e GO enrichment analysis of the significantly expanded genes. \u003cstrong\u003eD:\u003c/strong\u003e KEGG enrichment analysis of the contracted genes.\u003c/p\u003e","description":"","filename":"floatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-7299302/v1/ee034fc57e2a27205833daae.png"},{"id":91201855,"identity":"e9abefe0-9191-4994-a8df-35978f0cc567","added_by":"auto","created_at":"2025-09-12 15:48:37","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":388333,"visible":true,"origin":"","legend":"\u003cp\u003eCo-analysis of transcriptional metabolism of KEGG enrichment maps.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eA: \u003c/strong\u003eKEGG enrichment map of RvsS. \u003cstrong\u003eB:\u003c/strong\u003e KEGG enrichment map of RvsL. \u003cstrong\u003eC:\u003c/strong\u003e KEGG enrichment map of SvsL.\u003c/p\u003e","description":"","filename":"floatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-7299302/v1/95647a3841c352cb80f9e50b.png"},{"id":91203939,"identity":"eac3d1f0-5f34-4a6e-b959-4b1f9d863739","added_by":"auto","created_at":"2025-09-12 16:12:37","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":232389,"visible":true,"origin":"","legend":"\u003cp\u003eTerpenoid biosynthetic pathways.\u003c/p\u003e","description":"","filename":"floatimage5.png","url":"https://assets-eu.researchsquare.com/files/rs-7299302/v1/3ad71ea8dd0e692cda997b88.png"},{"id":91201865,"identity":"c593f70b-941f-40a3-9fec-4bd9c51a020d","added_by":"auto","created_at":"2025-09-12 15:48:37","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":362987,"visible":true,"origin":"","legend":"\u003cp\u003ePhylogeny of \u003cem\u003eCYP\u003c/em\u003e and \u003cem\u003eUGT\u003c/em\u003e gene family families identified in the \u003cem\u003eC. chinense\u003c/em\u003e genome.\u003c/p\u003e","description":"","filename":"floatimage6.png","url":"https://assets-eu.researchsquare.com/files/rs-7299302/v1/e8161a5f79774af1f3c7d4c0.png"},{"id":97178798,"identity":"15341f0d-9d32-47e1-9ede-3c5ab08f20f9","added_by":"auto","created_at":"2025-12-01 16:13:44","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":3271582,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7299302/v1/a3c46afa-d2ed-4949-b651-2be9efcc35e3.pdf"},{"id":91202729,"identity":"120b0c4d-7460-4142-839e-a7bd18891169","added_by":"auto","created_at":"2025-09-12 15:56:37","extension":"docx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":14777,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryInformationlegends.docx","url":"https://assets-eu.researchsquare.com/files/rs-7299302/v1/61ae01bcfbf01f722bf91e95.docx"},{"id":91202732,"identity":"a4a74acf-ac36-444f-a024-294f19b94c2b","added_by":"auto","created_at":"2025-09-12 15:56:37","extension":"rar","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":3132791,"visible":true,"origin":"","legend":"","description":"","filename":"Supplementarymateria.rar","url":"https://assets-eu.researchsquare.com/files/rs-7299302/v1/7ba8d97fe6171b2b5d3188d1.rar"}],"financialInterests":"No competing interests reported.","formattedTitle":"A Chromosome-Level Genome Assembly Reveals triterpenoid biosynthesis in Clinopodium chinense","fulltext":[{"header":"Background","content":"\u003cp\u003e\u003cem\u003eClinopodium chinense\u003c/em\u003e (Benth.) O. Kuntze, also known by its Chinese name \u0026ldquo;Feng Lun Cai\u0026rdquo;, is classified within the genus Clinopodium of the Lamiaceae family. This perennial herb is prevalent across the eastern, northeastern, and southwestern regions of China [1]. In China, the aerial parts of \u003cem\u003eC. chinense-\u003c/em\u003ecommonly known as \u0026ldquo;duanxueliu\u0026rdquo;-have long been used in traditional folk medicine, and they are employed to treat various ailments, including hematuria, influenza, and allergic dermatitis [2]. Due to the long growth cycle, polyploid nature of the genome, and ambiguity of the phenotype, the breeding of \u003cem\u003eC. chinense\u003c/em\u003e is still in its early stages. There is also a phenomenon of unstable germplasm resources in agricultural applications.\u003c/p\u003e\u003cp\u003eThe reference genome offers insights into functional genes, evolutionary processes, and breeding practices. However, at present, only a small number of medicinal plants have been sequenced assembled into high-quality genomes [3\u0026ndash;5]. The innovation of sequencing technology makes it possible to assemble higher quality medicinal plant genomes [6, 7]. However, due to the polyploidy of many medicinal plants, current genomic research is more difficult compared to conventional crops and horticultural plants. In earlier research, fluorescence in situ hybridization (FISH) analyses on ginseng have indicated that it is an allotetraploid [4, 8]. This type of polyploid is typically formed through the hybridization of two or more distinct diploid species followed by chromosome doubling. While genome doubling events contribute positively to boosting species diversity and environmental adaptability, this conclusion holds particularly in the initial phases of allopolyploid formation. To coordinate and coexist with the genomes of various evolved diploid ancestral species within a single nucleus, polyploid genomes may undergo gene loss, differentiation, or even chromosome reduction, thereby acquiring a more diploid-like condition.\u003c/p\u003e\u003cp\u003ePrevious studies have indicated that allopolyploids exhibit a subgenome advantage, wherein one subgenome typically has more genes, higher gene expression, and lower differentiation [9\u0026ndash;11]. For example, allelic recombination occurring in the subgenome of polyploid cotton (\u003cem\u003eGossypium hirsutum\u003c/em\u003e) plays a role in its ecological adaptation and the evolution of fibers [12]. Rapid genomic changes and homologous inhibition leading to diploidization have also been observed in the subgenome of wheat (\u003cem\u003eTriticum aestivum\u003c/em\u003e) [13]. In addition, the differences between the subgenomes of ginseng (\u003cem\u003ePanax ginseng\u003c/em\u003e) may lead to the functional divergency and evolution of ginsenoside biosynthetic genes [4]. Therefore, elucidating the differences between the subgenomes of \u003cem\u003eC. chinense\u003c/em\u003e will also contribute to functional and evolutionary research.\u003c/p\u003e\u003cp\u003eLike other herbs, \u003cem\u003eC. chinense\u003c/em\u003e also has complex metabolites that are considered effective compounds, among which triterpenoid saponins are recognized as the most important type. Previous studies have found that \u003cem\u003eC. chinense\u003c/em\u003e contain abundant flavonoids, diterpenes, and triterpenes with have anti-inflammatory, antibacterial, and procoagulant activities [14\u0026ndash;16]. It is generally believed that the presence of secondary metabolites triterpenoid saponins in \u003cem\u003eC. chinense\u003c/em\u003e gives them significant pharmacological effects, including hemostatic, anti-tumor, and hypoglycemic activities [17]. However, it is difficult to directly extract sufficient amounts of triterpenoid saponins directly from \u003cem\u003eC. chinense\u003c/em\u003e. Nevertheless, the biosynthesis pathway of triterpenoid saponins has not been well characterized in various medicinal plants. Thus, conducting genome sequencing research on species like \u003cem\u003eC. chinense\u003c/em\u003e could greatly enhance our comprehension of these pathways.\u003c/p\u003e\u003cp\u003eTriterpenoid saponins constitute a category of specialized metabolites with diverse structures, found in plants [18] as well as marine invertebrates such as sea cucumbers [19] and sponges [20]. Their biosynthetic processes involve the isoprenoid pathway, where isopentenyl pyrophosphate (IPP) serves as the precursor for all isoprenoids [21]. IPP can be produced through either the mevalonate (MVA) pathway or the 2-C-methyl-D-erythritol-4-phosphate (MEP) pathway [10]. The biosynthesis of triterpenoid saponins consists of three primary stages. Initially, isopentenyl pyrophosphate (IPP) is transformed into farnesyl pyrophosphate (FPP) through the action of geranyl-diphosphate synthase (GPPS) and farnesyl-diphosphate synthase (FPPS) [22]. In the second stage, 2,3-oxidosqualene undergoes cyclization catalyzed by 2,3-oxidosqualene cyclases\u0026mdash;such as beta-amyrin synthase (β-AS) and lupeol synthase (LS)\u0026mdash;resulting in the formation of various compounds like beta-amyrin and lupeol [23]. The modification stage involves a series of modifications such as oxidation, displacement, and glycosylation of terpenoid skeletons, which are carried out by specific cytochrome P450 monooxygenase (CYPs) and UDP-glycosyltransferases (UGTs) to generate triterpenoid saponins [24].\u003c/p\u003e\u003cp\u003eWith the continuous development of genome sequencing technology, advancements in long-read sequencing techniques (e.g., HiFi sequencing) and high-throughput/resolution chromosome conformation capture (Hi-C) methods have made it possible to generate chromosome-sized genome sets in various plants. In present work, we assembled a chromosomal-level genome of \u003cem\u003eC. chinense\u003c/em\u003e for traditional Chinese medicine, where two haplotypes fully represent it. We conducted a deeper investigation into the differences in triterpenoid saponin synthesis genes across subgenomes and identified gene clusters linked to triterpenoid saponin synthesis. In addition, several high-quality genome sequences of Labiatae plants published in recent years have further enhanced our understanding of the evolution and biology of Labiatae plants. The genomic resources and analyses presented in this article establish a crucial foundation for comprehending the evolutionary dynamics of Labiatae plants, while also advancing genetic research and breeding efforts for \u003cem\u003eC. chinense\u003c/em\u003e.\u003c/p\u003e"},{"header":"Results","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e\n \u003ch2\u003eGenome sequencing, assembly, and annotation\u003c/h2\u003e\n \u003cp\u003eIn this study, \u003cem\u003eC. chinense\u003c/em\u003e is a diploid (2n\u0026thinsp;=\u0026thinsp;2x\u0026thinsp;=\u0026thinsp;38; Fig. \u003cspan class=\"InternalRef\"\u003e1\u003c/span\u003eA, \u003cspan class=\"InternalRef\"\u003e1\u003c/span\u003eB, \u003cspan class=\"InternalRef\"\u003e1\u003c/span\u003eC; Table \u003cspan class=\"InternalRef\"\u003e1\u003c/span\u003e), with an estimated genome size of 0.61 Gb. K-mer analysis generated a total of 47,441,502,660 k-mers with a length of 19 and a peak depth of 77.96. The genome size of \u003cem\u003eC. chinense\u003c/em\u003e is 608.5 Mb, with a heterozygosity of 0.89% and a duplication ratio of 54.64% (Table \u003cspan class=\"InternalRef\"\u003eS1\u003c/span\u003e; Fig. \u003cspan class=\"InternalRef\"\u003eS1\u003c/span\u003e). After pruning and quality control, we obtained 22.73 Gb of HiFi reads and 113.3 Gb of Hi-C paired reads (Table S2; Fig. S2). The assembled genome comprised 24 contigs with a contig N50 of 37.15 Mb and a GC content of 38.61%. The size of the assembled genome aligns closely with the projected genome size derived from k-mer analysis and flow cytometry (Table S3). The quality of \u003cem\u003eC. chinense\u003c/em\u003e genome assembly in this study (N50\u0026thinsp;=\u0026thinsp;37.15 Mb) was significantly higher than that of recently published genome assemblies of other herbal plants, such as \u003cem\u003eCamptotheca acuminata\u003c/em\u003e (N50\u0026thinsp;=\u0026thinsp;1.47 Mb) [25], \u003cem\u003eMorinda officinalis\u003c/em\u003e (N50\u0026thinsp;=\u0026thinsp;4.21 Mb) [26], and \u003cem\u003eSenna tora\u003c/em\u003e (N50\u0026thinsp;=\u0026thinsp;4.03 Mb) [25]. In addition, we further utilized Hi-C assembly information and found that these overlapping groups were further anchored to 20 pseudo-chromosomes with only one gap, covering approximately 99.12% of the assembly sequence. Genes are not evenly distributed across chromosomes; instead, a greater concentration is found at the chromosomal ends (Fig. \u003cspan class=\"InternalRef\"\u003e1\u003c/span\u003eD, \u003cspan class=\"InternalRef\"\u003e1\u003c/span\u003eE; Table S4). The length range of pseudo chromosomes is 7.21 to 44.32 Mb, with an average length of 30.16 Mb (Table S4). Furthermore, we used BUSCO software (version 5.2.2, parameter: -evalue 1e\u0026minus;05) [27] to evaluate of the \u003cem\u003eC. chinense\u003c/em\u003e genome annotation. The results showed that approximately 96.8% of complete embryonic plant genes were identified in the assembled genome (Table \u003cspan class=\"InternalRef\"\u003e1\u003c/span\u003e). The BUSCO evaluation typically exceeds 90%, signifying robust genome annotation and underscoring the superior quality of the assembled genome.\u003c/p\u003e\n \u003cp\u003e\u003c/p\u003e\n \u003ctable id=\"Tab1\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e\n \u003cdiv class=\"CaptionContent\"\u003e\n \u003cp\u003eStatistics of \u003cem\u003eC. chinense\u003c/em\u003e genome assembly and annotation.\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eFeature\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eStatistic\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAssembled genome size (Gb)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.61\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eGC content (%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e38.61\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eContig number\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e128\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eContig N50 (Mb)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e34.33\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eScaffold number\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e124\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eN50 (Mb)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e37.15\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eN90 (Mb)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e20.13\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eMinimum len (Mb)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.02\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eMaximum len (Mb)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e44.32\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eChromosome numbers\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e20\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAverage coding sequence length (bp)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1250.95\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAverage exons per gene\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e5.51\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAverage exon length (bp)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e337.69\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eRepeats in genome (%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e63.85\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eBUSCO (complete) (%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e96.84\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n \u003cp\u003e\u003c/p\u003e\n \u003cp\u003eThe transcription element (TE) annotation of the reference genome indicated that \u003cem\u003eC. chinense\u003c/em\u003e contained 61.45% repetitive sequences (Table S5), which was significantly lower than the repeat sequences found in other medicinal plant genomes. The ginseng genome comprises 83.17% repeat sequences [4], while the genome of \u003cem\u003eDendrobium huoshanense\u003c/em\u003e genome exhibits a repeat sequence value of 74.92% [5]. We further analyzed and found that long terminal repeat (LTR) sequences are the most abundant in repeat sequences, accounting for 42.42% of the \u003cem\u003eC. chinense\u003c/em\u003e genome. Among them, LTR/Gypsy and LTR/Copia accounted for 19.23% and 18.86% of the genome, respectively (Table S5).\u003c/p\u003e\n \u003cp\u003eTo predict protein-coding genes, we integrated ab initio, homology-based, and transcriptome-based prediction approaches (Fig. S3). Across the \u003cem\u003eC. chinense\u003c/em\u003e genome, a total of 45,466 protein-coding genes were identified, with each gene having an average length of 4097.38 bp and containing 5.51 exons on average (Fig. S4; Table S6). In addiotion, we have conducted in-depth research and discovered a group of non-coding RNAs (Table S7).\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003ePhylogenomic and evolution of the\u003c/strong\u003e \u003cstrong\u003eC. chinense\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003eWe performed a comparative genomic analysis of \u003cem\u003eC. chinense\u003c/em\u003e with 15 other plant genomes to investigate its evolutionary history. We conducted a thorough analysis of the statistical results related to gene family identification. This analysis revealed that a total of 68,482 orthologous gene families were identified across all species, comprising 517,286 genes. The total count of single-copy gene families was 48, while the gene families common to all species amounted to 7,132, comprising 226,662 genes. The distinctive gene family of this research species comprises 3451 members, encompassing 5212 genes (Table S8; Fig. S5). To elucidate the origin of \u003cem\u003eC. chinense\u003c/em\u003e, we conducted a phylogenetic tree analysis involving 15 species. Phylogenetic analysis of 48 single-copy gene families indicates that \u003cem\u003eC. chinense\u003c/em\u003e is most closely linked to the \u003cem\u003eThymus quinquecostatus\u003c/em\u003e within the Labiatae family, with a divergence time of approximately 14.0 million years ago (Fig. \u003cspan class=\"InternalRef\"\u003e2\u003c/span\u003eA).\u003c/p\u003e\n \u003cp\u003eWhole genome duplication (WGD), or ancient polyploidization, refers to the process of genomic duplication and subsequent doubling. To investigate the WGD events that occurred during the evolution of the \u003cem\u003eC. chinense\u003c/em\u003e, we studied the distribution of Ks among the selected species (Fig. \u003cspan class=\"InternalRef\"\u003e2\u003c/span\u003eB). The Ks value for \u003cem\u003eC. chinense\u003c/em\u003e-\u003cem\u003eT. quinquecostatu\u003c/em\u003e (CcTq) was 1.60. Our research indicated that members of the Labiatae family, including \u003cem\u003eC. chinense\u003c/em\u003e and \u003cem\u003eS. baicalensis\u003c/em\u003e, exhibit notable ancient peaks around Ks\u0026thinsp;=\u0026thinsp;1.0. Furthermore, when incorporating \u003cem\u003eJ. sambac\u003c/em\u003e into the study, we observed that this peak emerged before the divergence peak between \u003cem\u003eC. chinense\u003c/em\u003e and \u003cem\u003eJ. sambac\u003c/em\u003e, indicating that \u003cem\u003eC. chinense\u003c/em\u003e underwent the same gamma WGD event as \u003cem\u003eJ. sambac\u003c/em\u003e and \u003cem\u003eS. baicalensis.\u003c/em\u003e\u003c/p\u003e\n \u003cp\u003eFurthermore, evolutionary analysis of gene families showed that 6,353 gene families have expanded, accounting for 35.51% of all gene families, while 619 gene families have contracted (accounting for 3.51%). Notably, 21,736 amplified gene families and 697 contracted gene families showed statistical significance (P\u0026thinsp;\u0026lt;\u0026thinsp;0.05) in \u003cem\u003eC. chinense\u003c/em\u003e (Table S9). To improve our understanding of the evolutionary history of \u003cem\u003eC. chinense\u003c/em\u003e, we conducted genomic collinearity research involving \u003cem\u003eC. chinense\u003c/em\u003e, \u003cem\u003eT. quinquecostatus\u003c/em\u003e, and \u003cem\u003eS. baicalensis\u003c/em\u003e (Fig. \u003cspan class=\"InternalRef\"\u003e2\u003c/span\u003eC). Further analysis of different types of duplicated genes revealed that the recent WGD event specifically occurring in Labiatae contributed significantly to the expansion of these gene families.\u003c/p\u003e\n\u003c/div\u003e\n\u003ch3\u003eGene family enrichment analysis\u003c/h3\u003e\n\u003cp\u003eTo understand their biological functions, we conducted KEGG and GO analyses (Fig. \u003cspan class=\"InternalRef\"\u003e3\u003c/span\u003e). GO analysis emphasizes that significantly expanded gene families are enriched in multiple processes, such as UDP-glycosyitranslerase activity, regulation of terpenoid biosynthetic process, structural constituent of cytoskeleton, regulation of coumarin biosynthetic process, and response to high light intensity (Fig. \u003cspan class=\"InternalRef\"\u003e3\u003c/span\u003eA; Table S10). KEGG analysis revealed that the majority of expanded genes were concentrated in pathways such as monoterpenoid biosynthesis, ubiquitin-mediated proteolysis, the biosynthesis of diverse plant secondary metabolites, ABC transporters, and plant hormone signal transduction (Fig. \u003cspan class=\"InternalRef\"\u003e3\u003c/span\u003eB; Table S11). However, further research has revealed that the contracted gene families were associated with GO terms linked to polysaccharide binding, oligopeptide transmembrane transporter activity, lignin catabolic process, response to auxin, lipid transport, \u003cem\u003eetc\u003c/em\u003e. (Fig. \u003cspan class=\"InternalRef\"\u003e3\u003c/span\u003eC; Table S12). Furthermore, our KEGG pathway analysis of contraction genes indicated their involvement in photosynthesis, oxidative phosphorylation, protein processing in the endoplasmic reticulum, and glutathione metabolism (Fig. \u003cspan class=\"InternalRef\"\u003e3\u003c/span\u003eD; Table S13). Consequently, based on the aforementioned gene family enrichment analysis, we put forward that the expansion and contraction of these genes may strengthen the adaptability of C. chinense in complex environments, enhance its stress tolerance and growth regulation capabilities, and ultimately ensure its survival under abiotic stress.\u003c/p\u003e\n\u003ch3\u003eCo-analysis of transcriptomics and metabolomics\u003c/h3\u003e\n\u003cp\u003eThe use of single-omics data analysis to study complex biological processes and biological network regulation has certain limitations. The integration of multi-omics joint analysis is capable of offsetting data-related issues in single-omics data analysis, such as those arising from data loss and noise. Therefore, a combined analysis of the transcriptome and metabolome was conducted to study the biological functions of differentially expressed genes and metabolites in three different tissues of \u003cem\u003eC. chinense\u003c/em\u003e: roots, stems, and leaves.. In this study, a combined analysis of transcriptome and metabolome was conducted to investigate the differential substances between different tissues of \u003cem\u003eC. chinense\u003c/em\u003e. It is hoped that through this method, more comprehensive and accurate research results can be obtained, providing strong support for the development of related fields.\u003c/p\u003e\n\u003cp\u003eThe joint analysis results indicated that there were 1271, 2834, and 2013 differentially expressed chemicals annotated in 136, 141, and 138 KEGG pathways, respectively, among the three comparison groups of RvsS, RvsL, and SvsL in various tissues of \u003cem\u003eC. chinense\u003c/em\u003e (Fig. \u003cspan class=\"InternalRef\"\u003e4\u003c/span\u003e). Among them, RvsL had the most differentially expressed substances, indicating more significant differences between root and leaf tissues. The analysis of the top 20 pathways with the lowest P-values in the KEGG enrichment results showed that the category with the highest number of annotated pathways was metabolic, indicating that differentially expressed genes and metabolites in the \u003cem\u003eC. chinense\u003c/em\u003e had a significant impact on plant metabolism. There were 13, 9, and 13 metabolic pathways in the three control groups, respectively. In the RvsS group, differential substances were most significantly enriched in the ribosome pathway; The differential substances in the RvsL and SvsL groups were most significantly enriched in the biosynthesis of secondary metabolites pathway, indicating that the differential substances between different tissues involved in the regulation of plant secondary metabolic pathways are mainly involved in the \u003cem\u003eC. chinense\u003c/em\u003e.\u003c/p\u003e\n\u003cp\u003eMoreover, the differentially expressed genes across the three groups showed enrichment in two terpenoid biosynthesis-related pathways, specifically those involved in terpenoid skeleton biosynthesis, sesquiterpene biosynthesis, and triterpenoid biosynthesis. This suggests that the differentially expressed genes across various tissues of \u003cem\u003eC. chinense\u003c/em\u003e exert a regulatory influence on terpenoid production in \u003cem\u003eC. chinense\u003c/em\u003e. Considering that triterpenoid components are important active constituents in \u003cem\u003eC. chinense\u003c/em\u003e, we will do extensive research on the metabolic pathways and genes related to terpenoid synthesis below.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eIdentification of key genes for triterpenoid saponins biosynthesis in\u003c/strong\u003e \u003cstrong\u003eC. chinense\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eEarlier research on the chemical components of \u003cem\u003eC. chinense\u003c/em\u003e has shown that its rich triterpenoid saponins, which have been verified in clinical settings, are the primary bioactive substances [28]. Combining transcriptomic and metabolomic analysis of three different tissues undergoing \u003cem\u003eC. chinense\u003c/em\u003e, we inferred and proposed the biosynthetic pathways of triterpenoid saponins and their derivatives in \u003cem\u003eC. chinense\u003c/em\u003e. Terpenes are mainly synthesized through two pathways: the MVA pathway and the MEP pathway. Multiple key enzymes are involved in this process and play an important regulatory role in terpenoid synthesis (Fig. \u003cspan class=\"InternalRef\"\u003e5\u003c/span\u003e).\u003c/p\u003e\n\u003cp\u003eDrawing on the KEGG enrichment outcomes, our analysis centered on differentially expressed genes (DEGs) within two terpenoid biosynthesis-related pathways: Terpenoid backbone biosynthesis and Sesquiterpenoid and triterpenoid biosynthesis. We also conducted in-depth exploration and analysis of enzyme genes involved in terpenoid biosynthesis in three different tissues with \u003cem\u003eC. chinense\u003c/em\u003e. In the RvsS group, there are 13 differentially expressed genes (DEGs) annotated to the Terpenoid backbone biosynthetic pathway, while 7 DEGs are annotated to the Sesquiterpenoid and triterpenoid biosynthesis pathway. Among these DEGs, we identified key enzyme genes involved in terpenoid synthesis pathways, including 3 \u003cem\u003eSEQ\u003c/em\u003e, 2 \u003cem\u003eGPPS\u003c/em\u003e, 2 \u003cem\u003eHMGS\u003c/em\u003e, 2 \u003cem\u003eHMGR\u003c/em\u003e, 1 \u003cem\u003eMVD\u003c/em\u003e, 1 \u003cem\u003eDXS\u003c/em\u003e, 1 \u003cem\u003eIDI\u003c/em\u003e, and 1 \u003cem\u003eSS\u003c/em\u003e gene. In addition, there are a total of 45 DEGs annotated to the Terpenoid backbone biosynthetic pathway and 19 DEGs annotated to the RvsL group Sesquiterpenoid and triterpenoid biosynthesis pathway. Among these DEGs, we identified key enzyme genes involved in terpenoid synthesis pathways, including 7 \u003cem\u003eSEQ\u003c/em\u003e, 6 \u003cem\u003eGPPS\u003c/em\u003e, 5 \u003cem\u003e\u0026beta;-AS\u003c/em\u003e, 4 \u003cem\u003eHMGR\u003c/em\u003e, 4 \u003cem\u003ePMK\u003c/em\u003e, 3 \u003cem\u003eMVD\u003c/em\u003e, 3 \u003cem\u003eDXS\u003c/em\u003e, 3 \u003cem\u003eSS\u003c/em\u003e, 2 \u003cem\u003eHMGS\u003c/em\u003e, 2 \u003cem\u003eAACT\u003c/em\u003e, 2 \u003cem\u003eHDR\u003c/em\u003e, 2 \u003cem\u003eMVK\u003c/em\u003e, 2 \u003cem\u003eIDI\u003c/em\u003e, 1 \u003cem\u003eCMK\u003c/em\u003e, 1 \u003cem\u003eDXR\u003c/em\u003e, 1 \u003cem\u003eMCS\u003c/em\u003e, and 1 \u003cem\u003eFPPS\u003c/em\u003e gene. A total of 35 differentially expressed genes (DEGs) are annotated to the terpenoid backbone biosynthetic pathway, while 14 DEGs are assigned to the sesquiterpenoid and triterpenoid biosynthesis pathway belonging to the SvsL group. Among these DEGs, we identified key enzyme genes involved in terpenoid synthesis pathways, including 6 \u003cem\u003eSEQ\u003c/em\u003e, 5 \u003cem\u003eHMGR\u003c/em\u003e, 4 \u003cem\u003eGPPS\u003c/em\u003e, 3 \u003cem\u003eMVD\u003c/em\u003e, 3 \u003cem\u003eSS\u003c/em\u003e, 3 \u003cem\u003e\u0026beta;-AS\u003c/em\u003e, 2 \u003cem\u003eMVK\u003c/em\u003e, 2 \u003cem\u003eAACT\u003c/em\u003e, 2 \u003cem\u003eHMGS\u003c/em\u003e, 2 \u003cem\u003eDXR\u003c/em\u003e, 2 \u003cem\u003eIDI\u003c/em\u003e, 1 \u003cem\u003eFPPS\u003c/em\u003e, 1 \u003cem\u003ePMK\u003c/em\u003e, 1 \u003cem\u003eCMK\u003c/em\u003e, 1 \u003cem\u003eDXS\u003c/em\u003e, and 1 \u003cem\u003eHDR\u003c/em\u003e gene (Table. 2).\u003c/p\u003e\n\u003ctable id=\"Tab2\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e\n \u003cdiv class=\"CaptionContent\"\u003e\n \u003cp\u003eStatistics on the number of genes for key enzymes of the terpenoid synthesis pathway.\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\" rowspan=\"2\"\u003e\n \u003cp\u003eGene name\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" rowspan=\"2\"\u003e\n \u003cp\u003eabbreviation\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colspan=\"3\"\u003e\n \u003cp\u003egroups\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eRvsS\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eRvsL\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eSvsL\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eacetyl-CoA C-acetyltransferase\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAACT\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e2\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eHydroxymethylglutaryl-CoA synthase\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eHMGS\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e2\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eHydroxymethylglutaryl-CoA reductase\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eHMGR\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e4\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e5\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eMevalonate Kinase\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eMVK\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e2\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003ePhosphomevalonate Kinase\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003ePMK\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e4\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eMevalonate pyrophosphate decarboxylase\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eMVD\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e3\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e3\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1-deoxy-D-xylulose 5-phosphate synthase\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eDXS\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e3\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1-deoxy-D-xylulose 5-phosphate reductoisomerase\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eDXR\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e2\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e4-diphosphocytidyl\u0026minus;2-C-methyl-D-erythritol kinase\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eCMK\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e2-C-methyl-d-erythritol 2,4-cyclodiphosphate synthase\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eMCS\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e4-Hydroxy\u0026minus;3-methylbut\u0026minus;2-en\u0026minus;1-yl diphosphate reductase\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eHDR\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eIsopentenyl diphosphate isomerase\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eIDI\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e2\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eFarnesyl diphosphate synthase\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eFPPS\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eSqualene synthase\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eSS\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e3\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e3\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eSqualene monooxygenase\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eSQE\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e3\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e7\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e6\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u0026beta;-Amyrin synthase\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u0026beta;-AS\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e5\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e3\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eGeranylgeranyl pyrophosphate synthase\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eGPPS\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e6\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e4\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003e\u003c/p\u003e\n\u003cp\u003eThe biosynthesis of triterpenoid saponin involves over 20 catalytic steps, featuring essential enzymes such as squalene synthase (SS) for scaffold formation and 2,3-oxidosqualene cyclase (OSC), alongside modifying enzymes including cytochrome P450s (CYP450) and glycosyltransferases (UGT) [29]. Furthermore, the functional enrichment analysis revealed that these DEGs were mainly enriched in \u0026quot;cytochrome P450\u0026quot; and \u0026quot;transporters\u0026quot;, which may be involved in the synthesis, transport, and storage of active ingredients and stress responses in \u003cem\u003eC. chinense.\u003c/em\u003e To elucidate which members of the CYP and UGT gene families may be involved in the synthesis of triterpenoid saponins in \u003cem\u003eC. chinense.\u003c/em\u003e We identified 105 \u003cem\u003eCYP\u003c/em\u003e and 21 \u003cem\u003eUGT\u003c/em\u003e genes in the \u003cem\u003eC. chinense\u003c/em\u003e genome. By prior functional predictions, we evaluated 14 CYP members and 10 UGT members potentially implicated in triterpenoid saponin biosynthesis. We additionally performed an evolutionary study on them alongside CYP and UGT members with established roles. Based on the identification of 36 CYP enzymes implicated in triterpenoid biosynthesis, we developed a phylogenetic tree utilizing CcCYP derived from a blood cut source alongside these enzymes. We additionally used CYP101A1 from \u003cem\u003eC. chinense\u003c/em\u003e as an outgroup, which is strongly associated with CYPs in plants. CYP101A1 from prokaryotes serves as an outgroup that efficiently anchors the base of the phylogenetic tree, facilitating a more precise examination of the branching direction and evolutionary distance among various CYP subfamilies. The phylogenetic tree indicated that CcCYP, which was related to triterpenoid biosynthesis, primarily clusters within the CYP85 and CYP71 subfamilies (Fig. \u003cspan class=\"InternalRef\"\u003e6\u003c/span\u003eA). Specifically, CcCYP707A1-7, belonging to the CYP85 family is on the same branch as GuCYP88D6, which has C-11 oxidase function. We speculated that it may be responsible for catalyzing the oxidative modification of the 11th carbon atom of triterpenoid molecules, which is an essential step in the formation of triterpenoid structures. The clustering of CcCYP26A1 with C-16 \u0026alpha; oxidase MlCYP87D16 suggests that its function may be related to catalyzing the oxidation reaction of the 16th \u0026alpha; configuration carbon atom (such as introducing oxygen-containing groups such as hydroxyl and carbonyl groups). Meanwhile, the C-23 oxidase AtCYP71A16 is adjacent to CcCYP26A1, while CcCYP75B1A1-2 and CcCYP84 are similar to LjCYP71D353 responsible for C-20 and C-28 oxidation. CcCYP93B1-2 is also speculated to function as a C-24 oxidase potentially.\u003c/p\u003e\n\u003cp\u003eBased on the discovery of 36 glycosyltransferases involved in triterpenoid biosynthesis, we have constructed a phylogenetic tree of CcUGT with these enzymes, which are involved in \u003cem\u003eC. chinense\u003c/em\u003e. As shown in the Fig. \u003cspan class=\"InternalRef\"\u003e6\u003c/span\u003eB, except for CcUGT72E1-2, the functions of CcUGT73C1-3 may be similar to those of oleanolic acid 3-O-glucuronosyltransferase.\u003c/p\u003e"},{"header":"Discussion","content":"\u003cp\u003eIn traditional Chinese medicine, \u003cem\u003eC. chinense\u003c/em\u003e is regarded as a significant medicinal plant, boasting a wealth of acknowledged bioactive components such as flavonoids, triterpenoids, polysaccharides, and amino acids [15, 30]. The analysis of whole genome sequences provides a foundation for studying the gene functions, phylogenetics, and gene evolution of \u003cem\u003eC. chinense\u003c/em\u003e and many other medicinal plants. The genome of \u003cem\u003eC. chinense\u003c/em\u003e has the characteristics of high heterozygosity (0.89%) and a large number of repetitive sequences (61.45%), making its assembly challenging. Moreover, medicinal plants have a relatively short domestication history, which often leads to disrupted genetic backgrounds and complicates genome mining efforts. To address this, we integrated Illumina and PacBio sequencing with high-throughput Hi-C technology to produce a high-quality C. chinense genome, featuring a scaffold N50 of 37.15 Mb and a contig N50 of 34.33 Mb (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). Additionally, we constructed a 0.61 Gb telomere-to-telomere reference genome with a BUSCO completeness of 96.84% (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). EGMA analysis further confirmed that the assembled genome fully covered 99.97% of the core eukaryotic genes. These results emphasize that compared to medicinal plant genomes sequenced recently, the integrity and quality of the \u003cem\u003eC. chinense\u003c/em\u003e genome are higher. This reference genome aids in clarifying genetic variations within the saponin biosynthesis pathway throughout the evolutionary process of the \u003cem\u003eC. chinense\u003c/em\u003e genome, offering guidance for the further domestication of ginseng to enhance its medicinal effectiveness.\u003c/p\u003e\u003cp\u003eThrough phylogenomic analysis, we put forward a framework for the evolutionary process of \u003cem\u003eC. chinense\u003c/em\u003e. Ks analyses showed that \u003cem\u003eC. chinense\u003c/em\u003e has not experienced an WGD event after differentiation. The proliferation of different LTR-RT lineages is an important and and essential factor that causes differences in plant genome size. For example, the \u003cem\u003eTat\u003c/em\u003e family is a key driver behind genome amplification in \u003cem\u003eCamellia sinensis\u003c/em\u003e [31], while the \u003cem\u003eOgre\u003c/em\u003e family plays a major role in \u003cem\u003ePisum sativum\u003c/em\u003e [32]. Additionally, due to insertions, translocations, and deletions of LTR-RTs occurring during this process, incomplete autonomous LTRs now make up a substantial part of the genome. Despite the significant role LTR-RTs play in shaping genome size diversity, many aspects of them\u0026mdash;including their origin, expression patterns, insertion specificity, evolutionary trajectories, and potential impacts on genetic and epigenetic regulation of genes\u0026mdash;remain largely uninvestigated [33, 34].\u003c/p\u003e\u003cp\u003eTriterpenoids and flavonoids stand as the two primary bioactive components in \u003cem\u003eC. chinense\u003c/em\u003e. Among them, triterpenoids exhibit a range of beneficial properties, including anti-inflammatory, hypotensive, and anti-aging effects, while also providing positive impacts on myocardial ischemia injury and chronic kidney disease [35]. We have tentatively verified the substantial expansion of gene families associated with the production of flavonoids and triterpenoids in AMM via comparative genomic analysis. In this study, based on the combined analysis of transcriptome and metabolomics, we identified 53 and 72 candidate genes in the biosynthesis pathways of triterpenoids and flavonoids, respectively, through homology search and functional annotation in \u003cem\u003eC. chinense\u003c/em\u003e. Further analysis revealed that tandem replication may play a key role in gene amplification in the synthesis pathways of flavonoids and triterpenoid saponins, such as CYP450 and UGT genes, which are critical nodes controlling metabolic flow [36]. We hypothesized that an elevation in the copy numbers of these genes could facilitate the accumulation of active substances in \u003cem\u003eC. chinense\u003c/em\u003e.\u003c/p\u003e\u003cp\u003eTriterpenoids are a group of natural compounds with diverse structures. They are formed when triterpenoid saponins condense with one or more sugars and other chemical groups, with tetracyclic triterpenoids and pentacyclic triterpenoids being the most common types [37]. As a result, across various tissues of \u003cem\u003eC. chinense\u003c/em\u003e, differentially expressed genes (DEGs) associated with terpenoid biosynthesis were most abundant in the RvsL group, followed by the SvsL group, and least prevalent in the RvsS group, highlighting notable disparities in gene expression between root and leaf tissues and DEGs. DEGs predominantly localize within the terpenoid backbone biosynthesis pathway, indicating their significant regulatory function in terpenoid synthesis in \u003cem\u003eC. chinense\u003c/em\u003e, thereby establishing a basis for the investigation of terpenoid synthesis in this species.\u003c/p\u003e"},{"header":"Conclusion","content":"\u003cp\u003eIn summary, we have constructed a chromosome-level genome of \u003cem\u003eC. chinense\u003c/em\u003e, establishing a basis for further research on \u003cem\u003eC. chinense\u003c/em\u003e and serving as a significant resource for the detailed study of several other medicinal plants. This study identifies potential genes associated with flavonoid and triterpene production in \u003cem\u003eC. chinense\u003c/em\u003e, establishing a basis for future genetic enhancement, while genomic evolutionary analysis offers novel insights into the evolutionary history of orchid species. The investigation of the \u003cem\u003eC. chinense\u003c/em\u003e genome is anticipated to elucidate its lineage selection and enhance the breeding of therapeutic plants for future generations.\u003c/p\u003e"},{"header":"Materials and Methods","content":"\u003cdiv id=\"Sec9\" class=\"Section2\"\u003e\u003ch2\u003ePlant materials\u003c/h2\u003e\u003cp\u003eThe \u003cem\u003eC. chin\u003c/em\u003eense samples were collected in August 2021 from an orchard of Taipingfan Township, Huoshan County, Lu'an, Anhui Province, China, for the purpose of sequencing analysis. High-quality DNA extracted from young roots was used to construct multiple genomic sequencing libraries. For Illumina sequencing, short-insert fragments are utilized, while long fragments are employed in SMRT sequencing.\u003c/p\u003e\u003c/div\u003e\n\u003ch3\u003eGenome sequencing\u003c/h3\u003e\n\u003cp\u003eIn this study, genome sequencing was conducted via Nanopore ultra-long read, PacBio HiFi, and Hi-C sequencing technologies [38], with the work carried out by Wuhan Benegen Technology Co., Ltd. Specifically, ultra-long sequencing was performed using the Nanopore PromethION platform, and raw data were filtered to exclude failed reads with an average quality score below 7. Using Filtlong software (v0.2.4) (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/rrwick/Filtlong\u003c/span\u003e\u003cspan address=\"https://github.com/rrwick/Filtlong\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e), and use Poreecho v0.2.4 to cut the joint sequence (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/rrwick/Porechop\u003c/span\u003e\u003cspan address=\"https://github.com/rrwick/Porechop\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e) [39]. Subsequently, we carried out final filtering on the acquired readings (length\u0026thinsp;\u0026lt;\u0026thinsp;30 kb, average reading quality score\u0026thinsp;\u0026le;\u0026thinsp;90%) and used them for genome assembly. For the HiFi raw data generated by the Pacbio Revio sequencing platform, filtering was implemented using CCS v6.0.0 (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/PacificBiosciences/ccs\u003c/span\u003e\u003cspan address=\"https://github.com/PacificBiosciences/ccs\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e). The CCS readings obtained are ready for subsequent analysis [40]. Assisted assembly and annotation are performed using Hi-C. Trim the Hi-C raw data using fastp v0.21.0 and then align it with the reference genome using HICUP v0.8.0 (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://www.bioinformatics.babraham.ac.uk/projects/hicup\u003c/span\u003e\u003cspan address=\"http://www.bioinformatics.babraham.ac.uk/projects/hicup\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e). We used Fastp v0.21.0 to filter the raw readings obtained from sequencing, ultimately removing low-quality, short-length, high N-based content, and adapter-contaminated readings. The repetitive sequences generated by PCR amplification were also eliminated.\u003c/p\u003e\u003cdiv id=\"Sec11\" class=\"Section2\"\u003e\u003ch2\u003eGenome assembly\u003c/h2\u003e\u003cp\u003eThe \u003cem\u003ede novo\u003c/em\u003e assembly of PacBio long reads was performed by Falconv0.3.0 (github/PacicBiosciences/falcon) is performed. Firstly, all overlaps in the original reading were identified and error correction was performed on the reading using the overlap information. Overlap detection between generated correction reads for continuous components. Subsequently, the continuous sequences are polished using Quiver (a component of the SMRT analysis kit) to generate the primary assemblies. For the Bionano raw data, de novo assembly is performed in IrysView v. to produce consensus physical maps. The initial version of the reference genome assembly, referred to as hybrid scaffolds, is constructed by integrating the de novo primary assembly with Bionano genome imaging data via Bionano Solve V. As for the Hi-C sequencing data, it is mapped to the assembled scaffolds using BWA v (available at github.com/lh3/bwa) [41] to obtain information regarding the order and orientation of the scaffolds. The scaffolds were anchored to pseudochromosomes by LACHESIS v (github.com/shendurelab/LACHESIS) [42].\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec12\" class=\"Section2\"\u003e\u003ch2\u003ePhylogenetic analyses\u003c/h2\u003e\u003cp\u003eTo identify paralogs and orthologs, as well as infer the species tree involving 15 plant species, OrthoFinder 2.5.5 was employed [43]. For example, \u003cem\u003eCamptotheca acuminata\u003c/em\u003e, \u003cem\u003eSalvia miltiorrhiza\u003c/em\u003e, and \u003cem\u003eThymus quinquecostatus\u003c/em\u003e et al. Construct a phylogenetic tree using identified single-copy genes. Functional annotation was performed on common and specific gene families by searching KEGG and GO databases [44]. We performed multiple sequence alignment on the single-copy gene family using Muscle v3.8.31 [45]. Next, the sequences were trimmed and aligned using Trimal v1.2rev59 with the parameter -gt set to 0.2. The processed data were then merged and fed into RAxML v8.2.10 for the construction of a maximum likelihood (ML) tree [46]. Lastly, CAFE v3.1 was employed to assess the expansion and contraction of gene families across each evolutionary branch. Gene families with significant expansion or contraction were identified based on a threshold of p\u0026thinsp;\u0026le;\u0026thinsp;0.05 [47].\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec13\" class=\"Section2\"\u003e\u003ch2\u003eSyntenic analyses\u003c/h2\u003e\u003cp\u003eHomologous genes were identified both within and between species via BLASTP, and based on these findings, homologous blocks were defined using MCscanX [44]. Subsequently, WGDI was performed to calculate the Ks density of homologous and homologous gene pairs [48], and additionally, WGDI was used to generate syntenic point maps. Finally, SyRI was used to detect structural variations between subgenomes [49].\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec14\" class=\"Section2\"\u003e\u003ch2\u003eGene duplication identification\u003c/h2\u003e\u003cp\u003eTo categorize the duplicated genes under conditions of severed blood flow, we employed DupGen Finder [50] v1.12, which enabled us to classify them into five distinct groups: WGD, TD, PD, TRD, and DSD. Subsequently, further GO and KEGG analyses were performed on the genes in these duplicate categories using the R package clusterProfiler [44] v4.0. KaKs_Calculator [51] is used to calculate the values of Ka (non-synonymous permutation for each non-synonymous site) and Ks (synonymous permutation for each synonymous site).\u003c/p\u003e\u003c/div\u003e"},{"header":"Abbreviations","content":"\u003cp\u003eIPP Isopentenyl pyrophosphate\u003c/p\u003e\n\u003cp\u003eMVA Mevalonate \u003c/p\u003e\n\u003cp\u003eMEP 2-C-methyl-D-erythritol-4-phosphate \u003c/p\u003e\n\u003cp\u003eFPP Farnesyl pyrophosphate \u003c/p\u003e\n\u003cp\u003eGPPS Geranyl-diphosphate synthase \u003c/p\u003e\n\u003cp\u003e\u0026beta;-AS Beta-amyrin synthase (\u0026beta;-AS) \u003c/p\u003e\n\u003cp\u003eLS Lupeol synthase \u003c/p\u003e\n\u003cp\u003eCYP450 CytochromeP450 monooxygenase \u003c/p\u003e\n\u003cp\u003eUGT UDP-glycosyltransferases \u003c/p\u003e\n\u003cp\u003eSS Squalene synthase\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eAcknowledgements\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe would like to thank the reviewers and editors for their careful reading and helpful comments on this manuscript.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e\u0026nbsp;\u003c/strong\u003e\u003cstrong\u003eAuthors\u003c/strong\u003e\u003cstrong\u003e\u0026rsquo;\u003c/strong\u003e\u003cstrong\u003econtributions\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eConceptualization: GHL, CS, MAM and JD; Software: GHL, MDW, JYC, MA and HYW; Writing-review \u0026amp; editing: GHL MAM, CS,YPC and JD; Data curation: GHL, MDW, JYC and HYW; Funding acquisition: GHL, JD and CS.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eData availability\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe raw data generated in this study were available on the NCBI database through the Bioproject number PRJNA 1062362.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis study was supported by the Anhui Provincial University Research Projects (2023AH052637), Startup fund for high-level talents of West Anhui University (WGKQ2021079), Quality Engineering Project of West Anhui University (wxxy2024011), Quality Engineering Project of Anhui Province (2024zybj032), Development of Big Data Integration and Analysis Platform for Traditional Chinese Medicine Genomics (0045025050), Anhui Innovation and Entrepreneurship Training Program for College Students (S202510376030), Key research and development project of Hainan Province (ZDYF2024SHFZ076) and Haikou Science and technology planning project (2022-008), and Guangxi \u0026ldquo;Bagui Young Talents\u0026rdquo; special fund.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eEthics approval and consent to participate\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe experiments did not involve endangered or protected species. No specific permits were required for these locations/activities because \u003cem\u003eC. chinense\u003c/em\u003e used in this study were obtained from an orchard in Taipingfan Township, Huoshan County, Lu\u0026apos;an, Anhui Province, China. All methods were carried out in accordance with relevant guidelines and regulations, under ethical approval and consent to participate.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConsent for publication\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNot applicable.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCompeting interests\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors declare that they have no competing interests.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eClinical trial number\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis experiment is not applicable to clinical trials.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eLi HY, Liu XC, Chen XB, Liu QZ, Liu ZL. Chemical composition and insecticidal activities of the essential oil of \u003cem\u003eClinopodium chinense\u003c/em\u003e (Benth.) Kuntze aerial parts against liposcelis bostrychophila badonnel, J Food Prot. 2015; 78 (10): 1870-1874.\u003c/li\u003e\n\u003cli\u003eWang S, Ma G, Zhong M, Yu S, Xu X, Hu Y, Zhang Y, Wei H, Yang J. Triterpene saponins from \u003cem\u003eTabellae clinopodii\u003c/em\u003e. Fitoterapia. 2013; 90: 14-19.\u003c/li\u003e\n\u003cli\u003eSun YQ, Shang LU, Zhu QH, Fan LJ, Guo LB. Twenty years of plant genome sequencing: achievements and challenges. Trends Plant Sci. 2022; 27: 391-401.\u003c/li\u003e\n\u003cli\u003eSong YT, Zhang YT, Wang X, Yu XK, Liao Y, Zhang H, Li LF, Wang YP, Liu B, Li W. Telomere-to-telomere reference genome for \u003cem\u003ePanax ginseng\u003c/em\u003e highlights the evolution of saponin biosynthesis. Horticulture Research. 2024; 11: uhae107.\u003c/li\u003e\n\u003cli\u003eHan BX, Jing Y, Dai J, Zheng T, Gu FL, Zhao Q, Zhu FC, Chen CW, Yue Z, Chen NF A chromosome-level genome assembly of \u003cem\u003eDendrobium Huoshanense\u003c/em\u003e using long reads and Hi-C data. Genome Biol Evol. 2020; 12(12): 2486-2490.\u003c/li\u003e\n\u003cli\u003eYounessi-Hamzekhanlu M, Ozturk M, Jafarpour P, Mahna N. Exploitation of next generation sequencing technologies for unraveling metabolic pathways in medicinal plants: a concise review. Ind Crop Prod. 2022; 178: 114669. \u003c/li\u003e\n\u003cli\u003eBielecka M, Pencakowski B, Nicoletti R. Using next-generation sequencing technology to explore genetic pathways in endophytic fungi in the syntheses of plant bioactive metabolites. Agriculture. 2022; 12: 187.\u003c/li\u003e\n\u003cli\u003eChoi HI, Waminal NE, Park HM, Kim NH, Choi BS, Park MY, Choi DI, Lim YP, Kwon SJ, Park BS, Kim HH, Yang TJ. Major repeat components covering one-third of the ginseng (\u003cem\u003ePanax ginseng\u003c/em\u003e CA Meyer) genome and evidence for allotetraploidy. Plant J. 2014; 77: 906-16.\u003c/li\u003e\n\u003cli\u003eNie SA, Zhao SW, Shi TL, Zhao W, Zhang RG, Tian XC, Guo JF, Yan XM, Bao YT, Li ZC, Kong L, Ma HY, Chen ZY, Liu H, El-Kassaby YA, Porth IG, Yang FS, Mao JF. Gapless genome assembly of azalea and multi-omics investigation into divergence between two species with distinct flower color. Hortic Res. 2023; 10: uhac241. \u003c/li\u003e\n\u003cli\u003eCheng F, Wu J, Cai X, Liang JLI, Freeling MA, Wang XW. Gene retention, fractionation and subgenome differences in polyploid plants. Nat Plants. 2018; 4: 258-68. \u003c/li\u003e\n\u003cli\u003eWang MJ, Tu LL, Lin M, Lin ZX, Wang PC, Yang QY, Ye ZX, Shen C, Li JY, Zhang L, Zhou XL, Nie XH, Li ZH, Guo K, Ma YZ, Huang C, Jin SX, Zhu LF, Yang XY, Min L, Yuan DJ, Zhang QH, Lindsey K, Zhang XL. Asymmetric subgenome selection and cis-regulatory divergence during cotton domestication. Nat Genet. 2017; 49: 579-87.\u003c/li\u003e\n\u003cli\u003ePaterson AH, Wendel JF, Gundlach H, Guo H, Jenkins J, Jin DC, Llewellyn D, Showmaker KC, Shu SQ, Udall J, Yoo MJ, Byers R, Chen W, Faigenboim AD, Duke MV, Gong L, Grimwood J, Grover C, Grupp K, Hu GJ, Lee TH, Li JP, Lin LF, Liu T, Schmutz J. Repeated polyploidization of Gossypium genomes and the evolution of spinnable cotton fibres. Nature. 2012; 492: 423-427. \u003c/li\u003e\n\u003cli\u003eLevy AA, Feldman M. Evolution and origin of bread wheat. Plant Cell. 2022; 34: 2549-67.\u003c/li\u003e\n\u003cli\u003eLili Li, Qi Huang, Xianchun Duan, Lan Han, Daiyin Peng. Protective effect of \u003cem\u003eClinopodium chinense\u003c/em\u003e (Benth.) O. Kuntze against abnormal uterine bleeding in female rats. J Pharma Sci. 2020; 02(004): 1347-8613.\u003c/li\u003e\n\u003cli\u003eZhu YD, Hong JY, Bao F, Xing N, Wang LT, Sun ZH, Luo Y, Jiang H, Xu XD, Zhu NL, Wu HF, Sun GB, Yang JS. Triterpenoid saponins from \u003cem\u003eClinopodium chinense \u003c/em\u003e(Benth.) O. Kuntze and their biological activity. Arch Pharm Res. 2018; 41(12): 1117e1130. \u003c/li\u003e\n\u003cli\u003eZhang HJ, Chen RC, Sun GB,Yang LP, Zhu YD, Xu XD, Sun XB. Protective effects of total flavonoids from\u003cem\u003e Clinopodium chinense\u003c/em\u003e (Benth.) O. Ktunze on myocardial injury in vivo and in vitro via regulation of Akt/Nrf2/HO-1 pathway. Phytomedicine. 2018; 40: 88e97.\u003c/li\u003e\n\u003cli\u003eZhu YD, Hong JY, Bao FD, Xing N, Wang LT, Sun ZH, Luo Y, Jiang H, Xu XD, Zhu NL, Wu HF, Sun GB, Yang JS. Triterpenoid saponins from \u003cem\u003eClinopodium chinense\u003c/em\u003e (Benth.) O. Kuntze and their biological activity. Arch Pharm Res. 2018; 41: 1117-1130.\u003c/li\u003e\n\u003cli\u003eAugustin JM, Kuzina V, Andersen SB, Bak S. Molecular activities, biosynthesis and evolution of triterpenoid saponins. Phytochemistry. 2011; 72: 435-457. \u003c/li\u003e\n\u003cli\u003eMondol MAM, Shin HJ, Rahman MA, Islam MT. Sea cucumber glycosides: chemical structures, producing species and important biological properties. Mar Drugs. 2017; 15: 317.\u003c/li\u003e\n\u003cli\u003eKalinin VI, Ivanchina NV, Krasokhin VB, Makarieva TN, Stonik VA. Glycosides from Marine Sponges (Porifera, Demospongiae): Structures, Taxonomical Distribution, Biological Activities and Biological Roles. Mar Drugs. 2012; 10: 1671-1710.\u003c/li\u003e\n\u003cli\u003eZhao CL, Cui XM, Chen YP, Liang Q. Key enzymes of triterpenoid saponin biosynthesis and the induction of their activities and gene expressions in plants. Nat Prod Commun. 2010; 5: 1147-1158. \u003c/li\u003e\n\u003cli\u003eHaralampidis K, Trojanowska M, Osbourn AE. Biosynthesis of Triterpenoid Saponins in Plants. Adv. Biochem. Eng Biotechnol. 2002; 75: 31. \u003c/li\u003e\n\u003cli\u003eVincken JP, Heng L, de Groot A, Gruppen H. Saponins, classification and occurrence in the plant kingdom. Phytochemistry. 2007; 68: 275-297.\u003c/li\u003e\n\u003cli\u003eSawai S, Saito K. Triterpenoid biosynthesis and engineering in plants. Front Plant Sci. 2011; 2: 25.\u003c/li\u003e\n\u003cli\u003eKang SH, Pandey RP, Lee CM, Sim JS, Jeong JT, Choi BS, Jung M, Ginzburg D, Zhao K, Won SY, Oh TJ, Yu Y, Kim NH, Lee OR, LeeTH, Bashyal P, Kim TS, Lee WH, Hawkins C, Kim CK, Kim JS, Ahn BO, Rhee SY, Sohng JK. Genome enabled discovery of anthraquinone biosynthesis in \u003cem\u003eSenna tora\u003c/em\u003e. Nat Communication. 2021; 11: 5875.\u003c/li\u003e\n\u003cli\u003eWang J, Xu S, Mei Y, Cai S, Gu Y, Sun M, Liang Z, Xiao Y, Zhang M,Yang S. A high quality genome assembly of \u003cem\u003eMorinda cinalis\u003c/em\u003e, a famous native southern herb in the lingnan region of southern China. Hortic Res. 2021; 8: 135.\u003c/li\u003e\n\u003cli\u003eManni M, Berkeley MR, Seppey M, Simao FA, Zdobnov EM. \u0026ldquo;BUSCO update: Novel and streamlined work flows along with broader and deeper phylogenetic coverage of eukaryotic, prokaryotic, and viral genomes,\u0026rdquo; arXiv preprint arXiv. 2021; 2106: 11799.\u003c/li\u003e\n\u003cli\u003eShi YY, Zhang SX, Peng DY, Wang CK, Zhao D, Ma KL, Wu JW, Huang LQ. Transcriptome Analysis of \u003cem\u003eClinopodium chinense\u003c/em\u003e (Benth.) O. Kuntze and Identification of Genes Involved in Triterpenoid Saponin Biosynthesis. Int J Mol Sci. 2019; 20: 2643.\u003c/li\u003e\n\u003cli\u003eHou MQ, Wang RF, Zhao SJ, Wang ZT. Ginsenosides in Panax genus and their biosynthesis. Acta Pharm Sin B. 2021; 11: 1813-34.\u003c/li\u003e\n\u003cli\u003eGao YL, Wang YZ, Wang KZ, Zhu J, Li GS, Tian JW, Li CM, Wang ZH, Li J, Leed AW, Guo CH. Acute and a 28-day repeated-dose toxicity study of total flavonoids from \u003cem\u003eClinopodium chinense\u003c/em\u003e (Benth.) O. Ktze in mice and rats. Regul Toxicol Pharm. 2017; 91: 117-123. \u003c/li\u003e\n\u003cli\u003eZhang QJ, Li W, Li K, Nan H, Shi C, Zhang Y, Dai ZY, Lin YL, Yang XL, Tong Y. et al. The chromosome-level reference genome of tea tree unveils recent bursts of non-autonomous LTR retrotransposons in driving genome size evolution. Mol Plant. 2022; 13: 935-938. \u003c/li\u003e\n\u003cli\u003eKreplak J, Madoui MA, Ca\u0026acute;pal P, Nova\u0026acute;k P, Labadie K, Aubert G, Bayer PE, Gali KK, Syme RA, Main D. A reference genome for pea provides insight into legume genome evolution. Nat Genet. 2019; 51: 1411-1422.\u003c/li\u003e\n\u003cli\u003eZhao M, and Ma J. Co-evolution of plant LTR-retrotransposons and their host genomes. Protein Cell. 2013; 4: 493-501.\u003c/li\u003e\n\u003cli\u003eYi C, Fang T, Su H, Duan SF, Ma RR, Wang P, Wu L, Sun WB, Hu QC, Zhao MX, Sun LJ, Dong XH. A reference-grade genome assembly for Astragalus mongholicus and insights into the biosynthesis and high accumulation of triterpenoids and flavonoids in its roots. Plant Communications. 2022; 4: 100469.\u003c/li\u003e\n\u003cli\u003eZhang CH, Yang X, Wei JR, Chen NMH, Xu JP, Bi YQ, Yang M, Gong X, Li ZY, Ren K. Ethnopharmacology, phytochemistry, pharmacology, toxicology and clinical applications of Radix Astragali. Chin J Integr Med. 2021; 27: 229-240.\u003c/li\u003e\n\u003cli\u003eSeki H, Tamura K, and Muranaka T. P450s and UGTs: key players in the structural diversity of triterpenoid saponins. Plant Cell Physiol. 2015; 56: 1463-1471.\u003c/li\u003e\n\u003cli\u003eThimmappa R, Geisler K, Louveau T, O\u0026rsquo;Maille P, and Osbourn A. Triterpene biosynthesis in plants. Annu Rev Plant Biol. 2014; 65: 225-257.\u003c/li\u003e\n\u003cli\u003eGong L, Wong CH, Idol J, Ngan CY, Wei CL. Ultra-long Read Sequencing for Whole 959 Genomic DNA Analysis. Journal of visualized experiments: JoVE. 2019.\u003c/li\u003e\n\u003cli\u003eBonenfant Q, No\u0026eacute; L, Touzet H. Porechop_ABI: discovering unknown adapters in Oxford 961 Nanopore Technology sequencing reads for downstream trimming. Bioinformatics 962 advances. 2023; 3: vbac085.\u003c/li\u003e\n\u003cli\u003eTamura KT, Teranishi YG, Ueda SY, Suzuki HY, Kawano NK, Yoshimatsu K, Saito KK, Kawahara NB, Muranaka TY, Seki H. Cytochrome P450 Monooxygenase CYP716A141 891 is a Unique \u0026beta;-Amyrin C-16\u0026beta; Oxidase Involved in Triterpenoid Saponin Biosynthesis in 892 Platycodon grandiflorus. Plant cell physiol. 2017; 58: 874-884.\u003c/li\u003e\n\u003cli\u003eLi HX, Wu S, Lin RX, Xiao YR, Morotti ALM, Wang Y, Galilee MT, Qin HW, Huang T, Zhao Y, Zhou X, Yang J, Zhao Q, Kanellis AK, Martin C, Tatsis EC. The genomes of medicinal skullcaps reveal the polyphyletic origins of clerodane diterpene biosynthesis in the family Lamiaceae. Mol Plant. 16: 549-570.\u003c/li\u003e\n\u003cli\u003eWolff J, Rabbani L, Gilsbach R, Richard G, Manke T, Backofen R, Gruning BA . Galaxy HiCExplorer 3: a web server for 976 reproducible Hi-C, capture Hi-C and single-cell Hi-C data analysis, quality control and 977 visualization. Nucleic Acids Res. 2020; 48: W177-184.\u003c/li\u003e\n\u003cli\u003eEmms DM, Kelly S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 2019; 20:1-14.\u003c/li\u003e\n\u003cli\u003eWu T, Hu E, Xu S, Chen M, Guo P, Dai Z, Feng T. ClusterProfiler 4.0: A universal enrichment tool for interpreting omics data. Innovations. 2021; 2: 100141.\u003c/li\u003e\n\u003cli\u003eYu HW, Wang HX, Liang X, Liu J, Jiang C, Chi XL, Zhi NN, PSu P, Zha LP, Gui SY. Telomere-to-telomere gap-free genome assembly provides genetic insight into the triterpenoid saponins biosynthesis in \u003cem\u003ePlatycodon grandiflorus. \u003c/em\u003eHortic Res. 2025; 12: uhaf030\u003c/li\u003e\n\u003cli\u003eCapella-Guti\u0026eacute;rrez S, Silla-Mart\u0026iacute;nez JM, Gabald\u0026oacute;n T. Al: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics (Oxford, England). 2009; 25: 1972-3. \u003c/li\u003e\n\u003cli\u003eMamta, Shikha S, Sangeeta K, Poonam P, Aasim M Gopal S, Ram KSa. High-quality haplotype-resolved chromosome assembly provides evolutionary insights and targeted steviol glycosides (SGs) biosynthesis in \u003cem\u003eStevia rebaudiana\u003c/em\u003e Bertoni. Plant Biotechnol J. 2024; 22: 3262-3277.\u003c/li\u003e\n\u003cli\u003eSun PC, Jiao BB, Yang YZ, Shan LX, Li T, Li XN, Xi ZX, Wang XY, Liu JQ. WGDI: a user-friendly toolkit for evolutionary analyses of whole-genome duplications and ancestral karyotypes. Mol Plant. 2022; 15: 1841-51.\u003c/li\u003e\n\u003cli\u003eGoel M, Sun HQ, Jiao WB, Schneeberger KB. SyRI: finding genomic rearrangements and local sequence differences from whole-genome assemblies. Genome Biol. 2019; 20: 1-13.\u003c/li\u003e\n\u003cli\u003eQiao X, Li QH, Yin H, Qin KJ, Li LT, Wang RZ, Zhang SL, Paterson AH. Gene duplication and evolution in recurring polyploidization-diploidization cycles in plants. Genome Biol. 2019; 20: 1-23.\u003c/li\u003e\n\u003cli\u003eZhang Z. KaKs_calculator 3.0: calculating selective pressure on coding and non-coding sequences. Genom Proteom Bioinf. 2022; 20: 536-40.\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"bmc-plant-biology","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"pbio","sideBox":"Learn more about [BMC Plant Biology](http://bmcplantbiol.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/pbio/default.aspx","title":"BMC Plant Biology","twitterHandle":"BMC_series","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"Comparative genomics, Genome assembly, C. chinense, Triterpenoids","lastPublishedDoi":"10.21203/rs.3.rs-7299302/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-7299302/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003e\u003cstrong\u003eBackground \u003c/strong\u003e\u003cem\u003eClinopodium chinense \u003c/em\u003eis an important medicinal plant belonging to the Lamiaceae. The desiccated roots of \u003cem\u003eC. chinense\u003c/em\u003e exhibit a variety of pharmacological properties and are utilized in traditional Chinese medicine.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eResults \u003c/strong\u003eWe present the first chromosome-level genome assembly of \u003cem\u003eC. chinense\u003c/em\u003e, comprising 20 pseudochromosomes with an aggregate size of 0.61 Gb and 45,466 protein-coding genes. The analysis of genome evolution indicated that two recent bursts of long terminal repeats (LTRs) significantly increased the size of the \u003cem\u003eC. chinense\u003c/em\u003egenome. Additionally, numerous large-scale chromosomal rearrangements have been identified between the genomes of \u003cem\u003eC. chinense\u003c/em\u003e and \u003cem\u003eThymus quinquecostatu \u003c/em\u003egenomes. Through comparative genomics studies, it was found that a recent whole-genome duplication event unique to Labiatae plants has resulted in a notable expansion of gene families related to the biosynthesis of triterpenoids and flavonoids in \u003cem\u003eC. chinense\u003c/em\u003e. Subsequently, we identified several putative key genes responsible for triterpenoid biosynthesis. The results of our study offer new perspectives on the biosynthesis of triterpenoids and flavonoids, potentially advancing future investigations into the genetic and medicinal properties of \u003cem\u003eC. chinense\u003c/em\u003e.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConclusions \u003c/strong\u003eOur research outcomes offer new perspectives on the biosynthesis of triterpenoids and flavonoids, and may aid subsequent studies on the genetic properties and medicinal uses of \u003cem\u003eC. chinense.\u003c/em\u003e\u003c/p\u003e","manuscriptTitle":"A Chromosome-Level Genome Assembly Reveals triterpenoid biosynthesis in Clinopodium chinense","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-09-12 15:48:32","doi":"10.21203/rs.3.rs-7299302/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2025-10-28T08:03:59+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-10-28T01:04:06+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"38193104732564827350590636266767960362","date":"2025-10-27T08:17:39+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-10-25T18:40:17+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"268033918703181653464364986040002228523","date":"2025-10-09T13:31:13+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"158201394254961067605022369421424215354","date":"2025-09-25T01:51:22+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2025-09-08T03:53:06+00:00","index":"","fulltext":""},{"type":"editorInvited","content":"","date":"2025-09-02T23:26:55+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2025-08-08T11:14:56+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2025-08-08T03:57:52+00:00","index":"","fulltext":""},{"type":"submitted","content":"BMC Plant Biology","date":"2025-08-08T03:54:45+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"bmc-plant-biology","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"pbio","sideBox":"Learn more about [BMC Plant Biology](http://bmcplantbiol.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/pbio/default.aspx","title":"BMC Plant Biology","twitterHandle":"BMC_series","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"0f384a1a-3898-489a-8be9-1c8cb09dfb78","owner":[],"postedDate":"September 12th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[],"tags":[],"updatedAt":"2025-12-01T16:07:22+00:00","versionOfRecord":{"articleIdentity":"rs-7299302","link":"https://doi.org/10.1186/s12870-025-07841-8","journal":{"identity":"bmc-plant-biology","isVorOnly":false,"title":"BMC Plant Biology"},"publishedOn":"2025-11-29 15:58:10","publishedOnDateReadable":"November 29th, 2025"},"versionCreatedAt":"2025-09-12 15:48:32","video":"","vorDoi":"10.1186/s12870-025-07841-8","vorDoiUrl":"https://doi.org/10.1186/s12870-025-07841-8","workflowStages":[]},"version":"v1","identity":"rs-7299302","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-7299302","identity":"rs-7299302","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.