Genome Survey Indicated Complex Evolutionary History of Garuga Roxb. Species

preprint OA: closed
Full text JSON View at publisher
Full text 186,422 characters · extracted from preprint-html · click to expand
Genome Survey Indicated Complex Evolutionary History of Garuga Roxb. Species | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Genome Survey Indicated Complex Evolutionary History of Garuga Roxb. Species Dongbo Zhu, Rui Rao, Yu Du, Chunmin Mao, Rong Chen, Sun Hang, and 1 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-3905007/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 23 Oct, 2024 Read the published version in BMC Genomics → Version 1 posted 12 You are reading this latest preprint version Abstract Background Garuga Roxb. is a genus endemic to southwest China and other tropical regions in Southeast Asia facing risk of extinction due to the loss of tropical forests and changes in land use. Conducting a genome survey of G. forrestii contribute to a deeper understanding and conservation of the genus. Results This study utilized genome survey of G. forrestii generated approximately 54.56 GB of sequence data, with approximately 112 × coverage. K-mer analysis indicated a genome size of approximately 0.48 GB, smaller than 0.52GB estimated by flow cytometry. The heterozygosity is of about 0.54%, and a repeat rate of around 51.54%. All the shotgun data were assembled into 339,729 scaffolds, with an N50 of 17,344 bp. The average content of guanine and cytosine was approximately 35.16%. A total of 330,999 SSRs were detected, with mononucleotide repeats being the most abundant at 70.16%, followed by dinucleotide repeats at 20.40%. A pseudo chromosome of G. forrestii and a gemone of Boswellia sacra were used as reference genome to perform a primer population resequencing analysis within three Garuga species. PCA indicated three distinct groups, but genome wide phylogenetics represented conflicting both between the dataset of different reference genomes and between maternal and nuclear genome. Conclusion In summary, the genome of G. forrestii is small, and the phylogenetic relationships within the Garuga genus are complex. The genetic data presented in this study holds significant value for comprehensive whole-genome analyses, the evaluation of population genetic diversity, investigations into adaptive evolution, the advancement of artificial breeding efforts, and the support of species conservation and restoration initiatives. Ultimately, this research contributes to reinforcing the conservation and management of natural ecosystems, promoting biodiversity conservation, and advancing sustainable development. Genome survey K-mer Flow cytometry SNP Phylogenetic Garuga forrestii Nuclear-cytoplasmic conflict Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Figure 9 Figure 10 1 Background Garuga Roxb. is an endemic genus to southwest China and other tropical regions in Southeast Asia. These deciduous trees bloom between March and April, prior to the onset of the rainy season, produce fruits maturing from May to November, with the main maturation phase occurring in July and August. There are approximately five species or varieties within this genus, including G. forrestii W. W. Smith, G. floribunda var. gamblei (King ex Smith) Kalkm., G. floribunda var. floribunda Decne., G. pierrei Guill., and G. pinnata Roxb.. Four of these species are found in China, except G. floribunda var. floribunda . G. forrestii is an endemic species of China. Garuga forrestii , is a Chinese endemic species of this genus, distributed in the arid and warm river valleys of Yunnan, S.W. China, such as Jinshajiang River (JR), Lancangjiang River (LR), Red River (RR), Nanpanjiang River (NPR) and their tributaries. It is also the only species distributed beyond to the Tropic of Cancer, in the JR, extra north to the tropical regions. It extends to Leibo, Sichuan, making it the highest latitude distribution within the Burseraceae family. It distributed upper to the altitudes nearing 1600 meters at many points in the JR. It demonstrates an extraordinary ability to flourish under arid conditions. G. floribunda var. gamblei and G. pinnata grow in dense tropical forests of South Yunnan, further to Guangxi, Guangdong and Hainan. Our survey along the middle LR unveiled the coexistence of three species: G. forrestii , G . floribunda var. gamblei and G . pinnata . But individuals of this genus are all in danger of extinction due to the loss of tropical forests and changes of land uses. The taxonomy of Garuga has experienced substantial reorganization, resulting in alterations to their taxonomic classifications. G. pierrei previously considered a variety of G. pinnata , exhibits leaf characteristics similar to G. pinnata . However, distinctions arise in the short and soft hairs present on the axis and leaflets of G. pierrei , contrasting with the shorter hairs found on the latter. G. pierrei bears spherical fruits, G. pinnata 's fruits are nearly spherical and occasionally exhibit soft hairs. Nonetheless, typical G. pierrei specimens eluded discovery during our fieldwork. The fruiting attributes of G. floribunda var. gamblei share similarities with those of G . forrestii , yet notable differences manifest in terms of hair coverage, pedicel, fruit dimensions and shape. Within the Lancang-Mekong River region, these three species coexist, displaying a continuum of variations in tree structure and leaf morphology. Consequently, the current taxonomic classification needs further tests by phylogenetic and phylogenomic evidence. The Sino-Himalaya region is in the southeast margin of the Qinghai-Tibetan Plateau (QTP) hosting several Asian rivers, including the Mekong, RRB and JRB, along with their tributaries. Presently, these rivers flow separately to the ocean [ 1 , 2 ]. However, they once flowed southward into the Paleo-Red River (PRR), creating an extensive drainage network that ultimately emptied into the South China Sea millions of years before [ 3 , 4 ]. The PRR system disassembled subsequently due to uplift of QTP and river capture events since the late Miocene [ 4 ]. This reorganization disrupted the previously continuous distribution pattern, resulting in distinctive genetic and biogeographical attributes, facilitating species' genetic differentiation giving rise to new taxa [ 5 , 6 ]. The river captures also made separate ranges unique promoting lineage fusion and diversity accumulation [ 7 ]. Genome sequencing technology enables the thorough investigation of species evolution [ 8 ]. Next-generation sequencing, NGS, technique has greatly reduced the cost of DNA sequencing [ 9 ], and was widely used in most fields of genetics and genomics, providing researchers with a more convenient, refined, and comprehensive investigation method [ 10 ]. A genome survey is the most convenient strategy to provide a rough reference genome for those species without whole genome data [ 11 – 13 ]. This study employed genome survey to explore the genome size and characteristics of G . forrestii . The gathered genomic information from this research will serve to 1) learn deeper in our comprehension of the G . forrestii genome; 2) strive to construct a preliminary long contig or even a draft scaffold, offering a foundation for future evolutionary surveys of this genus. 2 Materials and methods 2.1 Sample collection and DNA extractions Fresh leaf from a mature G. forrestii individual were collected for genome survey during the summer of 2023 near to the Gangou Bridge in the Red River valley, Yunnan, China. Fresh leaves from other 25 individuals of three species ( G. floribunda var. gamblei , G. pinnata , G. forrestii ) in the Garuga genus. Among these, 6 individuals of G. floribunda var. gamblei , 8 of G. pinnata , and 11 of G. forrestii (Fig. 1; Table 1). No specific permissions were required for the collection of specimens for this study which were neither privately owned nor protected and the field study did not involve endangered or protected species. We complied with the IUCN Policy Statement on Research Involving Species at Risk of Extinction and the Convention on the Trade in Endangered Species of Wild Fauna and Flora. After the formal identification of the plant material was carried out by Liangliang Yue, voucher specimens were prepared and deposited at the herbarium of College of Wetlands, Southwest Forestry University (Yue, accession number Yue2023051032-1). A total of 26 samples were sequenced. The genomic DNA was isolated using the CTAB method [14]. DNA was detected by 0.8% agarose gel electrophoresis, while DNA was quantified by UV spectrophotometer. Table 1 Sampling Information Species Number of Individuals Population Province Small Scale Longitude Latitude G. pinnata 3 Pi_BB Yunnan Bubeng 101.585 21.60944 3 Pi_YXG Yunnan Yexianggu 100.8648 22.18184 1 Pi_SHZWY Yunnan Shanghaizhiwuyuan 121.4429 31.14649 1 Pi_DTS Yunnan Datianshan 99.87337 24.7832 G. floribunda var. gamblei 2 Fl_SMC Yunnan Sanmaicun 100.6193 22.00044 2 Fl_YXG Yunnan Yexianggu 100.8648 22.18184 1 Fl_ML Yunnan Menglong 100.7445 21.7359 1 Fl_DTBC Yunnan Datianbacun 99.88069 24.73952 G. forrestii 1 Fo_XP Yunnan Xinping 101.5044 24.14715 2 Fo_DC Yunnan Dacun 100.4178 25.01894 1 Fo_LJZ Yunnan Luzhijiang 101.9574 24.6514 2 Fo_HN Yunnan Huaning 102.9734 24.0748 2 Fo_KYC Yunnan Kaiyuancheng 103.2985 23.89117 1 Fo_NXH Yunnan Nanxihe 101.8507 23.64461 1 Fo_YJZ Sichuan Yinjiangzhen 101.7854 26.59825 1 Fo_FF Sichuan Fenfang 101.9786 26.76728 2.2 Genome size estimation by flow cytometry For the experimental material, we utilized tender leaves from tomato plants (genome size: 900 Mb) that were one month old following seed germination. The samples were carefully placed within 0.8 mL of pre-chilled MG b dissociation solution, composed of 45 mM MgCl 2 ·6H 2 O, 20 mM MOPS, 30 mM sodium citrate, 1% (W/V) PVP 40, 0.2% (V/V) Tritonx-100, 10 mM Na 2 EDTA, 20 µL/mL β-mercaptoethanol, and adjusted to pH 7.5. The tissues were swiftly sectioned vertically using a sharp blade and allowed to rest in the dissociation solution on ice for a duration of 10 minutes. Following this, the mixture was filtered through a 40-micron mesh to obtain a nuclear cell suspension. This suspension was then combined with a suitable volume of pre-chilled propidium iodide (PI) solution, having a stock concentration of 1 mg/mL, along with a fitting amount of RNAase solution at a stock concentration of 1 mg/mL. The combined mixture was subsequently subjected to a dark ice-cold staining process for a period of 0.5-1 hour. The effective concentration of both the PI staining solution and the RNAase solution was maintained at 50 µg/mL [15, 16]. The stained suspension of nuclear cells underwent detection using a BD FACScalibur flow cytometer. This involved utilizing a 488 nm blue light excitation to measure the emitted fluorescence intensity of propidium iodide. Each detection cycle involved the collection of 10,000 particles. The coefficient of variation (CV%) was maintained at a level below 5%. Modifit3.0 software was employed to conduct graphing and analysis. The genome size was calculated using the subsequent formula [17]: $$Sample genome size = standard genome size \times \frac{sample{G}_{0}/{G}_{1}peak mean}{standard{G}_{0}/{G}_{1}peak mean}$$ 2.3 Genome Sequencing and Data Quality Control Dried leaves from the mature tree of G. forrestii was sent to Personalbio company for paired-end sequencing. We followed the standard protocol using Illumina's TruSeq DNA PCR-free prep kit reagents for sequencing library preparation. Extracted DNA sample underwent random shearing through ultrasonication, followed by end repair to eliminate overhanging bases at the 5' end and to add a phosphate group while filling in missing bases at the 3' end. To prevent self-ligation of DNA fragments and ensure compatibility with sequencing adaptors, an A base was added to the 3' end of the DNA sequence. Sequencing adaptors with library-specific tags were ligated to the 5' end of the DNA sequence, facilitating the immobilization of DNA molecules onto the Flow Cell. We used AMPureXP beads (Beckman Coulter, Brea, CA) for a selective removal of adaptor-ligated fragments and purification of the resulting library system. Subsequently, PCR amplification was performed on DNA fragments ligated with adaptors to enrich the sequencing library templates. A second purification step with BECKMAN AMPure XP Beads was carried out to purify the enriched library products. Finally, we conducted 2% agarose gel electrophoresis to select and purify the library's final fragments. The resulting library insert fragments were approximately 400 bp in size. We performed paired-end sequencing with 2 × 150 bp reads using an Illumina NovaSeq instrument (Table 2). Furthermore, we also subjected the remaining 25 samples to sequencing processing, including chloroplast genome sequencing. Table 2 Sequencing Overview Sample Insert Size Sequencing platform Sequencing Mode G. forrestii 400 bp Illumina Novaseq Paired-end, 2×150bp 2.4 High-quality data acquisition Raw sequence was filtered using fastp [18] and using the sliding window method to generate high quality sequence (high quality data). The size of the slide window is set to 5 bp. Slide the window from the 3' end to the 5' end, calculate the average Q value of the bases in the window, if the Q value is < 20, delete the bases in the window; if the Q value is ≥ 20, stop sliding. Length filtering, if the length of any one reads in the bipartite end ≤ 50 bp, then remove the bipartite end reads. fuzzy base N filtering, if the number of N bases in the bipartite end ≥ 5, then remove the bipartite end reads. 2.5 Ploidy analysis and k-mer analysis Smudgeplot software [19] was used to analyze the genome structure and to count the number of heterozygous k-mer pairs by comparing the total number of k-mer pair coverage (CovA + CovB) and relative coverage (CovB / (CovA + CovB)) and to obtain the genomic ploidy. The distribution of k-mer was calculated using jellyfish software [20], which gives information about the heterozygosity of G. forrestii , the proportion of repetitive sequences, and then genome size estimation based on the distribution of k-mer frequencies and the number of 19-mer. 2.6 De novo assembly and GC content analysis The pair-end DNA sequencing data was de novo assembled with standard parameters using the MEGAHIT [21]. Contigs were further spliced into scaffold levels using SOAPdenovo [22] software. Sliding window calculations were performed using a window size of 10 kb. The average depth and GC content of each window were computed. The quality of genome assembly was assessed using the Assembly-stats utility, which calculated metrics such as N50, scaffold count, scaffold size, scaffold length, and genome length. Before chloroplast assembly, low-quality sequences were filtered using SOAPnuke [23], and de novo assembly was performed following the pipeline of GetOrganelle [24]. 2.7 Identification of microsatellite motifs Microsatellite Identification Software (MISA) [25] was utilized to detect microsatellite patterns within the scaffolds generated above. The minimum number of repeats required for identifying mononucleotide, dinucleotide, trinucleotide, tetranucleotide, pentanucleotide, and hexanucleotide repeat sequences is set as follows: mononucleotide repeats with less than 10 repetitions, dinucleotide repeats with less than 6 repetitions, and all other repeat types with less than 5 repetitions [26]. 2.8 Population resequencing variation detection and filtering of SNP data The Picard [27], BWA [28], and Samtools [29] software is used to build the genome index and perform processes such as mapping, sorting, and deduplication, all aimed at subsequent variation detection. We used G. forrestii (we sampled, sequenced, and assembled the genome to ensure its usability as a reference genome.) and Boswellia sacra Flück. (downloaded from NCBI, accession: SNVD00000000) as reference genomes for mapping the 25 individuals. The GATK [30] software was employed to perform variant calling using HaplotypeCaller for each sample. This study involved the assembly of the genome of an individual of G. forrestii at the scaffold level, as well as the genomes of the remaining 25 individuals, which included 6 G. floribunda var. gamblei , 8 G. pinnata , and 11 G. forrestii . Each genome of the mentioned individuals was divided into 10 pseudo-chromosomes. The GATK software was applied to filter and extract SNPs and INDELs. 2.9 Phylogenetic reconstruction and PCA We applied Plink [31] software to perform LD (Linkage Disequilibrium) filtering on the SNP data, resulting in a refined VCF file to generate an alignment for phylogenomic reconstruction and PCA analysis. The dataset encompassed nucleotide polymorphisms from the whole genomes of 25 individuals of the three species. We employed iqtree [32] to construct two Maximum Likelihood (ML) trees for the nuclear genome respectively using B. sacra Flück. and Garuga forrestii as reference genomes. Phylogenomic trees were visualized using Figtree [33]. The PCA based on the SNP data extracted from the 25 samples was conducted using Plink [31] and GCTA [34] software, followed by visualization of the results using an R script. We also constructed ML trees using iqtree [32] for the chloroplast genomes of 25 individuals. 3 Results 3.1 Genome size estimation by flow cytometry The utilization of flow cytometry analysis produced a detailed high-resolution histogram (Fig. 2; Table 3), revealing a fluorescence intensity of 17.04 for the sample and 29.67 for the internal reference. The ratio between these values is 0.57, leading to an estimated genome size of approximately 0.52 GB. Table 3 Flow Cytometry Information Sample Reference Selection Reference Fluorescence Intensity Fluorescence Intensity of the Test Sample Ratio Genome Size (Gb) G. forrestii tomato 29.67 17.04 0.57 0.52 3.2 Sequence filtering and data quality control Insertion of paired-end libraries with a size of 400 bp (Table 2) yielded approximately ~ 54.56 GB of raw sequence (Table 4). The percentage of bases with quality score ≥ Q20 was 97.46% and ≥ Q30 was 92.86% (Table 4). Approximately 1.1% of the low-quality data was filtered, and the remaining ~ 53.97 GB was used for downstream analysis (Table 5). Table 4 Sequencing Data Statistics Sample Read Num Total base N_rate GC_Content (%) Q20_rate Q30_rate G. forrestii 361299934 54556290034 0 35.1 97.46 92.86 Table 5 High-Quality Data Statistics Sample HQ Reads HQ Reads (%) HQ Data (bp) HQ Data (%) G. forrestii 358261400 99.16 53967212246 98.92 The GC content is 35.10% (Table 4). The base distribution showed no separation of AT and GC either at the R1 end or the R2 end (Fig. 3), which laid the foundation for the subsequent quantitative analysis. The average sequencing error rates for single-base loci were all less than 0.1% (Fig. 4) 3.3 K-mer analysis and ploidy estimation K-mer analysis provides an estimate of the genome size based on the substrings of length k contained in the biological sequence. The data volume was multiplied to achieve a coverage of 112x. Besides, this it also indicates low quality or contamination in the sequences. 19-mer frequency analysis identified the 90X depth as the dominant peak based on the k-mer count (Fig. 5). After dividing the k-mer number by 90, the genome size was predicted to be ~ 483 Mb (483,508,243 bp). The other peak at 1/2 the depth of the main peak (45X) is most likely due to heterozygosity (~ 0.54%), While the peak at twice the depth of the main peak (180X) is caused by the repetitive sequence (~ 51.54%). The results of ploidy assessment yielded 83% AB, 12% AAB, and 5% AABB (Fig. 6), from which G. forrestii was predicted to be diploid, and this ploidy assessment is for reference only. 3.4 De-novo assembly and gene prediction A total of 360 million paired-end reads were utilized to establish the initial genome assembly of G. forrestii . The genome, assembled using SOAPdenovo software, comprises 339,729 scaffolds (509.9 Mb) with scaffolds exceeding 200 bp chosen to exclude low-quality sequences. The genome size, excluding Ns, accounts for 499,986,768 bp, and the final assembly exhibits a low value of Ns content of roughly 0.01%. The largest scaffold spans 345,990 bp, with N50 = 17,344 bp, N90 = 293 bp (Table 6). Table 6 Statistics of Contigs and Scaffolds Assembly Level Number Sum (bp) Longest Seq N50 N90 N count Gaps Software contig 514298 516660543 179938 6972 280 MEGAHIT scaffold 339729 509934743 345990 17344 293 6439439 146457 SOAPdenovo 3.5 Analysis of GC content The resultant scaffolds longer than 200 bases in length were chosen. A window size of 10 kb was used for non-repetitive advancement in the sequence and calculation of the mean depth and GC content of every window to generate a GC depth plot. Most windows displayed GC content ranging from 25–50%, resulting in an estimated GC ratio of ~ 35.16% (Fig. 7). We did not identify any significant regions with abnormal accumulation. This suggests that the DNA samples used for sequencing were not mixed with DNA from other species. 3.6 Identification of SSRs The assembled scaffolds were searched for the presence of SSR markers by using the MISA software. It yielded 330,999 SSR motifs (Table 7). Among these, mononucleotides were the largest in number (232,225; 70.16%), dinucleotides were the second in number (67,518; 20.40%), which was followed by trinucleotide (23,506; 7.10%), tetranucleotide (5,103; 1.54%), pentanucleotide (1,472; 0.44%), and hexanucleotide (1,175; 0.35%) SSR markers (Fig. 8a). Table 7 SSR Information Total number of sequences examined: 339729 Total size of examined sequences (bp): 506426207 Total number of identified SSRs: 330999 Number of SSR containing sequences: 6914 Number of sequences containing more than 1 SSR: 34647 Number of SSRs present in compound formation: 51666 Within mononucleotide repeat sequences, the highest content is represented by A/T (230,708; 99.35%) (Fig. 8b). Among dinucleotide repeat sequences, the predominant composition is AT/AT (48,604; 71.99%), followed by AG/CT (12,133; 17.97%), AC/GT (6,723; 9.96%), and CG/CG (58; 0.09%) (Fig. 8c). Regarding trinucleotide repeat sequences, the principal repetitive motifs include AAT/ATT, AAG/CTT, ATC/ATG, and AAC/GTT, accounting for contents of 16,427 (69.88%), 3,005 (12.78%), 1,337 (5.69%), and 1,062 (4.52%), respectively (Fig. 8d). 3.7 Phylogenetic reconstruction and PCA Both two nuclear genomic reconstruction suggested that the G. forrestii located at the basal position (Fig. 9ab). However, plastid phylogenomic reconstruction indicated conflict structure. G. floribunda var. gamblei and G. pinnata are clustered at basal position, (Fig. 9c). The Maximum Likelihood (ML) reconstruction of the nuclear genome with G. forrestii as the reference genome, reveals that G. forrestii constitutes a monophyletic lineage in the Clade-A. G. pinata and G. floribunda var. gamblei individuals clustered into mixed clades. In the Clade-B, 2 individuals of G. floribunda var. gamblei (from populations of FL_DTBC and FL_YXG) were found mixed with 6 individuals of G. pinata (from Pi_YXG, Pi_SHZWY and Pi_DTS). In the Clade-C, 2 individuals of G. pinata (from of Pi_YXG) were mixed with 4 individuals of G. floribunda var. gamblei (from Fl_YXG, Fl_ML and Fl_SMC) (Fig. 9a). Phylogenomic reconstruction based on dataset using B. sacra as the reference genome revealed similar patterns mentioned above, with differences lie in the internal topology of the branches (Fig. 9ab). But G. forrestii does not form a monophyletic lineage (Clade-F, Clade-G and Clade-H), mixing free with the other two species (Fig. 9b). The plastid ML tree shows that G. forrestii and G. pinata are mixed with each other in the Clade-D ((Fig. 9c), G. pinata and G. floribunda var. gamblei are mixed in Clade-E (Fig. 9c). 25 individuals clustered into three distinct groups in the PCA (Fig. 10). In the first group, two individuals of G. floribunda var. gamblei (from Fl_DTBC and Fl_YXG) were found mixed within the G. pinata group, whereas within the predominantly G. floribunda var. gamblei group, two individuals of G. pinata (from Pi_YXG) were also mixed. 4 Discussion 4.1 Genome Characterization We observed variations in genome size when measuring it using both k-mer analysis and flow cytometry for the same individual. In this study, for the first time, we conducted genome survey sequencing on G. forrestii and obtained ~ 53.97 GB of clean data. The 19 K-mer analyses showed that the G. forrestii genome was approximately 483.51 Mb, slightly smaller than that by flow cytometry (~ 520.00 Mb) (Fig. 2 , 5 ). We observed similar variations in other published studies and statistically analyzed the differences among these 15 species (Table 8 ). The genomic size differences obtained by these two methods ranged from 16.00 Mb to 138.09 Mb, with G. forrestii showing a difference of 37 Mb, falling within the range of differences we calculated and being lower than the average difference of 82.93 Mb. Different principles of the two methods determined the differences. Flow cytometry is a technique based on staining undamaged nuclei with a fluorescent dye that adheres quantitatively to the DNA to calculate the amount of DNA. Processes are varying in different laboratories in sample preparation, staining/staining strategies. Random drift of the instruments may lead to significant deviation in genome size estimation [ 16 , 35 ]. K-mer approach emerged as a computational technique to generate a k-mer frequency distribution (similar to a Poisson distribution) by plotting the coverage distribution of all k-mers in a sequence, where the peak of the distribution would be centered on the average sequencing depth of the genome. The genome size would be better inferred by directly sequencing reads and analyzing the frequency of k-mers [ 36 ]. Table 8 Genome Size Difference between Flow Cytometry and K-mer Species Genome Size of K-mer Genome Size of Flow Cytometry Absolute Value (Low Flow Cytometry - K-mer) Absolute Value (High Flow Cytometry - K-mer) Average Article Carex cristatella 317.3 255.2 ± 13.1 75.2 49 62.1 Genome Survey Sequencing for the Characterization of the Genetic Background of Rosa roxburghii Tratt and Leaf Ascorbate Metabolism Genes Carex scoparia 294.5 268.8 ± 9.7 35.4 16 25.7 Juncus effusus 225.5 198.9 ± 4.6 31.2 22 26.6 Juncus inflexus 196 286.4 ± 4.7 85.7 95.1 90.4 Raddia distichophylla 608 589 19 19 The draft genome sequence of herbaceous diploid bamboo Raddia distichophylla Psammochloa villosa 1564 1503.27 ± 3.41 64.14 57.32 60.73 Estimation of genome size for Psammochloa villosa by flow cytometry and K-mer analysis Reseda lutea 934 867.7 66.3 66.3 Estimation of Genome Size in the Endemic Species Reseda pentagyna and the Locally Rare Species Reseda lutea Using comparative Analyses of Flow Cytometry and K-Mer Approaches R. pentagyna 1022 896 126 126 Aspalathus linearis 1070 1.24 ± 0.01 180 160 170 Rooibos (Aspalathus linearis) Genome Size Estimation Using Flow Cytometry and K-Mer Analyses Acer henryi 561.72 691.12 ± 8.69 120.71 138.09 129.4 Estimation of genome sizes of six Acer species by flow cytometry and K-mer analysis A. buergerianum 743 863.90 ± 8.69 112.21 129.59 120.9 A. elegantulum 777.87 896.50 ± 4.35 114.28 122.98 118.63 A. griseum 771.51 893.24 ± 8.69 113.04 130.42 121.73 A. pentaphyllum 650.64 766.10 ± 8.69 106.77 124.15 115.46 A. tegmentosum 1103.46 1 154.04 ± 13.04 37.9 63.98 50.94 GC content of G. forrestii is ~ 35.16% (Fig. 7 ), based on most of the plant genetic data summarized by previous authors, it was found that most of the GC content ranged from 30–47% [ 37 ]. This value can serve as an indicator of genome stability, as DNA sequences with higher GC content tend to exhibit tolerance to extreme temperatures and drought [ 38 , 39 ]. We assessed the genomic GC content of three species at shallow sequencing depths, assembled to the contig level. It was observed that G. forrestii has the lowest GC content (~ 33.60%), G. floribunda var. gamblei has the highest (~ 33.94%), while G. pinnata falls in between (~ 33.81%) (Table 9 ). This GC content may be due to the species' adaptation to environmental stress [ 40 ], and it may be similar. As a report found that the highest GC content was found in gramineous plants [ 41 ]. During the Tertiary period's global cooling, Poaceae plants underwent differentiation and adaptation to this stress, consequently establishing their dominance in thriving under today's extreme climatic conditions [ 42 – 44 ]. Two species of G. floribunda var. gamblei and G. pinnata did suffer more interspecific competition from other higher tree species from rainforests and from longer, hotter and drier situations in South Yunnan area. Table 9 Three Species GC Content Species Individual GC Content (%) G. pinnata Pi_BB 33.7 Pi_BB 33.67 Pi_BB 33.87 Pi_YXG 33.59 Pi_YXG 33.63 Pi_YXG 34.29 Pi_SHZWY 33.97 Pi_DTS 33.5 Average 33.7775 G. floribunda var. gamblei Fl_SMC 34.52 Fl_SMC 34.51 Fl_YXG 33.63 Fl_ML 33.56 Fl_DTBC 33.64 Average 33.972 G. forrestii Fo_XP 33.59 Fo_DC 33.69 Fo_DC 33.59 Fo_LJZ 33.62 Fo_HN 33.81 Fo_HN 33.6 Fo_KYC 33.91 Fo_KYC 33.7 Fo_NXH 33.7 Fo_YJZ 33.52 Fo_FF 33.57 Average 33.66363636 The individual Fl_YXG of G. floribunda var. gamblei was not included in the calculation. Plant genomes present the most challenging task for sequencing and assembly due to their high levels of heterozygosity, complex polyploidy, and abundant repeat content [ 45 ]. A genome with a heterozygosity exceeding 1% becomes highly challenging for de novo assembly [ 46 ]. However, the K-mer analysis in this study indicated a relatively low heterozygosity of approximately 0.54% in the G. forrestii genome (Fig. 6 ), making de novo assembly in this aspect more accurate. The repeat rate of the genome is 51.54%, This high repeat rate likely due to gene family expansion [ 47 ], gene transposition activity [ 48 ] and genome recombination [ 49 ]. Chromosome-level assembly for G. forrestii is needed to gain deeper insights into its genome and facilitate a comprehensive understanding of speciation and population dynamics of this species. SSR has consistently been one of the most preferred molecular markers for plant genotyping due to its high level of polymorphism, wide distribution in the majority of plant genomes, user-friendly, and anticipated significance as a valuable tool for numerous species in the future [ 50 – 52 ]. Mononucleotide repeats are the predominant type, in accordance with previous reports [ 53 ]. The high abundance of AT/AT (48,604; 71.99%) motifs (Fig. 8 c) in our study was consistent with some earlier genomic surveys. As an illustration, Akebia trifoliata exhibits an AT/AT content of 50.21% [ 54 ], while Cunninghamia lanceolata (Lamb.) Hook. showcases a content of 59% [ 55 ], and Acer truncatum Bunge boasts a significantly higher AT/AT content at 71.31% [ 56 ], confirming their representation as the most typical dinucleotide motifs in higher plants. We investigated several species exited also in warm and hot valleys, Terminalia franchetii Gagnep. [ 57 ], Osteomeles schwerinae Schneid. [ 58 ], Nouelia insignis Franch. [ 59 ], Buddleja crispa Benth. [ 60 ] and Excoecaria acerifolia Didr. [ 61 ] for a comparison analysis. No raw reads of four species mentioned can be found in the GenBank for genome survey. Our data would provide a first view for evolutionary pattern of this area. 4.2 Nuclear-Cytoplasmic Conflict In this study, the topography of the basal clades is different among phylogenomic trees. The G. forrestii referred tree indicates that G. forrestii is monophyletic, while the B. sacra referred tree suggests polyphyletic. For if we used a reference genome from other genus could have overestimated more ancient characters in the dataset. Consequently, a significantly higher number of SNP loci were generated through the alignment, three times more than those produced in the latter (Table 10 ). The disparity in SNP loci counts results in the observed differences in the topological structures of the two nuclear gene phylogenetic trees [ 62 ]. Table 10 SNP Calling Information Reference Number of Individual Number of SNP Sites VCF Size (Mb) G. forrestii 26 244936 43 B. scara 26 79691 15.5 The topological structure distinctly shows a significant increase in the branch length of Fo_HN relative to other taxonomic units, designating it as the earliest diverging species. This could indicate a result of long-branch attraction, explaining the anomalous position in the phylogenetic tree [ 63 , 64 ]. The basal clade of the chloroplast genome phylogenetic tree is composed of G. pinnata and G. floribunda var. gamblei , showing a nuclear-cytoplasmic conflict with the two nuclear gene trees. Hybridization could properly explain [ 65 , 66 ]. The rapid elevation of the QTP at the beginning of the Paleogene, and the formation and intensification of the monsoon climate zone brought abundant precipitation leading to river erosion, and that the rapid downcutting and uptake events led to changes in the spatial pattern of the PRR drainage system, which in turn affected the geographic distribution pattern of plants [ 1 , 2 , 67 , 68 ]. Their sympatric distribution in river valleys may produce interspecific hybridization as otherwise disconnected river systems are linked together, making it easier for species to migrate and spread along river valleys. A previous report on the study of three river valley species of the genus Ostryopsis Decne indicates that the inconsistent interspecific relationship of Ostryopsis intermedia B. Tian & J. Q. Liu with the other two species ( Ostryopsis davidiana Decaisne and Ostryopsis nobilis I. B. Balfour et W. W. Smith) recovered from two different sets of molecular markers strongly indicates its origin through hybrid speciation in the southeast Qinghai-Tibet Plateau through hybrid speciation [ 69 ]. To address this issue in the Garuga genus, we plan to conduct further Hi-C or Hi-Fi sequencing in the future, studying at the chromosomal level to acquire additional genetic information. By employing sophisticated models, we aim to consider this situation comprehensively and reconstruct the evolutionary history between species more accurately. 5 Conclusion This study conducted a preliminary investigation into the genome size and characteristics of G. forrestii for the first time and reconstructed the phylogenetic relationships of three species within the Garuga genus, namely G. floribunda var. gamblei , G. pinnata , and G. forrestii . It provided valuable genomic resources for the further exploration and utilization of G. forrestii and offered initial insights into the phylogenetic relationships within the Garuga genus. However, there is further work to be done as the construction of two nuclear gene phylogenetic trees for these three species resulted in differences in the topological structure at the base clade ( G. forrestii ) with one being monophyletic and the other polyphyletic. Additionally, conflicts arose between the two nuclear gene phylogenetic trees and the chloroplast genome phylogenetic tree (where the base clade in the nuclear gene trees is G. forrestii , while in the chloroplast genome tree, it includes G. floribunda var. gamblei , G. pinnata , and G. forrestii ). To clearly explain the differences in the topology of nuclear trees and the nuclear-cytoplasmic conflict with the chloroplast tree, we plan to supplement the study with Hi-C or Hi-Fi sequencing at the chromosomal level in future whole-genome sequencing. This will lead to better assembly results, and we will use complex models to study the aforementioned issues. The genetic data presented in this study holds significant value for comprehensive whole-genome analyses, the evaluation of population genetic diversity, investigations into adaptive evolution, the advancement of artificial breeding efforts, and the support of species conservation and restoration initiatives. Ultimately, this research contributes to reinforcing the conservation and management of natural ecosystems, promoting biodiversity conservation, and advancing sustainable development. Declarations Ethics approval and consent to participate Not applicable Consent for publication Not applicable Availability of data and materials A raw sequence data of Garuga floribunda var. gamblei were deposited in NCBI under the BioProject ID: PRJNA783803. The other raw sequence data reported in this paper have been deposited in the Genome Sequence Archive (Genomics, Proteomics & Bioinformatics 2021) in National Genomics Data Center (Nucleic Acids Res 2022), China National Center for Bioinformation / Beijing Institute of Genomics, Chinese Academy of Sciences (GSA: CRA014668, CRA014684, CRA014694, CRA014677) that are publicly accessible at https://ngdc.cncb.ac.cn/gsa. The whole genome sequence data reported in this paper have been deposited in the Genome Warehouse in National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences / China National Center for Bioinformation, under accession number GWHERCJ00000000 that is publicly accessible at https://ngdc.cncb.ac.cn/gwh. The chloroplast assembly and annotation data for this study are deposited in NCBI with accession numbers PP337695-PP337718. Competing interests The authors declare that they have no competing interests Funding This research was funded by the National Natural Science Foundation of China (grant No. 42161015). Authors’ contribution LLY and HS conceived the study. CMM collected plant materials and drew the figure. RC organized the plant materials. RR performed data analysis for SNP calling and generated plots. DBZ carried out genome assembly, data analysis, figure generation, and contributed to the paper writing. DBZ and RR contribute equally in this study. All authors read and approved the final manuscript. Authors’ information (1) Authors and Affiliations Southwest Forestry University, Kunming, PR, 650224, China Dongbo Zhu ( [email protected] ) Southwest Forestry University, Kunming, PR, 650224, China Rui Rao ( [email protected] ) Technology Center of Kunming Customs, Kunming, PR, 650228, China Yu Du ( [email protected] ) Southwest Forestry University, Kunming, PR, 650224, China Chunmin Mao ( [email protected] ) Southwest Forestry University, Kunming, PR, 650224, China Rong Chen ( [email protected] ) Kunming Institute of Botony, Chinese Academy of Science, Kunming, PR, 650201, China Hang Sun ( [email protected] ) Yunnan Key Laboratory of Plateau Wetland Conservation, Restoration and Ecological Services, Southwest Forestry University, Kunming, PR, 650224, China. National Plateau Wetland Research Center, Kunming, PR, 650224, China Liang-Liang Yue ( [email protected] ) (2) Contributions LLY and HS conceived the study. CMM collected plant materials and drew the figure. RC organized the plant materials. RR performed data analysis for SNP calling and generated plots. DBZ carried out genome assembly, data analysis, figure generation, and contributed to the paper writing. DBZ and RR contribute equally in this study. All authors read and approved the final manuscript. (3) Corresponding authors Correspondence to Liangliang Yue and Hang Sun (4) Acknowledgements We appreciate the assistance of Yanchun Liu from the Shanghai Botanical Garden and Jinlong Dong from the Xishuangbanna Tropical Botanical Garden for their support in sample collection for this study. References Spicer RA, Farnsworth A, Su T: Cenozoic topography, monsoons and biodiversity conservation within the Tibetan Region: An evolving story . Plant Diversity 2020, 42 (4):229-254. Nie J, Ruetenik G, Gallagher K, Hoke G, Garzione CN, Wang W, Stockli D, Hu X, Wang Z, Wang Y et al : Rapid incision of the Mekong River in the middle Miocene linked to monsoonal precipitation . Nature Geoscience 2018, 11 (12):944-948. Ming Q, Shi Z, Zhang H: The evolution of the landform and environment in the region of the three parallel rivers . Tropical geography 2006, 26 (2):122. Clark MK, Schoenbohm LM, Royden LH, Whipple KX, Burchfiel BC, Zhang X, Tang W, Wang E, Chen L: Surface uplift, tectonics, and erosion of eastern Tibet from large ‐scale drainage patterns . Tectonics 2004, 23 (1). Brookfield ME: The evolution of the great river systems of southern Asia during the Cenozoic India-Asia collision: rivers draining southwards . Geomorphology 1998, 22 (3-4):285-312. Sun H, Li Z, Landis JB, Qian L, Zhang T, Deng T: Effects of drainage reorganization on phytogeographic pattern in Sino-Himalaya . Alpine Botany 2021, 132 (1):141-151. Sun H, Zhang J, Deng T, Boufford DE: Origins and evolution of plant diversity in the Hengduan Mountains, China . Plant Diversity 2017, 39 (4):161-166. Yu T, Hu Y, Zhang Y, Zhao R, Yan X, Dayananda B, Wang J, Jiao Y, Li J, Yi X et al : Whole-Genome Sequencing ofAcer catalpifoliumReveals Evolutionary History of Endangered Species . Genome Biology and Evolution 2021, 13 (12). Hert DG, Fredlake CP, Barron AE: Advantages and limitations of next ‐generation sequencing technologies: A comparison of electrophoresis and non ‐electrophoresis methods . Electrophoresis 2008, 29 (23):4618-4626. Aird D, Ross MG, Chen W, Danielsson M, Fennell T, Russ C, Jaffe DB, Nusbaum C, Gnirke A: Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries . Genome Biology 2011, 12 (2):1-14. Bi Q, Zhao Y, Cui Y, Wang L: Genome survey sequencing and genetic background characterization of yellow horn based on next-generation sequencing . Molecular Biology Reports 2019, 46 (4):4303-4312. Huang G, Cao J, Chen C, Wang M, Liu Z, Gao F, Yi M, Chen G, Lu M: Genome Survey of Misgurnus Anguillicaudatus to Identify Genomic Information, Simple Sequence Repeat (SSR) Markers and Mitochondrial Genome . Reaserch Square 2021. Liang X, Bai T, Wang J, Jiang W: Genome survey and development of 13 SSR markers in Eucalyptus cloeziana by NGS . Journal of Genetics 2022, 101 (2). Doyle JJ: A rapid DNA isolation procedure for small quantities of fresh leaf tissue . Phytochem Bull 1987, 19 :11-15. Dolezel J: Plant DNA Flow Cytometry and Estimation of Nuclear Genome Size . Annals of Botany 2005, 95 (1):99-110. Doležel J, Greilhuber J, Suda J: Estimation of nuclear DNA content in plants using flow cytometry . Nature Protocols 2007, 2 (9):2233-2244. Xinming T, Xiangyan Z, Na G: Applications of Flow Cytometry in Plant Research—Analysis of Nuclear DNA Content and Ploidy Level in Plant Cells . Chinese Agricultural Science Bulletin 2011, 27 (9):21-27. Chen S, Zhou Y, Chen Y, Gu J: fastp: an ultra-fast all-in-one FASTQ preprocessor . Bioinformatics 2018, 34 (17):i884-i890. Ranallo-Benavidez TR, Jaron KS, Schatz MC: GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes . Nature Communications 2020, 11 (1):1432. Marçais G, Kingsford C: A fast, lock-free approach for efficient parallel counting of occurrences of k-mers . Bioinformatics 2011, 27 (6):764-770. Li D, Liu C, Luo R, Sadakane K, Lam T: MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph . Bioinformatics 2015, 31 (10):1674-1676. Luo R, Liu B, Xie Y, Li Z, Huang W, Yuan J, He G, Chen Y, Pan Q, Liu Y et al : SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler . GigaScience 2012, 1 (1):2047-2217X-2041-2018. Chen Y, Chen Y, Shi C, Huang Z, Zhang Y, Li S, Li Y, Ye J, Yu C, Li Z et al : SOAPnuke: a MapReduce acceleration-supported software for integrated quality control and preprocessing of high-throughput sequencing data . GigaScience 2018, 7 (1):gix120. Jin J, Yu W, Yang J, Song Y, dePamphilis CW, Yi T, Li D: GetOrganelle: a fast and versatile toolkit for accurate de novo assembly of organelle genomes . Genome Biology 2020, 21 (1):241. Thiel T, Michalek W, Varshney R, Graner A: Exploiting EST databases for the development and characterization of gene-derived SSR-markers in barley (Hordeum vulgare L.) . Theoretical and Applied Genetics 2003, 106 (3):411-422. Beier S, Thiel T, Münch T, Scholz U, Mascher M, Valencia A: MISA-web: a web server for microsatellite prediction . Bioinformatics 2017, 33 (16):2583-2585. Wysokar A, Tibbetts K, McCown M, Homer N, Fennell T: Picard: A set of tools for working with next generation sequencing data in BAM format . Retrieved Aug 2014. Li H, Durbin R: Fast and accurate short read alignment with Burrows–Wheeler transform . Bioinformatics 2009, 25 (14):1754-1760. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R: The Sequence Alignment/Map format and SAMtools . Bioinformatics 2009, 25 (16):2078-2079. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M et al : The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data . Genome Res 2010, 20 (9):1297-1303. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, De Bakker PI, Daly M: PLINK: a tool set for whole-genome association and population-based linkage analyses . The American journal of human genetics 2007, 81 (3):559-575. Nguyen L, Schmidt HA, von Haeseler A, Minh BQ: IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies . Molecular Biology and Evolution 2015, 32 (1):268-274. Rambaut A: FigTree v1. 4.2, a graphical viewer of phylogenetic trees . In . ; 2014. Yang J, Lee SH, Goddard ME, Visscher PM: GCTA: A Tool for Genome-wide Complex Trait Analysis . The American Journal of Human Genetics 2011, 88 (1):76-82. Doležel J, Greilhuber J, Lucretti S, Meister A, Lysák MA, Nardi L, Obermayer R: Plant Genome Size Estimation by Flow Cytometry: Inter-laboratory Comparison . Annals of Botany 1998, 82 (suppl_1):17-26. Li X, Waterman MS: Estimating the repeat structure and length of DNA sequences using ℓ-tuples . Genome Res 2003, 13 (8):1916-1922. Pellegrini M, Shangguan L, Han J, Kayesh E, Sun X, Zhang C, Pervaiz T, Wen X, Fang J: Evaluation of Genome Sequencing Quality in Selected Plant Species Using Expressed Sequence Tags . PLOS ONE 2013, 8 (7):e69890. Šmarda P, Knápek O, Březinová A, Horová L, Grulich V, Danihelka J, Veselý P, Šmerda J, Rotreklová O, Bureš P: Genome sizes and genomic guanine+cytosine (GC) contents of the Czech vascular flora with new estimates for 1700 species . Preslia 2019, 91 (2):117-142. Šmarda P, Bureš P, Horová L, Leitch IJ, Mucina L, Pacini E, Tichý L, Grulich V, Rotreklová O: Ecological and evolutionary significance of genomic GC content diversity in monocots . Proceedings of the National Academy of Sciences 2014, 111 (39):E4096-E4102. Zanne AE, Tank DC, Cornwell WK, Eastman JM, Smith SA, FitzJohn RG, McGlinn DJ, O’Meara BC, Moles AT, Reich PB: Three keys to the radiation of angiosperms into freezing environments . Nature 2014, 506 (7486):89-92. Singh R, Ming R, Yu Q: Comparative Analysis of GC Content Variations in Plant Genomes . Tropical Plant Biology 2016, 9 (3):136-149. Strömberg CA: Evolution of grasses and grassland ecosystems . Annual review of Earth and planetary sciences 2011, 39 :517-544. Edwards EJ, Osborne CP, Strömberg CA, Smith SA, Consortium CG, Bond WJ, Christin P-A, Cousins AB, Duvall MR, Fox DL: The origins of C4 grasslands: integrating evolutionary and ecosystem science . science 2010, 328 (5978):587-591. Zachos J, Pagani M, Sloan L, Thomas E, Billups K: Trends, rhythms, and aberrations in global climate 65 Ma to present . science 2001, 292 (5517):686-693. Michael TP, VanBuren R: Building near-complete plant genomes . Current Opinion in Plant Biology 2020, 54 :26-33. Zhou P, Zhang Q, Li J, Li F, Huang J, Zhang M: A first insight into the genomic background of Ilex pubescens (Aquifoliaceae) by flow cytometry and genome survey sequencing . BMC Genomics 2023, 24 (1). Han Y, Luthe D: Identification and evolution analysis of the JAZ gene family in maize . BMC Genomics 2021, 22 (1). Zhao D, Ferguson AA, Jiang N: What makes up plant genomes: The vanishing line between transposable elements and genes . Biochimica et Biophysica Acta (BBA) - Gene Regulatory Mechanisms 2016, 1859 (2):366-380. Hufton AL, Panopoulou G: Polyploidy and genome restructuring: a variety of outcomes . Current Opinion in Genetics & Development 2009, 19 (6):600-606. Hayden MJ, Nguyen TM, Waterman A, McMichael GL, Chalmers KJ: Application of multiplex-ready PCR for fluorescence-based SSR genotyping in barley and wheat . Molecular Breeding 2007, 21 (3):271-281. Gramazio P, Plesa IM, Truta AM, Sestras AF, Vilanova S, Plazas M, Vicente O, Boscaiu M, Prohens J, Sestras RE: Highly informative SSR genotyping reveals large genetic diversity and limited differentiation in European larch (Larixdecidua) populations from Romania . Turkish Journal of Agriculture and Forestry 2018, 42 (3):165-175. Liu XB, Feng B, Li J, Yan C, Yang ZL: Genetic diversity and breeding history of Winter Mushroom (Flammulina velutipes) in China uncovered by genomic SSR markers . Gene 2016, 591 (1):227-235. Manee MM, Al-Shomrani BM, Al-Fageeh MB: Genome-wide characterization of simple sequence repeats in Palmae genomes . Genes & Genomics 2020, 42 (5):597-608. Zhang Z, Zhang J, Yang Q, Li B, Zhou W, Wang Z: Genome survey sequencing and genetic diversity of cultivated Akebia trifoliata assessed via phenotypes and SSR markers . Molecular Biology Reports 2021, 48 (1):241-250. Lin E, Zhuang H, Yu J, Liu X, Huang H, Zhu M, Tong Z: Genome survey of Chinese fir (Cunninghamia lanceolata): Identification of genomic SSRs and demonstration of their utility in genetic diversity analysis . Scientific Reports 2020, 10 (1). Wang R, Fan J, Chang P, Zhu L, Zhao M, Li L: Genome Survey Sequencing of Acer truncatum Bunge to Identify Genomic Information, Simple Sequence Repeat (SSR) Markers and Complete Chloroplast Genome . Forests 2019, 10 (2). Zhang T, Comes HP, Sun H: Chloroplast phylogeography of Terminalia franchetii (Combretaceae) from the eastern Sino-Himalayan region and its correlation with historical river capture events . Molecular Phylogenetics and Evolution 2011, 60 (1):1-12. Gomory D, Wang Z, Chen S, Nie Z, Zhang J, Zhou Z, Deng T, Sun H: Climatic Factors Drive Population Divergence and Demography: Insights Based on the Phylogeography of a Riparian Plant Species Endemic to the Hengduan Mountains and Adjacent Regions . PLOS ONE 2015, 10 (12). Zhao Y, Gong X: Genetic divergence and phylogeographic history of two closely related species (Leucomeris decora and Nouelia insignis) across the 'Tanaka Line' in Southwest China . BMC Evolutionary Biology 2015, 15 (1):134. Yue L, Chen G, Sun W, Sun H: Phylogeography of Buddleja crispa (Buddlejaceae) and its correlation with drainage system evolution in southwestern China . American Journal of Botany 2012, 99 (10):1726-1735. Wang Z, Zhang T, Luo D, Sun W, Sun H: Phylogeography of Excoecaria acerifolia (Euphorbiaceae) suggests combined effects of historical drainage reorganization events and climatic changes on riparian plants in the Sino–Himalayan region . Botanical Journal of the Linnean Society 2019, 192 (2):350-368. Leaché AD, Banbury BL, Felsenstein J, De Oca AN-M, Stamatakis A: Short tree, long tree, right tree, wrong tree: new acquisition bias corrections for inferring SNP phylogenies . Systematic biology 2015, 64 (6):1032-1047. Philippe H: Opinion: long branch attraction and protist phylogeny . Protist 2000, 151 (4):307-316. Bergsten J: A review of long ‐branch attraction . Cladistics 2005, 21 (2):163-193. Degnan JH, Rosenberg NA: Gene tree discordance, phylogenetic inference and the multispecies coalescent . Trends in ecology & evolution 2009, 24 (6):332-340. Pelser PB, Kennedy AH, Tepe EJ, Shidler JB, Nordenstam B, Kadereit JW, Watson LE: Patterns and causes of incongruence between plastid and nuclear Senecioneae (Asteraceae) phylogenies . American Journal of Botany 2010, 97 (5):856-873. Spicer RA: Tibet, the Himalaya, Asian monsoons and biodiversity – In what ways are they related? Plant Diversity 2017, 39 (5):233-244. Tada R, Zheng H, Clift PD: Evolution and variability of the Asian monsoon and its potential linkage with uplift of the Himalaya and Tibetan Plateau . Progress in Earth and Planetary Science 2016, 3 (1):4. Lu Z, Tian B, Liu B, YANG C, Liu J: Origin of Ostryopsis intermedia (Betulaceae) in the southeast Qinghai–Tibet Plateau through hybrid speciation . Journal of Systematics and Evolution 2014, 52 (3):250-259. Additional Declarations No competing interests reported. Cite Share Download PDF Status: Published Journal Publication published 23 Oct, 2024 Read the published version in BMC Genomics → Version 1 posted Editorial decision: Revision requested 20 Mar, 2024 Reviews received at journal 20 Mar, 2024 Reviews received at journal 12 Mar, 2024 Reviews received at journal 07 Mar, 2024 Reviewers agreed at journal 05 Mar, 2024 Reviewers agreed at journal 29 Feb, 2024 Reviewers agreed at journal 29 Feb, 2024 Reviewers invited by journal 28 Feb, 2024 Editor assigned by journal 28 Feb, 2024 Editor invited by journal 28 Feb, 2024 Submission checks completed at journal 28 Feb, 2024 First submitted to journal 28 Jan, 2024 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-3905007","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":275507301,"identity":"2db3a20e-0c33-44da-acbe-cdfbafa7cf58","order_by":0,"name":"Dongbo Zhu","email":"","orcid":"","institution":"Southwest Forestry University","correspondingAuthor":false,"prefix":"","firstName":"Dongbo","middleName":"","lastName":"Zhu","suffix":""},{"id":275507302,"identity":"e8d25416-2031-4af3-a9fe-530e328b9c58","order_by":1,"name":"Rui Rao","email":"","orcid":"","institution":"Southwest Forestry University","correspondingAuthor":false,"prefix":"","firstName":"Rui","middleName":"","lastName":"Rao","suffix":""},{"id":275507303,"identity":"364d5059-f625-432c-b390-77ee7f03104c","order_by":2,"name":"Yu Du","email":"","orcid":"","institution":"Technology Center of Kunming Customs","correspondingAuthor":false,"prefix":"","firstName":"Yu","middleName":"","lastName":"Du","suffix":""},{"id":275507304,"identity":"e160702c-666d-4a52-b493-f5479c8c7cdd","order_by":3,"name":"Chunmin Mao","email":"","orcid":"","institution":"Southwest Forestry University","correspondingAuthor":false,"prefix":"","firstName":"Chunmin","middleName":"","lastName":"Mao","suffix":""},{"id":275507305,"identity":"c4ef4f12-64f8-4239-b5e8-b38df6ebefe3","order_by":4,"name":"Rong Chen","email":"","orcid":"","institution":"Southwest Forestry University","correspondingAuthor":false,"prefix":"","firstName":"Rong","middleName":"","lastName":"Chen","suffix":""},{"id":275507306,"identity":"d2507b45-303d-4100-9ada-bd909c4dfbb0","order_by":5,"name":"Sun Hang","email":"","orcid":"","institution":"Kunming Institute of Botony, Chinese Academy of Science","correspondingAuthor":false,"prefix":"","firstName":"Sun","middleName":"","lastName":"Hang","suffix":""},{"id":275507307,"identity":"17ac0c03-8113-45f5-a263-32139cf7dba2","order_by":6,"name":"Liangliang Yue","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABG0lEQVRIiWNgGAWjYHACxgMJQJKPGcSuYGBgbwDSPAT0gLWwgbWcAao+QIwWEMEGtrCNCC0GN5IPHHhQc8eujZ352cOv8+oSeyQSGB+8bWOQN8epJS3hQMKxZ8ltzGzmxrLbDoO0MBvObWMw3NmAS0uOwYEEtsPJQL+YSUtuO5C7XyKBTZq3jSHB4AAuLfkfDiT8A2lh/yYtOacuF2gL+2/8WnIYDiS2HbZjY+Yxk/zYwAzSwsaMT4vkmWcGBxL7DgOV8ZRJMxw7XN/D87BZcs45CcMNOLTwHU9++PDHt8P2/PzHt0n+qKkz5mFPPvjhTZmNPC5bFKDiiQ1AghkSHYwgtgR29UAg3wCh7cFqf+BUNwpGwSgYBSMZAACQEF9rDx3t0AAAAABJRU5ErkJggg==","orcid":"","institution":"Yunnan Key Laboratory of Plateau Wetland Conservation, Restoration and Ecological Services, Southwest Forestry University","correspondingAuthor":true,"prefix":"","firstName":"Liangliang","middleName":"","lastName":"Yue","suffix":""}],"badges":[],"createdAt":"2024-01-28 07:29:16","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-3905007/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-3905007/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1186/s12864-024-10917-8","type":"published","date":"2024-10-23T15:57:02+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":51833020,"identity":"bc2097bd-cc2a-4a91-8953-538a62ee92d3","added_by":"auto","created_at":"2024-02-29 19:17:04","extension":"jpg","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":1044145,"visible":true,"origin":"","legend":"\u003cp\u003eSample location. Green triangles represent \u003cem\u003eG. pinnata\u003c/em\u003e, blue triangles represent \u003cem\u003eG. forrestii\u003c/em\u003e, and red triangles represent \u003cem\u003eG. floribunda\u003c/em\u003e var. \u003cem\u003egamblei\u003c/em\u003e. One sample of \u003cem\u003eG. pinnata\u003c/em\u003e was collected from Chenshan Botanical Garden, Shanghai.\u003c/p\u003e","description":"","filename":"floatimage1.jpg","url":"https://assets-eu.researchsquare.com/files/rs-3905007/v1/042e26a13588d82ad7f569c1.jpg"},{"id":51833018,"identity":"1d938c8b-ee04-4314-b8d7-e1a59e2398b0","added_by":"auto","created_at":"2024-02-29 19:17:04","extension":"jpg","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":159114,"visible":true,"origin":"","legend":"\u003cp\u003eThe flow cytometry (FCM) analysis results for the \u003cem\u003eG. forrestii\u003c/em\u003esample are as follows: (a) Represents the histogram of relative fluorescence intensity. Peak 1 corresponds to the G1 phase, Peak 2 corresponds to the S phase, and Peak 3 corresponds to the G2/M phase; (b) Represents the scatter plot of side scatter (SSC) versus PI fluorescence. The particle clusters exhibit a pronounced, lucid, and concentrated appearance, underscoring the effective differentiation of samples under the specific experimental conditions.\u003c/p\u003e","description":"","filename":"floatimage2.jpg","url":"https://assets-eu.researchsquare.com/files/rs-3905007/v1/65ffb8214b82ffd6d83bb2fb.jpg"},{"id":51833065,"identity":"cded645f-378b-4a9f-8b07-f3dcd6484742","added_by":"auto","created_at":"2024-02-29 19:25:04","extension":"jpg","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":95112,"visible":true,"origin":"","legend":"\u003cp\u003eReads sequencing base content distribution. The x-axis represents the position of bases within the reads, and the y-axis indicates the average percentage of base content at that position. Different colored lines represent different types of bases. The dashed line on the left shows the base content distribution for the R1 end of paired-end sequencing reads, while the dashed line on the right shows the base content distribution for R2 end sequencing reads.\u003c/p\u003e","description":"","filename":"floatimage3.jpg","url":"https://assets-eu.researchsquare.com/files/rs-3905007/v1/323c293f577c91a064e23a05.jpg"},{"id":51833026,"identity":"6c4eabe6-cc14-46fc-99d1-589989ab9814","added_by":"auto","created_at":"2024-02-29 19:17:05","extension":"jpg","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":88084,"visible":true,"origin":"","legend":"\u003cp\u003eAverage error rate distribution of reads. The x-axis represents the position of bases within the reads, and the y-axis indicates the average percentage of error rate at that position. The dashed line on the left shows the error rate distribution for the R1 end of paired-end sequencing reads, while the dashed line on the right shows the error rate distribution for R2 end sequencing reads.\u003c/p\u003e","description":"","filename":"floatimage4.jpg","url":"https://assets-eu.researchsquare.com/files/rs-3905007/v1/252bf20ab28da6d073066495.jpg"},{"id":51833021,"identity":"5519394d-b8d5-4490-b374-0b6c3e074820","added_by":"auto","created_at":"2024-02-29 19:17:04","extension":"jpg","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":374123,"visible":true,"origin":"","legend":"\u003cp\u003eK-mer analysis. The x-axis represents the K-mer depth, and the y-axis indicates the number of K-mer types at that depth.\u003c/p\u003e","description":"","filename":"floatimage5.jpg","url":"https://assets-eu.researchsquare.com/files/rs-3905007/v1/8e2f644686760e323bf68d7a.jpg"},{"id":51833024,"identity":"0b5138e8-27b9-4daf-acc8-bbd5fdd29e36","added_by":"auto","created_at":"2024-02-29 19:17:05","extension":"jpg","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":99352,"visible":true,"origin":"","legend":"\u003cp\u003ePolyploidy analysis of \u003cem\u003eG. forrestii\u003c/em\u003e. The x-axis represents the relative coverage depth, the y-axis represents the total coverage depth, and the color intensity indicates the frequency of k-mer pairs. The single ploidy structure with the highest frequency is considered as the predicted species ploidy result.\u003c/p\u003e","description":"","filename":"floatimage6.jpg","url":"https://assets-eu.researchsquare.com/files/rs-3905007/v1/0a47e2203cfbb3b39cee94fe.jpg"},{"id":51833023,"identity":"cc3388fd-de1d-48d9-b4b6-ffb3e8282f4c","added_by":"auto","created_at":"2024-02-29 19:17:04","extension":"jpg","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":704973,"visible":true,"origin":"","legend":"\u003cp\u003eAnalysis of the correlation between Guanine-Cytosine (GC) content and sequencing depth of the scaffold of the \u003cem\u003eG. forrestii\u003c/em\u003e genome is obtained after assembly. The x-axis represents GC content, and the y-axis represents sequence depth. Estimation of GC% was performed utilizing a sliding window of 10 kb size with a 5 kb step.\u003c/p\u003e","description":"","filename":"floatimage7.jpg","url":"https://assets-eu.researchsquare.com/files/rs-3905007/v1/07a8b5aa15cfdd0bf3d1be65.jpg"},{"id":51833027,"identity":"2aa78a4b-acd1-47be-8f1b-60a0ba7ebc86","added_by":"auto","created_at":"2024-02-29 19:17:05","extension":"jpg","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":245121,"visible":true,"origin":"","legend":"\u003cp\u003eCharacteristics of microsatellite motifs of \u003cem\u003eG. forrestii\u003c/em\u003eafter assembly. (a) Frequency of different microsatellite patterns; (b) Frequency of different mononucleotide patterns; (c) Frequency of different dinucleotide patterns; (d) Frequency of different trinucleotide patterns.\u003c/p\u003e","description":"","filename":"floatimage8.jpg","url":"https://assets-eu.researchsquare.com/files/rs-3905007/v1/42fb59e57b98697a4e82ae7a.jpg"},{"id":51833022,"identity":"0e05a022-a66b-4c28-98c2-637aca309d0f","added_by":"auto","created_at":"2024-02-29 19:17:04","extension":"jpg","order_by":9,"title":"Figure 9","display":"","copyAsset":false,"role":"figure","size":363543,"visible":true,"origin":"","legend":"\u003cp\u003eThe phylogenetic trees of the nuclear-cytoplasmic gene systems for three species with a total of 25 individuals. (a) Nuclear gene phylogenetic tree constructed based on \u003cem\u003eGaruga forrestii\u003c/em\u003e as the reference genome; (b) Nuclear gene phylogenetic tree constructed based on \u003cem\u003eBoswellia scara\u003c/em\u003e as the reference genome; (c) Chloroplast gene phylogenetic tree. Orange lines represent \u003cem\u003eGaruga floribunda\u003c/em\u003e var. \u003cem\u003egamblei\u003c/em\u003e, green lines represent \u003cem\u003eGaruga forrestii\u003c/em\u003e, and blue lines represent \u003cem\u003eGaruga pinnata\u003c/em\u003e. The text at the tips of the branches represents the species and population. In (a) and (c), \u003cem\u003eGaruga forrestii\u003c/em\u003e clustered at the evolved clades, while in (b) clustered at the basal position.\u003c/p\u003e","description":"","filename":"floatimage9.jpg","url":"https://assets-eu.researchsquare.com/files/rs-3905007/v1/c8d45b460f0510a9a0afde23.jpg"},{"id":51833025,"identity":"3215981c-5b2d-42c5-bc35-b3043f747b26","added_by":"auto","created_at":"2024-02-29 19:17:05","extension":"jpg","order_by":10,"title":"Figure 10","display":"","copyAsset":false,"role":"figure","size":247765,"visible":true,"origin":"","legend":"\u003cp\u003ePrincipal Component Analysis (PCA). Based on the SNP similarity of 25 individuals from 16 populations of the three species using \u003cem\u003eG. forrestii\u003c/em\u003eas the reference genome. All the individuals clustered into three distinct groups.\u003c/p\u003e","description":"","filename":"floatimage10.jpg","url":"https://assets-eu.researchsquare.com/files/rs-3905007/v1/dc787c8e46c34ef933aac1f6.jpg"},{"id":67681721,"identity":"d4502cd7-5b3e-4dee-a6e0-649e3beb63d4","added_by":"auto","created_at":"2024-10-28 16:08:28","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":6250681,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-3905007/v1/71f8e66d-f614-48bd-873d-9ff39aff3f57.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Genome Survey Indicated Complex Evolutionary History of Garuga Roxb. Species","fulltext":[{"header":"1 Background","content":"\u003cp\u003e \u003cem\u003eGaruga\u003c/em\u003e Roxb. is an endemic genus to southwest China and other tropical regions in Southeast Asia. These deciduous trees bloom between March and April, prior to the onset of the rainy season, produce fruits maturing from May to November, with the main maturation phase occurring in July and August. There are approximately five species or varieties within this genus, including \u003cem\u003eG. forrestii\u003c/em\u003e W. W. Smith, \u003cem\u003eG. floribunda\u003c/em\u003e var. \u003cem\u003egamblei\u003c/em\u003e (King ex Smith) Kalkm., \u003cem\u003eG. floribunda\u003c/em\u003e var. \u003cem\u003efloribunda\u003c/em\u003e Decne., \u003cem\u003eG. pierrei\u003c/em\u003e Guill., and \u003cem\u003eG. pinnata\u003c/em\u003e Roxb.. Four of these species are found in China, except \u003cem\u003eG. floribunda\u003c/em\u003e var. \u003cem\u003efloribunda\u003c/em\u003e. \u003cem\u003eG. forrestii\u003c/em\u003e is an endemic species of China.\u003c/p\u003e \u003cp\u003e \u003cem\u003eGaruga forrestii\u003c/em\u003e, is a Chinese endemic species of this genus, distributed in the arid and warm river valleys of Yunnan, S.W. China, such as Jinshajiang River (JR), Lancangjiang River (LR), Red River (RR), Nanpanjiang River (NPR) and their tributaries. It is also the only species distributed beyond to the Tropic of Cancer, in the JR, extra north to the tropical regions. It extends to Leibo, Sichuan, making it the highest latitude distribution within the Burseraceae family. It distributed upper to the altitudes nearing 1600 meters at many points in the JR. It demonstrates an extraordinary ability to flourish under arid conditions. \u003cem\u003eG. floribunda\u003c/em\u003e var. \u003cem\u003egamblei\u003c/em\u003e and \u003cem\u003eG. pinnata\u003c/em\u003e grow in dense tropical forests of South Yunnan, further to Guangxi, Guangdong and Hainan. Our survey along the middle LR unveiled the coexistence of three species: \u003cem\u003eG. forrestii\u003c/em\u003e, \u003cem\u003eG\u003c/em\u003e. \u003cem\u003efloribunda\u003c/em\u003e var. \u003cem\u003egamblei\u003c/em\u003e and \u003cem\u003eG\u003c/em\u003e. \u003cem\u003epinnata\u003c/em\u003e. But individuals of this genus are all in danger of extinction due to the loss of tropical forests and changes of land uses.\u003c/p\u003e \u003cp\u003eThe taxonomy of \u003cem\u003eGaruga\u003c/em\u003e has experienced substantial reorganization, resulting in alterations to their taxonomic classifications. \u003cem\u003eG. pierrei\u003c/em\u003e previously considered a variety of \u003cem\u003eG. pinnata\u003c/em\u003e, exhibits leaf characteristics similar to \u003cem\u003eG. pinnata\u003c/em\u003e. However, distinctions arise in the short and soft hairs present on the axis and leaflets of \u003cem\u003eG. pierrei\u003c/em\u003e, contrasting with the shorter hairs found on the latter. \u003cem\u003eG. pierrei\u003c/em\u003e bears spherical fruits, \u003cem\u003eG. pinnata\u003c/em\u003e's fruits are nearly spherical and occasionally exhibit soft hairs. Nonetheless, typical \u003cem\u003eG. pierrei\u003c/em\u003e specimens eluded discovery during our fieldwork. The fruiting attributes of \u003cem\u003eG. floribunda\u003c/em\u003e var. \u003cem\u003egamblei\u003c/em\u003e share similarities with those of \u003cem\u003eG\u003c/em\u003e. \u003cem\u003eforrestii\u003c/em\u003e, yet notable differences manifest in terms of hair coverage, pedicel, fruit dimensions and shape. Within the Lancang-Mekong River region, these three species coexist, displaying a continuum of variations in tree structure and leaf morphology. Consequently, the current taxonomic classification needs further tests by phylogenetic and phylogenomic evidence.\u003c/p\u003e \u003cp\u003eThe Sino-Himalaya region is in the southeast margin of the Qinghai-Tibetan Plateau (QTP) hosting several Asian rivers, including the Mekong, RRB and JRB, along with their tributaries. Presently, these rivers flow separately to the ocean [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e, \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e]. However, they once flowed southward into the Paleo-Red River (PRR), creating an extensive drainage network that ultimately emptied into the South China Sea millions of years before [\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e, \u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e]. The PRR system disassembled subsequently due to uplift of QTP and river capture events since the late Miocene [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e]. This reorganization disrupted the previously continuous distribution pattern, resulting in distinctive genetic and biogeographical attributes, facilitating species' genetic differentiation giving rise to new taxa [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e, \u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e]. The river captures also made separate ranges unique promoting lineage fusion and diversity accumulation [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eGenome sequencing technology enables the thorough investigation of species evolution [\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e]. Next-generation sequencing, NGS, technique has greatly reduced the cost of DNA sequencing [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e], and was widely used in most fields of genetics and genomics, providing researchers with a more convenient, refined, and comprehensive investigation method [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e]. A genome survey is the most convenient strategy to provide a rough reference genome for those species without whole genome data [\u003cspan additionalcitationids=\"CR12\" citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eThis study employed genome survey to explore the genome size and characteristics of \u003cem\u003eG\u003c/em\u003e. \u003cem\u003eforrestii\u003c/em\u003e. The gathered genomic information from this research will serve to 1) learn deeper in our comprehension of the \u003cem\u003eG\u003c/em\u003e. \u003cem\u003eforrestii\u003c/em\u003e genome; 2) strive to construct a preliminary long contig or even a draft scaffold, offering a foundation for future evolutionary surveys of this genus.\u003c/p\u003e"},{"header":"2 Materials and methods","content":"\u003cdiv id=\"Sec3\"\u003e\n \u003ch2\u003e2.1 Sample collection and DNA extractions\u003c/h2\u003e\n \u003cp\u003eFresh leaf from a mature \u003cem\u003eG. forrestii\u003c/em\u003e individual were collected for genome survey during the summer of 2023 near to the Gangou Bridge in the Red River valley, Yunnan, China. Fresh leaves from other 25 individuals of three species (\u003cem\u003eG. floribunda\u003c/em\u003e var. \u003cem\u003egamblei\u003c/em\u003e, \u003cem\u003eG. pinnata\u003c/em\u003e, \u003cem\u003eG. forrestii\u003c/em\u003e) in the \u003cem\u003eGaruga\u003c/em\u003e genus. Among these, 6 individuals of \u003cem\u003eG. floribunda\u003c/em\u003e var. \u003cem\u003egamblei\u003c/em\u003e, 8 of \u003cem\u003eG. pinnata\u003c/em\u003e, and 11 of \u003cem\u003eG. forrestii\u003c/em\u003e (Fig. 1; Table 1). No specific permissions were required for the collection of specimens for this study which were neither privately owned nor protected and the field study did not involve endangered or protected species. We complied with the IUCN Policy Statement on Research Involving Species at Risk of Extinction and the Convention on the Trade in Endangered Species of Wild Fauna and Flora. After the formal identification of the plant material was carried out by Liangliang Yue, voucher specimens were prepared and deposited at the herbarium of College of Wetlands, Southwest Forestry University (Yue, accession number Yue2023051032-1).\u003c/p\u003e\n \u003cp\u003eA total of 26 samples were sequenced. The genomic DNA was isolated using the CTAB method [14]. DNA was detected by 0.8% agarose gel electrophoresis, while DNA was quantified by UV spectrophotometer.\u003c/p\u003e\n \u003cdiv\u003e\n \u003ctable id=\"Tab1\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv\u003eTable 1\u003c/div\u003e\n \u003cdiv\u003e\n \u003cp\u003eSampling Information\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eSpecies\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eNumber of Individuals\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003ePopulation\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eProvince\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eSmall Scale\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eLongitude\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eLatitude\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" rowspan=\"4\"\u003e\n \u003cp\u003eG. pinnata\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e3\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003ePi_BB\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eYunnan\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eBubeng\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e101.585\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e21.60944\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e3\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003ePi_YXG\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eYunnan\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eYexianggu\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e100.8648\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e22.18184\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003ePi_SHZWY\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eYunnan\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eShanghaizhiwuyuan\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e121.4429\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e31.14649\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003ePi_DTS\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eYunnan\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eDatianshan\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e99.87337\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e24.7832\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" rowspan=\"4\"\u003e\n \u003cp\u003eG. floribunda var. gamblei\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eFl_SMC\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eYunnan\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eSanmaicun\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e100.6193\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e22.00044\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eFl_YXG\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eYunnan\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eYexianggu\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e100.8648\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e22.18184\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eFl_ML\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eYunnan\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eMenglong\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e100.7445\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e21.7359\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eFl_DTBC\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eYunnan\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eDatianbacun\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e99.88069\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e24.73952\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" rowspan=\"8\"\u003e\n \u003cp\u003eG. forrestii\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eFo_XP\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eYunnan\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eXinping\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e101.5044\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e24.14715\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eFo_DC\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eYunnan\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eDacun\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e100.4178\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e25.01894\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eFo_LJZ\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eYunnan\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eLuzhijiang\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e101.9574\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e24.6514\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eFo_HN\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eYunnan\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eHuaning\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e102.9734\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e24.0748\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eFo_KYC\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eYunnan\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eKaiyuancheng\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e103.2985\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e23.89117\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eFo_NXH\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eYunnan\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eNanxihe\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e101.8507\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e23.64461\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eFo_YJZ\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eSichuan\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eYinjiangzhen\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e101.7854\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e26.59825\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eFo_FF\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eSichuan\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eFenfang\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e101.9786\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e26.76728\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n \u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec4\"\u003e\n \u003ch2\u003e2.2 Genome size estimation by flow cytometry\u003c/h2\u003e\n \u003cp\u003eFor the experimental material, we utilized tender leaves from tomato plants (genome size: 900 Mb) that were one month old following seed germination. The samples were carefully placed within 0.8 mL of pre-chilled MG\u003csup\u003eb\u003c/sup\u003e dissociation solution, composed of 45 mM MgCl\u003csub\u003e2\u003c/sub\u003e\u0026middot;6H\u003csub\u003e2\u003c/sub\u003eO, 20 mM MOPS, 30 mM sodium citrate, 1% (W/V) PVP 40, 0.2% (V/V) Tritonx-100, 10 mM Na\u003csub\u003e2\u003c/sub\u003eEDTA, 20 \u0026micro;L/mL \u0026beta;-mercaptoethanol, and adjusted to pH 7.5. The tissues were swiftly sectioned vertically using a sharp blade and allowed to rest in the dissociation solution on ice for a duration of 10 minutes. Following this, the mixture was filtered through a 40-micron mesh to obtain a nuclear cell suspension. This suspension was then combined with a suitable volume of pre-chilled propidium iodide (PI) solution, having a stock concentration of 1 mg/mL, along with a fitting amount of RNAase solution at a stock concentration of 1 mg/mL. The combined mixture was subsequently subjected to a dark ice-cold staining process for a period of 0.5-1 hour. The effective concentration of both the PI staining solution and the RNAase solution was maintained at 50 \u0026micro;g/mL [15, 16]. The stained suspension of nuclear cells underwent detection using a BD FACScalibur flow cytometer. This involved utilizing a 488 nm blue light excitation to measure the emitted fluorescence intensity of propidium iodide. Each detection cycle involved the collection of 10,000 particles. The coefficient of variation (CV%) was maintained at a level below 5%. Modifit3.0 software was employed to conduct graphing and analysis. The genome size was calculated using the subsequent formula [17]:\u003c/p\u003e\n \u003cdiv id=\"Equa\"\u003e\n \u003cdiv id=\"FileID_Equa\" name=\"EquationSource\"\u003e$$Sample genome size = standard genome size \\times \\frac{sample{G}_{0}/{G}_{1}peak mean}{standard{G}_{0}/{G}_{1}peak mean}$$\u003c/div\u003e\n \u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec5\"\u003e\n \u003ch2\u003e2.3 Genome Sequencing and Data Quality Control\u003c/h2\u003e\n \u003cp\u003eDried leaves from the mature tree of \u003cem\u003eG. forrestii\u003c/em\u003e was sent to Personalbio company for paired-end sequencing. We followed the standard protocol using Illumina\u0026apos;s TruSeq DNA PCR-free prep kit reagents for sequencing library preparation. Extracted DNA sample underwent random shearing through ultrasonication, followed by end repair to eliminate overhanging bases at the 5\u0026apos; end and to add a phosphate group while filling in missing bases at the 3\u0026apos; end. To prevent self-ligation of DNA fragments and ensure compatibility with sequencing adaptors, an A base was added to the 3\u0026apos; end of the DNA sequence. Sequencing adaptors with library-specific tags were ligated to the 5\u0026apos; end of the DNA sequence, facilitating the immobilization of DNA molecules onto the Flow Cell. We used AMPureXP beads (Beckman Coulter, Brea, CA) for a selective removal of adaptor-ligated fragments and purification of the resulting library system. Subsequently, PCR amplification was performed on DNA fragments ligated with adaptors to enrich the sequencing library templates. A second purification step with BECKMAN AMPure XP Beads was carried out to purify the enriched library products. Finally, we conducted 2% agarose gel electrophoresis to select and purify the library\u0026apos;s final fragments. The resulting library insert fragments were approximately 400 bp in size. We performed paired-end sequencing with 2 \u0026times; 150 bp reads using an Illumina NovaSeq instrument (Table 2). Furthermore, we also subjected the remaining 25 samples to sequencing processing, including chloroplast genome sequencing.\u003c/p\u003e\n \u003cdiv\u003e\n \u003ctable id=\"Tab2\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv\u003eTable 2\u003c/div\u003e\n \u003cdiv\u003e\n \u003cp\u003eSequencing Overview\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eSample\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eInsert Size\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eSequencing platform\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eSequencing Mode\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cem\u003eG. forrestii\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e400 bp\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eIllumina Novaseq\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003ePaired-end, 2\u0026times;150bp\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n \u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec6\"\u003e\n \u003ch2\u003e2.4 High-quality data acquisition\u003c/h2\u003e\n \u003cp\u003eRaw sequence was filtered using fastp [18] and using the sliding window method to generate high quality sequence (high quality data). The size of the slide window is set to 5 bp. Slide the window from the 3\u0026apos; end to the 5\u0026apos; end, calculate the average Q value of the bases in the window, if the Q value is \u0026lt;\u0026thinsp;20, delete the bases in the window; if the Q value is \u0026ge;\u0026thinsp;20, stop sliding. Length filtering, if the length of any one reads in the bipartite end\u0026thinsp;\u0026le;\u0026thinsp;50 bp, then remove the bipartite end reads. fuzzy base N filtering, if the number of N bases in the bipartite end\u0026thinsp;\u0026ge;\u0026thinsp;5, then remove the bipartite end reads.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec7\"\u003e\n \u003ch2\u003e2.5 Ploidy analysis and k-mer analysis\u003c/h2\u003e\n \u003cp\u003eSmudgeplot software [19] was used to analyze the genome structure and to count the number of heterozygous k-mer pairs by comparing the total number of k-mer pair coverage (CovA\u0026thinsp;+\u0026thinsp;CovB) and relative coverage (CovB / (CovA\u0026thinsp;+\u0026thinsp;CovB)) and to obtain the genomic ploidy. The distribution of k-mer was calculated using jellyfish software [20], which gives information about the heterozygosity of \u003cem\u003eG. forrestii\u003c/em\u003e, the proportion of repetitive sequences, and then genome size estimation based on the distribution of k-mer frequencies and the number of 19-mer.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec8\"\u003e\n \u003ch2\u003e2.6 De novo assembly and GC content analysis\u003c/h2\u003e\n \u003cp\u003eThe pair-end DNA sequencing data was de novo assembled with standard parameters using the MEGAHIT [21]. Contigs were further spliced into scaffold levels using SOAPdenovo [22] software. Sliding window calculations were performed using a window size of 10 kb. The average depth and GC content of each window were computed. The quality of genome assembly was assessed using the Assembly-stats utility, which calculated metrics such as N50, scaffold count, scaffold size, scaffold length, and genome length.\u003c/p\u003e\n \u003cp\u003eBefore chloroplast assembly, low-quality sequences were filtered using SOAPnuke [23], and de novo assembly was performed following the pipeline of GetOrganelle [24].\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec9\"\u003e\n \u003ch2\u003e2.7 Identification of microsatellite motifs\u003c/h2\u003e\n \u003cp\u003eMicrosatellite Identification Software (MISA) [25] was utilized to detect microsatellite patterns within the scaffolds generated above. The minimum number of repeats required for identifying mononucleotide, dinucleotide, trinucleotide, tetranucleotide, pentanucleotide, and hexanucleotide repeat sequences is set as follows: mononucleotide repeats with less than 10 repetitions, dinucleotide repeats with less than 6 repetitions, and all other repeat types with less than 5 repetitions [26].\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec10\"\u003e\n \u003ch2\u003e2.8 Population resequencing variation detection and filtering of SNP data\u003c/h2\u003e\n \u003cp\u003eThe Picard [27], BWA [28], and Samtools [29] software is used to build the genome index and perform processes such as mapping, sorting, and deduplication, all aimed at subsequent variation detection. We used \u003cem\u003eG. forrestii\u003c/em\u003e (we sampled, sequenced, and assembled the genome to ensure its usability as a reference genome.) and \u003cem\u003eBoswellia sacra\u003c/em\u003e Fl\u0026uuml;ck. (downloaded from NCBI, accession: SNVD00000000) as reference genomes for mapping the 25 individuals. The GATK [30] software was employed to perform variant calling using HaplotypeCaller for each sample. This study involved the assembly of the genome of an individual of \u003cem\u003eG. forrestii\u003c/em\u003e at the scaffold level, as well as the genomes of the remaining 25 individuals, which included 6 \u003cem\u003eG. floribunda\u003c/em\u003e var. \u003cem\u003egamblei\u003c/em\u003e, 8 \u003cem\u003eG. pinnata\u003c/em\u003e, and 11 \u003cem\u003eG. forrestii\u003c/em\u003e. Each genome of the mentioned individuals was divided into 10 pseudo-chromosomes. The GATK software was applied to filter and extract SNPs and INDELs.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec11\"\u003e\n \u003ch2\u003e2.9 Phylogenetic reconstruction and PCA\u003c/h2\u003e\n \u003cp\u003eWe applied Plink [31] software to perform LD (Linkage Disequilibrium) filtering on the SNP data, resulting in a refined VCF file to generate an alignment for phylogenomic reconstruction and PCA analysis. The dataset encompassed nucleotide polymorphisms from the whole genomes of 25 individuals of the three species. We employed iqtree [32] to construct two Maximum Likelihood (ML) trees for the nuclear genome respectively using \u003cem\u003eB. sacra\u003c/em\u003e Fl\u0026uuml;ck. and \u003cem\u003eGaruga forrestii\u003c/em\u003e as reference genomes. Phylogenomic trees were visualized using Figtree [33]. The PCA based on the SNP data extracted from the 25 samples was conducted using Plink [31] and GCTA [34] software, followed by visualization of the results using an R script. We also constructed ML trees using iqtree [32] for the chloroplast genomes of 25 individuals.\u003c/p\u003e\n\u003c/div\u003e"},{"header":"3 Results","content":"\u003cdiv id=\"Sec13\"\u003e\n \u003ch2\u003e3.1 Genome size estimation by flow cytometry\u003c/h2\u003e\n \u003cp\u003eThe utilization of flow cytometry analysis produced a detailed high-resolution histogram (Fig. 2; Table 3), revealing a fluorescence intensity of 17.04 for the sample and 29.67 for the internal reference. The ratio between these values is 0.57, leading to an estimated genome size of approximately 0.52 GB.\u003c/p\u003e\n \u003cdiv\u003e\n \u003ctable id=\"Tab3\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv\u003eTable 3\u003c/div\u003e\n \u003cdiv\u003e\n \u003cp\u003eFlow Cytometry Information\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eSample\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eReference Selection\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eReference Fluorescence Intensity\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eFluorescence Intensity of the Test Sample\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eRatio\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eGenome Size (Gb)\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cem\u003eG. forrestii\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003etomato\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e29.67\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e17.04\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.57\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.52\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n \u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec14\"\u003e\n \u003ch2\u003e3.2 Sequence filtering and data quality control\u003c/h2\u003e\n \u003cp\u003eInsertion of paired-end libraries with a size of 400 bp (Table\u0026nbsp;2) yielded approximately\u0026thinsp;~\u0026thinsp;54.56 GB of raw sequence (Table\u0026nbsp;4). The percentage of bases with quality score\u0026thinsp;\u0026ge;\u0026thinsp;Q20 was 97.46% and \u0026ge;\u0026thinsp;Q30 was 92.86% (Table\u0026nbsp;4). Approximately 1.1% of the low-quality data was filtered, and the remaining\u0026thinsp;~\u0026thinsp;53.97 GB was used for downstream analysis (Table\u0026nbsp;5).\u003c/p\u003e\n \u003cdiv\u003e\n \u003ctable id=\"Tab4\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv\u003eTable 4\u003c/div\u003e\n \u003cdiv\u003e\n \u003cp\u003eSequencing Data Statistics\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eSample\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eRead Num\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eTotal base\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eN_rate\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eGC_Content (%)\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eQ20_rate\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eQ30_rate\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cem\u003eG. forrestii\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e361299934\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e54556290034\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e35.1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e97.46\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e92.86\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n \u003c/div\u003e\n \u003cdiv\u003e\n \u003ctable id=\"Tab5\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cp\u003e\u003cbr\u003e\u003c/p\u003e\n \u003cdiv\u003eTable 5\u003c/div\u003e\n \u003cdiv\u003e\n \u003cp\u003eHigh-Quality Data Statistics\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eSample\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eHQ Reads\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eHQ Reads (%)\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eHQ Data (bp)\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eHQ Data (%)\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cem\u003eG. forrestii\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e358261400\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e99.16\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e53967212246\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e98.92\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n \u003c/div\u003e\n \u003cp\u003eThe GC content is 35.10% (Table 4). The base distribution showed no separation of AT and GC either at the R1 end or the R2 end (Fig. 3), which laid the foundation for the subsequent quantitative analysis. The average sequencing error rates for single-base loci were all less than 0.1% (Fig. 4)\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec15\"\u003e\n \u003ch2\u003e3.3 K-mer analysis and ploidy estimation\u003c/h2\u003e\n \u003cp\u003eK-mer analysis provides an estimate of the genome size based on the substrings of length k contained in the biological sequence. The data volume was multiplied to achieve a coverage of 112x. Besides, this it also indicates low quality or contamination in the sequences. 19-mer frequency analysis identified the 90X depth as the dominant peak based on the k-mer count (Fig. 5). After dividing the k-mer number by 90, the genome size was predicted to be ~\u0026thinsp;483 Mb (483,508,243 bp). The other peak at 1/2 the depth of the main peak (45X) is most likely due to heterozygosity (~\u0026thinsp;0.54%), While the peak at twice the depth of the main peak (180X) is caused by the repetitive sequence (~\u0026thinsp;51.54%).\u003c/p\u003e\n \u003cp\u003eThe results of ploidy assessment yielded 83% AB, 12% AAB, and 5% AABB (Fig. 6), from which \u003cem\u003eG. forrestii\u003c/em\u003e was predicted to be diploid, and this ploidy assessment is for reference only.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec16\"\u003e\n \u003ch2\u003e3.4 De-novo assembly and gene prediction\u003c/h2\u003e\n \u003cp\u003eA total of 360 million paired-end reads were utilized to establish the initial genome assembly of \u003cem\u003eG. forrestii\u003c/em\u003e. The genome, assembled using SOAPdenovo software, comprises 339,729 scaffolds (509.9 Mb) with scaffolds exceeding 200 bp chosen to exclude low-quality sequences. The genome size, excluding Ns, accounts for 499,986,768 bp, and the final assembly exhibits a low value of Ns content of roughly 0.01%. The largest scaffold spans 345,990 bp, with N50\u0026thinsp;=\u0026thinsp;17,344 bp, N90\u0026thinsp;=\u0026thinsp;293 bp (Table\u0026nbsp;6).\u003c/p\u003e\n \u003cdiv\u003e\n \u003ctable id=\"Tab6\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv\u003eTable 6\u003c/div\u003e\n \u003cdiv\u003e\n \u003cp\u003eStatistics of Contigs and Scaffolds\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eAssembly Level\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eNumber\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eSum (bp)\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eLongest Seq\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eN50\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eN90\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eN count\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eGaps\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eSoftware\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003econtig\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e514298\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e516660543\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e179938\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e6972\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e280\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eMEGAHIT\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003escaffold\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e339729\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e509934743\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e345990\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e17344\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e293\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e6439439\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e146457\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eSOAPdenovo\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n \u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec17\"\u003e\n \u003ch2\u003e3.5 Analysis of GC content\u003c/h2\u003e\n \u003cp\u003eThe resultant scaffolds longer than 200 bases in length were chosen. A window size of 10 kb was used for non-repetitive advancement in the sequence and calculation of the mean depth and GC content of every window to generate a GC depth plot. Most windows displayed GC content ranging from 25\u0026ndash;50%, resulting in an estimated GC ratio of ~\u0026thinsp;35.16% (Fig. 7). We did not identify any significant regions with abnormal accumulation. This suggests that the DNA samples used for sequencing were not mixed with DNA from other species.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec18\"\u003e\n \u003ch2\u003e3.6 Identification of SSRs\u003c/h2\u003e\n \u003cp\u003eThe assembled scaffolds were searched for the presence of SSR markers by using the MISA software. It yielded 330,999 SSR motifs (Table\u0026nbsp;7). Among these, mononucleotides were the largest in number (232,225; 70.16%), dinucleotides were the second in number (67,518; 20.40%), which was followed by trinucleotide (23,506; 7.10%), tetranucleotide (5,103; 1.54%), pentanucleotide (1,472; 0.44%), and hexanucleotide (1,175; 0.35%) SSR markers (Fig.\u0026nbsp;8a).\u003c/p\u003e\n \u003cdiv\u003e\n \u003ctable id=\"Tab7\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv\u003eTable 7\u003c/div\u003e\n \u003cdiv\u003e\n \u003cp\u003eSSR Information\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eTotal number of sequences examined:\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003e339729\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eTotal size of examined sequences (bp):\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e506426207\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eTotal number of identified SSRs:\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e330999\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eNumber of SSR containing sequences:\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e6914\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eNumber of sequences containing more than 1 SSR:\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e34647\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eNumber of SSRs present in compound formation:\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e51666\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n \u003c/div\u003e\n \u003cp\u003eWithin mononucleotide repeat sequences, the highest content is represented by A/T (230,708; 99.35%) (Fig. 8b). Among dinucleotide repeat sequences, the predominant composition is AT/AT (48,604; 71.99%), followed by AG/CT (12,133; 17.97%), AC/GT (6,723; 9.96%), and CG/CG (58; 0.09%) (Fig. 8c). Regarding trinucleotide repeat sequences, the principal repetitive motifs include AAT/ATT, AAG/CTT, ATC/ATG, and AAC/GTT, accounting for contents of 16,427 (69.88%), 3,005 (12.78%), 1,337 (5.69%), and 1,062 (4.52%), respectively (Fig. 8d).\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec19\"\u003e\n \u003ch2\u003e3.7 Phylogenetic reconstruction and PCA\u003c/h2\u003e\n \u003cp\u003eBoth two nuclear genomic reconstruction suggested that the \u003cem\u003eG. forrestii\u003c/em\u003e located at the basal position (Fig. 9ab). However, plastid phylogenomic reconstruction indicated conflict structure. \u003cem\u003eG. floribunda\u003c/em\u003e var. \u003cem\u003egamblei\u003c/em\u003e and \u003cem\u003eG. pinnata\u003c/em\u003e are clustered at basal position, (Fig. 9c). The Maximum Likelihood (ML) reconstruction of the nuclear genome with \u003cem\u003eG. forrestii\u003c/em\u003e as the reference genome, reveals that \u003cem\u003eG. forrestii\u003c/em\u003e constitutes a monophyletic lineage in the Clade-A. \u003cem\u003eG. pinata\u003c/em\u003e and \u003cem\u003eG. floribunda\u003c/em\u003e var. \u003cem\u003egamblei\u003c/em\u003e individuals clustered into mixed clades. In the Clade-B, 2 individuals of \u003cem\u003eG. floribunda\u003c/em\u003e var. \u003cem\u003egamblei\u003c/em\u003e (from populations of FL_DTBC and FL_YXG) were found mixed with 6 individuals of \u003cem\u003eG. pinata\u003c/em\u003e (from Pi_YXG, Pi_SHZWY and Pi_DTS). In the Clade-C, 2 individuals of \u003cem\u003eG. pinata\u003c/em\u003e (from of Pi_YXG) were mixed with 4 individuals of \u003cem\u003eG. floribunda\u003c/em\u003e var. \u003cem\u003egamblei\u003c/em\u003e (from Fl_YXG, Fl_ML and Fl_SMC) (Fig. 9a). Phylogenomic reconstruction based on dataset using \u003cem\u003eB. sacra\u003c/em\u003e as the reference genome revealed similar patterns mentioned above, with differences lie in the internal topology of the branches (Fig. 9ab). But \u003cem\u003eG. forrestii\u003c/em\u003e does not form a monophyletic lineage (Clade-F, Clade-G and Clade-H), mixing free with the other two species (Fig. 9b). The plastid ML tree shows that \u003cem\u003eG. forrestii\u003c/em\u003e and \u003cem\u003eG. pinata\u003c/em\u003e are mixed with each other in the Clade-D ((Fig. 9c), \u003cem\u003eG. pinata\u003c/em\u003e and \u003cem\u003eG. floribunda\u003c/em\u003e var. \u003cem\u003egamblei\u003c/em\u003e are mixed in Clade-E (Fig. 9c).\u003c/p\u003e\n \u003cp\u003e25 individuals clustered into three distinct groups in the PCA (Fig. 10). In the first group, two individuals of \u003cem\u003eG. floribunda\u003c/em\u003e var. \u003cem\u003egamblei\u003c/em\u003e (from Fl_DTBC and Fl_YXG) were found mixed within the \u003cem\u003eG. pinata\u003c/em\u003e group, whereas within the predominantly \u003cem\u003eG. floribunda\u003c/em\u003e var. \u003cem\u003egamblei\u003c/em\u003e group, two individuals of \u003cem\u003eG. pinata\u003c/em\u003e (from Pi_YXG) were also mixed.\u003c/p\u003e\n\u003c/div\u003e"},{"header":"4 Discussion","content":"\u003cdiv id=\"Sec21\" class=\"Section2\"\u003e \u003ch2\u003e4.1 Genome Characterization\u003c/h2\u003e \u003cp\u003eWe observed variations in genome size when measuring it using both k-mer analysis and flow cytometry for the same individual. In this study, for the first time, we conducted genome survey sequencing on \u003cem\u003eG. forrestii\u003c/em\u003e and obtained\u0026thinsp;~\u0026thinsp;53.97 GB of clean data. The 19 K-mer analyses showed that the \u003cem\u003eG. forrestii\u003c/em\u003e genome was approximately 483.51 Mb, slightly smaller than that by flow cytometry (~\u0026thinsp;520.00 Mb) (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e, \u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e). We observed similar variations in other published studies and statistically analyzed the differences among these 15 species (Table\u0026nbsp;\u003cspan refid=\"Tab8\" class=\"InternalRef\"\u003e8\u003c/span\u003e). The genomic size differences obtained by these two methods ranged from 16.00 Mb to 138.09 Mb, with \u003cem\u003eG. forrestii\u003c/em\u003e showing a difference of 37 Mb, falling within the range of differences we calculated and being lower than the average difference of 82.93 Mb. Different principles of the two methods determined the differences. Flow cytometry is a technique based on staining undamaged nuclei with a fluorescent dye that adheres quantitatively to the DNA to calculate the amount of DNA. Processes are varying in different laboratories in sample preparation, staining/staining strategies. Random drift of the instruments may lead to significant deviation in genome size estimation [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e, \u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e]. K-mer approach emerged as a computational technique to generate a k-mer frequency distribution (similar to a Poisson distribution) by plotting the coverage distribution of all k-mers in a sequence, where the peak of the distribution would be centered on the average sequencing depth of the genome. The genome size would be better inferred by directly sequencing reads and analyzing the frequency of k-mers [\u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e].\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab8\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 8\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eGenome Size Difference between Flow Cytometry and K-mer\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"7\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSpecies\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eGenome Size of K-mer\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eGenome Size of Flow Cytometry\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eAbsolute Value (Low Flow Cytometry - K-mer)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eAbsolute Value (High Flow Cytometry - K-mer)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eAverage\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003eArticle\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cem\u003eCarex cristatella\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e317.3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e255.2\u0026thinsp;\u0026plusmn;\u0026thinsp;13.1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e75.2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e49\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e62.1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\" morerows=\"3\" rowspan=\"4\"\u003e \u003cp\u003eGenome Survey Sequencing for the Characterization of the Genetic Background of Rosa roxburghii Tratt and Leaf Ascorbate Metabolism Genes\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cem\u003eCarex scoparia\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e294.5\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e268.8\u0026thinsp;\u0026plusmn;\u0026thinsp;9.7\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e35.4\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e16\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e25.7\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cem\u003eJuncus effusus\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e225.5\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e198.9\u0026thinsp;\u0026plusmn;\u0026thinsp;4.6\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e31.2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e22\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e26.6\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cem\u003eJuncus inflexus\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e196\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e286.4\u0026thinsp;\u0026plusmn;\u0026thinsp;4.7\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e85.7\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e95.1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e90.4\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cem\u003eRaddia distichophylla\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e608\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e589\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e19\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e19\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eThe draft genome sequence of herbaceous diploid bamboo\u0026nbsp;Raddia distichophylla\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cem\u003ePsammochloa villosa\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e1564\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1503.27\u0026thinsp;\u0026plusmn;\u0026thinsp;3.41\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e64.14\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e57.32\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e60.73\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eEstimation\u0026nbsp;of\u0026nbsp;genome size\u0026nbsp;for\u0026nbsp;Psammochloa villosa\u0026nbsp;by\u0026nbsp;flow cytometry\u0026nbsp;and\u0026nbsp;K-mer analysis\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cem\u003eReseda lutea\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e934\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e867.7\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e66.3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e66.3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eEstimation of Genome Size in the Endemic Species\u0026nbsp;Reseda pentagyna\u0026nbsp;and the Locally Rare Species\u0026nbsp;Reseda lutea\u0026nbsp;Using comparative Analyses of Flow Cytometry and K-Mer Approaches\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cem\u003eR. pentagyna\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e1022\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e896\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e126\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e126\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cem\u003eAspalathus linearis\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e1070\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1.24\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e180\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e160\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e170\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eRooibos (Aspalathus linearis) Genome Size Estimation Using Flow Cytometry and K-Mer Analyses\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cem\u003eAcer henryi\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e561.72\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e691.12\u0026thinsp;\u0026plusmn;\u0026thinsp;8.69\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e120.71\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e138.09\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e129.4\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\" morerows=\"5\" rowspan=\"6\"\u003e \u003cp\u003eEstimation of genome sizes of six\u0026nbsp;Acer\u0026nbsp;species by flow cytometry and K-mer analysis\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cem\u003eA. buergerianum\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e743\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e863.90\u0026thinsp;\u0026plusmn;\u0026thinsp;8.69\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e112.21\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e129.59\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e120.9\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cem\u003eA. elegantulum\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e777.87\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e896.50\u0026thinsp;\u0026plusmn;\u0026thinsp;4.35\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e114.28\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e122.98\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e118.63\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cem\u003eA. griseum\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e771.51\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e893.24\u0026thinsp;\u0026plusmn;\u0026thinsp;8.69\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e113.04\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e130.42\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e121.73\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cem\u003eA. pentaphyllum\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e650.64\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e766.10\u0026thinsp;\u0026plusmn;\u0026thinsp;8.69\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e106.77\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e124.15\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e115.46\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cem\u003eA. tegmentosum\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e1103.46\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1 154.04\u0026thinsp;\u0026plusmn;\u0026thinsp;13.04\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e37.9\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e63.98\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e50.94\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eGC content of \u003cem\u003eG. forrestii\u003c/em\u003e is ~\u0026thinsp;35.16% (Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003e), based on most of the plant genetic data summarized by previous authors, it was found that most of the GC content ranged from 30\u0026ndash;47% [\u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e37\u003c/span\u003e]. This value can serve as an indicator of genome stability, as DNA sequences with higher GC content tend to exhibit tolerance to extreme temperatures and drought [\u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e, \u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e]. We assessed the genomic GC content of three species at shallow sequencing depths, assembled to the contig level. It was observed that \u003cem\u003eG. forrestii\u003c/em\u003e has the lowest GC content (~\u0026thinsp;33.60%), \u003cem\u003eG. floribunda\u003c/em\u003e var. \u003cem\u003egamblei\u003c/em\u003e has the highest (~\u0026thinsp;33.94%), while \u003cem\u003eG. pinnata\u003c/em\u003e falls in between (~\u0026thinsp;33.81%) (Table\u0026nbsp;\u003cspan refid=\"Tab9\" class=\"InternalRef\"\u003e9\u003c/span\u003e). This GC content may be due to the species' adaptation to environmental stress [\u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e40\u003c/span\u003e], and it may be similar. As a report found that the highest GC content was found in gramineous plants [\u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e]. During the Tertiary period's global cooling, Poaceae plants underwent differentiation and adaptation to this stress, consequently establishing their dominance in thriving under today's extreme climatic conditions [\u003cspan additionalcitationids=\"CR43\" citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e44\u003c/span\u003e]. Two species of \u003cem\u003eG. floribunda\u003c/em\u003e var. \u003cem\u003egamblei\u003c/em\u003e and \u003cem\u003eG. pinnata\u003c/em\u003e did suffer more interspecific competition from other higher tree species from rainforests and from longer, hotter and drier situations in South Yunnan area.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab9\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 9\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eThree Species GC Content\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"3\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSpecies\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eIndividual\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eGC Content (%)\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"7\" rowspan=\"8\"\u003e \u003cp\u003eG. pinnata\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ePi_BB\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e33.7\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ePi_BB\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e33.67\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ePi_BB\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e33.87\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ePi_YXG\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e33.59\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ePi_YXG\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e33.63\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ePi_YXG\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e34.29\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ePi_SHZWY\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e33.97\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ePi_DTS\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e33.5\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAverage\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e \u003cp\u003e33.7775\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"4\" rowspan=\"5\"\u003e \u003cp\u003eG. floribunda var. gamblei\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eFl_SMC\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e34.52\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eFl_SMC\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e34.51\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eFl_YXG\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e33.63\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eFl_ML\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e33.56\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eFl_DTBC\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e33.64\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAverage\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e \u003cp\u003e33.972\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"10\" rowspan=\"11\"\u003e \u003cp\u003eG. forrestii\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eFo_XP\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e33.59\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eFo_DC\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e33.69\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eFo_DC\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e33.59\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eFo_LJZ\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e33.62\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eFo_HN\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e33.81\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eFo_HN\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e33.6\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eFo_KYC\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e33.91\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eFo_KYC\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e33.7\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eFo_NXH\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e33.7\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eFo_YJZ\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e33.52\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eFo_FF\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e33.57\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAverage\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e \u003cp\u003e33.66363636\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eThe individual Fl_YXG of \u003cem\u003eG. floribunda\u003c/em\u003e var. \u003cem\u003egamblei\u003c/em\u003e was not included in the calculation.\u003c/p\u003e \u003cp\u003ePlant genomes present the most challenging task for sequencing and assembly due to their high levels of heterozygosity, complex polyploidy, and abundant repeat content [\u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e45\u003c/span\u003e]. A genome with a heterozygosity exceeding 1% becomes highly challenging for de novo assembly [\u003cspan citationid=\"CR46\" class=\"CitationRef\"\u003e46\u003c/span\u003e]. However, the K-mer analysis in this study indicated a relatively low heterozygosity of approximately 0.54% in the \u003cem\u003eG. forrestii\u003c/em\u003e genome (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003e), making de novo assembly in this aspect more accurate. The repeat rate of the genome is 51.54%, This high repeat rate likely due to gene family expansion [\u003cspan citationid=\"CR47\" class=\"CitationRef\"\u003e47\u003c/span\u003e], gene transposition activity [\u003cspan citationid=\"CR48\" class=\"CitationRef\"\u003e48\u003c/span\u003e] and genome recombination [\u003cspan citationid=\"CR49\" class=\"CitationRef\"\u003e49\u003c/span\u003e]. Chromosome-level assembly for \u003cem\u003eG. forrestii\u003c/em\u003e is needed to gain deeper insights into its genome and facilitate a comprehensive understanding of speciation and population dynamics of this species.\u003c/p\u003e \u003cp\u003eSSR has consistently been one of the most preferred molecular markers for plant genotyping due to its high level of polymorphism, wide distribution in the majority of plant genomes, user-friendly, and anticipated significance as a valuable tool for numerous species in the future [\u003cspan additionalcitationids=\"CR51\" citationid=\"CR50\" class=\"CitationRef\"\u003e50\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR52\" class=\"CitationRef\"\u003e52\u003c/span\u003e]. Mononucleotide repeats are the predominant type, in accordance with previous reports [\u003cspan citationid=\"CR53\" class=\"CitationRef\"\u003e53\u003c/span\u003e]. The high abundance of AT/AT (48,604; 71.99%) motifs (Fig.\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e8\u003c/span\u003ec) in our study was consistent with some earlier genomic surveys. As an illustration, \u003cem\u003eAkebia trifoliata\u003c/em\u003e exhibits an AT/AT content of 50.21% [\u003cspan citationid=\"CR54\" class=\"CitationRef\"\u003e54\u003c/span\u003e], while \u003cem\u003eCunninghamia lanceolata\u003c/em\u003e (Lamb.) Hook. showcases a content of 59% [\u003cspan citationid=\"CR55\" class=\"CitationRef\"\u003e55\u003c/span\u003e], and \u003cem\u003eAcer truncatum\u003c/em\u003e Bunge boasts a significantly higher AT/AT content at 71.31% [\u003cspan citationid=\"CR56\" class=\"CitationRef\"\u003e56\u003c/span\u003e], confirming their representation as the most typical dinucleotide motifs in higher plants.\u003c/p\u003e \u003cp\u003eWe investigated several species exited also in warm and hot valleys, \u003cem\u003eTerminalia franchetii\u003c/em\u003e Gagnep. [\u003cspan citationid=\"CR57\" class=\"CitationRef\"\u003e57\u003c/span\u003e], \u003cem\u003eOsteomeles schwerinae\u003c/em\u003e Schneid. [\u003cspan citationid=\"CR58\" class=\"CitationRef\"\u003e58\u003c/span\u003e], \u003cem\u003eNouelia insignis\u003c/em\u003e Franch. [\u003cspan citationid=\"CR59\" class=\"CitationRef\"\u003e59\u003c/span\u003e], \u003cem\u003eBuddleja crispa\u003c/em\u003e Benth. [\u003cspan citationid=\"CR60\" class=\"CitationRef\"\u003e60\u003c/span\u003e] and \u003cem\u003eExcoecaria acerifolia\u003c/em\u003e Didr. [\u003cspan citationid=\"CR61\" class=\"CitationRef\"\u003e61\u003c/span\u003e] for a comparison analysis. No raw reads of four species mentioned can be found in the GenBank for genome survey. Our data would provide a first view for evolutionary pattern of this area.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec22\" class=\"Section2\"\u003e \u003ch2\u003e4.2 Nuclear-Cytoplasmic Conflict\u003c/h2\u003e \u003cp\u003eIn this study, the topography of the basal clades is different among phylogenomic trees. The \u003cem\u003eG. forrestii\u003c/em\u003e referred tree indicates that \u003cem\u003eG. forrestii\u003c/em\u003e is monophyletic, while the \u003cem\u003eB. sacra\u003c/em\u003e referred tree suggests polyphyletic. For if we used a reference genome from other genus could have overestimated more ancient characters in the dataset. Consequently, a significantly higher number of SNP loci were generated through the alignment, three times more than those produced in the latter (Table\u0026nbsp;\u003cspan refid=\"Tab10\" class=\"InternalRef\"\u003e10\u003c/span\u003e). The disparity in SNP loci counts results in the observed differences in the topological structures of the two nuclear gene phylogenetic trees [\u003cspan citationid=\"CR62\" class=\"CitationRef\"\u003e62\u003c/span\u003e].\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab10\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 10\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eSNP Calling Information\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"4\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eReference\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNumber of Individual\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eNumber of SNP Sites\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eVCF Size (Mb)\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cem\u003eG. forrestii\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e26\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e244936\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e43\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cem\u003eB. scara\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e26\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e79691\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e15.5\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eThe topological structure distinctly shows a significant increase in the branch length of Fo_HN relative to other taxonomic units, designating it as the earliest diverging species. This could indicate a result of long-branch attraction, explaining the anomalous position in the phylogenetic tree [\u003cspan citationid=\"CR63\" class=\"CitationRef\"\u003e63\u003c/span\u003e, \u003cspan citationid=\"CR64\" class=\"CitationRef\"\u003e64\u003c/span\u003e]. The basal clade of the chloroplast genome phylogenetic tree is composed of \u003cem\u003eG. pinnata\u003c/em\u003e and \u003cem\u003eG. floribunda\u003c/em\u003e var. \u003cem\u003egamblei\u003c/em\u003e, showing a nuclear-cytoplasmic conflict with the two nuclear gene trees. Hybridization could properly explain [\u003cspan citationid=\"CR65\" class=\"CitationRef\"\u003e65\u003c/span\u003e, \u003cspan citationid=\"CR66\" class=\"CitationRef\"\u003e66\u003c/span\u003e]. The rapid elevation of the QTP at the beginning of the Paleogene, and the formation and intensification of the monsoon climate zone brought abundant precipitation leading to river erosion, and that the rapid downcutting and uptake events led to changes in the spatial pattern of the PRR drainage system, which in turn affected the geographic distribution pattern of plants [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e, \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e, \u003cspan citationid=\"CR67\" class=\"CitationRef\"\u003e67\u003c/span\u003e, \u003cspan citationid=\"CR68\" class=\"CitationRef\"\u003e68\u003c/span\u003e]. Their sympatric distribution in river valleys may produce interspecific hybridization as otherwise disconnected river systems are linked together, making it easier for species to migrate and spread along river valleys. A previous report on the study of three river valley species of the genus \u003cem\u003eOstryopsis\u003c/em\u003e Decne indicates that the inconsistent interspecific relationship of \u003cem\u003eOstryopsis intermedia\u003c/em\u003e B. Tian \u0026amp; J. Q. Liu with the other two species (\u003cem\u003eOstryopsis davidiana\u003c/em\u003e Decaisne and \u003cem\u003eOstryopsis nobilis\u003c/em\u003e I. B. Balfour et W. W. Smith) recovered from two different sets of molecular markers strongly indicates its origin through hybrid speciation in the southeast Qinghai-Tibet Plateau through hybrid speciation [\u003cspan citationid=\"CR69\" class=\"CitationRef\"\u003e69\u003c/span\u003e]. To address this issue in the \u003cem\u003eGaruga\u003c/em\u003e genus, we plan to conduct further Hi-C or Hi-Fi sequencing in the future, studying at the chromosomal level to acquire additional genetic information. By employing sophisticated models, we aim to consider this situation comprehensively and reconstruct the evolutionary history between species more accurately.\u003c/p\u003e \u003c/div\u003e"},{"header":"5 Conclusion","content":"\u003cp\u003eThis study conducted a preliminary investigation into the genome size and characteristics of \u003cem\u003eG. forrestii\u003c/em\u003e for the first time and reconstructed the phylogenetic relationships of three species within the \u003cem\u003eGaruga\u003c/em\u003e genus, namely \u003cem\u003eG. floribunda\u003c/em\u003e var. \u003cem\u003egamblei\u003c/em\u003e, \u003cem\u003eG. pinnata\u003c/em\u003e, and \u003cem\u003eG. forrestii\u003c/em\u003e. It provided valuable genomic resources for the further exploration and utilization of \u003cem\u003eG. forrestii\u003c/em\u003e and offered initial insights into the phylogenetic relationships within the \u003cem\u003eGaruga\u003c/em\u003e genus. However, there is further work to be done as the construction of two nuclear gene phylogenetic trees for these three species resulted in differences in the topological structure at the base clade (\u003cem\u003eG. forrestii\u003c/em\u003e) with one being monophyletic and the other polyphyletic. Additionally, conflicts arose between the two nuclear gene phylogenetic trees and the chloroplast genome phylogenetic tree (where the base clade in the nuclear gene trees is \u003cem\u003eG. forrestii\u003c/em\u003e, while in the chloroplast genome tree, it includes \u003cem\u003eG. floribunda\u003c/em\u003e var. \u003cem\u003egamblei\u003c/em\u003e, \u003cem\u003eG. pinnata\u003c/em\u003e, and \u003cem\u003eG. forrestii\u003c/em\u003e). To clearly explain the differences in the topology of nuclear trees and the nuclear-cytoplasmic conflict with the chloroplast tree, we plan to supplement the study with Hi-C or Hi-Fi sequencing at the chromosomal level in future whole-genome sequencing. This will lead to better assembly results, and we will use complex models to study the aforementioned issues.\u003c/p\u003e \u003cp\u003eThe genetic data presented in this study holds significant value for comprehensive whole-genome analyses, the evaluation of population genetic diversity, investigations into adaptive evolution, the advancement of artificial breeding efforts, and the support of species conservation and restoration initiatives. Ultimately, this research contributes to reinforcing the conservation and management of natural ecosystems, promoting biodiversity conservation, and advancing sustainable development.\u003c/p\u003e"},{"header":"Declarations","content":"\u003ch3\u003e\u003cstrong\u003eEthics approval and consent to participate\u003c/strong\u003e\u003c/h3\u003e\n\u003cp\u003eNot applicable\u003c/p\u003e\n\u003ch3\u003e\u003cstrong\u003eConsent for publication\u003c/strong\u003e\u003c/h3\u003e\n\u003cp\u003eNot applicable\u003c/p\u003e\n\u003ch3\u003e\u003cstrong\u003eAvailability of data and materials\u003c/strong\u003e\u003c/h3\u003e\n\u003cp\u003eA raw sequence data of \u003cem\u003eGaruga floribunda\u003c/em\u003e var. \u003cem\u003egamblei\u003c/em\u003e were deposited in NCBI under the BioProject ID: PRJNA783803. The other raw sequence data reported in this paper have been deposited in the Genome Sequence Archive (Genomics, Proteomics \u0026amp; Bioinformatics 2021) in National Genomics Data Center (Nucleic Acids Res 2022), China National Center for Bioinformation / Beijing Institute of Genomics, Chinese Academy of Sciences (GSA: CRA014668, CRA014684, CRA014694, CRA014677) that are publicly accessible at https://ngdc.cncb.ac.cn/gsa. The whole genome sequence data reported in this paper have been deposited in the Genome Warehouse in National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences / China National Center for Bioinformation, under accession number GWHERCJ00000000 that is publicly accessible at https://ngdc.cncb.ac.cn/gwh. The chloroplast assembly and annotation data for this study are deposited in NCBI with accession numbers PP337695-PP337718.\u003c/p\u003e\n\u003ch3\u003e\u003cstrong\u003eCompeting interests\u003c/strong\u003e\u003c/h3\u003e\n\u003cp\u003eThe authors declare that they have no competing interests\u003c/p\u003e\n\u003ch3\u003e\u003cstrong\u003eFunding\u003c/strong\u003e\u003c/h3\u003e\n\u003cp\u003eThis research was funded by the National Natural Science Foundation of China (grant No. 42161015).\u003c/p\u003e\n\u003ch3\u003e\u003cstrong\u003eAuthors\u0026rsquo; contribution\u003c/strong\u003e\u003c/h3\u003e\n\u003cp\u003eLLY and HS conceived the study. CMM collected plant materials and drew the figure. RC organized the plant materials. RR performed data analysis for SNP calling and generated plots. DBZ carried out genome assembly, data analysis, figure generation, and contributed to the paper writing. DBZ and RR contribute equally in this study. All authors read and approved the final manuscript.\u003c/p\u003e\n\u003ch3\u003e\u003cstrong\u003eAuthors\u0026rsquo; information\u003c/strong\u003e\u003c/h3\u003e\n\u003cp\u003e\u003cstrong\u003e(1) Authors and Affiliations\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eSouthwest Forestry University, Kunming, PR, 650224, China\u003c/p\u003e\n\u003cp\u003eDongbo Zhu ([email protected])\u003c/p\u003e\n\u003cp\u003eSouthwest Forestry University, Kunming, PR, 650224, China\u003c/p\u003e\n\u003cp\u003eRui Rao ([email protected])\u003c/p\u003e\n\u003cp\u003eTechnology Center of Kunming Customs, Kunming, PR, 650228, China\u003c/p\u003e\n\u003cp\u003eYu Du ([email protected])\u003c/p\u003e\n\u003cp\u003eSouthwest Forestry University, Kunming, PR, 650224, China\u003c/p\u003e\n\u003cp\u003eChunmin Mao ([email protected])\u003c/p\u003e\n\u003cp\u003eSouthwest Forestry University, Kunming, PR, 650224, China\u003c/p\u003e\n\u003cp\u003eRong Chen ([email protected])\u003c/p\u003e\n\u003cp\u003eKunming Institute of Botony, Chinese Academy of Science, Kunming, PR, 650201, China\u003c/p\u003e\n\u003cp\u003eHang Sun ([email protected])\u003c/p\u003e\n\u003cp\u003eYunnan Key Laboratory of Plateau Wetland Conservation, Restoration and Ecological Services, Southwest Forestry University, Kunming, PR, 650224, China. National Plateau Wetland Research Center, Kunming, PR, 650224, China\u003c/p\u003e\n\u003cp\u003eLiang-Liang Yue ([email protected])\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e(2) Contributions\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eLLY and HS conceived the study. CMM collected plant materials and drew the figure. RC organized the plant materials. RR performed data analysis for SNP calling and generated plots. DBZ carried out genome assembly, data analysis, figure generation, and contributed to the paper writing. DBZ and RR contribute equally in this study. All authors read and approved the final manuscript.\u003c/p\u003e\n\u003cp\u003e(3) \u003cstrong\u003eCorresponding authors\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eCorrespondence to Liangliang Yue and Hang Sun\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e(4) Acknowledgements\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe appreciate the assistance of Yanchun Liu from the Shanghai Botanical Garden and Jinlong Dong from the Xishuangbanna Tropical Botanical Garden for their support in sample collection for this study.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eSpicer RA, Farnsworth A, Su T: \u003cstrong\u003eCenozoic topography, monsoons and biodiversity conservation within the Tibetan Region: An evolving story\u003c/strong\u003e. \u003cem\u003ePlant Diversity \u003c/em\u003e2020, \u003cstrong\u003e42\u003c/strong\u003e(4):229-254.\u003c/li\u003e\n\u003cli\u003eNie J, Ruetenik G, Gallagher K, Hoke G, Garzione CN, Wang W, Stockli D, Hu X, Wang Z, Wang Y\u003cem\u003e et al\u003c/em\u003e: \u003cstrong\u003eRapid incision of the Mekong River in the middle Miocene linked to monsoonal precipitation\u003c/strong\u003e. \u003cem\u003eNature Geoscience \u003c/em\u003e2018, \u003cstrong\u003e11\u003c/strong\u003e(12):944-948.\u003c/li\u003e\n\u003cli\u003eMing Q, Shi Z, Zhang H: \u003cstrong\u003eThe evolution of the landform and environment in the region of the three parallel rivers\u003c/strong\u003e. \u003cem\u003eTropical geography \u003c/em\u003e2006, \u003cstrong\u003e26\u003c/strong\u003e(2):122.\u003c/li\u003e\n\u003cli\u003eClark MK, Schoenbohm LM, Royden LH, Whipple KX, Burchfiel BC, Zhang X, Tang W, Wang E, Chen L: \u003cstrong\u003eSurface uplift, tectonics, and erosion of eastern Tibet from large\u003c/strong\u003e\u003cstrong\u003e‐scale drainage patterns\u003c/strong\u003e. \u003cem\u003eTectonics \u003c/em\u003e2004, \u003cstrong\u003e23\u003c/strong\u003e(1).\u003c/li\u003e\n\u003cli\u003eBrookfield ME: \u003cstrong\u003eThe evolution of the great river systems of southern Asia during the Cenozoic India-Asia collision: rivers draining southwards\u003c/strong\u003e. \u003cem\u003eGeomorphology \u003c/em\u003e1998, \u003cstrong\u003e22\u003c/strong\u003e(3-4):285-312.\u003c/li\u003e\n\u003cli\u003eSun H, Li Z, Landis JB, Qian L, Zhang T, Deng T: \u003cstrong\u003eEffects of drainage reorganization on phytogeographic pattern in Sino-Himalaya\u003c/strong\u003e. \u003cem\u003eAlpine Botany \u003c/em\u003e2021, \u003cstrong\u003e132\u003c/strong\u003e(1):141-151.\u003c/li\u003e\n\u003cli\u003eSun H, Zhang J, Deng T, Boufford DE: \u003cstrong\u003eOrigins and evolution of plant diversity in the Hengduan Mountains, China\u003c/strong\u003e. \u003cem\u003ePlant Diversity \u003c/em\u003e2017, \u003cstrong\u003e39\u003c/strong\u003e(4):161-166.\u003c/li\u003e\n\u003cli\u003eYu T, Hu Y, Zhang Y, Zhao R, Yan X, Dayananda B, Wang J, Jiao Y, Li J, Yi X\u003cem\u003e et al\u003c/em\u003e: \u003cstrong\u003eWhole-Genome Sequencing ofAcer catalpifoliumReveals Evolutionary History of Endangered Species\u003c/strong\u003e. \u003cem\u003eGenome Biology and Evolution \u003c/em\u003e2021, \u003cstrong\u003e13\u003c/strong\u003e(12).\u003c/li\u003e\n\u003cli\u003eHert DG, Fredlake CP, Barron AE: \u003cstrong\u003eAdvantages and limitations of next\u003c/strong\u003e\u003cstrong\u003e‐generation sequencing technologies: A comparison of electrophoresis and non\u003c/strong\u003e\u003cstrong\u003e‐electrophoresis methods\u003c/strong\u003e. \u003cem\u003eElectrophoresis \u003c/em\u003e2008, \u003cstrong\u003e29\u003c/strong\u003e(23):4618-4626.\u003c/li\u003e\n\u003cli\u003eAird D, Ross MG, Chen W, Danielsson M, Fennell T, Russ C, Jaffe DB, Nusbaum C, Gnirke A: \u003cstrong\u003eAnalyzing and minimizing PCR amplification bias in Illumina sequencing libraries\u003c/strong\u003e. \u003cem\u003eGenome Biology \u003c/em\u003e2011, \u003cstrong\u003e12\u003c/strong\u003e(2):1-14.\u003c/li\u003e\n\u003cli\u003eBi Q, Zhao Y, Cui Y, Wang L: \u003cstrong\u003eGenome survey sequencing and genetic background characterization of yellow horn based on next-generation sequencing\u003c/strong\u003e. \u003cem\u003eMolecular Biology Reports \u003c/em\u003e2019, \u003cstrong\u003e46\u003c/strong\u003e(4):4303-4312.\u003c/li\u003e\n\u003cli\u003eHuang G, Cao J, Chen C, Wang M, Liu Z, Gao F, Yi M, Chen G, Lu M: \u003cstrong\u003eGenome Survey of Misgurnus Anguillicaudatus to Identify Genomic Information, Simple Sequence Repeat (SSR) Markers and Mitochondrial Genome\u003c/strong\u003e. \u003cem\u003eReaserch Square \u003c/em\u003e2021.\u003c/li\u003e\n\u003cli\u003eLiang X, Bai T, Wang J, Jiang W: \u003cstrong\u003eGenome survey and development of 13 SSR markers in Eucalyptus cloeziana by NGS\u003c/strong\u003e. \u003cem\u003eJournal of Genetics \u003c/em\u003e2022, \u003cstrong\u003e101\u003c/strong\u003e(2).\u003c/li\u003e\n\u003cli\u003eDoyle JJ: \u003cstrong\u003eA rapid DNA isolation procedure for small quantities of fresh leaf tissue\u003c/strong\u003e. \u003cem\u003ePhytochem Bull \u003c/em\u003e1987, \u003cstrong\u003e19\u003c/strong\u003e:11-15.\u003c/li\u003e\n\u003cli\u003eDolezel J: \u003cstrong\u003ePlant DNA Flow Cytometry and Estimation of Nuclear Genome Size\u003c/strong\u003e. \u003cem\u003eAnnals of Botany \u003c/em\u003e2005, \u003cstrong\u003e95\u003c/strong\u003e(1):99-110.\u003c/li\u003e\n\u003cli\u003eDoležel J, Greilhuber J, Suda J: \u003cstrong\u003eEstimation of nuclear DNA content in plants using flow cytometry\u003c/strong\u003e. \u003cem\u003eNature Protocols \u003c/em\u003e2007, \u003cstrong\u003e2\u003c/strong\u003e(9):2233-2244.\u003c/li\u003e\n\u003cli\u003eXinming T, Xiangyan Z, Na G: \u003cstrong\u003eApplications of Flow Cytometry in Plant Research\u0026mdash;Analysis of Nuclear DNA Content and Ploidy Level in Plant Cells\u003c/strong\u003e. \u003cem\u003eChinese Agricultural Science Bulletin \u003c/em\u003e2011, \u003cstrong\u003e27\u003c/strong\u003e(9):21-27.\u003c/li\u003e\n\u003cli\u003eChen S, Zhou Y, Chen Y, Gu J: \u003cstrong\u003efastp: an ultra-fast all-in-one FASTQ preprocessor\u003c/strong\u003e. \u003cem\u003eBioinformatics \u003c/em\u003e2018, \u003cstrong\u003e34\u003c/strong\u003e(17):i884-i890.\u003c/li\u003e\n\u003cli\u003eRanallo-Benavidez TR, Jaron KS, Schatz MC: \u003cstrong\u003eGenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes\u003c/strong\u003e. \u003cem\u003eNature Communications \u003c/em\u003e2020, \u003cstrong\u003e11\u003c/strong\u003e(1):1432.\u003c/li\u003e\n\u003cli\u003eMar\u0026ccedil;ais G, Kingsford C: \u003cstrong\u003eA fast, lock-free approach for efficient parallel counting of occurrences of k-mers\u003c/strong\u003e. \u003cem\u003eBioinformatics \u003c/em\u003e2011, \u003cstrong\u003e27\u003c/strong\u003e(6):764-770.\u003c/li\u003e\n\u003cli\u003eLi D, Liu C, Luo R, Sadakane K, Lam T: \u003cstrong\u003eMEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph\u003c/strong\u003e. \u003cem\u003eBioinformatics \u003c/em\u003e2015, \u003cstrong\u003e31\u003c/strong\u003e(10):1674-1676.\u003c/li\u003e\n\u003cli\u003eLuo R, Liu B, Xie Y, Li Z, Huang W, Yuan J, He G, Chen Y, Pan Q, Liu Y\u003cem\u003e et al\u003c/em\u003e: \u003cstrong\u003eSOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler\u003c/strong\u003e. \u003cem\u003eGigaScience \u003c/em\u003e2012, \u003cstrong\u003e1\u003c/strong\u003e(1):2047-2217X-2041-2018.\u003c/li\u003e\n\u003cli\u003eChen Y, Chen Y, Shi C, Huang Z, Zhang Y, Li S, Li Y, Ye J, Yu C, Li Z\u003cem\u003e et al\u003c/em\u003e: \u003cstrong\u003eSOAPnuke: a MapReduce acceleration-supported software for integrated quality control and preprocessing of high-throughput sequencing data\u003c/strong\u003e. \u003cem\u003eGigaScience \u003c/em\u003e2018, \u003cstrong\u003e7\u003c/strong\u003e(1):gix120.\u003c/li\u003e\n\u003cli\u003eJin J, Yu W, Yang J, Song Y, dePamphilis CW, Yi T, Li D: \u003cstrong\u003eGetOrganelle: a fast and versatile toolkit for accurate de novo assembly of organelle genomes\u003c/strong\u003e. \u003cem\u003eGenome Biology \u003c/em\u003e2020, \u003cstrong\u003e21\u003c/strong\u003e(1):241.\u003c/li\u003e\n\u003cli\u003eThiel T, Michalek W, Varshney R, Graner A: \u003cstrong\u003eExploiting EST databases for the development and characterization of gene-derived SSR-markers in barley (Hordeum vulgare L.)\u003c/strong\u003e. \u003cem\u003eTheoretical and Applied Genetics \u003c/em\u003e2003, \u003cstrong\u003e106\u003c/strong\u003e(3):411-422.\u003c/li\u003e\n\u003cli\u003eBeier S, Thiel T, M\u0026uuml;nch T, Scholz U, Mascher M, Valencia A: \u003cstrong\u003eMISA-web: a web server for microsatellite prediction\u003c/strong\u003e. \u003cem\u003eBioinformatics \u003c/em\u003e2017, \u003cstrong\u003e33\u003c/strong\u003e(16):2583-2585.\u003c/li\u003e\n\u003cli\u003eWysokar A, Tibbetts K, McCown M, Homer N, Fennell T: \u003cstrong\u003ePicard: A set of tools for working with next generation sequencing data in BAM format\u003c/strong\u003e. \u003cem\u003eRetrieved Aug \u003c/em\u003e2014.\u003c/li\u003e\n\u003cli\u003eLi H, Durbin R: \u003cstrong\u003eFast and accurate short read alignment with Burrows\u0026ndash;Wheeler transform\u003c/strong\u003e. \u003cem\u003eBioinformatics \u003c/em\u003e2009, \u003cstrong\u003e25\u003c/strong\u003e(14):1754-1760.\u003c/li\u003e\n\u003cli\u003eLi H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R: \u003cstrong\u003eThe Sequence Alignment/Map format and SAMtools\u003c/strong\u003e. \u003cem\u003eBioinformatics \u003c/em\u003e2009, \u003cstrong\u003e25\u003c/strong\u003e(16):2078-2079.\u003c/li\u003e\n\u003cli\u003eMcKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M\u003cem\u003e et al\u003c/em\u003e: \u003cstrong\u003eThe Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data\u003c/strong\u003e. \u003cem\u003eGenome Res \u003c/em\u003e2010, \u003cstrong\u003e20\u003c/strong\u003e(9):1297-1303.\u003c/li\u003e\n\u003cli\u003ePurcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, De Bakker PI, Daly M: \u003cstrong\u003ePLINK: a tool set for whole-genome association and population-based linkage analyses\u003c/strong\u003e. \u003cem\u003eThe American journal of human genetics \u003c/em\u003e2007, \u003cstrong\u003e81\u003c/strong\u003e(3):559-575.\u003c/li\u003e\n\u003cli\u003eNguyen L, Schmidt HA, von Haeseler A, Minh BQ: \u003cstrong\u003eIQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies\u003c/strong\u003e. \u003cem\u003eMolecular Biology and Evolution \u003c/em\u003e2015, \u003cstrong\u003e32\u003c/strong\u003e(1):268-274.\u003c/li\u003e\n\u003cli\u003eRambaut A: \u003cstrong\u003eFigTree v1. 4.2, a graphical viewer of phylogenetic trees\u003c/strong\u003e. In\u003cem\u003e.\u003c/em\u003e; 2014.\u003c/li\u003e\n\u003cli\u003eYang J, Lee SH, Goddard ME, Visscher PM: \u003cstrong\u003eGCTA: A Tool for Genome-wide Complex Trait Analysis\u003c/strong\u003e. \u003cem\u003eThe American Journal of Human Genetics \u003c/em\u003e2011, \u003cstrong\u003e88\u003c/strong\u003e(1):76-82.\u003c/li\u003e\n\u003cli\u003eDoležel J, Greilhuber J, Lucretti S, Meister A, Lys\u0026aacute;k MA, Nardi L, Obermayer R: \u003cstrong\u003ePlant Genome Size Estimation by Flow Cytometry: Inter-laboratory Comparison\u003c/strong\u003e. \u003cem\u003eAnnals of Botany \u003c/em\u003e1998, \u003cstrong\u003e82\u003c/strong\u003e(suppl_1):17-26.\u003c/li\u003e\n\u003cli\u003eLi X, Waterman MS: \u003cstrong\u003eEstimating the repeat structure and length of DNA sequences using ℓ-tuples\u003c/strong\u003e. \u003cem\u003eGenome Res \u003c/em\u003e2003, \u003cstrong\u003e13\u003c/strong\u003e(8):1916-1922.\u003c/li\u003e\n\u003cli\u003ePellegrini M, Shangguan L, Han J, Kayesh E, Sun X, Zhang C, Pervaiz T, Wen X, Fang J: \u003cstrong\u003eEvaluation of Genome Sequencing Quality in Selected Plant Species Using Expressed Sequence Tags\u003c/strong\u003e. \u003cem\u003ePLOS ONE \u003c/em\u003e2013, \u003cstrong\u003e8\u003c/strong\u003e(7):e69890.\u003c/li\u003e\n\u003cli\u003e\u0026Scaron;marda P, Kn\u0026aacute;pek O, Březinov\u0026aacute; A, Horov\u0026aacute; L, Grulich V, Danihelka J, Vesel\u0026yacute; P, \u0026Scaron;merda J, Rotreklov\u0026aacute; O, Bure\u0026scaron; P: \u003cstrong\u003eGenome sizes and genomic guanine+cytosine (GC) contents of the Czech vascular flora with new estimates for 1700 species\u003c/strong\u003e. \u003cem\u003ePreslia \u003c/em\u003e2019, \u003cstrong\u003e91\u003c/strong\u003e(2):117-142.\u003c/li\u003e\n\u003cli\u003e\u0026Scaron;marda P, Bure\u0026scaron; P, Horov\u0026aacute; L, Leitch IJ, Mucina L, Pacini E, Tich\u0026yacute; L, Grulich V, Rotreklov\u0026aacute; O: \u003cstrong\u003eEcological and evolutionary significance of genomic GC content diversity in monocots\u003c/strong\u003e. \u003cem\u003eProceedings of the National Academy of Sciences \u003c/em\u003e2014, \u003cstrong\u003e111\u003c/strong\u003e(39):E4096-E4102.\u003c/li\u003e\n\u003cli\u003eZanne AE, Tank DC, Cornwell WK, Eastman JM, Smith SA, FitzJohn RG, McGlinn DJ, O\u0026rsquo;Meara BC, Moles AT, Reich PB: \u003cstrong\u003eThree keys to the radiation of angiosperms into freezing environments\u003c/strong\u003e. \u003cem\u003eNature \u003c/em\u003e2014, \u003cstrong\u003e506\u003c/strong\u003e(7486):89-92.\u003c/li\u003e\n\u003cli\u003eSingh R, Ming R, Yu Q: \u003cstrong\u003eComparative Analysis of GC Content Variations in Plant Genomes\u003c/strong\u003e. \u003cem\u003eTropical Plant Biology \u003c/em\u003e2016, \u003cstrong\u003e9\u003c/strong\u003e(3):136-149.\u003c/li\u003e\n\u003cli\u003eStr\u0026ouml;mberg CA: \u003cstrong\u003eEvolution of grasses and grassland ecosystems\u003c/strong\u003e. \u003cem\u003eAnnual review of Earth and planetary sciences \u003c/em\u003e2011, \u003cstrong\u003e39\u003c/strong\u003e:517-544.\u003c/li\u003e\n\u003cli\u003eEdwards EJ, Osborne CP, Str\u0026ouml;mberg CA, Smith SA, Consortium CG, Bond WJ, Christin P-A, Cousins AB, Duvall MR, Fox DL: \u003cstrong\u003eThe origins of C4 grasslands: integrating evolutionary and ecosystem science\u003c/strong\u003e. \u003cem\u003escience \u003c/em\u003e2010, \u003cstrong\u003e328\u003c/strong\u003e(5978):587-591.\u003c/li\u003e\n\u003cli\u003eZachos J, Pagani M, Sloan L, Thomas E, Billups K: \u003cstrong\u003eTrends, rhythms, and aberrations in global climate 65 Ma to present\u003c/strong\u003e. \u003cem\u003escience \u003c/em\u003e2001, \u003cstrong\u003e292\u003c/strong\u003e(5517):686-693.\u003c/li\u003e\n\u003cli\u003eMichael TP, VanBuren R: \u003cstrong\u003eBuilding near-complete plant genomes\u003c/strong\u003e. \u003cem\u003eCurrent Opinion in Plant Biology \u003c/em\u003e2020, \u003cstrong\u003e54\u003c/strong\u003e:26-33.\u003c/li\u003e\n\u003cli\u003eZhou P, Zhang Q, Li J, Li F, Huang J, Zhang M: \u003cstrong\u003eA first insight into the genomic background of Ilex pubescens (Aquifoliaceae) by flow cytometry and genome survey sequencing\u003c/strong\u003e. \u003cem\u003eBMC Genomics \u003c/em\u003e2023, \u003cstrong\u003e24\u003c/strong\u003e(1).\u003c/li\u003e\n\u003cli\u003eHan Y, Luthe D: \u003cstrong\u003eIdentification and evolution analysis of the JAZ gene family in maize\u003c/strong\u003e. \u003cem\u003eBMC Genomics \u003c/em\u003e2021, \u003cstrong\u003e22\u003c/strong\u003e(1).\u003c/li\u003e\n\u003cli\u003eZhao D, Ferguson AA, Jiang N: \u003cstrong\u003eWhat makes up plant genomes: The vanishing line between transposable elements and genes\u003c/strong\u003e. \u003cem\u003eBiochimica et Biophysica Acta (BBA) - Gene Regulatory Mechanisms \u003c/em\u003e2016, \u003cstrong\u003e1859\u003c/strong\u003e(2):366-380.\u003c/li\u003e\n\u003cli\u003eHufton AL, Panopoulou G: \u003cstrong\u003ePolyploidy and genome restructuring: a variety of outcomes\u003c/strong\u003e. \u003cem\u003eCurrent Opinion in Genetics \u0026amp; Development \u003c/em\u003e2009, \u003cstrong\u003e19\u003c/strong\u003e(6):600-606.\u003c/li\u003e\n\u003cli\u003eHayden MJ, Nguyen TM, Waterman A, McMichael GL, Chalmers KJ: \u003cstrong\u003eApplication of multiplex-ready PCR for fluorescence-based SSR genotyping in barley and wheat\u003c/strong\u003e. \u003cem\u003eMolecular Breeding \u003c/em\u003e2007, \u003cstrong\u003e21\u003c/strong\u003e(3):271-281.\u003c/li\u003e\n\u003cli\u003eGramazio P, Plesa IM, Truta AM, Sestras AF, Vilanova S, Plazas M, Vicente O, Boscaiu M, Prohens J, Sestras RE: \u003cstrong\u003eHighly informative SSR genotyping reveals large genetic diversity and limited differentiation in European larch (Larixdecidua) populations from Romania\u003c/strong\u003e. \u003cem\u003eTurkish Journal of Agriculture and Forestry \u003c/em\u003e2018, \u003cstrong\u003e42\u003c/strong\u003e(3):165-175.\u003c/li\u003e\n\u003cli\u003eLiu XB, Feng B, Li J, Yan C, Yang ZL: \u003cstrong\u003eGenetic diversity and breeding history of Winter Mushroom (Flammulina velutipes) in China uncovered by genomic SSR markers\u003c/strong\u003e. \u003cem\u003eGene \u003c/em\u003e2016, \u003cstrong\u003e591\u003c/strong\u003e(1):227-235.\u003c/li\u003e\n\u003cli\u003eManee MM, Al-Shomrani BM, Al-Fageeh MB: \u003cstrong\u003eGenome-wide characterization of simple sequence repeats in Palmae genomes\u003c/strong\u003e. \u003cem\u003eGenes \u0026amp; Genomics \u003c/em\u003e2020, \u003cstrong\u003e42\u003c/strong\u003e(5):597-608.\u003c/li\u003e\n\u003cli\u003eZhang Z, Zhang J, Yang Q, Li B, Zhou W, Wang Z: \u003cstrong\u003eGenome survey sequencing and genetic diversity of cultivated Akebia trifoliata assessed via phenotypes and SSR markers\u003c/strong\u003e. \u003cem\u003eMolecular Biology Reports \u003c/em\u003e2021, \u003cstrong\u003e48\u003c/strong\u003e(1):241-250.\u003c/li\u003e\n\u003cli\u003eLin E, Zhuang H, Yu J, Liu X, Huang H, Zhu M, Tong Z: \u003cstrong\u003eGenome survey of Chinese fir (Cunninghamia lanceolata): Identification of genomic SSRs and demonstration of their utility in genetic diversity analysis\u003c/strong\u003e. \u003cem\u003eScientific Reports \u003c/em\u003e2020, \u003cstrong\u003e10\u003c/strong\u003e(1).\u003c/li\u003e\n\u003cli\u003eWang R, Fan J, Chang P, Zhu L, Zhao M, Li L: \u003cstrong\u003eGenome Survey Sequencing of Acer truncatum Bunge to Identify Genomic Information, Simple Sequence Repeat (SSR) Markers and Complete Chloroplast Genome\u003c/strong\u003e. \u003cem\u003eForests \u003c/em\u003e2019, \u003cstrong\u003e10\u003c/strong\u003e(2).\u003c/li\u003e\n\u003cli\u003eZhang T, Comes HP, Sun H: \u003cstrong\u003eChloroplast phylogeography of Terminalia franchetii (Combretaceae) from the eastern Sino-Himalayan region and its correlation with historical river capture events\u003c/strong\u003e. \u003cem\u003eMolecular Phylogenetics and Evolution \u003c/em\u003e2011, \u003cstrong\u003e60\u003c/strong\u003e(1):1-12.\u003c/li\u003e\n\u003cli\u003eGomory D, Wang Z, Chen S, Nie Z, Zhang J, Zhou Z, Deng T, Sun H: \u003cstrong\u003eClimatic Factors Drive Population Divergence and Demography: Insights Based on the Phylogeography of a Riparian Plant Species Endemic to the Hengduan Mountains and Adjacent Regions\u003c/strong\u003e. \u003cem\u003ePLOS ONE \u003c/em\u003e2015, \u003cstrong\u003e10\u003c/strong\u003e(12).\u003c/li\u003e\n\u003cli\u003eZhao Y, Gong X: \u003cstrong\u003eGenetic divergence and phylogeographic history of two closely related species (Leucomeris decora and Nouelia insignis) across the \u0026apos;Tanaka Line\u0026apos; in Southwest China\u003c/strong\u003e. \u003cem\u003eBMC Evolutionary Biology \u003c/em\u003e2015, \u003cstrong\u003e15\u003c/strong\u003e(1):134.\u003c/li\u003e\n\u003cli\u003eYue L, Chen G, Sun W, Sun H: \u003cstrong\u003ePhylogeography of Buddleja crispa (Buddlejaceae) and its correlation with drainage system evolution in southwestern China\u003c/strong\u003e. \u003cem\u003eAmerican Journal of Botany \u003c/em\u003e2012, \u003cstrong\u003e99\u003c/strong\u003e(10):1726-1735.\u003c/li\u003e\n\u003cli\u003eWang Z, Zhang T, Luo D, Sun W, Sun H: \u003cstrong\u003ePhylogeography of Excoecaria acerifolia (Euphorbiaceae) suggests combined effects of historical drainage reorganization events and climatic changes on riparian plants in the Sino\u0026ndash;Himalayan region\u003c/strong\u003e. \u003cem\u003eBotanical Journal of the Linnean Society \u003c/em\u003e2019, \u003cstrong\u003e192\u003c/strong\u003e(2):350-368.\u003c/li\u003e\n\u003cli\u003eLeach\u0026eacute; AD, Banbury BL, Felsenstein J, De Oca AN-M, Stamatakis A: \u003cstrong\u003eShort tree, long tree, right tree, wrong tree: new acquisition bias corrections for inferring SNP phylogenies\u003c/strong\u003e. \u003cem\u003eSystematic biology \u003c/em\u003e2015, \u003cstrong\u003e64\u003c/strong\u003e(6):1032-1047.\u003c/li\u003e\n\u003cli\u003ePhilippe H: \u003cstrong\u003eOpinion: long branch attraction and protist phylogeny\u003c/strong\u003e. \u003cem\u003eProtist \u003c/em\u003e2000, \u003cstrong\u003e151\u003c/strong\u003e(4):307-316.\u003c/li\u003e\n\u003cli\u003eBergsten J: \u003cstrong\u003eA review of long\u003c/strong\u003e\u003cstrong\u003e‐branch attraction\u003c/strong\u003e. \u003cem\u003eCladistics \u003c/em\u003e2005, \u003cstrong\u003e21\u003c/strong\u003e(2):163-193.\u003c/li\u003e\n\u003cli\u003eDegnan JH, Rosenberg NA: \u003cstrong\u003eGene tree discordance, phylogenetic inference and the multispecies coalescent\u003c/strong\u003e. \u003cem\u003eTrends in ecology \u0026amp; evolution \u003c/em\u003e2009, \u003cstrong\u003e24\u003c/strong\u003e(6):332-340.\u003c/li\u003e\n\u003cli\u003ePelser PB, Kennedy AH, Tepe EJ, Shidler JB, Nordenstam B, Kadereit JW, Watson LE: \u003cstrong\u003ePatterns and causes of incongruence between plastid and nuclear Senecioneae (Asteraceae) phylogenies\u003c/strong\u003e. \u003cem\u003eAmerican Journal of Botany \u003c/em\u003e2010, \u003cstrong\u003e97\u003c/strong\u003e(5):856-873.\u003c/li\u003e\n\u003cli\u003eSpicer RA: \u003cstrong\u003eTibet, the Himalaya, Asian monsoons and biodiversity \u0026ndash; In what ways are they related?\u003c/strong\u003e \u003cem\u003ePlant Diversity \u003c/em\u003e2017, \u003cstrong\u003e39\u003c/strong\u003e(5):233-244.\u003c/li\u003e\n\u003cli\u003eTada R, Zheng H, Clift PD: \u003cstrong\u003eEvolution and variability of the Asian monsoon and its potential linkage with uplift of the Himalaya and Tibetan Plateau\u003c/strong\u003e. \u003cem\u003eProgress in Earth and Planetary Science \u003c/em\u003e2016, \u003cstrong\u003e3\u003c/strong\u003e(1):4.\u003c/li\u003e\n\u003cli\u003eLu Z, Tian B, Liu B, YANG C, Liu J: \u003cstrong\u003eOrigin of Ostryopsis intermedia (Betulaceae) in the southeast Qinghai\u0026ndash;Tibet Plateau through hybrid speciation\u003c/strong\u003e. \u003cem\u003eJournal of Systematics and Evolution \u003c/em\u003e2014, \u003cstrong\u003e52\u003c/strong\u003e(3):250-259.\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"bmc-genomics","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"gics","sideBox":"Learn more about [BMC Genomics](http://bmcgenomics.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/gics","title":"BMC Genomics","twitterHandle":"#BMCGenomics","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"Genome survey, K-mer, Flow cytometry, SNP, Phylogenetic, Garuga forrestii, Nuclear-cytoplasmic conflict","lastPublishedDoi":"10.21203/rs.3.rs-3905007/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-3905007/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003ch2\u003eBackground\u003c/h2\u003e \u003cp\u003e \u003cem\u003eGaruga\u003c/em\u003e Roxb. is a genus endemic to southwest China and other tropical regions in Southeast Asia facing risk of extinction due to the loss of tropical forests and changes in land use. Conducting a genome survey of \u003cem\u003eG. forrestii\u003c/em\u003e contribute to a deeper understanding and conservation of the genus.\u003c/p\u003e\u003ch2\u003eResults\u003c/h2\u003e \u003cp\u003eThis study utilized genome survey of \u003cem\u003eG. forrestii\u003c/em\u003e generated approximately 54.56 GB of sequence data, with approximately 112 \u0026times; coverage. K-mer analysis indicated a genome size of approximately 0.48 GB, smaller than 0.52GB estimated by flow cytometry. The heterozygosity is of about 0.54%, and a repeat rate of around 51.54%. All the shotgun data were assembled into 339,729 scaffolds, with an N50 of 17,344 bp. The average content of guanine and cytosine was approximately 35.16%. A total of 330,999 SSRs were detected, with mononucleotide repeats being the most abundant at 70.16%, followed by dinucleotide repeats at 20.40%. A pseudo chromosome of \u003cem\u003eG. forrestii\u003c/em\u003e and a gemone of \u003cem\u003eBoswellia sacra\u003c/em\u003e were used as reference genome to perform a primer population resequencing analysis within three \u003cem\u003eGaruga\u003c/em\u003e species. PCA indicated three distinct groups, but genome wide phylogenetics represented conflicting both between the dataset of different reference genomes and between maternal and nuclear genome.\u003c/p\u003e\u003ch2\u003eConclusion\u003c/h2\u003e \u003cp\u003eIn summary, the genome of \u003cem\u003eG. forrestii\u003c/em\u003e is small, and the phylogenetic relationships within the \u003cem\u003eGaruga\u003c/em\u003e genus are complex. The genetic data presented in this study holds significant value for comprehensive whole-genome analyses, the evaluation of population genetic diversity, investigations into adaptive evolution, the advancement of artificial breeding efforts, and the support of species conservation and restoration initiatives. Ultimately, this research contributes to reinforcing the conservation and management of natural ecosystems, promoting biodiversity conservation, and advancing sustainable development.\u003c/p\u003e","manuscriptTitle":"Genome Survey Indicated Complex Evolutionary History of Garuga Roxb. Species","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-02-29 19:16:59","doi":"10.21203/rs.3.rs-3905007/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2024-03-21T03:55:02+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2024-03-21T03:17:37+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2024-03-12T05:37:49+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2024-03-07T08:49:46+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"eb67811f-65f5-4fd2-b671-5fa6058dfc5e","date":"2024-03-05T05:57:45+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"e1bf4956-0064-4bb2-8d1b-281b94d6ccb4","date":"2024-03-01T00:43:34+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"8ff5d606-6d2f-4961-bf12-b1e07c05c1b2_SNPRID","date":"2024-02-29T09:37:14+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2024-02-29T02:21:04+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2024-02-28T06:02:54+00:00","index":"","fulltext":""},{"type":"editorInvited","content":"","date":"2024-02-28T05:55:39+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2024-02-28T05:38:44+00:00","index":"","fulltext":""},{"type":"submitted","content":"BMC Genomics","date":"2024-01-28T07:26:53+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"bmc-genomics","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"gics","sideBox":"Learn more about [BMC Genomics](http://bmcgenomics.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/gics","title":"BMC Genomics","twitterHandle":"#BMCGenomics","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"3b3ad84d-b579-4246-8a25-7e096ad14931","owner":[],"postedDate":"February 29th, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[],"tags":[],"updatedAt":"2024-10-28T15:59:16+00:00","versionOfRecord":{"articleIdentity":"rs-3905007","link":"https://doi.org/10.1186/s12864-024-10917-8","journal":{"identity":"bmc-genomics","isVorOnly":false,"title":"BMC Genomics"},"publishedOn":"2024-10-23 15:57:02","publishedOnDateReadable":"October 23rd, 2024"},"versionCreatedAt":"2024-02-29 19:16:59","video":"","vorDoi":"10.1186/s12864-024-10917-8","vorDoiUrl":"https://doi.org/10.1186/s12864-024-10917-8","workflowStages":[]},"version":"v1","identity":"rs-3905007","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-3905007","identity":"rs-3905007","version":["v1"]},"buildId":"qtupq5eGEP_6zYnWcrvyt","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2024) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00