Genome-wide Survey of Crataegus scabrifolia Provides New Insights into Its Genetic Evolution and Adaptation Mechanisms | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Short Report Genome-wide Survey of Crataegus scabrifolia Provides New Insights into Its Genetic Evolution and Adaptation Mechanisms Baozheng Wang, Xien Wu, Dengli Luo, Jian Chen, Yingmin Zhang, and 2 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-4747077/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 09 Oct, 2024 Read the published version in Genetic Resources and Crop Evolution → Version 1 posted 7 You are reading this latest preprint version Abstract Crataegus scabrifolia is a significant botanical resource in Southwest China, renowned for its medicinal properties and high potential for development due to its rich medicinal components. However, genomic research on C. scabrifolia remains limited. This study conducted a comprehensive genome-wide survey of C. scabrifolia , employing flow cytometry in conjunction with genome K-mer analysis to assess its genomic characteristics in detail. Our findings reveal that despite a genome size similar to cultivated hawthorn ( Crataegus pinnatifida var. major), C. scabrifolia exhibits a significantly lower heterozygosity rate of 0.5% compared to 1.77% in cultivated varieties. Additionally, we identified transposable elements comprising 51.79% of the assembled genome, with retrotransposons accounting for 35.05% of the total genome. Transposon analysis elucidated the genomic characteristics of transposons in C. scabrifolia , suggesting a mode of increase similar to that observed in cultivated hawthorn. Furthermore, this study identified numerous SSR marker loci and annotated the functions of single-copy genes, providing insights into C. scabrifolia 's adaptive strategies and genetic stability under varying environmental conditions. These findings offer crucial tools and resources for further genotype selection, genetic analysis, and breeding improvements. Genome-wide survey Illumina sequencing Crataegus scabrifolia Transposable elements SSR Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Figure 9 Figure 10 1 Introduction Crataegus L., belonging to the Rosaceae family, subfamily Maloideae, is widely distributed throughout the northern hemisphere, with certain species being significant in both medicinal and culinary domains. In China, hawthorn boasts a long history of traditional medicinal and culinary use (Yang et al. 2022 ). Known for its sour-sweet taste, it is valued for its therapeutic effects including digestion aid, qi regulation, blood circulation improvement, and lipid reduction (Orhan 2018 ). Modern medical research has extensively documented hawthorn's pharmacological activities, particularly in cardiovascular health, encompassing cardiotonic, antiarrhythmic, hypotensive, hypolipidemic, and antioxidant properties. Presently, hawthorn is primarily employed in treating conditions such as congestive heart failure, hypertension, atherosclerosis, and digestive ailments (Chang et al. 2002 ). The southwestern region of China benefits from unique geographical and climatic conditions, fostering exceptionally rich plant diversity and serving as a prominent area for hawthorn distribution (Zhang et al. 2014 ). Yunnan province, situated in southwestern China, stands out as a principal cultivation area for C. scabrifolia , renowned locally for its medicinal use. Research indicates that C. scabrifolia exhibits high levels of bioactive compounds and potent pharmacological effects. For instance, C. scabrifolia fruits demonstrate significant scavenging activity against superoxide anion radicals in vitro (Zhou et al. 1999 ). Moreover, the total flavonoid content in C. scabrifolia fruits (84g/kg) surpasses that of cultivated hawthorn fruits (31g/kg) by more than double (Gao and Feng 1994 ), underscoring its substantial medicinal value and developmental potential. Currently, the most extensively studied and cultivated species of the Crataegus genus is the Crataegus pinnatifida var. major. However, research on the second most cultivated variety, the C. scabrifolia , is scarce and mainly focuses on the chemical composition of its fruits, their biological activity, and the evaluation of variety resources (Wu et al. 2014 ; Liu et al. 2010 ). Although some scholars have conducted chloroplast genomic studies on C. scabrifolia (Wu et al. 2022 ), its genetic foundation remains underexplored. Genomic research can uncover the most comprehensive genetic variation sites in an individual, providing precise and extensive information for studying plant origin and evolution, gene function analysis, and genetic breeding. Microsatellite markers, that is, simple sequence repeats (SSRs), developed based on the genome, are commonly used molecular markers in population genetics, gene targeting, and crop germplasm research. The identification of SSR loci can provide crucial data support for genetic improvement and breeding programs. To elucidate the genomic characteristics and potential genetic resources of C. scabrifolia , enhancing our understanding and facilitating future molecular breeding efforts. We employ flow cytometry and K-mer analysis based on whole-genome sequencing technology to estimate the genome size of C. scabrifolia and analyze its genomic characteristics. Using bioinformatics methods, transposons across the whole genome were identified and classified, followed by the extraction of microsatellite sequence information. Additionally, we conducted functional annotation and analysis of single-copy genes in the genome. These efforts aim to lay a solid foundation for further genomic analysis, gene discovery, and molecular genetic breeding. 2 Materials and methods 2.1 Plant and DNA sources The experimental materials were sourced from the Kunming Institute of Botany, Chinese Academy of Sciences (25°14′ N, 102°75′ E, elevation 1900 meters), utilizing fresh, tender leaves. A voucher specimen, C. scabrifolia (YUNCM5301260363), is deposited in the herbarium of Yunnan University of Traditional Chinese Medicine. Genomic DNA was extracted using the CTAB method (Doyle and Doyle 1987 ), and the DNA concentration and quality were assessed using a NanoDrop 2000 spectrophotometer and agarose gel electrophoresis. 2.2 Flow cytometry estimation of genome size Corn B73, sourced from the Kunming Institute of Botany, Chinese Academy of Sciences, was selected as the internal reference. Samples were placed in MG dissociation solution, finely chopped vertically, and left to stand for 10 minutes. The nuclei were then filtered and treated with propidium iodide (PI) and RNAase solution, stained on ice in the dark for 1 hour (Dolezel and Bartos 2005 ; Dolezel et al. 2007 ). The stained nuclei suspension samples were mixed with the nuclei suspension of the reference sample. The fluorescence intensity of the stained nuclei suspension samples was measured using a BD FACScalibur (BD Biosciences, USA) flow cytometer. The coefficient of variation (CV) was controlled within 5%, and the data were analyzed using Modifit 3.0 software (Tian et al. 2011 ). The DNA content of the C. scabrifolia sample was calculated by comparing the relative fluorescence intensity between the corn and C. scabrifolia samples using the formula: DNA content of C. scabrifolia sample = DNA content of corn sample × fluorescence intensity of C. scabrifolia sample / fluorescence intensity of corn sample. 2.3 DNA sequencing, genome size and ploidy estimation DNA samples were randomly sheared using an ultrasonic disruptor, targeting fragments approximately 350 bp in length. These fragments were then used to construct libraries for paired-end sequencing on the Illumina HiSeq platform. Sequencing data underwent quality assessment with Q20 and Q30 scores, and were subjected to filtering and quality control using fastp and FastQC software. Filtered data were further analyzed using a K-mer based approach (K = 19), and genome size, repetitive sequence content, and heterozygosity were estimated using findGSE (Sun et al. 2017) and genomescope (Ranallo-Benavidez et al. 2020 ). Ploidy estimation was performed using smudgeplot (Ranallo-Benavidez et al. 2020 ). 2.4 Genome assembly and GC content analysis The high-quality filtered sequences were assembled using SOAPdenovo2 (Luo et al. 2012 ) with K-mer = 59 and default parameters, followed by the calculation of GC content. The completeness of the genome assembly and annotation was then assessed using Busco v5 (Simão et al. 2015 ). The genome sequence of C. scabrifolia was analyzed and compared against a single-copy ortholog database of embryophytes, resulting in the determination of the coverage rate of the C. scabrifolia genome against single-copy ortholog genes in the database. 2.5 Species matching analysis of C. scabrifolia using the NT database To investigate the diversity of C. scabrifolia species and to assess whether the extracted sample DNA was contaminated, we conducted a comprehensive analysis using a constructed DNA library. From this library, we randomly selected 10,000 reads and performed BLAST (Basic Local Alignment Search Tool) comparisons against the NCBI nucleotide database (NT database). The origins of these sequences were determined based on their sequence similarity. 2.6 Identification and annotation of transposons In the process of transposon identification and annotation, we utilized RepeatModeler v.1.08 (Abrusán et al. 2009 ) to construct a comprehensive de novo repeat library, employing its default settings for initial training. Following this, we applied RepeatMasker v4.0.7 (Bedell et al. 2000 ) to systematically identify and classify repetitive sequences within the genomic data. RepeatMasker enabled detailed annotation by comparing the sequences against the custom repeat library generated in the previous step. 2.7 SSR site data mining We utilized SSRMMD software (Gou et al. 2020 ) to search for and enumerate SSR sites within the whole genome of C. scabrifolia . The criteria for counting were as follows: mononucleotide repeats were cataloged with a minimum of 10 consecutive nucleotides, dinucleotide repeats with at least 6 repeats, and trinucleotide, tetranucleotide, pentanucleotide, and hexanucleotide repeats were identified with a minimum of 5 repeats each. Additionally, for SSR comparison, we incorporated the cultivated hawthorn genome ( https://www.rosaceae.org/ ) and conducted SSR locus searches using the same parameters. 2.8 Gene annotation Single-copy genes within C. scabrifolia were systematically identified using Busco v5 (Simão et al. 2015 ). Functional annotation of these genes was conducted using eggNOG-mapper v2.1.5 software (Cantalapiedra et al. 2021 ), leveraging the eggNOG database to annotate EuKaryotic Orthologous Groups (KOG), Gene Ontology (GO), and Kyoto Encyclopedia of Genes and Genomes (KEGG). 3 Results 3.1 Genome size analysis of C. scabrifolia using flow cytometry The genome size of C. scabrifolia was estimated using maize (genome size of 2.3 Gb) as a reference. The particle clusters were distinct, clear, and concentrated (Fig. 1 ), indicating that the samples were well distinguished under the experimental conditions. The results showed that the fluorescence value for the maize internal standard peak was 61.85, while the fluorescence value for the C. scabrifolia peak was 22.76. The genome size of C. scabrifolia is 0.85Gb, which is 0.37 times that of corn. 3.2 DNA sequencing We sequenced the DNA of C. scabrifolia using the Illumina HiSeq PE platform, generating an initial dataset of 134 Gb. Following stringent quality control to remove low-quality reads, we secured 131 Gb of high-quality clean reads. The high fidelity of our sequencing is reflected in the Q20 and Q30 quality scores, which exceeded 96% and 90%, respectively, confirming the robustness of our sequencing data (Table 1 ). To further evaluate the sequencing quality, we plotted the distribution of base quality scores and base content at each read position (Fig. 2 , Fig. 3 ). The base quality scores predominantly ranged between 30 and 40. The content of bases A and T, as well as G and C, was consistently balanced, with negligible N content, indicating high-quality base calling and no significant bias. This comprehensive quality assessment underscores the reliability of our sequencing efforts. Table 1 Sequencing data statistics Item Stastics Raw reads(bp) 893,279,450 Raw bases(bp) 133,991,917,000 Clean reads(bp) 875,595,288 Clean bases(bp) 130,971,449,000 Q20(%) 96.32 Q30(%) 90.19 3.3 Genome size and ploidy estimation To estimate the genome size and ploidy of C. scabrifolia , we utilized K-mer analysis with K = 19. The resulting K-mer frequency distribution (Fig. 4 ) revealed a genome size of approximately 0.87 Gb. This analysis also showed that the genome comprises about 57.86% repetitive sequences and exhibits a moderate heterozygosity rate of approximately 0.5% (Table 2 ). Initial ploidy estimation identified C. scabrifolia as an allotetraploid (AABB-type) (Fig. 5 ). However, the K-mer distribution analysis suggests that the current genome structure is diploid, which may be the result of a relatively recent whole-genome duplication event. Table 2 Feature Statistics of C. scabrifolia genome sequences Item Stastics K-mer 19 K-mer numbers 110,890,578,828 Genome size(Gb) 0.87 Heterozygous ratio(%) 0.5 Repeat sequence ratio(%) 57.86 3.4 Genome assembly and GC content analysis We performed de novo genome assembly of C. scabrifolia using SOAPdenovo2 ( K-mer = 59). This process yielded 3,516,690 raw contigs with a total length of 866,848,161 bp and a contig N50 of 306 bp. The final assembled genome comprised 2,361,006 scaffolds with a total length of 874,017,016 bp and a scaffold N50 of 3,587 bp. The GC content of the assembled genome was 37.98% (Table 3 ). Using BUSCO (Benchmarking Universal Single-Copy Orthologs) to evaluate the completeness of the genome assembly annotation, we found that 87.6% of the genes were covered, with 67.8% identified as single-copy genes and 19.8% as multi-copy genes (Fig. 6 ). We also performed a correlation analysis between K-mer depth and GC content in the sequencing results (Fig. 7 ). The x-axis represents K-mer depth (k = 27), while the y-axis represents the product of K-mer depth and GC content. The distribution shows no significant GC bias in sequencing, with sequencing depth primarily concentrated in the 50x-150x range, indicating no contamination during sequencing. The average depth was 126x, with a minor sub-peak around 63x, which, when analyzed alongside Fig. 4 , may be attributed to heterozygosity within the genome. Table 3 Genome Assembly Information of C. scabrifolia Item Stastics Contigs Number of sequences 3,516,690 Total length 866,848,161 Max length 24,978 N50 length 306 N90 length 115 Scaffolds Number of sequences 2,361,006 Total length 874,017,016 Max length 97,654 N50 length 3,587 N90 length 116 GC content 37.98% 3.5 Analysis of species matching in the C. scabrifolia NT database To explore the genetic diversity of C. scabrifolia species and ensure the integrity of the extracted DNA samples, we conducted a comprehensive comparison of genomic fragment data. From the filtered high-quality data, we randomly selected 10,000 reads and subjected them to BLAST analysis against the NT database. The analysis revealed that Crataegus laevigata, had the highest match rate of 81.63% with the reads from the C. scabrifolia genome. This finding is significant, as both species belong to the Crataegus genus, indicating a close genetic relationship. Furthermore, other matching results included reads from various plants such as Malus domestica (apple), Pyrus communis (European pear), and Eriobotrya japonica (loquat), all of which are members of the Rosaceae family. This consistent match with other Rosaceae species supports the reliability of our genomic data. Importantly, our analysis did not identify any reads matching animal or microbial genomes, thus confirming the absence of contamination in our samples. Table 4 Comparison results of NTdatabase Species Family Percentage (%) Crataegus laevigata Rosaceae 81.63% Malus domestica Rosaceae 7.84% Malus hybrid 'Flame' Rosaceae 2.97% Pyrus communis Rosaceae 2.64% Malus sylvestris Rosaceae 1.04% Pyrus salicifolia Rosaceae 0.99% Malus × robusta Rosaceae 0.30% Crataegus cuneata Rosaceae 0.25% Eriobotrya japonica Rosaceae 0.23% Pyrus bretschneideri Rosaceae 0.13% 3.6 Identification and annotation of transposons In our study of the C. scabrifolia genome, we identified a total of 2,361,006 transposon sequences, which collectively represent 51.79% of the entire genome. Among these transposons, long terminal repeat retrotransposons (LTRs) were the most prevalent, making up 35.05% of the genome. This predominance underscores the significant role LTRs play in shaping the genomic architecture and evolutionary history of the C. scabrifolia . On the other hand, DNA transposons were found to be much less common, constituting only 5.73% of the genome. Table 5 Transposon statistics Class Order Superfamily Counts length (bp) percentage of sequence Retrotransposon LTR Copia 481,524 91,958,174 10.52 Gypsy 1,123,183 204,779,784 23.43 Retroviral 803 95,981 0.01 LINE L1 32,076 9,627,386 1.10 L2 869 80,960 0.01 Jockey 58 9,011 0.01 RTE 25,221 3,808,486 0.44 SINE SINE 33,693 3,908,094 0.45 DNA transposons DNA hAT 213,435 27,884,436 3.19 Tc1-IS630-Pogo 903 126,471 0.01 PiggyBac 393 37,577 0.01 Harbinger 116,540 15,557,650 1.78 Rolling-circles Rolling-circles 35,125 6,451,050 0.74 Unclassified 899,729 88,162,902 10.09 3.7 SSR data mining In our de novo search for SSR loci within the C. scabrifolia genome, we identified a total of 493,829 SSR loci. The most common type was mononucleotide repeats (281,350 loci, 56.97%), followed by dinucleotide repeats (174,220 loci, 35.28%), trinucleotide repeats (31,646 loci, 6.41%), tetranucleotide repeats (3,963 loci, 0.80%), pentanucleotide repeats (2,069 loci, 0.42%), and hexanucleotide repeats (581 loci, 0.12%). Similarly, SSR locus searches in cultivated hawthorn identified 402,799 SSR loci with a distribution pattern mirroring that of the C. scabrifolia . Mononucleotide repeats were predominant, comprising 55.54% (223,698 loci), followed by dinucleotide repeats with 35.92% (144,694 loci), and trinucleotide repeats with 6.74% (27,155 loci). Tetranucleotide repeats were present at 0.83% (3,336 loci), pentanucleotide repeats at 0.77% (3,114 loci), and hexanucleotide repeats at 0.20% (802 loci). 3.8 Gene annotation A total of 996 protein-coding genes in the KOG database have been annotated, accounting for 91.04% of the total predicted protein-coding genes. The KOG annotation statistics indicate that, excluding those with unknown functions (S), the most annotated COG function in the C. scabrifolia genome is replication, recombination, and repair (L), with 100 protein-coding genes, representing 10% of all protein-coding genes. The next most annotated functions are posttranslational modification, protein turnover, chaperones (O) with 86 genes (8.63%), translation, ribosomal structure, and biogenesis (J) with 79 genes (7.93%), RNA processing and modification (A) with 68 genes (6.83%), and carbohydrate transport and metabolism (G) with 43 genes (4.32%). The Gene Ontology (GO) project was developed to address inconsistencies in gene function definitions across different databases and species, aiming to provide a unified and standardized gene function annotation system. In the C. scabrifolia genome, 627 genes are annotated in the GO database, which is 57.31% of the total predicted genes. Among the three major categories of the GO database—Molecular Function (MF), Biological Process (BP), and Cellular Components (CC)—the most enriched categories are cellular anatomical entity, intracellular, cellular process, metabolic process, and catalytic activity, with 545, 545, 518, 459, and 325 genes, respectively. The Kyoto Encyclopedia of Genes and Genomes (KEGG) is one of the databases used to understand higher-order functional and biological systems such as cells, organisms, and ecosystems, as well as for studying pathways. In the C. scabrifolia genome, 301 genes are annotated in the KEGG database, accounting for 27.51% of the total predicted genes. The metabolism category has the highest number of enriched genes with 156. The next most enriched pathways are translation, replication and repair, and metabolism of cofactors and vitamins. 4 Discussion Genome size, also known as DNA C-value, refers to the amount of DNA contained in a gamete of an organism. The C-value varies significantly among different species and can be used to assess biological characteristics of plants. In this study, we estimated the genome size of C. scabrifolia to be 870 Mb with a heterozygosity rate of 0.5% and a repeat sequence proportion of 57.86%, using flow cytometry combined with genome survey-based K-mer analysis. The genome size of cultivated hawthorn was determined to be 856.88 Mb, with a heterozygosity rate of 1.77% and a repeat sequence proportion of 67.89%. Although the genome sizes of the two hawthorn species are similar, the heterozygosity rate of C. scabrifolia is lower than that of cultivated hawthorn. This lower heterozygosity in C. scabrifolia may be attributed to its geographically restricted distribution, which limits gene flow between populations and results in lower genetic diversity (Favre et al. 2022 ). In contrast, cultivated hawthorn has a wider distribution and has undergone extensive artificial selection and hybridization to enhance yield, disease resistance, and adaptability, thus increasing its heterozygosity (Zhuang et al. 2022 ). The higher proportion of repeat sequences in the cultivated hawthorn genome compared to the C. scabrifolia genome is likely due to the sequencing techniques used. Our study utilized only second-generation sequencing data, which has shorter read lengths and is less effective at accurately assembling highly repetitive genomic regions. Consequently, the identification of repeat sequences in the C. scabrifolia genome may be incomplete. In contrast, the genome of cultivated hawthorn was sequenced using a combination of second-generation and third-generation sequencing technologies, which provide longer read lengths that can span repetitive regions, resulting in a more comprehensive genome assembly. Thus, the lower proportion of detected repeat sequences in the C. scabrifolia genome can be attributed to the limitations of the sequencing technology employed in this study. To further study the species diversity of C. scabrifolia , the top 10,000 reads from sequencing were extracted and aligned with the NT database using BLAST. The results indicated that among the published sequences, C. scabrifolia shares the highest read match rate with C. laevigata , indicating the closest phylogenetic relationship, followed by a closer relationship with apples. Large-scale genome sequencing technology has provided new opportunities for the study of transposons. Studies have shown that although the content of transposons varies among species, their content is closely related to genome size, showing a positive correlation. It is generally believed that genome size is influenced by both the increase and deletion of DNA content, with an increase in transposon copies being a significant factor in genome enlargement (Hawkins et al. 2008 ). Analysis of the C. scabrifolia genome revealed various types of transposons, among which LTRs are the predominant form, accounting for 51.79%. In contrast, DNA transposons are less common, making up only 5.73%. The high proportion of LTRs suggests that they may be involved in genome size expansion and functional diversification through mechanisms such as gene duplication, insertion mutations, and gene expression regulation (Vitte C and Panaud O 2003 ). Conversely, the relatively low presence of DNA transposons might reflect their different evolutionary pressures and reproduction mechanisms (Feschotte C and Pritham EJ 2007). Understanding the diverse roles and impacts of these transposons is crucial for elucidating the genetic and evolutionary processes that shape the genome of C. scabrifolia . Based on Fig. 7 , it is inferred that the increase in transposon copy numbers in the C. scabrifolia genome likely occurred during a specific period rather than through multiple independent events, resembling the transposon insertion patterns of cultivated hawthorn (Zhang et al. 2022 ). Breeding research on C. scabrifolia has been relatively limited, but molecular marker technology can facilitate the selection and preservation of genotypes. Genomic SSR markers are highly polymorphic molecular markers known for their co-dominant inheritance, excellent reproducibility, and stability. These characteristics make SSR markers highly valuable for genomic research, genetic analysis, and breeding improvement (Lei et al. 2021 ). Due to the lack of genomic data specific to Crataegus species, previous studies have utilized SSR loci developed from apple and pear genomes to analyze various hawthorn genotypes (Güney et al. 2018 ). In this study, we identified 493,829 SSR loci in the C. scabrifolia genome, providing a comprehensive molecular marker resource for genotype selection and preservation. These newly identified markers are expected to significantly enhance the precision and effectiveness of related research. The distribution patterns of SSR loci in C. scabrifolia were found to be similar to those in cultivated hawthorn, indicating significant genomic stability within the Crataegus genus (Li et al. 2002 ). The observed differences in SSR sequence abundance between C. scabrifolia and cultivated hawthorn could be attributed to selective pressures or genetic drift (Bagshaw 2017 ). Investigating the genomic and evolutionary mechanisms underlying these differences will offer valuable insights into the genetic diversity and adaptability of hawthorn species. Single-copy genes, due to their high conservation and functional importance, exhibit unique value in phylogenetic analysis and gene function studies (Li et al. 2003 ). These genes are typically unique within the genome and are integral to fundamental biological processes. They maintain a high degree of conservation across species evolution, reflecting their essential roles. In our study, we conducted a comprehensive search for single-copy genes within the genome of C. scabrifolia and subsequently annotated them using the KOG, GO, and KEGG databases. The categorization of these annotated genes revealed their involvement in crucial biological functions, with significant classifications including "Replication, Recombination, and Repair," "Cellular Organization and Structure," and "Metabolism." These genes are integral to maintaining genomic stability, ensuring cellular structural complexity, and regulating metabolic processes. This underscores the adaptive strengths of C. scabrifolia in varied and stressful environments, potentially aiding in its survival and reproduction under adverse conditions (Chinnusamy et al. 2009). Detailed analysis of these genes' expression patterns can offer significant insights into their dynamic regulatory mechanisms across different environmental contexts. Such insights are crucial for advancing genetic improvement, informing conservation strategies, and supporting further functional genomics research. 5 Conclusion This research, through the inaugural comprehensive analysis of the genomic characteristics and pivotal genetic elements of C. scabrifolia , unveils significant advantages related to genome stability and adaptability. It provides new insights into its genetic evolution and environmental adaptation mechanisms and offers valuable theoretical foundations for future functional genomics research, molecular breeding, and conservation management. Declarations Acknowledgements The authors thank Guodong Li and Ticao Zhang for their assistance with the molecular laboratory work. Additionally, we appreciate the bioinformatics high-performance computing server at Yunnan University of Traditional Chinese Medicine for providing computational resources for data analysis. Author Contributions TCZ and GDL conceptualized the study; XEW and DLL collected the experimental materials; BZW, JC, and YMZ participated in the analysis of the experimental results; BZW drafted the manuscript. All authors have read and agreed to the published version of the manuscript. Funding This research was jointly funded by the National Natural Science Foundation of China Regional Project (32260094), the Yunnan Provincial Traditional Chinese Medicine Joint Key Project (202101AZ070001-166), and the Yunnan Provincial Major Science and Technology Special Project (202102AE090031). Conflict of interest The authors declare that there is no conflict of interest. Consent for publication All authors participated in, read and approved the final version of the article before publication. Ethical approval Not applicable. References Abrusán G, Grundmann N, DeMester L, Makalowski W (2009) TEclass–a tool for automated classification of unknown eukaryotic transposable elements. Bioinformatics 25:1329–1330. https://doi.org/10.1093/bioinformatics/btp084 Bagshaw ATM (2017) Functional Mechanisms of Microsatellite DNA in Eukaryotic Genomes. Genome Biol Evol 9:2428–2443. https://doi.org/10.1093/gbe/evx164 Bedell JA, Korf I, Gish W (2000) MaskerAid: a performance enhancement to RepeatMasker. Bioinformatics 16:1040–1041. https://doi.org/10.1093/bioinformatics/16.11.1040 Chinnusamy V, Zhu JK (2009) Epigenetic regulation of stress responses in plants. Curr Opin Plant Biol 12:133–139. https://doi.org/10.1016/j.pbi.2008.12.006 Chang Q, Zuo Z, Harrison F, Chow MS (2002) Hawthorn. J Clin Pharmacol 42:605–612. https://doi.org/10.1177/00970002042006003 Chen W, Hasegawa DK, Arumuganathan K et al (2015) Estimation of the Whitefly Bemisia tabaci Genome Size Based on k-mer and Flow Cytometric Analyses. Insects 6:704–715. https://doi.org/10.3390/insects6030704 Cantalapiedra CP, Hernández-Plaza A, Letunic I et al (2021) eggNOG-mapper v2: Functional Annotation, Orthology Assignments, and Domain Prediction at the Metagenomic Scale. Mol Biol Evol 38:5825–5829. https://doi.org/10.1093/molbev/msab293 Dolezel J, Bartos J (2005) Plant DNA flow cytometry and estimation of nuclear genome size. Ann Bot 95:99–110. https://doi.org/10.1093/aob/mci005 Doyle JJ, Doyle JL (1987) A rapid DNA isolation procedure for small quantities of fresh leaf tissue. Phytochemistry 19:11–15 Dolezel J, Greilhuber J, Suda J (2007) Estimation of nuclear DNA content in plants using flow cytometry. Nat Protoc 2:2233–2244. https://doi.org/10.1038/nprot.2007.310 Feschotte C, Pritham EJ (2007) DNA transposons and the evolution of eukaryotic genomes. Annu Rev Genet 41:331–368. https://doi.org/10.1146/annurev.genet.40.110405.090448 Favre F, Jourda C, Grisoni M et al (2022) A genome-wide assessment of the genetic diversity, evolution and relationships with allied species of the clonally propagated crop Vanilla planifolia Jacks. ex Andrews. Genet Resour Crop Evol 69:2125–2139. https://doi.org/10.1007/s10722-022-01362-1 Gao GY, Feng YX (1994) Pharmacognocy and resource utilization of Yunnan-Hawthorn. The Chinese Pharmaceutical Journal 06:329–331 Güney M, Kafkas S, Keles H et al (2018) Characterization of hawthorn ( Crataegus spp.) genotypes by SSR markers. Physiol Mol Biol Plants 24:1221–1230. https://doi.org/10.1007/s12298-018-0604-6 Gou X, Shi H, Yu S et al (2020) SSRMMD: A Rapid and Accurate Algorithm for Mining SSR Feature Loci and Candidate Polymorphic SSRs Based on Assembled Sequences. Front Genet 11:706. https://doi.org/10.3389/fgene.2020.00706 Hawkins JS, Grover CE, Wendel JF (2008) Repeated big bangs and the expanding universe: Directionality in plant genome size evolution. Plant Science 174:557–562. https://doi.org/10.1016/j.plantsci.2008.03.015 Li YC, Korol AB, Fahima T et al (2002) Microsatellites: genomic distribution, putative functions and mutational mechanisms: a review. Mol Ecol 11:2453–2465 Li L, Stoeckert CJ Jr, Roos DS (2003) OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res 13:2178–2189. https://doi.org/10.1101/gr.1224503 Liu P, Kallio H, Lü D et al (2010) Acids, sugars, and sugar alcohols in Chinese hawthorn (Crataegus spp.) fruits. J Agric Food Chem 58:1012–1019. https://doi.org/10.1021/jf902773v Luo R, Liu B, Xie Y, et al (2012) SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. GigaScience 1:18. https://doi.org/10.1186/2047-217X-1-18 Lei Y, Zhou Y, Price M, Song Z (2021) Genome-wide characterization of microsatellite DNA in fishes: survey and analysis of their abundance and frequency in genome-specific regions. BMC Genomics 22:421. https://doi.org/10.1186/s12864-021-07752-6 Orhan IE (2018) Phytochemical and Pharmacological Activity Profile of Crataegus oxyacantha L. (Hawthorn) - A Cardiotonic Herb. Curr Med Chem 25:4854–4865. https://doi.org/10.2174/0929867323666160919095519 Ranallo-Benavidez TR, Jaron KS, Schatz MC (2020) GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat Commun 11:1432. https://doi.org/10.1038/s41467-020-14998-3 Simão FA, Waterhouse RM, Ioannidis P et al (2015) BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31:3210–3212. https://doi.org/10.1093/bioinformatics/btv351 Sun H, Ding J, Piednoël M, Schneeberger K (2018) findGSE: estimating genome size variation within human and Arabidopsis using k-mer frequencies. Bioinformatics 34:550–557. https://doi.org/10.1093/bioinformatics/btx637 Tian XM, Zhou XY, Gong N (2011) Applications of Flow Cytometry in Plant Research–Analysis of Nuclear DNA Content and Ploidy Level in Plant Cells. Chinese Agricultural Science Bulletin 27: 21–27 Vitte C, Panaud O (2003) Formation of solo-LTRs through unequal homologous recombination counterbalances amplifications of LTR retrotransposons in rice Oryza sativa L. Mol Biol Evol 20:528–540. https://doi.org/10.1093/molbev/msg055 Wu J, Peng W, Qin R, Zhou H (2014) Crataegus pinnatifida : chemical constituents, pharmacology, and potential applications. Molecules 19:1685–1712. https://doi.org/10.3390/molecules19021685 Wu X, Luo D, Zhang Y, et al (2022) Comparative Genomic and Phylogenetic Analysis of Chloroplast Genomes of Hawthorn ( Crataegus spp.) in Southwest China. Front Genet 13:900357. https://doi.org/10.3389/fgene.2022.900357 Xie T, Zheng JF, Liu S, et al (2015) De novo plant genome assembly based on chromatin interactions: a case study of Arabidopsis thaliana . Mol Plant 8:489–492. https://doi.org/10.1016/j.molp.2014.12.015 Yang YC, Wang EH, Wang JQ et al (2022) History of Traditional Chinese Medicine Crataegi Fructus. Asia-Pacific Traditional Medicine 18:157–163 Zhou QP, Wang LW, Gao GY (1999) Stuty on Antioxidative and Decreasing Blood-fat Effect in Four Kinds of Fructus Crataegi . Research and Practice on Chinese Medicines 03:3–5 Zhang HP, Zhang JY, Liu QL (2014) Research progress on hawthorn germplasm resources and breeding varieties in China. China Seed Industry 02:15–17 Zhuang Y, Li X, Hu J et al (2022) Expanding the gene pool for soybean improvement with its wild relatives. aBIOTECH 3:115–125. https://doi.org/10.1007/s42994-022-00072-7 Zhang T, Qiao Q, Du X et al (2022) Cultivated hawthorn ( Crataegus pinnatifida var. major) genome sheds light on the evolution of Maleae (apple tribe). J Integr Plant Biol 64:1487–1501. https://doi.org/10.1111/jipb.13318 Additional Declarations No competing interests reported. Cite Share Download PDF Status: Published Journal Publication published 09 Oct, 2024 Read the published version in Genetic Resources and Crop Evolution → Version 1 posted Editorial decision: Revision requested 08 Aug, 2024 Reviews received at journal 08 Aug, 2024 Reviewers agreed at journal 21 Jul, 2024 Reviewers invited by journal 20 Jul, 2024 Editor assigned by journal 17 Jul, 2024 Submission checks completed at journal 17 Jul, 2024 First submitted to journal 16 Jul, 2024 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-4747077","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Short Report","associatedPublications":[],"authors":[{"id":337764267,"identity":"28d6bed8-2b22-4e31-bdd4-f0047ec21312","order_by":0,"name":"Baozheng Wang","email":"","orcid":"","institution":"Yunnan University of Chinese Medicine college of traditional Chinese medicine","correspondingAuthor":false,"prefix":"","firstName":"Baozheng","middleName":"","lastName":"Wang","suffix":""},{"id":337764268,"identity":"45f7bf3c-8893-449f-8e1b-7f8e7536cc38","order_by":1,"name":"Xien Wu","email":"","orcid":"","institution":"Yunnan University of Chinese Medicine college of traditional Chinese medicine","correspondingAuthor":false,"prefix":"","firstName":"Xien","middleName":"","lastName":"Wu","suffix":""},{"id":337764269,"identity":"779f3f91-0430-4bc6-8080-74e32f2c75c7","order_by":2,"name":"Dengli Luo","email":"","orcid":"","institution":"Yunnan University of Chinese Medicine college of traditional Chinese medicine","correspondingAuthor":false,"prefix":"","firstName":"Dengli","middleName":"","lastName":"Luo","suffix":""},{"id":337764270,"identity":"f85a63e9-f6b3-4277-9947-4cbe4467dd95","order_by":3,"name":"Jian Chen","email":"","orcid":"","institution":"Yunnan University of Chinese Medicine college of traditional Chinese medicine","correspondingAuthor":false,"prefix":"","firstName":"Jian","middleName":"","lastName":"Chen","suffix":""},{"id":337764271,"identity":"5c7ff268-18fe-4c3b-b5a9-9b6686fbb67b","order_by":4,"name":"Yingmin Zhang","email":"","orcid":"","institution":"Yunnan University of Chinese Medicine college of traditional Chinese medicine","correspondingAuthor":false,"prefix":"","firstName":"Yingmin","middleName":"","lastName":"Zhang","suffix":""},{"id":337764272,"identity":"43beb649-731e-4336-a0bf-70d1d9ef6aac","order_by":5,"name":"Guodong Li","email":"","orcid":"","institution":"Yunnan University of Chinese Medicine college of traditional Chinese medicine","correspondingAuthor":false,"prefix":"","firstName":"Guodong","middleName":"","lastName":"Li","suffix":""},{"id":337764273,"identity":"526def65-7935-4d00-9fcd-17e048819b65","order_by":6,"name":"Ticao Zhang","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA1UlEQVRIie3PsQqCQBzH8QvhXE7cQnHwFRShCApfReklGh2iKWg1eoWGG2v7y381XIUaiqAtOGjMIXSo8a4t6D7DccPvC3eEaNovMroT2huAmCkU7JPQtMhLlYS8ExahNVdIYtO4CNGc/KG9FWBlxLf7IHsYjdyc3cJ9fufg7ki43iSyhBGPOdjj9YFDWJIkOMoT48kCjHldniFdqCXUYwmmvFoSKJQSpINRDjjlNQ2KrHTkfzFXeK1FgxNe4fXRzMa+7UmSD6dbOqrzlg3frDVN0/7JC7PeSoij4AEuAAAAAElFTkSuQmCC","orcid":"","institution":"Kunming Institute of Botany Chinese Academy of Sciences","correspondingAuthor":true,"prefix":"","firstName":"Ticao","middleName":"","lastName":"Zhang","suffix":""}],"badges":[],"createdAt":"2024-07-16 05:03:47","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-4747077/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-4747077/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1007/s10722-024-02186-x","type":"published","date":"2024-10-09T15:57:22+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":62453584,"identity":"29108bcd-9ca8-4f76-97b5-aa0c0e4961b1","added_by":"auto","created_at":"2024-08-14 11:08:23","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":434725,"visible":true,"origin":"","legend":"\u003cp\u003eFlow cytometry analysis\u003c/p\u003e","description":"","filename":"Fig1.png","url":"https://assets-eu.researchsquare.com/files/rs-4747077/v1/0064ceb58db1136147e05849.png"},{"id":62455104,"identity":"563382e4-084a-4535-b530-10fd523e9e27","added_by":"auto","created_at":"2024-08-14 11:24:23","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":213297,"visible":true,"origin":"","legend":"\u003cp\u003eBase mass distribution diagram. \u003cstrong\u003ea\u003c/strong\u003e Quality scores across all bases in read1. \u003cstrong\u003eb\u003c/strong\u003e Quality scores across all bases in read2\u003c/p\u003e","description":"","filename":"Fig2.png","url":"https://assets-eu.researchsquare.com/files/rs-4747077/v1/7d592b0db4802eb67b0c4d60.png"},{"id":62453583,"identity":"5df914df-6406-4640-b542-8fa543d66706","added_by":"auto","created_at":"2024-08-14 11:08:23","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":98450,"visible":true,"origin":"","legend":"\u003cp\u003eDistribution of base content. \u003cstrong\u003ea\u003c/strong\u003e Sequence content across all bases in read 1. \u003cstrong\u003eb\u003c/strong\u003eSequence content across all bases in read 2\u003c/p\u003e","description":"","filename":"Fig3.png","url":"https://assets-eu.researchsquare.com/files/rs-4747077/v1/62397757e91ab4927c8e2fd1.png"},{"id":62454301,"identity":"2cf807a6-f2e4-413f-904e-674b3f450842","added_by":"auto","created_at":"2024-08-14 11:16:23","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":69666,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eK-mer\u003c/em\u003efrequency distribution (K=19)\u003c/p\u003e","description":"","filename":"Fig4.png","url":"https://assets-eu.researchsquare.com/files/rs-4747077/v1/202bce9d8abce810b0070ddb.png"},{"id":62454304,"identity":"229c71ba-8233-405f-b00c-c3c49b73208f","added_by":"auto","created_at":"2024-08-14 11:16:24","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":73559,"visible":true,"origin":"","legend":"\u003cp\u003eC\u003cem\u003e. scabrifolia\u003c/em\u003e ploidy analysis\u003c/p\u003e","description":"","filename":"Fig5.png","url":"https://assets-eu.researchsquare.com/files/rs-4747077/v1/7a6831516ad149fe60993652.png"},{"id":62453592,"identity":"8def51e9-6162-4518-a698-eae5ff333c3c","added_by":"auto","created_at":"2024-08-14 11:08:24","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":42197,"visible":true,"origin":"","legend":"\u003cp\u003eBUSCO assessment\u003c/p\u003e","description":"","filename":"Fig6.png","url":"https://assets-eu.researchsquare.com/files/rs-4747077/v1/54a5aa54655efb934a28e5bc.png"},{"id":62453591,"identity":"68614a4f-2686-4953-a2d1-4594f14b2a94","added_by":"auto","created_at":"2024-08-14 11:08:24","extension":"png","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":146374,"visible":true,"origin":"","legend":"\u003cp\u003eDistribution of GC content and average sequencing depth\u003c/p\u003e","description":"","filename":"Fig7.png","url":"https://assets-eu.researchsquare.com/files/rs-4747077/v1/73bfbe171326e1e80269fb38.png"},{"id":62453589,"identity":"4e4d5b8b-a6ee-4a97-b2ab-8d32a682c367","added_by":"auto","created_at":"2024-08-14 11:08:24","extension":"png","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":55091,"visible":true,"origin":"","legend":"\u003cp\u003eTransposon distribution statistics. \u003cstrong\u003ea\u003c/strong\u003e Dispersed distribution of transposable elements in the \u003cem\u003eC. scabrifolia\u003c/em\u003e genome. Kimura substitution levels (CpG adjusted) are provided, with each repeat type represented as a percentage of the genome shown on the y-axis. \u003cstrong\u003eb\u003c/strong\u003e Distribution of transposon types in \u003cem\u003eC. scabrifolia\u003c/em\u003e\u003c/p\u003e","description":"","filename":"Fig8.png","url":"https://assets-eu.researchsquare.com/files/rs-4747077/v1/0474c0733760bd53fb1ae6c8.png"},{"id":62454302,"identity":"3e269b7d-4398-463e-ba09-50d4f1281bac","added_by":"auto","created_at":"2024-08-14 11:16:23","extension":"png","order_by":9,"title":"Figure 9","display":"","copyAsset":false,"role":"figure","size":70238,"visible":true,"origin":"","legend":"\u003cp\u003eQuantities and distribution of different SSR types in\u003cstrong\u003e \u003c/strong\u003e\u003cem\u003eC. scabrifolia\u003c/em\u003e. \u003cstrong\u003ea\u003c/strong\u003e Distribution of Various Repeat Types. \u003cstrong\u003eb\u003c/strong\u003e Repeat frequency of different mononucleotides. \u003cstrong\u003ec\u003c/strong\u003e Repeat frequency of different dinucleotides.\u003cstrong\u003ed\u003c/strong\u003e Repeat frequency of different trinucleotides\u003c/p\u003e","description":"","filename":"Fig9.png","url":"https://assets-eu.researchsquare.com/files/rs-4747077/v1/539d145112f93bcc1f62776e.png"},{"id":62453587,"identity":"00dcd156-8126-4932-a58d-bb751ba3d25e","added_by":"auto","created_at":"2024-08-14 11:08:24","extension":"png","order_by":10,"title":"Figure 10","display":"","copyAsset":false,"role":"figure","size":245642,"visible":true,"origin":"","legend":"\u003cp\u003eFunctional Classification of All Predicted Single-Copy Genes. \u003cstrong\u003ea\u003c/strong\u003e KOG Annotations of Predicted Single-Copy Genes. \u003cstrong\u003eb\u003c/strong\u003e GO Annotations of Predicted Single-Copy Genes. \u003cstrong\u003ec\u003c/strong\u003eKEGG Annotations of Predicted Single-Copy Genes\u003c/p\u003e","description":"","filename":"Fig10.png","url":"https://assets-eu.researchsquare.com/files/rs-4747077/v1/8bd69f0026f2c6c7b72ea04e.png"},{"id":66597121,"identity":"1e3e6a47-5037-44e3-88ee-4603da06a092","added_by":"auto","created_at":"2024-10-14 16:07:24","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":2126004,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-4747077/v1/6920efd5-e3d6-4e8b-b152-7d2af4887d42.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Genome-wide Survey of Crataegus scabrifolia Provides New Insights into Its Genetic Evolution and Adaptation Mechanisms","fulltext":[{"header":"1 Introduction","content":"\u003cp\u003e \u003cem\u003eCrataegus\u003c/em\u003e L., belonging to the Rosaceae family, subfamily Maloideae, is widely distributed throughout the northern hemisphere, with certain species being significant in both medicinal and culinary domains. In China, hawthorn boasts a long history of traditional medicinal and culinary use (Yang et al. \u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). Known for its sour-sweet taste, it is valued for its therapeutic effects including digestion aid, qi regulation, blood circulation improvement, and lipid reduction (Orhan \u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e2018\u003c/span\u003e). Modern medical research has extensively documented hawthorn's pharmacological activities, particularly in cardiovascular health, encompassing cardiotonic, antiarrhythmic, hypotensive, hypolipidemic, and antioxidant properties. Presently, hawthorn is primarily employed in treating conditions such as congestive heart failure, hypertension, atherosclerosis, and digestive ailments (Chang et al. \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e2002\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eThe southwestern region of China benefits from unique geographical and climatic conditions, fostering exceptionally rich plant diversity and serving as a prominent area for hawthorn distribution (Zhang et al. \u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e2014\u003c/span\u003e). Yunnan province, situated in southwestern China, stands out as a principal cultivation area for \u003cem\u003eC. scabrifolia\u003c/em\u003e, renowned locally for its medicinal use. Research indicates that \u003cem\u003eC. scabrifolia\u003c/em\u003e exhibits high levels of bioactive compounds and potent pharmacological effects. For instance, \u003cem\u003eC. scabrifolia\u003c/em\u003e fruits demonstrate significant scavenging activity against superoxide anion radicals in vitro (Zhou et al. \u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e1999\u003c/span\u003e). Moreover, the total flavonoid content in \u003cem\u003eC. scabrifolia\u003c/em\u003e fruits (84g/kg) surpasses that of cultivated hawthorn fruits (31g/kg) by more than double (Gao and Feng \u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e1994\u003c/span\u003e), underscoring its substantial medicinal value and developmental potential.\u003c/p\u003e \u003cp\u003eCurrently, the most extensively studied and cultivated species of the \u003cem\u003eCrataegus\u003c/em\u003e genus is the \u003cem\u003eCrataegus pinnatifida\u003c/em\u003e var. major. However, research on the second most cultivated variety, the \u003cem\u003eC. scabrifolia\u003c/em\u003e, is scarce and mainly focuses on the chemical composition of its fruits, their biological activity, and the evaluation of variety resources (Wu et al. \u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e2014\u003c/span\u003e; Liu et al. \u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e2010\u003c/span\u003e). Although some scholars have conducted chloroplast genomic studies on \u003cem\u003eC. scabrifolia\u003c/em\u003e (Wu et al. \u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e2022\u003c/span\u003e), its genetic foundation remains underexplored. Genomic research can uncover the most comprehensive genetic variation sites in an individual, providing precise and extensive information for studying plant origin and evolution, gene function analysis, and genetic breeding. Microsatellite markers, that is, simple sequence repeats (SSRs), developed based on the genome, are commonly used molecular markers in population genetics, gene targeting, and crop germplasm research. The identification of SSR loci can provide crucial data support for genetic improvement and breeding programs.\u003c/p\u003e \u003cp\u003eTo elucidate the genomic characteristics and potential genetic resources of \u003cem\u003eC. scabrifolia\u003c/em\u003e, enhancing our understanding and facilitating future molecular breeding efforts. We employ flow cytometry and \u003cem\u003eK-mer\u003c/em\u003e analysis based on whole-genome sequencing technology to estimate the genome size of \u003cem\u003eC. scabrifolia\u003c/em\u003e and analyze its genomic characteristics. Using bioinformatics methods, transposons across the whole genome were identified and classified, followed by the extraction of microsatellite sequence information. Additionally, we conducted functional annotation and analysis of single-copy genes in the genome. These efforts aim to lay a solid foundation for further genomic analysis, gene discovery, and molecular genetic breeding.\u003c/p\u003e"},{"header":"2 Materials and methods","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003e2.1 Plant and DNA sources\u003c/h2\u003e \u003cp\u003eThe experimental materials were sourced from the Kunming Institute of Botany, Chinese Academy of Sciences (25\u0026deg;14\u0026prime; N, 102\u0026deg;75\u0026prime; E, elevation 1900 meters), utilizing fresh, tender leaves. A voucher specimen, C. scabrifolia (YUNCM5301260363), is deposited in the herbarium of Yunnan University of Traditional Chinese Medicine. Genomic DNA was extracted using the CTAB method (Doyle and Doyle \u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e1987\u003c/span\u003e), and the DNA concentration and quality were assessed using a NanoDrop 2000 spectrophotometer and agarose gel electrophoresis.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec4\" class=\"Section2\"\u003e \u003ch2\u003e2.2 Flow cytometry estimation of genome size\u003c/h2\u003e \u003cp\u003eCorn B73, sourced from the Kunming Institute of Botany, Chinese Academy of Sciences, was selected as the internal reference.\u003c/p\u003e \u003cp\u003eSamples were placed in MG dissociation solution, finely chopped vertically, and left to stand for 10 minutes. The nuclei were then filtered and treated with propidium iodide (PI) and RNAase solution, stained on ice in the dark for 1 hour (Dolezel and Bartos \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e2005\u003c/span\u003e; Dolezel et al. \u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e2007\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eThe stained nuclei suspension samples were mixed with the nuclei suspension of the reference sample. The fluorescence intensity of the stained nuclei suspension samples was measured using a BD FACScalibur (BD Biosciences, USA) flow cytometer. The coefficient of variation (CV) was controlled within 5%, and the data were analyzed using Modifit 3.0 software (Tian et al. \u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e2011\u003c/span\u003e). The DNA content of the \u003cem\u003eC. scabrifolia\u003c/em\u003e sample was calculated by comparing the relative fluorescence intensity between the corn and \u003cem\u003eC. scabrifolia\u003c/em\u003e samples using the formula: DNA content of \u003cem\u003eC. scabrifolia\u003c/em\u003e sample\u0026thinsp;=\u0026thinsp;DNA content of corn sample \u0026times; fluorescence intensity of \u003cem\u003eC. scabrifolia\u003c/em\u003e sample / fluorescence intensity of corn sample.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec5\" class=\"Section2\"\u003e \u003ch2\u003e2.3 DNA sequencing, genome size and ploidy estimation\u003c/h2\u003e \u003cp\u003eDNA samples were randomly sheared using an ultrasonic disruptor, targeting fragments approximately 350 bp in length. These fragments were then used to construct libraries for paired-end sequencing on the Illumina HiSeq platform. Sequencing data underwent quality assessment with Q20 and Q30 scores, and were subjected to filtering and quality control using fastp and FastQC software. Filtered data were further analyzed using a \u003cem\u003eK-mer\u003c/em\u003e based approach (K\u0026thinsp;=\u0026thinsp;19), and genome size, repetitive sequence content, and heterozygosity were estimated using findGSE (Sun et al. 2017) and genomescope (Ranallo-Benavidez et al. \u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). Ploidy estimation was performed using smudgeplot (Ranallo-Benavidez et al. \u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e2020\u003c/span\u003e).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec6\" class=\"Section2\"\u003e \u003ch2\u003e2.4 Genome assembly and GC content analysis\u003c/h2\u003e \u003cp\u003eThe high-quality filtered sequences were assembled using SOAPdenovo2 (Luo et al. \u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e2012\u003c/span\u003e) with \u003cem\u003eK-mer\u003c/em\u003e\u0026thinsp;=\u0026thinsp;59 and default parameters, followed by the calculation of GC content. The completeness of the genome assembly and annotation was then assessed using Busco v5 (Sim\u0026atilde;o et al. \u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e2015\u003c/span\u003e). The genome sequence of \u003cem\u003eC. scabrifolia\u003c/em\u003e was analyzed and compared against a single-copy ortholog database of embryophytes, resulting in the determination of the coverage rate of the \u003cem\u003eC. scabrifolia\u003c/em\u003e genome against single-copy ortholog genes in the database.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec7\" class=\"Section2\"\u003e \u003ch2\u003e2.5 Species matching analysis of \u003cem\u003eC. scabrifolia\u003c/em\u003e using the NT database\u003c/h2\u003e \u003cp\u003eTo investigate the diversity of \u003cem\u003eC. scabrifolia\u003c/em\u003e species and to assess whether the extracted sample DNA was contaminated, we conducted a comprehensive analysis using a constructed DNA library. From this library, we randomly selected 10,000 reads and performed BLAST (Basic Local Alignment Search Tool) comparisons against the NCBI nucleotide database (NT database). The origins of these sequences were determined based on their sequence similarity.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003e2.6 Identification and annotation of transposons\u003c/h2\u003e \u003cp\u003eIn the process of transposon identification and annotation, we utilized RepeatModeler v.1.08 (Abrus\u0026aacute;n et al. \u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e2009\u003c/span\u003e) to construct a comprehensive de novo repeat library, employing its default settings for initial training. Following this, we applied RepeatMasker v4.0.7 (Bedell et al. \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e2000\u003c/span\u003e) to systematically identify and classify repetitive sequences within the genomic data. RepeatMasker enabled detailed annotation by comparing the sequences against the custom repeat library generated in the previous step.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec9\" class=\"Section2\"\u003e \u003ch2\u003e2.7 SSR site data mining\u003c/h2\u003e \u003cp\u003eWe utilized SSRMMD software (Gou et al. \u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e2020\u003c/span\u003e) to search for and enumerate SSR sites within the whole genome of \u003cem\u003eC. scabrifolia\u003c/em\u003e. The criteria for counting were as follows: mononucleotide repeats were cataloged with a minimum of 10 consecutive nucleotides, dinucleotide repeats with at least 6 repeats, and trinucleotide, tetranucleotide, pentanucleotide, and hexanucleotide repeats were identified with a minimum of 5 repeats each. Additionally, for SSR comparison, we incorporated the cultivated hawthorn genome (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.rosaceae.org/\u003c/span\u003e\u003cspan address=\"https://www.rosaceae.org/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e) and conducted SSR locus searches using the same parameters.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec10\" class=\"Section2\"\u003e \u003ch2\u003e2.8 Gene annotation\u003c/h2\u003e \u003cp\u003eSingle-copy genes within \u003cem\u003eC. scabrifolia\u003c/em\u003e were systematically identified using Busco v5 (Sim\u0026atilde;o et al. \u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e2015\u003c/span\u003e). Functional annotation of these genes was conducted using eggNOG-mapper v2.1.5 software (Cantalapiedra et al. \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e2021\u003c/span\u003e), leveraging the eggNOG database to annotate EuKaryotic Orthologous Groups (KOG), Gene Ontology (GO), and Kyoto Encyclopedia of Genes and Genomes (KEGG).\u003c/p\u003e \u003c/div\u003e"},{"header":"3 Results","content":"\u003cdiv id=\"Sec12\" class=\"Section2\"\u003e \u003ch2\u003e3.1 Genome size analysis of \u003cem\u003eC. scabrifolia\u003c/em\u003e using flow cytometry\u003c/h2\u003e \u003cp\u003eThe genome size of \u003cem\u003eC. scabrifolia\u003c/em\u003e was estimated using maize (genome size of 2.3 Gb) as a reference. The particle clusters were distinct, clear, and concentrated (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e), indicating that the samples were well distinguished under the experimental conditions. The results showed that the fluorescence value for the maize internal standard peak was 61.85, while the fluorescence value for the \u003cem\u003eC. scabrifolia\u003c/em\u003e peak was 22.76. The genome size of \u003cem\u003eC. scabrifolia\u003c/em\u003e is 0.85Gb, which is 0.37 times that of corn.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec13\" class=\"Section2\"\u003e \u003ch2\u003e3.2 DNA sequencing\u003c/h2\u003e \u003cp\u003eWe sequenced the DNA of \u003cem\u003eC. scabrifolia\u003c/em\u003e using the Illumina HiSeq PE platform, generating an initial dataset of 134 Gb. Following stringent quality control to remove low-quality reads, we secured 131 Gb of high-quality clean reads. The high fidelity of our sequencing is reflected in the Q20 and Q30 quality scores, which exceeded 96% and 90%, respectively, confirming the robustness of our sequencing data (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eTo further evaluate the sequencing quality, we plotted the distribution of base quality scores and base content at each read position (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e, Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e). The base quality scores predominantly ranged between 30 and 40. The content of bases A and T, as well as G and C, was consistently balanced, with negligible N content, indicating high-quality base calling and no significant bias. This comprehensive quality assessment underscores the reliability of our sequencing efforts.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eSequencing data statistics\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"2\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eItem\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eStastics\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRaw reads(bp)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e893,279,450\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRaw bases(bp)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e133,991,917,000\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eClean reads(bp)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e875,595,288\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eClean bases(bp)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e130,971,449,000\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eQ20(%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e96.32\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eQ30(%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e90.19\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec14\" class=\"Section2\"\u003e \u003ch2\u003e3.3 Genome size and ploidy estimation\u003c/h2\u003e \u003cp\u003eTo estimate the genome size and ploidy of \u003cem\u003eC. scabrifolia\u003c/em\u003e, we utilized \u003cem\u003eK-mer\u003c/em\u003e analysis with K\u0026thinsp;=\u0026thinsp;19. The resulting \u003cem\u003eK-mer\u003c/em\u003e frequency distribution (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e) revealed a genome size of approximately 0.87 Gb. This analysis also showed that the genome comprises about 57.86% repetitive sequences and exhibits a moderate heterozygosity rate of approximately 0.5% (Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e). Initial ploidy estimation identified \u003cem\u003eC. scabrifolia\u003c/em\u003e as an allotetraploid (AABB-type) (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e). However, the \u003cem\u003eK-mer\u003c/em\u003e distribution analysis suggests that the current genome structure is diploid, which may be the result of a relatively recent whole-genome duplication event.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eFeature Statistics of \u003cem\u003eC. scabrifolia\u003c/em\u003e genome sequences\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"2\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eItem\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eStastics\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cem\u003eK-mer\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e19\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cem\u003eK-mer\u003c/em\u003e numbers\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e110,890,578,828\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eGenome size(Gb)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.87\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eHeterozygous ratio(%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.5\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRepeat sequence ratio(%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e57.86\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec15\" class=\"Section2\"\u003e \u003ch2\u003e3.4 Genome assembly and GC content analysis\u003c/h2\u003e \u003cp\u003eWe performed de novo genome assembly of \u003cem\u003eC. scabrifolia\u003c/em\u003e using SOAPdenovo2 (\u003cem\u003eK-mer\u003c/em\u003e\u0026thinsp;=\u0026thinsp;59). This process yielded 3,516,690 raw contigs with a total length of 866,848,161 bp and a contig N50 of 306 bp. The final assembled genome comprised 2,361,006 scaffolds with a total length of 874,017,016 bp and a scaffold N50 of 3,587 bp. The GC content of the assembled genome was 37.98% (Table\u0026nbsp;\u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e3\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eUsing BUSCO (Benchmarking Universal Single-Copy Orthologs) to evaluate the completeness of the genome assembly annotation, we found that 87.6% of the genes were covered, with 67.8% identified as single-copy genes and 19.8% as multi-copy genes (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eWe also performed a correlation analysis between \u003cem\u003eK-mer\u003c/em\u003e depth and GC content in the sequencing results (Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003e). The x-axis represents \u003cem\u003eK-mer\u003c/em\u003e depth (k\u0026thinsp;=\u0026thinsp;27), while the y-axis represents the product of \u003cem\u003eK-mer\u003c/em\u003e depth and GC content. The distribution shows no significant GC bias in sequencing, with sequencing depth primarily concentrated in the 50x-150x range, indicating no contamination during sequencing. The average depth was 126x, with a minor sub-peak around 63x, which, when analyzed alongside Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e, may be attributed to heterozygosity within the genome.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab3\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 3\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eGenome Assembly Information of \u003cem\u003eC. scabrifolia\u003c/em\u003e\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"2\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eItem\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eStastics\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colspan=\"2\" nameend=\"c2\" namest=\"c1\"\u003e \u003cp\u003eContigs\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eNumber of sequences\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e3,516,690\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTotal length\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e866,848,161\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMax length\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e24,978\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eN50 length\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e306\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eN90 length\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e115\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colspan=\"2\" nameend=\"c2\" namest=\"c1\"\u003e \u003cp\u003eScaffolds\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eNumber of sequences\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e2,361,006\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTotal length\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e874,017,016\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMax length\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e97,654\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eN50 length\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e3,587\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eN90 length\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e116\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eGC content\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e37.98%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec16\" class=\"Section2\"\u003e \u003ch2\u003e3.5 Analysis of species matching in the \u003cem\u003eC. scabrifolia\u003c/em\u003e NT database\u003c/h2\u003e \u003cp\u003eTo explore the genetic diversity of \u003cem\u003eC. scabrifolia\u003c/em\u003e species and ensure the integrity of the extracted DNA samples, we conducted a comprehensive comparison of genomic fragment data. From the filtered high-quality data, we randomly selected 10,000 reads and subjected them to BLAST analysis against the NT database.\u003c/p\u003e \u003cp\u003eThe analysis revealed that Crataegus laevigata, had the highest match rate of 81.63% with the reads from the \u003cem\u003eC. scabrifolia\u003c/em\u003e genome. This finding is significant, as both species belong to the \u003cem\u003eCrataegus\u003c/em\u003e genus, indicating a close genetic relationship. Furthermore, other matching results included reads from various plants such as \u003cem\u003eMalus domestica\u003c/em\u003e (apple), \u003cem\u003ePyrus communis\u003c/em\u003e (European pear), and \u003cem\u003eEriobotrya japonica\u003c/em\u003e (loquat), all of which are members of the Rosaceae family. This consistent match with other Rosaceae species supports the reliability of our genomic data. Importantly, our analysis did not identify any reads matching animal or microbial genomes, thus confirming the absence of contamination in our samples.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab4\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 4\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eComparison results of NTdatabase\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"3\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSpecies\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eFamily\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003ePercentage (%)\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cem\u003eCrataegus laevigata\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eRosaceae\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e81.63%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cem\u003eMalus domestica\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eRosaceae\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e7.84%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cem\u003eMalus hybrid 'Flame'\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eRosaceae\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e2.97%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cem\u003ePyrus communis\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eRosaceae\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e2.64%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cem\u003eMalus sylvestris\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eRosaceae\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e1.04%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cem\u003ePyrus salicifolia\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eRosaceae\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.99%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cem\u003eMalus \u0026times; robusta\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eRosaceae\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.30%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCrataegus cuneata\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eRosaceae\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.25%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eEriobotrya japonica\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eRosaceae\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.23%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ePyrus\u0026nbsp;bretschneideri\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eRosaceae\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.13%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec17\" class=\"Section2\"\u003e \u003ch2\u003e3.6 Identification and annotation of transposons\u003c/h2\u003e \u003cp\u003eIn our study of the \u003cem\u003eC. scabrifolia\u003c/em\u003e genome, we identified a total of 2,361,006 transposon sequences, which collectively represent 51.79% of the entire genome. Among these transposons, long terminal repeat retrotransposons (LTRs) were the most prevalent, making up 35.05% of the genome. This predominance underscores the significant role LTRs play in shaping the genomic architecture and evolutionary history of the \u003cem\u003eC. scabrifolia\u003c/em\u003e. On the other hand, DNA transposons were found to be much less common, constituting only 5.73% of the genome.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab5\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 5\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eTransposon statistics\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"6\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eClass\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eOrder\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eSuperfamily\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eCounts\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003elength (bp)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003epercentage of sequence\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRetrotransposon\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eLTR\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eCopia\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e481,524\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e91,958,174\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e10.52\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eGypsy\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e1,123,183\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e204,779,784\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e23.43\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eRetroviral\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e803\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e95,981\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.01\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eLINE\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eL1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e32,076\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e9,627,386\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e1.10\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eL2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e869\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e80,960\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.01\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eJockey\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e58\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e9,011\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.01\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eRTE\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e25,221\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e3,808,486\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.44\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eSINE\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eSINE\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e33,693\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e3,908,094\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.45\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDNA transposons\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eDNA\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ehAT\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e213,435\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e27,884,436\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e3.19\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eTc1-IS630-Pogo\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e903\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e126,471\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.01\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ePiggyBac\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e393\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e37,577\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.01\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eHarbinger\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e116,540\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e15,557,650\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e1.78\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eRolling-circles\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eRolling-circles\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e35,125\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e6,451,050\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.74\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eUnclassified\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e899,729\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e88,162,902\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e10.09\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec18\" class=\"Section2\"\u003e \u003ch2\u003e3.7 SSR data mining\u003c/h2\u003e \u003cp\u003eIn our de novo search for SSR loci within the \u003cem\u003eC. scabrifolia\u003c/em\u003e genome, we identified a total of 493,829 SSR loci. The most common type was mononucleotide repeats (281,350 loci, 56.97%), followed by dinucleotide repeats (174,220 loci, 35.28%), trinucleotide repeats (31,646 loci, 6.41%), tetranucleotide repeats (3,963 loci, 0.80%), pentanucleotide repeats (2,069 loci, 0.42%), and hexanucleotide repeats (581 loci, 0.12%).\u003c/p\u003e \u003cp\u003eSimilarly, SSR locus searches in cultivated hawthorn identified 402,799 SSR loci with a distribution pattern mirroring that of the \u003cem\u003eC. scabrifolia\u003c/em\u003e. Mononucleotide repeats were predominant, comprising 55.54% (223,698 loci), followed by dinucleotide repeats with 35.92% (144,694 loci), and trinucleotide repeats with 6.74% (27,155 loci). Tetranucleotide repeats were present at 0.83% (3,336 loci), pentanucleotide repeats at 0.77% (3,114 loci), and hexanucleotide repeats at 0.20% (802 loci).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec19\" class=\"Section2\"\u003e \u003ch2\u003e3.8 Gene annotation\u003c/h2\u003e \u003cp\u003eA total of 996 protein-coding genes in the KOG database have been annotated, accounting for 91.04% of the total predicted protein-coding genes. The KOG annotation statistics indicate that, excluding those with unknown functions (S), the most annotated COG function in the \u003cem\u003eC. scabrifolia\u003c/em\u003e genome is replication, recombination, and repair (L), with 100 protein-coding genes, representing 10% of all protein-coding genes. The next most annotated functions are posttranslational modification, protein turnover, chaperones (O) with 86 genes (8.63%), translation, ribosomal structure, and biogenesis (J) with 79 genes (7.93%), RNA processing and modification (A) with 68 genes (6.83%), and carbohydrate transport and metabolism (G) with 43 genes (4.32%).\u003c/p\u003e \u003cp\u003eThe Gene Ontology (GO) project was developed to address inconsistencies in gene function definitions across different databases and species, aiming to provide a unified and standardized gene function annotation system. In the \u003cem\u003eC. scabrifolia\u003c/em\u003e genome, 627 genes are annotated in the GO database, which is 57.31% of the total predicted genes. Among the three major categories of the GO database\u0026mdash;Molecular Function (MF), Biological Process (BP), and Cellular Components (CC)\u0026mdash;the most enriched categories are cellular anatomical entity, intracellular, cellular process, metabolic process, and catalytic activity, with 545, 545, 518, 459, and 325 genes, respectively.\u003c/p\u003e \u003cp\u003eThe Kyoto Encyclopedia of Genes and Genomes (KEGG) is one of the databases used to understand higher-order functional and biological systems such as cells, organisms, and ecosystems, as well as for studying pathways. In the \u003cem\u003eC. scabrifolia\u003c/em\u003e genome, 301 genes are annotated in the KEGG database, accounting for 27.51% of the total predicted genes. The metabolism category has the highest number of enriched genes with 156. The next most enriched pathways are translation, replication and repair, and metabolism of cofactors and vitamins.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e"},{"header":"4 Discussion","content":"\u003cp\u003eGenome size, also known as DNA C-value, refers to the amount of DNA contained in a gamete of an organism. The C-value varies significantly among different species and can be used to assess biological characteristics of plants. In this study, we estimated the genome size of \u003cem\u003eC. scabrifolia\u003c/em\u003e to be 870 Mb with a heterozygosity rate of 0.5% and a repeat sequence proportion of 57.86%, using flow cytometry combined with genome survey-based \u003cem\u003eK-mer\u003c/em\u003e analysis. The genome size of cultivated hawthorn was determined to be 856.88 Mb, with a heterozygosity rate of 1.77% and a repeat sequence proportion of 67.89%. Although the genome sizes of the two hawthorn species are similar, the heterozygosity rate of \u003cem\u003eC. scabrifolia\u003c/em\u003e is lower than that of cultivated hawthorn. This lower heterozygosity in \u003cem\u003eC. scabrifolia\u003c/em\u003e may be attributed to its geographically restricted distribution, which limits gene flow between populations and results in lower genetic diversity (Favre et al. \u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). In contrast, cultivated hawthorn has a wider distribution and has undergone extensive artificial selection and hybridization to enhance yield, disease resistance, and adaptability, thus increasing its heterozygosity (Zhuang et al. \u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e2022\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eThe higher proportion of repeat sequences in the cultivated hawthorn genome compared to the \u003cem\u003eC. scabrifolia\u003c/em\u003e genome is likely due to the sequencing techniques used. Our study utilized only second-generation sequencing data, which has shorter read lengths and is less effective at accurately assembling highly repetitive genomic regions. Consequently, the identification of repeat sequences in the \u003cem\u003eC. scabrifolia\u003c/em\u003e genome may be incomplete. In contrast, the genome of cultivated hawthorn was sequenced using a combination of second-generation and third-generation sequencing technologies, which provide longer read lengths that can span repetitive regions, resulting in a more comprehensive genome assembly. Thus, the lower proportion of detected repeat sequences in the \u003cem\u003eC. scabrifolia\u003c/em\u003e genome can be attributed to the limitations of the sequencing technology employed in this study.\u003c/p\u003e \u003cp\u003eTo further study the species diversity of \u003cem\u003eC. scabrifolia\u003c/em\u003e, the top 10,000 reads from sequencing were extracted and aligned with the NT database using BLAST. The results indicated that among the published sequences, \u003cem\u003eC. scabrifolia\u003c/em\u003e shares the highest read match rate with \u003cem\u003eC. laevigata\u003c/em\u003e, indicating the closest phylogenetic relationship, followed by a closer relationship with apples.\u003c/p\u003e \u003cp\u003eLarge-scale genome sequencing technology has provided new opportunities for the study of transposons. Studies have shown that although the content of transposons varies among species, their content is closely related to genome size, showing a positive correlation. It is generally believed that genome size is influenced by both the increase and deletion of DNA content, with an increase in transposon copies being a significant factor in genome enlargement (Hawkins et al. \u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e2008\u003c/span\u003e). Analysis of the \u003cem\u003eC. scabrifolia\u003c/em\u003e genome revealed various types of transposons, among which LTRs are the predominant form, accounting for 51.79%. In contrast, DNA transposons are less common, making up only 5.73%. The high proportion of LTRs suggests that they may be involved in genome size expansion and functional diversification through mechanisms such as gene duplication, insertion mutations, and gene expression regulation (Vitte C and Panaud O \u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e2003\u003c/span\u003e). Conversely, the relatively low presence of DNA transposons might reflect their different evolutionary pressures and reproduction mechanisms (Feschotte C and Pritham EJ 2007). Understanding the diverse roles and impacts of these transposons is crucial for elucidating the genetic and evolutionary processes that shape the genome of \u003cem\u003eC. scabrifolia\u003c/em\u003e. Based on Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003e, it is inferred that the increase in transposon copy numbers in the \u003cem\u003eC. scabrifolia\u003c/em\u003e genome likely occurred during a specific period rather than through multiple independent events, resembling the transposon insertion patterns of cultivated hawthorn (Zhang et al. \u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e2022\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eBreeding research on \u003cem\u003eC. scabrifolia\u003c/em\u003e has been relatively limited, but molecular marker technology can facilitate the selection and preservation of genotypes. Genomic SSR markers are highly polymorphic molecular markers known for their co-dominant inheritance, excellent reproducibility, and stability. These characteristics make SSR markers highly valuable for genomic research, genetic analysis, and breeding improvement (Lei et al. \u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). Due to the lack of genomic data specific to Crataegus species, previous studies have utilized SSR loci developed from apple and pear genomes to analyze various hawthorn genotypes (G\u0026uuml;ney et al. \u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e2018\u003c/span\u003e). In this study, we identified 493,829 SSR loci in the \u003cem\u003eC. scabrifolia\u003c/em\u003e genome, providing a comprehensive molecular marker resource for genotype selection and preservation. These newly identified markers are expected to significantly enhance the precision and effectiveness of related research. The distribution patterns of SSR loci in \u003cem\u003eC. scabrifolia\u003c/em\u003e were found to be similar to those in cultivated hawthorn, indicating significant genomic stability within the Crataegus genus (Li et al. \u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e2002\u003c/span\u003e). The observed differences in SSR sequence abundance between \u003cem\u003eC. scabrifolia\u003c/em\u003e and cultivated hawthorn could be attributed to selective pressures or genetic drift (Bagshaw \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2017\u003c/span\u003e). Investigating the genomic and evolutionary mechanisms underlying these differences will offer valuable insights into the genetic diversity and adaptability of hawthorn species.\u003c/p\u003e \u003cp\u003eSingle-copy genes, due to their high conservation and functional importance, exhibit unique value in phylogenetic analysis and gene function studies (Li et al. \u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e2003\u003c/span\u003e). These genes are typically unique within the genome and are integral to fundamental biological processes. They maintain a high degree of conservation across species evolution, reflecting their essential roles. In our study, we conducted a comprehensive search for single-copy genes within the genome of \u003cem\u003eC. scabrifolia\u003c/em\u003e and subsequently annotated them using the KOG, GO, and KEGG databases. The categorization of these annotated genes revealed their involvement in crucial biological functions, with significant classifications including \"Replication, Recombination, and Repair,\" \"Cellular Organization and Structure,\" and \"Metabolism.\" These genes are integral to maintaining genomic stability, ensuring cellular structural complexity, and regulating metabolic processes. This underscores the adaptive strengths of \u003cem\u003eC. scabrifolia\u003c/em\u003e in varied and stressful environments, potentially aiding in its survival and reproduction under adverse conditions (Chinnusamy et al. 2009). Detailed analysis of these genes' expression patterns can offer significant insights into their dynamic regulatory mechanisms across different environmental contexts. Such insights are crucial for advancing genetic improvement, informing conservation strategies, and supporting further functional genomics research.\u003c/p\u003e"},{"header":"5 Conclusion","content":"\u003cp\u003eThis research, through the inaugural comprehensive analysis of the genomic characteristics and pivotal genetic elements of \u003cem\u003eC. scabrifolia\u003c/em\u003e, unveils significant advantages related to genome stability and adaptability. It provides new insights into its genetic evolution and environmental adaptation mechanisms and offers valuable theoretical foundations for future functional genomics research, molecular breeding, and conservation management.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eAcknowledgements\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors thank Guodong Li and Ticao Zhang for their assistance with the molecular laboratory work. Additionally, we appreciate the bioinformatics high-performance computing server at Yunnan University of Traditional Chinese Medicine for providing computational resources for data analysis.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthor Contributions\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTCZ and GDL conceptualized the study; XEW and DLL collected the experimental materials; BZW, JC, and YMZ participated in the analysis of the experimental results; BZW drafted the manuscript. All authors have read and agreed to the published version of the manuscript.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis research was jointly funded by the National Natural Science Foundation of China Regional Project (32260094), the Yunnan Provincial Traditional Chinese Medicine Joint Key Project (202101AZ070001-166), and the Yunnan Provincial Major Science and Technology Special Project (202102AE090031).\u003c/p\u003e\n\u003cp\u003eConflict of interest\u003c/p\u003e\n\u003cp\u003eThe authors declare that there is no conflict of interest.\u003c/p\u003e\n\u003cp\u003eConsent for publication\u003c/p\u003e\n\u003cp\u003eAll authors participated in, read and approved the final version of the article before publication.\u003c/p\u003e\n\u003cp\u003eEthical approval\u003c/p\u003e\n\u003cp\u003eNot applicable.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eAbrus\u0026aacute;n G, Grundmann N, DeMester L, Makalowski W (2009) TEclass\u0026ndash;a tool for automated classification of unknown eukaryotic transposable elements. Bioinformatics 25:1329\u0026ndash;1330. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1093/bioinformatics/btp084\u003c/span\u003e\u003cspan address=\"10.1093/bioinformatics/btp084\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBagshaw ATM (2017) Functional Mechanisms of Microsatellite DNA in Eukaryotic Genomes. Genome Biol Evol 9:2428\u0026ndash;2443. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1093/gbe/evx164\u003c/span\u003e\u003cspan address=\"10.1093/gbe/evx164\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBedell JA, Korf I, Gish W (2000) MaskerAid: a performance enhancement to RepeatMasker. Bioinformatics 16:1040\u0026ndash;1041. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1093/bioinformatics/16.11.1040\u003c/span\u003e\u003cspan address=\"10.1093/bioinformatics/16.11.1040\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChinnusamy V, Zhu JK (2009) Epigenetic regulation of stress responses in plants. Curr Opin Plant Biol 12:133\u0026ndash;139. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.pbi.2008.12.006\u003c/span\u003e\u003cspan address=\"10.1016/j.pbi.2008.12.006\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChang Q, Zuo Z, Harrison F, Chow MS (2002) Hawthorn. J Clin Pharmacol 42:605\u0026ndash;612. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1177/00970002042006003\u003c/span\u003e\u003cspan address=\"10.1177/00970002042006003\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChen W, Hasegawa DK, Arumuganathan K et al (2015) Estimation of the Whitefly Bemisia tabaci Genome Size Based on k-mer and Flow Cytometric Analyses. Insects 6:704\u0026ndash;715. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3390/insects6030704\u003c/span\u003e\u003cspan address=\"10.3390/insects6030704\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCantalapiedra CP, Hern\u0026aacute;ndez-Plaza A, Letunic I et al (2021) eggNOG-mapper v2: Functional Annotation, Orthology Assignments, and Domain Prediction at the Metagenomic Scale. Mol Biol Evol 38:5825\u0026ndash;5829. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1093/molbev/msab293\u003c/span\u003e\u003cspan address=\"10.1093/molbev/msab293\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDolezel J, Bartos J (2005) Plant DNA flow cytometry and estimation of nuclear genome size. Ann Bot 95:99\u0026ndash;110. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1093/aob/mci005\u003c/span\u003e\u003cspan address=\"10.1093/aob/mci005\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDoyle JJ, Doyle JL (1987) A rapid DNA isolation procedure for small quantities of fresh leaf tissue. Phytochemistry 19:11\u0026ndash;15\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDolezel J, Greilhuber J, Suda J (2007) Estimation of nuclear DNA content in plants using flow cytometry. Nat Protoc 2:2233\u0026ndash;2244. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1038/nprot.2007.310\u003c/span\u003e\u003cspan address=\"10.1038/nprot.2007.310\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFeschotte C, Pritham EJ (2007) DNA transposons and the evolution of eukaryotic genomes. Annu Rev Genet 41:331\u0026ndash;368. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1146/annurev.genet.40.110405.090448\u003c/span\u003e\u003cspan address=\"10.1146/annurev.genet.40.110405.090448\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFavre F, Jourda C, Grisoni M et al (2022) A genome-wide assessment of the genetic diversity, evolution and relationships with allied species of the clonally propagated crop \u003cem\u003eVanilla planifolia\u003c/em\u003e Jacks. ex Andrews. Genet Resour Crop Evol 69:2125\u0026ndash;2139. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/s10722-022-01362-1\u003c/span\u003e\u003cspan address=\"10.1007/s10722-022-01362-1\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGao GY, Feng YX (1994) Pharmacognocy and resource utilization of Yunnan-Hawthorn. The Chinese Pharmaceutical Journal 06:329\u0026ndash;331\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eG\u0026uuml;ney M, Kafkas S, Keles H et al (2018) Characterization of hawthorn (\u003cem\u003eCrataegus\u003c/em\u003e spp.) genotypes by SSR markers. Physiol Mol Biol Plants 24:1221\u0026ndash;1230. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/s12298-018-0604-6\u003c/span\u003e\u003cspan address=\"10.1007/s12298-018-0604-6\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGou X, Shi H, Yu S et al (2020) SSRMMD: A Rapid and Accurate Algorithm for Mining SSR Feature Loci and Candidate Polymorphic SSRs Based on Assembled Sequences. Front Genet 11:706. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3389/fgene.2020.00706\u003c/span\u003e\u003cspan address=\"10.3389/fgene.2020.00706\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHawkins JS, Grover CE, Wendel JF (2008) Repeated big bangs and the expanding universe: Directionality in plant genome size evolution. Plant Science 174:557\u0026ndash;562. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.plantsci.2008.03.015\u003c/span\u003e\u003cspan address=\"10.1016/j.plantsci.2008.03.015\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi YC, Korol AB, Fahima T et al (2002) Microsatellites: genomic distribution, putative functions and mutational mechanisms: a review. Mol Ecol 11:2453\u0026ndash;2465\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi L, Stoeckert CJ Jr, Roos DS (2003) OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res 13:2178\u0026ndash;2189. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1101/gr.1224503\u003c/span\u003e\u003cspan address=\"10.1101/gr.1224503\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLiu P, Kallio H, L\u0026uuml; D et al (2010) Acids, sugars, and sugar alcohols in Chinese hawthorn (Crataegus spp.) fruits. J Agric Food Chem 58:1012\u0026ndash;1019. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1021/jf902773v\u003c/span\u003e\u003cspan address=\"10.1021/jf902773v\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLuo R, Liu B, Xie Y, et al (2012) SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. GigaScience 1:18. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1186/2047-217X-1-18\u003c/span\u003e\u003cspan address=\"10.1186/2047-217X-1-18\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLei Y, Zhou Y, Price M, Song Z (2021) Genome-wide characterization of microsatellite DNA in fishes: survey and analysis of their abundance and frequency in genome-specific regions. BMC Genomics 22:421. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1186/s12864-021-07752-6\u003c/span\u003e\u003cspan address=\"10.1186/s12864-021-07752-6\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eOrhan IE (2018) Phytochemical and Pharmacological Activity Profile of \u003cem\u003eCrataegus oxyacantha\u003c/em\u003e L. (Hawthorn) - A Cardiotonic Herb. Curr Med Chem 25:4854\u0026ndash;4865. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.2174/0929867323666160919095519\u003c/span\u003e\u003cspan address=\"10.2174/0929867323666160919095519\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRanallo-Benavidez TR, Jaron KS, Schatz MC (2020) GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat Commun 11:1432. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1038/s41467-020-14998-3\u003c/span\u003e\u003cspan address=\"10.1038/s41467-020-14998-3\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSim\u0026atilde;o FA, Waterhouse RM, Ioannidis P et al (2015) BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31:3210\u0026ndash;3212. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1093/bioinformatics/btv351\u003c/span\u003e\u003cspan address=\"10.1093/bioinformatics/btv351\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSun H, Ding J, Piedno\u0026euml;l M, Schneeberger K (2018) findGSE: estimating genome size variation within human and Arabidopsis using k-mer frequencies. Bioinformatics 34:550\u0026ndash;557. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1093/bioinformatics/btx637\u003c/span\u003e\u003cspan address=\"10.1093/bioinformatics/btx637\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTian XM, Zhou XY, Gong N (2011) Applications of Flow Cytometry in Plant Research\u0026ndash;Analysis of Nuclear DNA Content and Ploidy Level in Plant Cells. Chinese Agricultural Science Bulletin 27: 21\u0026ndash;27\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eVitte C, Panaud O (2003) Formation of solo-LTRs through unequal homologous recombination counterbalances amplifications of LTR retrotransposons in rice Oryza sativa L. Mol Biol Evol 20:528\u0026ndash;540. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1093/molbev/msg055\u003c/span\u003e\u003cspan address=\"10.1093/molbev/msg055\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWu J, Peng W, Qin R, Zhou H (2014) \u003cem\u003eCrataegus pinnatifida\u003c/em\u003e: chemical constituents, pharmacology, and potential applications. Molecules 19:1685\u0026ndash;1712. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3390/molecules19021685\u003c/span\u003e\u003cspan address=\"10.3390/molecules19021685\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWu X, Luo D, Zhang Y, et al (2022) Comparative Genomic and Phylogenetic Analysis of Chloroplast Genomes of Hawthorn (\u003cem\u003eCrataegus\u003c/em\u003e spp.) in Southwest China. Front Genet 13:900357. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3389/fgene.2022.900357\u003c/span\u003e\u003cspan address=\"10.3389/fgene.2022.900357\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eXie T, Zheng JF, Liu S, et al (2015) De novo plant genome assembly based on chromatin interactions: a case study of \u003cem\u003eArabidopsis thaliana\u003c/em\u003e. Mol Plant 8:489\u0026ndash;492. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.molp.2014.12.015\u003c/span\u003e\u003cspan address=\"10.1016/j.molp.2014.12.015\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYang YC, Wang EH, Wang JQ et al (2022) History of Traditional Chinese Medicine \u003cem\u003eCrataegi\u003c/em\u003e Fructus. Asia-Pacific Traditional Medicine 18:157\u0026ndash;163\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhou QP, Wang LW, Gao GY (1999) Stuty on Antioxidative and Decreasing Blood-fat Effect in Four Kinds of Fructus \u003cem\u003eCrataegi\u003c/em\u003e. Research and Practice on Chinese Medicines 03:3\u0026ndash;5\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang HP, Zhang JY, Liu QL (2014) Research progress on hawthorn germplasm resources and breeding varieties in China. China Seed Industry 02:15\u0026ndash;17\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhuang Y, Li X, Hu J et al (2022) Expanding the gene pool for soybean improvement with its wild relatives. aBIOTECH 3:115\u0026ndash;125. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/s42994-022-00072-7\u003c/span\u003e\u003cspan address=\"10.1007/s42994-022-00072-7\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang T, Qiao Q, Du X et al (2022) Cultivated hawthorn (\u003cem\u003eCrataegus pinnatifida\u003c/em\u003e var. major) genome sheds light on the evolution of Maleae (apple tribe). J Integr Plant Biol 64:1487\u0026ndash;1501. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1111/jipb.13318\u003c/span\u003e\u003cspan address=\"10.1111/jipb.13318\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"genetic-resources-and-crop-evolution","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"gres","sideBox":"Learn more about [Genetic Resources and Crop Evolution](https://www.springer.com/journal/10722)","snPcode":"10722","submissionUrl":"https://submission.nature.com/new-submission/10722/3","title":"Genetic Resources and Crop Evolution","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Springer Hybrid","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"Genome-wide survey, Illumina sequencing, Crataegus scabrifolia, Transposable elements, SSR","lastPublishedDoi":"10.21203/rs.3.rs-4747077/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-4747077/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003e \u003cem\u003eCrataegus scabrifolia\u003c/em\u003e is a significant botanical resource in Southwest China, renowned for its medicinal properties and high potential for development due to its rich medicinal components. However, genomic research on \u003cem\u003eC. scabrifolia\u003c/em\u003e remains limited. This study conducted a comprehensive genome-wide survey of \u003cem\u003eC. scabrifolia\u003c/em\u003e, employing flow cytometry in conjunction with genome \u003cem\u003eK-mer\u003c/em\u003e analysis to assess its genomic characteristics in detail. Our findings reveal that despite a genome size similar to cultivated hawthorn (\u003cem\u003eCrataegus pinnatifida\u003c/em\u003e var. major), \u003cem\u003eC. scabrifolia\u003c/em\u003e exhibits a significantly lower heterozygosity rate of 0.5% compared to 1.77% in cultivated varieties. Additionally, we identified transposable elements comprising 51.79% of the assembled genome, with retrotransposons accounting for 35.05% of the total genome. Transposon analysis elucidated the genomic characteristics of transposons in \u003cem\u003eC. scabrifolia\u003c/em\u003e, suggesting a mode of increase similar to that observed in cultivated hawthorn. Furthermore, this study identified numerous SSR marker loci and annotated the functions of single-copy genes, providing insights into \u003cem\u003eC. scabrifolia\u003c/em\u003e 's adaptive strategies and genetic stability under varying environmental conditions. These findings offer crucial tools and resources for further genotype selection, genetic analysis, and breeding improvements.\u003c/p\u003e","manuscriptTitle":"Genome-wide Survey of Crataegus scabrifolia Provides New Insights into Its Genetic Evolution and Adaptation Mechanisms","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-08-14 11:08:19","doi":"10.21203/rs.3.rs-4747077/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2024-08-08T15:02:50+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2024-08-08T14:14:38+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"259374771804179727618863774997703236378","date":"2024-07-21T10:21:09+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2024-07-20T16:38:33+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2024-07-17T14:15:02+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2024-07-17T14:14:59+00:00","index":"","fulltext":""},{"type":"submitted","content":"Genetic Resources and Crop Evolution","date":"2024-07-16T05:02:28+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"genetic-resources-and-crop-evolution","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"gres","sideBox":"Learn more about [Genetic Resources and Crop Evolution](https://www.springer.com/journal/10722)","snPcode":"10722","submissionUrl":"https://submission.nature.com/new-submission/10722/3","title":"Genetic Resources and Crop Evolution","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Springer Hybrid","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"ed171f4e-0b80-42c6-91b1-2e294afb71cf","owner":[],"postedDate":"August 14th, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[],"tags":[],"updatedAt":"2024-10-14T16:00:40+00:00","versionOfRecord":{"articleIdentity":"rs-4747077","link":"https://doi.org/10.1007/s10722-024-02186-x","journal":{"identity":"genetic-resources-and-crop-evolution","isVorOnly":false,"title":"Genetic Resources and Crop Evolution"},"publishedOn":"2024-10-09 15:57:22","publishedOnDateReadable":"October 9th, 2024"},"versionCreatedAt":"2024-08-14 11:08:19","video":"","vorDoi":"10.1007/s10722-024-02186-x","vorDoiUrl":"https://doi.org/10.1007/s10722-024-02186-x","workflowStages":[]},"version":"v1","identity":"rs-4747077","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-4747077","identity":"rs-4747077","version":["v1"]},"buildId":"qtupq5eGEP_6zYnWcrvyt","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.