Enhancing the Accuracy of Reference-Guided Genomic Assemblies: Implementing Ragtag Correction for Reference-Guided Scaffolds

preprint OA: closed
Full text JSON View at publisher
Full text 139,562 characters · extracted from preprint-html · click to expand
Enhancing the Accuracy of Reference-Guided Genomic Assemblies: Implementing Ragtag Correction for Reference-Guided Scaffolds | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Short Report Enhancing the Accuracy of Reference-Guided Genomic Assemblies: Implementing Ragtag Correction for Reference-Guided Scaffolds Kai Liu, Nan Xie This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-4621443/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Recent advancements in long-read sequencing technologies are renowned for providing extended read lengths and lower error rates, which enhance the assembly of complex genomes. However, high costs and stringent sample quality requirements limit their widespread adoption, especially for degraded DNA samples. In contrast, short-read technologies require shorter DNA fragments but produce reads challenging genome assembly continuity. Reference-guided assembly offers a practical solution by aligning contigs with a reference genome, thereby improving scaffold continuity. However, the reference-guided assembly can introduce more misassemblies. To address this limitation, this study explores using Ragtag's Correct function integrated with in silico libraries to correct misassemblies in reference-guided assemblies. Using three draft genomes from two fish species, we demonstrate that this hybrid strategy significantly improves scaffold assembly accuracy. Specifically, in Megalobrama amblycephala , misassemblies were reduced from 8298 to 4920, and cross-links between different chromosomes decreased from 192 to zero in the corrected assemblies. In two Culter alburnus draft genomes, misassemblies were reduced from 5689 and 6582 to 4728 and 5861, respectively, while cross-links between different chromosomes were significantly reduced from 132 and 13 to five and ten in the corrected assemblies. This approach allowed precise correction of scaffold assembly errors, showcasing its potential to enhance the accuracy of genomic assemblies. Our findings underscore the importance of integrating additional genomic data to achieve reliable genome assemblies, especially for species with significant structural variations. This research provides valuable insights into optimizing genome assembly processes, contributing to advancements in genomic studies. Reference-guided assembly In silico libraries Misassembly correction Megalobrama amblycephala Culter alburnus Figures Figure 1 Figure 2 1 INTRODUCTION In biological research, long-read sequencing technologies are widely praised for delivering extended read lengths, showcasing significant advantages, especially in deciphering complex genomic structures and genetic variations. Long-read sequencing technologies have demonstrated substantial benefits in resolving the assembly of complex genomes and studying genetic diversity (Blom 2021 ; Zhang et al. 2022 ; Dijk et al. 2023). Furthermore, the application of long-read sequencing in clinical diagnostics is on the rise, promising to address genetic diseases that short-read sequencing fails to diagnose (Mantere et al. 2019 ; Logsdon et al. 2020 ; Conlin et al. 2022 ). Additionally, its use in environmental microbiology is rapidly evolving, particularly in analyzing microbial community compositions and genetic diversity (Patin and Goodwin 2022 ). Recent advancements in high-fidelity sequencing by Pacific Biosciences and ultra-long sequencing by Nanopore Technologies have significantly mitigated the high error rates traditionally associated with long-read sequencing, providing even longer reads (Lang et al. 2020 ). However, the high costs (Chen et al. 2020 ; Gehrig et al. 2022 ) and stringent sample quality requirements (Blom 2021 ) continue to pose major barriers to its widespread adoption. This is especially true for long-stored samples, as they often contain degraded DNA, which does not meet the high molecular weight DNA requirements of long-read sequencing (Tomas et al. 2018 ; Blom 2021 ). For instance, HiFi sequencing requires DNA fragments to be over 30 kb in length. In contrast to long-read sequencing technologies, short-read sequencing technologies, such as Illumina sequencing, have less stringent requirements for DNA fragment length, needing only fragments longer than 1 kb. However, the shorter reads obtained from Illumina sequencing pose challenges to the continuity of genome assembly. After assembling contigs from Illumina sequencing reads, the short length of these reads hinders the assembly of long scaffolds, resulting in scaffolds that are not sufficiently long. The reference-guided assembly provides an effective solution to improve the continuity of genome scaffold assembly. This technique involves aligning contigs with a reference genome, facilitating extension along the reference genome's chromosomal scaffolds (Lischer and Shimizu 2017 ). Using genomes of closely related species as references, we significantly improved the draft genome quality of Clarias batrachus and Culter alburnus in our preliminary studies (Liu et al. 2023a ). While effective, this method inevitably introduces misassemblies that can mislead downstream gene function annotation and evolutionary analysis. In previous research, we found that the reference-guided assembly of C. batrachus resulted in 6046 misassemblies (Liu et al. 2023a ). The Ragout software leverages phylogenetic relationships and synteny across multiple reference genomes to address potential misassemblies (Kolmogorov et al. 2018 ). Despite these efforts, it is not foolproof against misassemblies, as actual structural variations do not always conform to existing phylogenetic constructs. This method has been shown to mitigate the risk of misassemblies (Guo et al. 2022 ), yet it is not entirely without flaws. In contrast, Ragtag software offers a Correct function that uses the reference genome to detect and rectify misassemblies (Alonge et al. 2022 ). This function depends on either short-read or long-read sequencing libraries to confirm the accuracy of identified misassemblies; without this validation, misassemblies based solely on the reference genome could be erroneously identified. To tackle the challenge of misassemblies inherent in reference-guided assembly techniques, this study implemented a hybrid strategy that integrates Ragtag's Correct function with in silico libraries. Of course, when real sequencing library data is available, it can be used directly. In this study, we used three draft genomes from two fish species as examples: Megalobrama amblycephala (GCA_009869865.1) (Liu et al. 2017 )d alburnus (GCA_009869775.1, GCA_028476615.1) (Ren et al. 2019 ; Liu et al. 2023b ). Since the original sequencing data for the three draft genomes were unavailable, we constructed in silico libraries from the draft genomes of the target species. This approach allowed for precise correction of scaffold assembly errors. Recent releases of chromosome-level reference genomes for these two species (Liu et al. 2021 ; Jiang et al. 2023 ) allowed us to conveniently compare reference-guided and de novo assembly results and evaluate the hybrid strategy. 2 MATERIALS AND METHODS 2.1 Data for the target and reference genomes We downloaded the draft genomes for M. amblycephala (GCA_009869865.1) (Liu et al. 2017) and C. alburnus (GCA_009869775.1, GCA_028476615.1) (Ren et al. 2019; Liu et al. 2023b) from the National Center for Biotechnology Information (NCBI) Assembly database. Additionally, chromosome-level genome assemblies for M. amblycephala (GCA_018812025.1) (Liu et al. 2021) and Chanodichthys erythropterus (GCA_024489055.1) (Zhao et al. 2022) were also retrieved from NCBI. Furthermore, we accessed the chromosome-level genome assembly for C. alburnus (GWHBOSX00000000) (Jiang et al. 2023) from the China National Center for Bioinformation, ensuring comprehensive genomic resources for our analyses. The median divergence time between M. amblycephala and C. alburnus is estimated to be 7.18 million years ago (MYA). In comparison, the median divergence time between C. alburnus and C. erythropterus is estimated at 4.54 MYA (Liu et al. 2023a). 2.2 Reference-guided scaffold assembly We employed reference-guided assemblers, specifically Ragout v2.3, to enhance the draft genome assembly of the target species, M. amblycephala (GCA_009869865.1) and C. alburnus (GCA_009869775.1, GCA_028476615.1). For Ragout's alignment process, we used SibeliaZ (Minkin and Medvedev 2020). C. erythropterus (GCA_024489055.1), C. alburnus (GWHBOSX00000000), and M. amblycephala (GCA_018812025.1) were used as multiple reference genomes for guided scaffold assembly. 2.3 Misassembly correction We utilized C. erythropterus (GCA_024489055.1), M. amblycephala (GCA_018812025.1), and C. alburnus (GWHBOSX00000000) as reference genomes and employed the Ragtag Correct function with an in silico pair-end 500 bp library to rectify misassemblies in the target genomes of M. amblycephala (GCA_009869865.1) and C. alburnus (GCA_009869775.1, GCA_028476615.1). The median divergence time between C. erythropteru s and C. alburnus is 4.54 MYA (Liu et al. 2023a). Due to the lack of real sequencing data for the target genomes, the in silico paired-end library was generated using ART v2.5.8 (Huang et al. 2012) with 500 bp insert lengths, creating simulated Illumina sequence data. Fastp was used to trim and correct base errors and duplications in the generated in silico library (Chen 2023) with parameters set to -c -D. To compare the effectiveness of misassembly correction, we constructed three types of genomes based on reference-guided scaffold assembly obtained from Ragout. First, using the scaffold assembly results from Ragout, and we performed Ragtag assembly on the target genomes using M. amblycephala (GCA_018812025.1) or C. alburnus (GWHBOSX00000000) as reference genomes. Second, we performed misassembly correction without in silico libraries on the target genomes, followed by Ragtag assembly using the same references. Third, we performed misassembly correction with in silico libraries on the target genomes, followed by Ragtag assembly using the same references. For Ragtag's alignment process, we utilized nucmer (Marcais et al. 2018), one of Ragtag's in-built aligners. When correcting misassemblies in M. amblycephala (GCA_009869865.1), we also performed a de novo scaffold assembly (ma_denovo) for comparison using SSPACE Basic v2.1.1 (Boetzer et al. 2011). This was done after correcting the ma_ragout assembly with in silico libraries. 2.4 Evaluation of misassembly correction We used Quast-LG v5.2.0 (-large) to evaluate the quality of our assemblies, including misassemblies, contiguity, and other relevant statistics (Mikheenko et al. 2018). The assessment of misassemblies was also conducted using synteny plots and dot plots. Synteny plots were generated using NGenomeSyn software (He et al. 2023), and dot plots were created with dotPlotly (available at https://github.com/tpoorten/dotPlotly). For synteny comparisons, we utilized nucmer with alignment parameters set to --mum -c 30000 for M. amblycephala and --mum -c 10000 for C. alburnus . 3 RESULTS 3.1 Misassembly correction of the M. amblycephala scaffold assembly We performed multi-reference guided scaffold assembly of the M. amblycephala draft genome (GCA_009869865.1) using Ragout, incorporating chromosome-level reference genomes of M. amblycephala (GCA_018812025.1), C. erythropterus (GCA_024489055.1), and C. alburnus (GWHBOSX00000000). This version of the assembly was designated as ma_ragout. To compare the effectiveness of misassembly correction, we constructed three types of genomes based on ma_ragout. The first type of assembly, ma_ragtag, involved performing Ragtag assembly on the ma_ragout assembly using the M. amblycephala chromosome-level reference genome (GCA_018812025.1) as the reference genome. The second type, ma_without_insilico, involved misassembly correction without in silico libraries on thema_ragout assembly, using the same references. The third type, ma_with_insilico, involved misassembly correction with in silico libraries on the ma_ragout assembly, using the same references. Finally, we used M. amblycephala (GCA_018812025.1) as the reference genome and evaluated the quality of ma_ragout, ma_ragtag, ma_without_insilico, and ma_with_insilico using Quast-LG v5.2.0, including metrics such as misassemblies, contiguity, and other relevant statistics. As shown in Figure 1A and Table 1, the continuity of the assemblies ma_ragout, ma_ragtag, ma_without_insilico, and ma_with_insilico exhibits slight differences. The ma_ragtag assembly demonstrates the highest continuity but introduces the most misassemblies, totaling 8298. Regardless of whether in silico libraries were used, the misassemblies values in the assemblies obtained after misassembly correction, ma_without_insilico, and ma_with_insilico, were significantly reduced to 4853 and 4920, respectively (Figure 1A). The difference in misassemblies values between ma_without_insilico and ma_with_insilico is not significant, but the misassemblies in the ma_with_insilico assembly are slightly higher than those in the ma_without_insilico assembly. This may be because the ma_without_insilico assembly relied entirely on misassembly correction using the reference genome GCA_018812025.1, disregarding the differences between GCA_018812025.1 and GCA_009869865.1. Additionally, we found that after misassembly correction with in silico libraries, the Ragtag assembly had higher continuity (N50: 45322139 > 9908590) and lower misassembly values (4920 9787.62). Figure 1 also illustrates that prior to misassembly correction, the ma_ragtag assembly exhibited 192 cross-links between different chromosomes compared to GCA_018812025.1 (Figure 1B), and the dot plot showed discontinuous and non-linear diagonals (Figure 1D). After misassembly correction, the ma_with_insilico assembly had zero cross-links compared to GCA_018812025.1 (Figure 1C), with the corresponding dot plot displaying continuous and significantly more linear diagonals (Figure 1E). We also corrected misassemblies using C. erythropterus (GCA_024489055.1) and C. alburnus (GWHBOSX00000000) as reference genomes, comparing the continuity and dot plots of assemblies using these references individually or in combination. In terms of assembly continuity (Supplementary Material Table S1), while using Ragtag's Merge function for multi-genome assembly resulted in lower scaffold N50 values, the synteny was superior to that of the combined assembly strategy, ma_gc, as shown in Supplementary Material Table S1, Figures S2, S3, S4, S5. After misassembly correction, genomes guided solely by the three chromosome-level reference genomes exhibited similar scaffold N50 values to the combined strategy, ma_gc, but with significantly improved synteny. Notably, the genome corrected and assembled using the M. amblycephala genome (GCA_018812025.1) as a reference demonstrated the best scaffold N50 values and synteny performance (Supplementary Material Table S1, Figures S6, S7, S8). 3.2 Misassembly correction of the C. alburnus scaffold assembly Similar to the misassembly correction of the M. amblycephala scaffold assembly, we performed multi-reference guided scaffold assembly of two C. alburnus draft genomes (GCA_009869775.1, GCA_028476615.1) using Ragout, incorporating chromosome-level reference genomes of C. alburnus (GWHBOSX00000000), C. erythropterus (GCA_024489055.1), and M. amblycephala (GCA_018812025.1). These two versions of the assembly were designated as ca_ragout and ca_hu_ragout. To simplify the comparison, we assembled only two types of genomes based on ca_ragout or ca_hu_ragout. The first type of assembly, ca_ragtag and ca_hu_ragtag, involved performing Ragtag assembly on the ca_ragout or ca_hu_ragout assembly using the C. alburnus chromosome-level reference genome (GWHBOSX00000000) as the reference genome. The second type, ca_with_insilico and ca_hu_with_insilico, involved misassembly correction with in silico libraries on the ca_ragout or ca_hu_ragout assembly, using the same references. Finally, we used C. alburnus (GWHBOSX00000000) as the reference genome and evaluated the quality of ca_ragout, ca_ragtag, ca_with_insilico, ca_hu_ragout, ca_hu_ragtag, and ca_hu_with_insilico using Quast-LG v5.2.0, assessing metrics such as misassemblies, contiguity, and other relevant statistics. As shown in Table 2, Ragtag assemblies of ca_ragout and ca_hu_ragout (ca_ragtag, ca_with_insilico, ca_hu_ragtag, and ca_hu_with_insilico) can improve the N50 but introduce more gaps, increasing the N's per 100 kbp value. Misassembly correction with in silico libraries can reduce gaps to some extent, but not significantly. Regardless of the processing method, the mismatches per 100 kbp value showed slight variation among ca_ragout, ca_ragtag, ca_with_insilico, ca_hu_ragout, ca_hu_ragtag, and ca_hu_with_insilico, which may be related to the differences between the reference genome and the target genome. Significantly, misassembly correction with in silico libraries can reduce misassemblies. For the two versions of the C. alburnus genome, misassemblies decreased from 5689 and 6582 to 4728 and 5861, respectively. Additionally, we observed that performing Ragtag assembly on the ca_ragout or ca_hu_ragout assembly significantly increased misassemblies. According to Table 2, the increase in total misassemblies is related to increased scaffold misassemblies. This indicates that Ragtag can cause further misassemblies without misassembly correction. Additionally, comparing the two C. alburnus genomes with different assembly qualities, the high-quality genome (GCA_028476615.1) has lower N's per 100 kbp and mismatches per 100 kbp values compared to the lower-quality genome (GCA_009869775.1), but the improvement in misassemblies is not significant. However, as shown in Figure 2, the synteny of the two C. alburnus draft genomes was significantly enhanced, particularly for GCA_009869775.1. Before misassembly correction, this draft contained 312 cross-links between different chromosomes (Figure 2A), which were reduced to only five after correction (Figure 2B). In contrast, improvements in synteny for GCA_028476615.1 were less pronounced, with the number of cross-links changing from 13 before correction to 10 afterward. Further analysis of the dot plots (Supplementary Material Figures S9 to S12) revealed that misassembly correction notably enhanced synteny, significantly reducing breakpoints along the diagonals and highlighting their linearity. These results show that the high-quality genome (GCA_028476615.1) has fewer cross-links between different chromosomes than the lower-quality genome (GCA_009869775.1). 4 DISCUSSION In contrast to long-read sequencing technologies, short-read sequencing technologies, such as Illumina sequencing, have less stringent requirements for DNA fragment length. However, the shorter reads obtained from Illumina sequencing pose challenges to the continuity of genome assembly. In such cases, reference-guided genome assembly has emerged as an effective solution. For instance, in our preliminary research, using genomes of closely related species to C. batrachus and C. alburnus as references, we significantly enhanced the quality of their draft genomes (Liu et al. 2023a). However, reference-guided assembly is not without its flaws, as it can lead to misassemblies due to significant structural differences between the actual and reference genomes (Alonge et al. 2022). To address this issue, the Ragout software attempts to correct potential misassemblies by utilizing the phylogenetic and synteny relationships among multiple reference genomes (Kolmogorov et al. 2018). Guo et al. (2022) demonstrated that Ragout's strategy of employing multiple reference genomes helps correct genomic misassemblies by identifying chimeric adjacencies. However, in this study, the improvements were not markedly evident (Figures 1 and 2). Consequently, we further assessed using Ragtag's Correct function combined with in silico library strategies to enhance the synteny of the M. amblycephala and C. alburnus genomes. We believe introducing numerous misassemblies partly relates to Ragout's strategy of employing multiple reference genomes. Despite the efforts to correct genomic misassemblies by identifying chimeric adjacencies, this approach is not foolproof, as actual structural variations do not always conform to existing phylogenetic constructs. In this study, Ragtag's correct function has proven effective in misassembly correction. Utilizing data generated from in silico libraries, this function accurately identifies and corrects errors during scaffold assembly. It is important to note that reference-guided assembly technology relies on high-quality reference genomes, which is particularly effective in species with highly conserved genome structures (Lischer and Shimizu 2017). However, for species with substantial structural variations or lacking comprehensive genomic information, this method can introduce erroneous assemblies (Whibley et al. 2021). Therefore, its performance depends on the similarity between the reference and target genomes. By comparing Supplementary Material Figures S6, S7, and S8, we observed that using the genome of the same species as a reference markedly improves the correction of misassemblies compared to other species. Additionally, even when using closely related species, Ragtag's Correct function combined with in silico libraries significantly enhances genome synteny (Supplementary Material Figures S2, S6, S7). Although Guo et al. (2022) indicated that Ragout could improve the correction of genomic misassemblies, this conclusion was drawn considering the solid-scaffolds parameter within Ragout. Our results suggest that Ragout's ability to correct misassemblies using multiple species genomes' phylogenetic and synteny relationships is limited. Moreover, comparisons from Supplementary Material Figures S1, S2, and Figures 1 show that the earlier strategy of using Ragout in conjunction with Ragtag for assembly (Liu et al. 2023a) did not significantly enhance genome synteny compared to using Ragout alone. However, this strategy notably improved assembly continuity, as evidenced by the N50 of the M. amblycephala draft genome increasing from 41172990 to 52506231 (Table 1). This improvement is likely related to Ragtag's ability to assemble contigs into scaffolds further. Another intriguing finding was that using Ragtag's assembly strategy alone was more effective in reducing misassemblies than when combined with Ragout (Supplementary Material Figures S1). However, when the misassembly correction was applied, the combined strategy of Ragout and Ragtag significantly reduced misassemblies (Figure 1 and Supplementary Material Figures S1), demonstrating that Ragout, when utilizing the phylogenetic information of multiple reference genomes, does indeed play a role in correcting misassemblies (Kolmogorov et al. 2018; Guo et al. 2022). Additionally, we observed that using the Ragtag Correct function with in silico libraries resulted in more misassemblies than using the Ragtag Correct function without in silico libraries. We speculate that this is due to the structural differences between the draft genome of the target species and the reference genome, which are from different individuals. Structural variations are widespread among individuals (Kidd et al. 2008); therefore, using in silico libraries to correct misassemblies is crucial. Furthermore, comparing the scaffold assembly results of two C. alburnus draft genomes (Figure 2), the high-quality genome (GCA_028476615.1) exhibited significantly fewer cross-links between different chromosomes than the lower-quality genome (GCA_009869775.1). This indicates that the quality of contig assembly significantly affects scaffold assembly. Clearly, longer contigs provide more information and less uncertainty (Luo et al. 2021; Rayamajhi et al. 2022), leading to fewer errors during scaffold assembly. However, the misassembly values provided by Quast-LG did not change significantly before and after misassembly correction in these two genome versions. This may be due to differences between the reference and target genomes. As mentioned, structural variations are widespread among individuals (Kidd et al. 2008). We speculate that the misassemblies values are due to sample differences rather than assembly errors, as the two versions of the genome and the reference genome are from different individuals. In silico library technology offers a cost-effective strategy for simulating high-throughput sequencing data, thereby guiding assembly optimization. This technique avoids the cost and time associated with physically constructing libraries. Since the original sequencing data for the three draft genomes of the two species were unavailable, we constructed in silico libraries from the draft genomes of the target species. This approach allowed for precise correction of scaffold assembly errors. Without this validation, misassemblies based solely on the reference genome could be erroneously identified (Alonge et al. 2022). It is important to note that the success of in silico libraries highly depends on the quality and coverage of existing sequencing data (Luo et al. 2020). As shown in Figure 2, the cross-links between different chromosomes in the improved genome using in silico libraries constructed from the high-quality genome (GCA_028476615.1) were significantly lower than those from the lower-quality genome (GCA_009869775.1). Moreover, our study suggests that when the target genome is evolutionarily distant from the reference genome, additional data may be needed to verify and guide the correction of misassemblies, such as using Hi-C sequencing data. In summary, our research highlights both the potential and limitations of reference-guided genome assembly and the use of in silico libraries in improving genome synteny and reducing misassemblies. While reference-guided techniques are highly effective in species with conserved genome structures, they may introduce errors in species with significant genomic variations. The Correct function of Ragtag, when used in conjunction with in silico libraries, has shown promise in accurately identifying and correcting assembly errors, although its success largely depends on the phylogenetic closeness of the reference genomes used. This study underscores the necessity of integrating additional genomic data, such as Hi-C sequencing, especially when dealing with evolutionarily distant genomes, to enhance the accuracy and reliability of genome assemblies. Declarations Funding Science & Technology Innovation Program of Hangzhou Academy of Agricultural Sciences [Grant numbers 2022HNCT-01]. Conflicts of interest/Competing interests The authors declare that they have no conflict of interest in the publication. Availability of data and material This study utilized several genome assemblies publicly available in the National Center for Biotechnology Information (NCBI) Assembly database, including GCA_009869865.1, GCA_009869775.1, GCA_028476615.1, GCA_018812025.1, and GCA_024489055.1. Additionally, we accessed the chromosome-level genome assembly GWHBOSX00000000 from the China National Center for Bioinformation. The improved versions of the draft genomes for Megalobrama amblycephala (GCA_009869865.1) and Culter alburnus (GCA_009869775.1, GCA_028476615.1), labeled as cor_ma, cor_ca, and cor_ca_hu respectively, have been uploaded to Figshare with the DOI: 10.6084/m9.figshare.25621809. References Alonge M, Lebeigle L, Kirsche M, Jenike K, Ou S, Aganezov S, Wang X, Lippman ZB, Schatz MC, Soyk S (2022) Automated assembly scaffolding using RagTag elevates a new tomato system for high-throughput genome editing. Genome Biol 23(1):258. doi:https://doi.org/10.1186/s13059-022-02823-7 Blom MPK (2021) Opportunities and challenges for high-quality biodiversity tissue archives in the age of long-read sequencing. Molecular Ecology 30(23):5935-5948. doi:https://doi.org/10.1111/mec.15909 Boetzer M, Henkel CV, Jansen HJ, Butler D, Pirovano W (2011) Scaffolding pre-assembled contigs using SSPACE. Bioinformatics 27(4):578-579. doi:https://doi.org/10.1093/bioinformatics/btq683 Chen S (2023) Ultrafast one‐pass FASTQ data preprocessing, quality control, and deduplication using fastp. iMeta 2(2). doi:https://doi.org/10.1002/imt2.107 Chen Z, Pham L, Wu T-C, Mo G, Xia Y, Chang PL, Porter D, Phan T, Che H, Tran H, Bansal V, Shaffer J, Belda-Ferre P, Humphrey G, Knight R, Pevzner P, Pham S, Wang Y, Lei M (2020) Ultralow-input single-tube linked-read library method enables short-read second-generation sequencing systems to routinely generate highly accurate and economical long-range sequencing information. Genome Research 30(6):898-909. doi:https://doi.org/10.1101/gr.260380.119 Conlin LK, Aref-Eshghi E, McEldrew DA, Luo M, Rajagopalan R (2022) Long-read sequencing for molecular diagnostics in constitutional genetic disorders. Human Mutation 43(11):1531-1544. doi:https://doi.org/10.1002/humu.24465 Dijk ELv, Naquin D, Gorrichon K, Jaszczyszyn Y, Ouazahrou R, Thermes C, Hernandez C (2023) Genomics in the long-read sequencing era. Trends in Genetics 39(9):649-671. doi:https://doi.org/10.1016/j.tig.2023.04.006 Gehrig JL, Portik DM, Driscoll MD, Jackson E, Chakraborty S, Gratalo D, Ashby M, Valladares R (2022) Finding the right fit: evaluation of short-read and long-read sequencing approaches to maximize the utility of clinical microbiome data. Microbial Genomics 8(3):000794. doi:https://doi.org/10.1099/mgen.0.000794 Guo R, Papanicolaou A, Fritz ML (2022) Validation of reference-assisted assembly using existing and novel Heliothine genomes. Genomics:110441. doi:https://doi.org/10.1016/j.ygeno.2022.110441 He W, Yang J, Jing Y, Xu L, Yu K, Fang X (2023) NGenomeSyn: an easy-to-use and flexible tool for publication-ready visualization of syntenic relationships across multiple genomes. Bioinformatics. doi:https://doi.org/10.1093/bioinformatics/btad121 Huang W, Li L, Myers JR, Marth GT (2012) ART: a next-generation sequencing read simulator. Bioinformatics 28(4):593-594. doi:https://doi.org/10.1093/bioinformatics/btr708 Jiang H, Qian Y, Zhang Z, Meng M, Deng Y, Wang G, He S, Yang L (2023) Chromosome-level genome assembly and whole-genome resequencing of topmouth culter (Culter alburnus) provide insights into the intraspecific variation of its semi-buoyant and adhesive eggs. Mol Ecol Resour. doi:https://doi.org/10.1111/1755-0998.13845 Kidd JM, Cooper GM, Donahue WF, Hayden HS, Sampas N, Graves T, Hansen N, Teague B, Alkan C, Antonacci F, Haugen E, Zerr T, Yamada NA, Tsang P, Newman TL, Tüzün E, Cheng Z, Ebling HM, Tusneem N, David R, Gillett W, Phelps KA, Weaver M, Saranga D, Brand A, Tao W, Gustafson E, McKernan K, Chen L, Malig M, Smith JD, Korn JM, McCarroll SA, Altshuler DA, Peiffer DA, Dorschner M, Stamatoyannopoulos J, Schwartz D, Nickerson DA, Mullikin JC, Wilson RK, Bruhn L, Olson MV, Kaul R, Smith DR, Eichler EE (2008) Mapping and sequencing of structural variation from eight human genomes. Nature 453(7191):56-64. doi:https://doi.org/10.1038/nature06862 Kolmogorov M, Armstrong J, Raney BJ, Streeter I, Dunn M, Yang F, Odom D, Flicek P, Keane TM, Thybert D, Paten B, Pham S (2018) Chromosome assembly of large and complex genomes using multiple references. Genome Res 28(11):1720-1732. doi:https://doi.org/10.1101/gr.236273.118 Lang D, Zhang S, Ren P, Liang F, Sun Z, Meng G, Tan Y, Li X, Lai Q, Han L, Wang D, Hu F, Wang W, Liu S (2020) Comparison of the two up-to-date sequencing technologies for genome assembly: HiFi reads of Pacific Biosciences Sequel II system and ultralong reads of Oxford Nanopore. GigaScience 9(12):giaa123. doi:https://doi.org/10.1093/gigascience/giaa123 Lischer HEL, Shimizu KK (2017) Reference-guided de novo assembly approach improves genome reconstruction for related species. BMC Bioinformatics 18(1):474. doi:https://doi.org/10.1186/s12859-017-1911-6 Liu H, Chen C, Gao Z, Min J, Gu Y, Jian J, Jiang X, Cai H, Ebersberger I, Xu M, Zhang X, Chen J, Luo W, Chen B, Chen J, Liu H, Li J, Lai R, Bai M, Wei J, Yi S, Wang H, Cao X, Zhou X, Zhao Y, Wei K, Yang R, Liu B, Zhao S, Fang X, Schartl M, Qian X, Wang W (2017) The draft genome of blunt snout bream (Megalobrama amblycephala) reveals the development of intermuscular bone and adaptation to herbivorous diet. Gigascience 6(7):1-13. doi:https://doi.org/10.1093/gigascience/gix039 Liu H, Chen C, Lv M, Liu N, Hu Y, Zhang H, Enbody ED, Gao Z, Andersson L, Wang W (2021) A Chromosome-Level Assembly of Blunt Snout Bream (Megalobrama amblycephala) Genome Reveals an Expansion of Olfactory Receptor Genes in Freshwater Fish. Mol Biol Evol 38(10):4238-4251. doi:https://doi.org/10.1093/molbev/msab152 Liu K, Xie N, Wang Y, Liu X (2023a) The Utilization of Reference-Guided Assembly and In Silico Libraries Improves the Draft Genome of Clarias batrachus and Culter alburnus. Mar Biotechnol (NY) 25(6):907-917. doi:https://doi.org/10.1007/s10126-023-10248-x Liu S, Zheng J, Li F, Chi M, Cheng S, Jiang W, Liu Y, Gu Z, Zhao J (2023b) Chromosome-scale assembly and quantitative trait locus mapping for major economic traits of the Culter alburnus genome using Illumina and PacBio sequencing with Hi-C mapping information. Frontiers in Genetics 14. doi:https://doi.org/10.3389/fgene.2023.1072506 Logsdon GA, Vollger MR, Eichler EE (2020) Long-read human genome sequencing and its applications. Nature Reviews Genetics 21(10):597-614. doi:https://doi.org/10.1038/s41576-020-0236-x Luo J, Wei Y, Lyu M, Wu Z, Liu X, Luo H, Yan C (2021) A comprehensive review of scaffolding methods in genome assembly. Briefings in Bioinformatics 22(5):bbab033. doi:https://doi.org/10.1093/bib/bbab033 Luo Y, Liao X, Wu F-X, Wang J (2020) Computational Approaches for Transcriptome Assembly Based on Sequencing Technologies. Current Bioinformatics 15(1):2-16 Mantere T, Kersten S, Hoischen A (2019) Long-Read Sequencing Emerging in Medical Genetics. Frontiers in Genetics 10. doi:https://doi.org/10.3389/fgene.2019.00426 Marcais G, Delcher AL, Phillippy AM, Coston R, Salzberg SL, Zimin A (2018) MUMmer4: A fast and versatile genome alignment system. PLoS Comput Biol 14(1):e1005944. doi:https://doi.org/10.1371/journal.pcbi.1005944 Mikheenko A, Prjibelski A, Saveliev V, Antipov D, Gurevich A (2018) Versatile genome assembly evaluation with QUAST-LG. Bioinformatics 34(13):i142-i150. doi:https://doi.org/10.1093/bioinformatics/bty266 Minkin I, Medvedev P (2020) Scalable multiple whole-genome alignment and locally collinear block construction with SibeliaZ. Nature Communications 11(1):6327. doi:https://doi.org/10.1038/s41467-020-19777-8 Patin NV, Goodwin KD (2022) Long-Read Sequencing Improves Recovery of Picoeukaryotic Genomes and Zooplankton Marker Genes from Marine Metagenomes. mSystems 7(6):e00595-00522. doi:https://doi.org/10.1128/msystems.00595-22 Rayamajhi N, Cheng C-HC, Catchen JM (2022) Evaluating Illumina-, Nanopore-, and PacBio-based genome assembly strategies with the bald notothen, Trematomus borchgrevinki. G3 Genes|Genomes|Genetics 12(11):jkac192. doi:https://doi.org/10.1093/g3journal/jkac192 Ren L, Li W, Qin Q, Dai H, Han F, Xiao J, Gao X, Cui J, Wu C, Yan X, Wang G, Liu G, Liu J, Li J, Wan Z, Yang C, Zhang C, Tao M, Wang J, Luo K, Wang S, Hu F, Zhao R, Li X, Liu M, Zheng H, Zhou R, Shu Y, Wang Y, Liu Q, Tang C, Duan W, Liu S (2019) The subgenomes show asymmetric expression of alleles in hybrid lineages of Megalobrama amblycephala x Culter alburnus. Genome Res 29(11):1805-1815. doi:https://doi.org/10.1101/gr.249805.119 Tomas K, Erik B-R, Olga Vinnere P (2018) A comprehensive model of DNA fragmentation for the preservation of High Molecular Weight DNA. bioRxiv:254276. doi:https://doi.org/10.1101/254276 Whibley A, Kelley JL, Narum SR (2021) The changing face of genome assemblies: Guidance on achieving high-quality reference genomes. Molecular Ecology Resources 21(3):641-652. doi:https://doi.org/10.1111/1755-0998.13312 Zhang T, Zhou J, Gao W, Jia Y, Wei Y, Wang G (2022) Complex genome assembly based on long-read sequencing. Brief Bioinform 23(5):bbac305. doi:https://doi.org/10.1093/bib/bbac305 Zhao S, Yang X, Pang B, Zhang L, Wang Q, He S, Dou H, Zhang H (2022) A chromosome-level genome assembly of the redfin culter (Chanodichthys erythropterus). Sci Data 9(1):535. doi:https://doi.org/10.1038/s41597-022-01648-0 Tables Table 1 Multi-reference Guided Scaffold Assembly and Misassembly Correction Metrics for Megalobrama amblycephala Draft Genome ma_ragout ma_ragtag ma_without_insilico ma_with_insilico ma_correct ma_denovo N50 41172990 52506231 45776042 45322139 479171 990859 auN 46548936.2 63052515.7 49543677 48923487 1257298.6 2140853.2 N's per 100 kbp 9787.59 11117.97 13554.59 13239.3 9784.49 9787.62 mismatches per 100 kbp 509.28 506.17 499.8 503.88 513.88 514.3 misassemblies 7635 8298 4853 4920 7134 8080 contig misassemblies 5519 5173 2772 2923 5927 6325 c. relocations 2672 2646 1860 1894 2366 2566 c. translocations 2700 2416 864 969 3118 3363 c. inversions 147 111 48 60 443 396 scaffold misassemblies 2116 3125 2081 1997 1207 1755 s. relocations 987 1713 1517 1461 546 735 s. translocations 1117 1403 557 529 648 1000 s. inversions 12 9 7 7 13 20 misassembled contigs 558 90 311 336 3290 2215 Misassembled contigs length 1140376612 1177589568 1185337931 1176999057 774645442 966903610 local misassemblies 36333 36424 34678 35600 35329 35774 scaffold gap ext. mis. 211 236 423 432 163 188 scaffold gap loc. mis. 5539 5992 10650 9398 5007 5269 possible TEs 504 462 394 454 808 884 unaligned mis. contigs 363 162 459 475 913 590 mismatches 4732683 4684596 4649345 4680545 4843479 4826478 indels 1542737 1521082 1525034 1526830 1629164 1607328 indels ( 5 bp) 291188 289373 281144 290486 296529 295420 Indels length 16047289 15990323 15678591 16207677 15553476 15926178 We performed multi-reference guided scaffold assembly of the M. amblycephala draft genome (GCA_009869865.1) using Ragout, incorporating chromosome-level reference genomes of M. amblycephala (GCA_018812025.1), C. erythropterus (GCA_024489055.1), and C. alburnus (GWHBOSX00000000). This version of the assembly was designated as ma_ragout. To compare the effectiveness of misassembly correction, we constructed three types of genomes based on ma_ragout. The first type of corrected assembly, ma_ragtag, involved performing Ragtag assembly on the ma_ragout assembly using M. amblycephala chromosome-level reference genome (GCA_018812025.1) as the reference genome. The second type, ma_without_insilico, involved misassembly correction without in silico libraries on the ma_ragout assembly, using the same references. The third type, ma_with_insilico, involved misassembly correction with in silico libraries on the ma_ragout assembly, using the same references. ma_correct represents the result of misassembly correction with in silico libraries on the ma_ragout assembly. ma_denovo represents the de novo scaffold assembly using sspace_basic v2.1.1 after misassembly correction with in silico libraries on the ma_ragout assembly. All statistics are based on contigs of size >= 3000 bp, unless otherwise noted. Table 2 Multi-reference Guided Scaffold Assembly and Misassembly Correction Metrics for Culter alburnus Draft Genome ca_ragout ca_ragtag ca_with_insilico ca_hu_ragout ca_hu_ragtag ca_hu_with_insilico N50 39946378 44813369 40471471 38666041 44219289 41217093 auN 40571763.8 47915037.3 43560315.9 39949117.8 48056358.6 43386478.2 N's per 100 kbp 9721.27 11841.42 11659.28 6728.37 8091.10 8307.72 mismatches per 100 kbp 859.15 855.88 857.80 617.27 613.39 614.36 misassemblies 5689 7300 4728 6582 7126 5861 contig misassemblies 3952 3864 3026 4929 4447 3772 c. relocations 2091 2086 1765 2083 1998 1781 c. translocations 1593 1563 1114 2779 2403 1948 c. inversions 268 215 147 67 46 43 scaffold misassemblies 1737 3436 1702 1653 2679 2089 s. relocations 1019 2321 1246 1396 1979 1766 s. translocations 708 1109 436 255 696 321 s. inversions 10 6 20 2 4 2 misassembled contigs 517 137 535 1130 359 922 Misassembled contigs length 1062185200 1112049297 1074032609 1082512646 1131457991 1100912456 local misassemblies 41802 41930 41263 41287 41534 40995 scaffold gap ext. mis. 278 302 363 206 253 294 scaffold gap loc. mis. 13086 13680 14458 798 1255 1638 possible TEs 528 532 454 324 348 328 unaligned mis. contigs 465 187 817 680 361 793 mismatches 7479038 7435742 7468359 5704753 5625575 5652469 indels 1406914 1397794 1406569 2842456 2792777 2811356 indels ( 5 bp) 393433 392117 393302 457553 453420 454786 Indels length 15377377 15344830 15406340 19023843 18875571 18933458 Similar to the misassembly correction of the M. amblycephala scaffold assembly, we performed multi-reference guided scaffold assembly of two C. alburnus draft genomes (GCA_009869775.1, GCA_028476615.1) using Ragout, incorporating chromosome-level reference genomes of C. alburnus (GWHBOSX00000000), C. erythropterus (GCA_024489055.1), and M. amblycephala (GCA_018812025.1). These two versions of the assembly were designated as ca_ragout and ca_hu_ragout. To simplify the comparison, we assembled only two types of genomes based on ca_ragout or ca_hu_ragout. The first type of assembly, ca_ragtag and ca_hu_ragtag, involved performing Ragtag assembly on the ca_ragout or ca_hu_ragout assembly using the C. alburnus chromosome-level reference genome (GWHBOSX00000000) as the reference genome. The second type, ca_with_insilico and ca_hu_with_insilico, involved misassembly correction with in silico libraries on the ca_ragout or ca_hu_ragout assembly, using the same references. Finally, we used C. alburnus (GWHBOSX00000000) as the reference genome and evaluated the quality of ca_ragout, ca_hu_ragout, ca_ragtag, ca_hu_ragtag, ca_with_insilico, and ca_hu_with_insilico using Quast-LG v5.2.0, assessing metrics such as misassemblies, contiguity, and other relevant statistics. Additional Declarations The authors declare no competing interests. Supplementary Files 1SupportMaterials.docx Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-4621443","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Short Report","associatedPublications":[],"authors":[{"id":317630061,"identity":"e87b96c4-2228-45bf-bf6f-f1ede18846e6","order_by":0,"name":"Kai Liu","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA6ElEQVRIiWNgGAWjYHACxgcJP2zkGA6DORYGxGhhNvjYk2YM1nKAQYIoLWySM9gOJzYcIFaLwfHkZ9I8PGnpfcd5D7/+UCNhzMB++OgGfFoke54ZW/NY2OTOPMyXZnHgmIQZA09a2g18WvglEgxvA23J3XCYx8zgAJuEDYMEjxleLWwS6R+kedgOpxuAtfwjQgu/RI4RyPsJQC3GDw62AR1GSItkz5tiUCAbzgTawnC2T8KYjZBfDI6nbwRFpTzf+TPGHyq+2Rj2sx8+hlcLA0MCkr/AJH7lqFqYPxBWPQpGwSgYBSMRAAAZ+UupDe//JwAAAABJRU5ErkJggg==","orcid":"https://orcid.org/0000-0001-7034-7235","institution":"Hangzhou Academy of Agricultural Sciences","correspondingAuthor":true,"prefix":"","firstName":"Kai","middleName":"","lastName":"Liu","suffix":""},{"id":317630082,"identity":"3d618488-857f-4483-b0a8-c19d408c7eb3","order_by":1,"name":"Nan Xie","email":"","orcid":"","institution":"Hangzhou Academy of Agricultural Sciences","correspondingAuthor":false,"prefix":"","firstName":"Nan","middleName":"","lastName":"Xie","suffix":""}],"badges":[],"createdAt":"2024-06-22 10:20:15","currentVersionCode":1,"declarations":{"humanSubjects":false,"vertebrateSubjects":false,"conflictsOfInterestStatement":false,"humanSubjectEthicalGuidelines":false,"humanSubjectConsent":false,"humanSubjectClinicalTrial":false,"humanSubjectCaseReport":false,"vertebrateSubjectEthicalGuidelines":false},"doi":"10.21203/rs.3.rs-4621443/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-4621443/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":58989631,"identity":"c1ce36ad-560c-467e-b642-37cba4c4db2a","added_by":"auto","created_at":"2024-06-25 04:38:49","extension":"jpg","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":3129390,"visible":true,"origin":"","legend":"\u003cp\u003eSynteny Analysis Pre- and Post-Correction in \u003cem\u003eMegalobrama amblycephala\u003c/em\u003e Genome Assemblies. (A) Comparison of misassemblies in genomic assemblies with and without in silico libraries. (B) Synteny mapping before correction, comparing the reference-guided assembly (ma_ragtag) with the chromosome-level reference genome (ma). (C) Synteny mapping after correction, comparing the reference-guided assembly (ma_with_insilico) with the chromosome-level reference genome (ma). (D) Dot plot visualization before correction, contrasting the reference-guided assembly (ma_ragtag) with the chromosome-level reference genome (ma). (E) Dot plot visualization post-correction, contrasting the reference-guided assembly (ma_with_insilico) with the chromosome-level reference genome (ma).\u003c/p\u003e\n\u003cp\u003eThe \u003cem\u003ede novo\u003c/em\u003e assembly of the chromosome-level genome (GCA_018812025.1) is referred to as \"ma.\" The assembly labeled \"ma_ragtag\" corresponds to the top 24 chromosomes of the draft genome (GCA_009869865.1), which was scaffolded using multiple reference genomes via Ragout and subsequently refined using Ragtag, with GCA_018812025.1 serving as the reference genome. Following this, the assembly \"ma_with_insilico\" corresponds to the top 24 chromosomes of the draft genome, established by using GCA_018812025.1 for correcting misassemblies and as the scaffold reference in the post-\"ma_ragtag\" assembly construction.\u003c/p\u003e","description":"","filename":"fig1.jpg","url":"https://assets-eu.researchsquare.com/files/rs-4621443/v1/865fdaff432d730daba70df6.jpg"},{"id":58989632,"identity":"f3b6384f-be34-4dae-8be6-72e9446f5e96","added_by":"auto","created_at":"2024-06-25 04:38:49","extension":"jpg","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":3775534,"visible":true,"origin":"","legend":"\u003cp\u003eComparative Synteny Analysis in\u003cem\u003e Culter alburnus\u003c/em\u003e Draft Genomes Relative to Chromosome-Level Reference Genome. (A) Synteny comparison of the lower-quality genome assembly before misassembly correction (ca_ragtag) and after misassembly correction (ca_with_insilico) with the chromosome-level reference genome (ca_af). (B) Synteny comparison of the high-quality genome assembly before misassembly correction (ca_hu_ragtag) and after misassembly correction (ca_hu_with_insilico) with the chromosome-level reference genome (ca_af).\u003c/p\u003e\n\u003cp\u003eThe \u003cem\u003ede novo\u003c/em\u003e assembly of the chromosome-level genome (GWHBOSX00000000), denoted \"ca_af,\" was established. Prior to and after correction within the reference-guided framework, two draft genomes (GCA_009869775.1, GCA_028476615.1) were developed, referred to as \"ca\" and \"ca_hu,\" respectively.\u003c/p\u003e","description":"","filename":"fig2.jpg","url":"https://assets-eu.researchsquare.com/files/rs-4621443/v1/20007080b889acd12c78fa5d.jpg"},{"id":58989972,"identity":"5a3dce52-c7fc-413e-91b3-55b69f8df7a1","added_by":"auto","created_at":"2024-06-25 04:46:51","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":7456177,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-4621443/v1/16ca4fc2-e995-4bbe-973c-c44d8399a5b5.pdf"},{"id":58989633,"identity":"6486f313-f852-46d6-8856-e97ceff13c96","added_by":"auto","created_at":"2024-06-25 04:38:49","extension":"docx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":10436182,"visible":true,"origin":"","legend":"","description":"","filename":"1SupportMaterials.docx","url":"https://assets-eu.researchsquare.com/files/rs-4621443/v1/488a5b346c1ca8866c1a1297.docx"}],"financialInterests":"The authors declare no competing interests.","formattedTitle":"\u003cp\u003eEnhancing the Accuracy of Reference-Guided Genomic Assemblies: Implementing Ragtag Correction for Reference-Guided Scaffolds\u003c/p\u003e","fulltext":[{"header":"1 INTRODUCTION","content":"\u003cp\u003eIn biological research, long-read sequencing technologies are widely praised for delivering extended read lengths, showcasing significant advantages, especially in deciphering complex genomic structures and genetic variations. Long-read sequencing technologies have demonstrated substantial benefits in resolving the assembly of complex genomes and studying genetic diversity (Blom \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2021\u003c/span\u003e; Zhang et al. \u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Dijk et al. 2023). Furthermore, the application of long-read sequencing in clinical diagnostics is on the rise, promising to address genetic diseases that short-read sequencing fails to diagnose (Mantere et al. \u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e2019\u003c/span\u003e; Logsdon et al. \u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e2020\u003c/span\u003e; Conlin et al. \u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). Additionally, its use in environmental microbiology is rapidly evolving, particularly in analyzing microbial community compositions and genetic diversity (Patin and Goodwin \u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). Recent advancements in high-fidelity sequencing by Pacific Biosciences and ultra-long sequencing by Nanopore Technologies have significantly mitigated the high error rates traditionally associated with long-read sequencing, providing even longer reads (Lang et al. \u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). However, the high costs (Chen et al. \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e2020\u003c/span\u003e; Gehrig et al. \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e2022\u003c/span\u003e) and stringent sample quality requirements (Blom \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2021\u003c/span\u003e) continue to pose major barriers to its widespread adoption. This is especially true for long-stored samples, as they often contain degraded DNA, which does not meet the high molecular weight DNA requirements of long-read sequencing (Tomas et al. \u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e2018\u003c/span\u003e; Blom \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). For instance, HiFi sequencing requires DNA fragments to be over 30 kb in length. In contrast to long-read sequencing technologies, short-read sequencing technologies, such as Illumina sequencing, have less stringent requirements for DNA fragment length, needing only fragments longer than 1 kb. However, the shorter reads obtained from Illumina sequencing pose challenges to the continuity of genome assembly. After assembling contigs from Illumina sequencing reads, the short length of these reads hinders the assembly of long scaffolds, resulting in scaffolds that are not sufficiently long.\u003c/p\u003e \u003cp\u003eThe reference-guided assembly provides an effective solution to improve the continuity of genome scaffold assembly. This technique involves aligning contigs with a reference genome, facilitating extension along the reference genome's chromosomal scaffolds (Lischer and Shimizu \u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e2017\u003c/span\u003e). Using genomes of closely related species as references, we significantly improved the draft genome quality of \u003cem\u003eClarias batrachus\u003c/em\u003e and \u003cem\u003eCulter alburnus\u003c/em\u003e in our preliminary studies (Liu et al. \u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e2023a\u003c/span\u003e). While effective, this method inevitably introduces misassemblies that can mislead downstream gene function annotation and evolutionary analysis. In previous research, we found that the reference-guided assembly of \u003cem\u003eC. batrachus\u003c/em\u003e resulted in 6046 misassemblies (Liu et al. \u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e2023a\u003c/span\u003e). The Ragout software leverages phylogenetic relationships and synteny across multiple reference genomes to address potential misassemblies (Kolmogorov et al. \u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e2018\u003c/span\u003e). Despite these efforts, it is not foolproof against misassemblies, as actual structural variations do not always conform to existing phylogenetic constructs. This method has been shown to mitigate the risk of misassemblies (Guo et al. \u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e2022\u003c/span\u003e), yet it is not entirely without flaws. In contrast, Ragtag software offers a Correct function that uses the reference genome to detect and rectify misassemblies (Alonge et al. \u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). This function depends on either short-read or long-read sequencing libraries to confirm the accuracy of identified misassemblies; without this validation, misassemblies based solely on the reference genome could be erroneously identified.\u003c/p\u003e \u003cp\u003eTo tackle the challenge of misassemblies inherent in reference-guided assembly techniques, this study implemented a hybrid strategy that integrates Ragtag's Correct function with in silico libraries. Of course, when real sequencing library data is available, it can be used directly. In this study, we used three draft genomes from two fish species as examples: \u003cem\u003eMegalobrama amblycephala\u003c/em\u003e (GCA_009869865.1) (Liu et al. \u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e2017\u003c/span\u003e)d \u003cem\u003ealburnus\u003c/em\u003e (GCA_009869775.1, GCA_028476615.1) (Ren et al. \u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e2019\u003c/span\u003e; Liu et al. \u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e2023b\u003c/span\u003e). Since the original sequencing data for the three draft genomes were unavailable, we constructed in silico libraries from the draft genomes of the target species. This approach allowed for precise correction of scaffold assembly errors. Recent releases of chromosome-level reference genomes for these two species (Liu et al. \u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e2021\u003c/span\u003e; Jiang et al. \u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e2023\u003c/span\u003e) allowed us to conveniently compare reference-guided and \u003cem\u003ede novo\u003c/em\u003e assembly results and evaluate the hybrid strategy.\u003c/p\u003e"},{"header":"2 MATERIALS AND METHODS","content":"\u003cp\u003e2.1 Data for the target and reference genomes\u003c/p\u003e\n\u003cp\u003eWe downloaded the draft genomes for \u003cem\u003eM. amblycephala\u003c/em\u003e (GCA_009869865.1) (Liu et al. 2017) and \u003cem\u003eC. alburnus\u003c/em\u003e (GCA_009869775.1, GCA_028476615.1) (Ren et al. 2019; Liu et al. 2023b) from the National Center for Biotechnology Information (NCBI) Assembly database. Additionally, chromosome-level genome assemblies for \u003cem\u003eM. amblycephala\u003c/em\u003e (GCA_018812025.1) (Liu et al. 2021) and \u003cem\u003eChanodichthys erythropterus\u003c/em\u003e (GCA_024489055.1) (Zhao et al. 2022) were also retrieved from NCBI. Furthermore, we accessed the chromosome-level genome assembly for \u003cem\u003eC. alburnus\u003c/em\u003e (GWHBOSX00000000) (Jiang et al. 2023) from the China National Center for Bioinformation, ensuring comprehensive genomic resources for our analyses. The median divergence time between \u003cem\u003eM. amblycephala\u003c/em\u003e and \u003cem\u003eC. alburnus\u003c/em\u003e is estimated to be 7.18 million years ago (MYA). In comparison, the median divergence time between \u003cem\u003eC. alburnus\u003c/em\u003e and C. \u003cem\u003eerythropterus\u0026nbsp;\u003c/em\u003eis estimated at 4.54 MYA (Liu et al. 2023a).\u003c/p\u003e\n\u003cp\u003e2.2\u0026nbsp;Reference-guided scaffold assembly\u003c/p\u003e\n\u003cp\u003eWe employed reference-guided assemblers, specifically Ragout v2.3, to enhance the draft genome assembly of the target species, \u003cem\u003eM. amblycephala\u003c/em\u003e (GCA_009869865.1) and \u003cem\u003eC. alburnus\u003c/em\u003e (GCA_009869775.1, GCA_028476615.1). For Ragout\u0026apos;s alignment process, we used SibeliaZ\u0026nbsp;(Minkin and Medvedev 2020). \u003cem\u003eC. erythropterus\u003c/em\u003e (GCA_024489055.1), \u003cem\u003eC. alburnus\u003c/em\u003e (GWHBOSX00000000), and \u003cem\u003eM. amblycephala\u003c/em\u003e (GCA_018812025.1) were used as multiple reference genomes for guided scaffold\u0026nbsp;assembly.\u003c/p\u003e\n\u003cp\u003e2.3 Misassembly correction\u003c/p\u003e\n\u003cp\u003eWe utilized \u003cem\u003eC. erythropterus\u003c/em\u003e (GCA_024489055.1), \u003cem\u003eM. amblycephala\u003c/em\u003e (GCA_018812025.1), and \u003cem\u003eC. alburnus\u003c/em\u003e (GWHBOSX00000000) as reference genomes and employed the Ragtag Correct function with an in silico pair-end 500 bp library to rectify misassemblies in the target genomes of \u003cem\u003eM. amblycephala\u003c/em\u003e (GCA_009869865.1) and \u003cem\u003eC. alburnus\u003c/em\u003e (GCA_009869775.1, GCA_028476615.1).\u0026nbsp;The median divergence time between \u003cem\u003eC. erythropteru\u003c/em\u003es and \u003cem\u003eC. alburnus\u003c/em\u003e is 4.54 MYA (Liu et al. 2023a). Due to the lack of real sequencing data for the target genomes, the in silico paired-end library was generated using ART v2.5.8 (Huang et al. 2012) with 500 bp insert lengths, creating simulated Illumina sequence data. Fastp was used to trim and correct base errors and duplications in the generated in silico library\u0026nbsp;(Chen 2023)\u0026nbsp;with parameters set to -c -D.\u003c/p\u003e\n\u003cp\u003eTo compare the effectiveness of misassembly correction, we constructed three types of genomes based on reference-guided scaffold assembly obtained from Ragout.\u0026nbsp;First, using the scaffold assembly results from Ragout, and we performed Ragtag assembly on the target genomes using \u003cem\u003eM. amblycephala\u003c/em\u003e (GCA_018812025.1) or \u003cem\u003eC. alburnus\u003c/em\u003e (GWHBOSX00000000) as reference genomes. Second, we performed misassembly correction without in silico libraries on the target genomes, followed by Ragtag assembly using the same references. Third, we performed misassembly correction with in silico libraries on the target genomes, followed by Ragtag assembly using the same references.\u0026nbsp;For Ragtag\u0026apos;s alignment process, we utilized nucmer\u0026nbsp;(Marcais et al. 2018), one of Ragtag\u0026apos;s in-built aligners. When correcting misassemblies in \u003cem\u003eM. amblycephala\u003c/em\u003e (GCA_009869865.1), we also performed a \u003cem\u003ede novo\u003c/em\u003e scaffold assembly (ma_denovo) for comparison using SSPACE Basic v2.1.1\u0026nbsp;(Boetzer et al. 2011). This was done after correcting the ma_ragout assembly with in silico libraries.\u003c/p\u003e\n\u003cp\u003e2.4 Evaluation of misassembly correction\u003c/p\u003e\n\u003cp\u003eWe used Quast-LG v5.2.0 (-large) to evaluate the quality of our assemblies, including misassemblies, contiguity, and other relevant statistics\u0026nbsp;(Mikheenko et al. 2018). The assessment of misassemblies was also conducted using synteny plots and dot plots. Synteny plots were generated using NGenomeSyn software\u0026nbsp;(He et al. 2023), and dot plots were created with dotPlotly (available at https://github.com/tpoorten/dotPlotly). For synteny comparisons, we utilized nucmer with alignment parameters set to --mum -c 30000 for \u003cem\u003eM. amblycephala\u003c/em\u003e and --mum -c 10000 for \u003cem\u003eC. alburnus\u003c/em\u003e.\u003c/p\u003e"},{"header":"3 RESULTS","content":"\u003ch2\u003e3.1 Misassembly correction of the \u003cem\u003eM. amblycephala\u003c/em\u003e scaffold assembly\u003c/h2\u003e\n\u003cp\u003eWe performed multi-reference guided scaffold assembly of the \u003cem\u003eM. amblycephala\u003c/em\u003e draft genome (GCA_009869865.1) using Ragout, incorporating chromosome-level reference genomes of \u003cem\u003eM. amblycephala\u003c/em\u003e (GCA_018812025.1), \u003cem\u003eC. erythropterus\u003c/em\u003e (GCA_024489055.1), and \u003cem\u003eC. alburnus\u003c/em\u003e (GWHBOSX00000000). This version of the assembly was designated as ma_ragout. To compare the effectiveness of misassembly correction, we constructed three types of genomes based on ma_ragout. The first type of assembly, ma_ragtag, involved performing Ragtag assembly on the ma_ragout assembly using the \u003cem\u003eM. amblycephala\u003c/em\u003e chromosome-level reference genome (GCA_018812025.1) as the reference genome. The second type, ma_without_insilico, involved misassembly correction without in silico libraries on thema_ragout assembly, using the same references. The third type, ma_with_insilico, involved misassembly correction with in silico libraries on the ma_ragout assembly, using the same references. Finally, we used \u003cem\u003eM. amblycephala\u003c/em\u003e (GCA_018812025.1) as the reference genome and evaluated the quality of ma_ragout, ma_ragtag, ma_without_insilico, and ma_with_insilico using Quast-LG v5.2.0, including metrics such as misassemblies, contiguity, and other relevant statistics.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eAs shown in Figure 1A and Table 1, the continuity of the assemblies ma_ragout, ma_ragtag, ma_without_insilico, and ma_with_insilico exhibits slight differences. The ma_ragtag assembly demonstrates the highest continuity but introduces the most misassemblies, totaling 8298. Regardless of whether in silico libraries were used, the misassemblies values in the assemblies obtained after misassembly correction, ma_without_insilico, and ma_with_insilico, were significantly reduced to 4853 and 4920, respectively (Figure 1A). The difference in misassemblies values between ma_without_insilico and ma_with_insilico is not significant, but the misassemblies in the ma_with_insilico assembly are slightly higher than those in the ma_without_insilico assembly. This may be because the ma_without_insilico assembly relied entirely on misassembly correction using the reference genome GCA_018812025.1, disregarding the differences between GCA_018812025.1 and GCA_009869865.1. Additionally, we found that after misassembly correction with in silico libraries, the Ragtag assembly had higher continuity (N50: 45322139 \u0026gt; 9908590) and lower misassembly values (4920 \u0026lt; 8080) compared to the\u003cem\u003e\u0026nbsp;de novo\u003c/em\u003e scaffold assembly by SSPACE (ma_denovo) , but it introduced more gaps (N's per 100 kbp: 13239.3 \u0026gt; 9787.62). Figure 1 also illustrates that prior to misassembly correction, the ma_ragtag assembly exhibited 192 cross-links between different chromosomes compared to GCA_018812025.1 (Figure 1B), and the dot plot showed discontinuous and non-linear diagonals (Figure 1D). After misassembly correction, the ma_with_insilico assembly had zero cross-links compared to GCA_018812025.1 (Figure 1C), with the corresponding dot plot displaying continuous and significantly more linear diagonals (Figure 1E).\u003c/p\u003e\n\u003cp\u003eWe also corrected misassemblies using \u003cem\u003eC. erythropterus\u003c/em\u003e (GCA_024489055.1) and \u003cem\u003eC. alburnus\u003c/em\u003e (GWHBOSX00000000) as reference genomes, comparing the continuity and dot plots of assemblies using these references individually or in combination. In terms of assembly continuity (Supplementary Material Table S1), while using Ragtag's Merge function for multi-genome assembly resulted in lower scaffold N50 values, the synteny was superior to that of the combined assembly strategy, ma_gc, as shown in Supplementary Material Table S1, Figures S2, S3, S4, S5. After misassembly correction, genomes guided solely by the three chromosome-level reference genomes exhibited similar scaffold N50 values to the combined strategy, ma_gc, but with significantly improved synteny. Notably, the genome corrected and assembled using the \u003cem\u003eM. amblycephala\u003c/em\u003e genome (GCA_018812025.1) as a reference demonstrated the best scaffold N50 values and synteny performance (Supplementary Material Table S1, Figures S6, S7, S8).\u003c/p\u003e\n\u003ch2\u003e3.2 Misassembly correction of the\u003cem\u003e\u0026nbsp;C. alburnus\u003c/em\u003e scaffold assembly\u003c/h2\u003e\n\u003cp\u003eSimilar to the misassembly correction of the \u003cem\u003eM. amblycephala\u003c/em\u003e scaffold assembly, we performed multi-reference guided scaffold assembly of two \u003cem\u003eC. alburnus\u003c/em\u003e draft genomes (GCA_009869775.1, GCA_028476615.1) using Ragout, incorporating chromosome-level reference genomes of \u003cem\u003eC. alburnus\u003c/em\u003e (GWHBOSX00000000), \u003cem\u003eC. erythropterus\u003c/em\u003e (GCA_024489055.1), and\u003cem\u003e\u0026nbsp;M. amblycephala\u003c/em\u003e (GCA_018812025.1). These two versions of the assembly were designated as ca_ragout and ca_hu_ragout. To simplify the comparison, we assembled only two types of genomes based on ca_ragout or ca_hu_ragout. The first type of assembly, ca_ragtag and ca_hu_ragtag, involved performing Ragtag assembly on the ca_ragout or ca_hu_ragout assembly using the \u003cem\u003eC. alburnus\u003c/em\u003e chromosome-level reference genome (GWHBOSX00000000) as the reference genome. The second type, ca_with_insilico and ca_hu_with_insilico, involved misassembly correction with in silico libraries on the ca_ragout or ca_hu_ragout assembly, using the same references. Finally, we used \u003cem\u003eC. alburnus\u003c/em\u003e (GWHBOSX00000000) as the reference genome and evaluated the quality of ca_ragout, ca_ragtag, ca_with_insilico, ca_hu_ragout, ca_hu_ragtag, and ca_hu_with_insilico using Quast-LG v5.2.0, assessing metrics such as misassemblies, contiguity, and other relevant statistics.\u003c/p\u003e\n\u003cp\u003eAs shown in Table 2, Ragtag assemblies of ca_ragout and ca_hu_ragout (ca_ragtag, ca_with_insilico, ca_hu_ragtag, and ca_hu_with_insilico) can improve the N50 but introduce more gaps, increasing the N's per 100 kbp value. Misassembly correction with in silico libraries can reduce gaps to some extent, but not significantly. Regardless of the processing method, the mismatches per 100 kbp value showed slight variation among ca_ragout, ca_ragtag, ca_with_insilico, ca_hu_ragout, ca_hu_ragtag, and ca_hu_with_insilico, which may be related to the differences between the reference genome and the target genome. Significantly, misassembly correction with in silico libraries can reduce misassemblies. For the two versions of the\u003cem\u003e\u0026nbsp;C. alburnus\u003c/em\u003e genome, misassemblies decreased from 5689 and 6582 to 4728 and 5861, respectively. Additionally, we observed that performing Ragtag assembly on the ca_ragout or ca_hu_ragout assembly significantly increased misassemblies. According to Table 2, the increase in total misassemblies is related to increased scaffold misassemblies. This indicates that Ragtag can cause further misassemblies without misassembly correction. Additionally, comparing the two\u003cem\u003e\u0026nbsp;C. alburnus\u003c/em\u003e genomes with different assembly qualities, the high-quality genome (GCA_028476615.1) has lower N's per 100 kbp and mismatches per 100 kbp values compared to the lower-quality genome (GCA_009869775.1), but the improvement in misassemblies is not significant.\u003c/p\u003e\n\u003cp\u003eHowever, as shown in Figure 2, the synteny of the two \u003cem\u003eC. alburnus\u003c/em\u003e draft genomes was significantly enhanced, particularly for GCA_009869775.1. Before misassembly correction, this draft contained 312 cross-links between different chromosomes (Figure 2A), which were reduced to only five after correction (Figure 2B). In contrast, improvements in synteny for GCA_028476615.1 were less pronounced, with the number of cross-links changing from 13 before correction to 10 afterward. Further analysis of the dot plots (Supplementary Material Figures S9 to S12) revealed that misassembly correction notably enhanced synteny, significantly reducing breakpoints along the diagonals and highlighting their linearity. These results show that the high-quality genome (GCA_028476615.1) has fewer cross-links between different chromosomes than the lower-quality genome (GCA_009869775.1).\u0026nbsp;\u003c/p\u003e"},{"header":"4 DISCUSSION","content":"\u003cp\u003eIn contrast to long-read sequencing technologies, short-read sequencing technologies, such as Illumina sequencing, have less stringent requirements for DNA fragment length. However, the shorter reads obtained from Illumina sequencing pose challenges to the continuity of genome assembly. In such cases, reference-guided genome assembly has emerged as an effective solution. For instance, in our preliminary research, using genomes of closely related species to \u003cem\u003eC. batrachus\u003c/em\u003e and \u003cem\u003eC. alburnus\u003c/em\u003e as references, we significantly enhanced the quality of their draft genomes (Liu et al. 2023a). However, reference-guided assembly is not without its flaws, as it can lead to misassemblies due to significant structural differences between the actual and reference genomes\u0026nbsp;(Alonge et al. 2022). To address this issue, the Ragout software attempts to correct potential misassemblies by utilizing the phylogenetic and synteny relationships among multiple reference genomes\u0026nbsp;(Kolmogorov et al. 2018).\u0026nbsp;Guo et al. (2022) demonstrated that Ragout's strategy of employing multiple reference genomes helps correct genomic misassemblies by identifying chimeric adjacencies. However, in this study, the improvements were not markedly evident (Figures 1 and 2). Consequently, we further assessed using Ragtag's Correct function combined with in silico library strategies to enhance the synteny of the \u003cem\u003eM. amblycephala\u003c/em\u003e and \u003cem\u003eC. alburnus\u003c/em\u003e genomes. We believe introducing numerous misassemblies partly relates to Ragout's strategy of employing multiple reference genomes. Despite the efforts to correct genomic misassemblies by identifying chimeric adjacencies, this approach is not foolproof, as actual structural variations do not always conform to existing phylogenetic constructs.\u003c/p\u003e\n\u003cp\u003eIn this study, Ragtag's correct function has proven effective in misassembly correction. Utilizing data generated from in silico libraries, this function accurately identifies and corrects errors during scaffold assembly. It is important to note that reference-guided assembly technology relies on high-quality reference genomes, which is particularly effective in species with highly conserved genome structures (Lischer and Shimizu 2017). However, for species with substantial structural variations or lacking comprehensive genomic information, this method can introduce erroneous assemblies (Whibley et al. 2021). Therefore, its performance depends on the similarity between the reference and target genomes. By comparing Supplementary Material Figures S6, S7, and S8, we observed that using the genome of the same species as a reference markedly improves the correction of misassemblies compared to other species. Additionally, even when using closely related species, Ragtag's Correct function combined with in silico libraries significantly enhances genome synteny (Supplementary Material Figures S2, S6, S7). Although Guo et al. (2022) indicated that Ragout could improve the correction of genomic misassemblies, this conclusion was drawn considering the solid-scaffolds parameter within Ragout. Our results suggest that Ragout's ability to correct misassemblies using multiple species genomes' phylogenetic and synteny relationships is limited.\u003c/p\u003e\n\u003cp\u003eMoreover, comparisons from Supplementary Material Figures S1, S2, and Figures 1 show that the earlier strategy of using Ragout in conjunction with Ragtag for assembly (Liu et al. 2023a) did not significantly enhance genome synteny compared to using Ragout alone. However, this strategy notably improved assembly continuity, as evidenced by the N50 of the \u003cem\u003eM. amblycephala\u003c/em\u003e draft genome increasing from 41172990 to 52506231 (Table 1). This improvement is likely related to Ragtag's ability to assemble contigs into scaffolds further. Another intriguing finding was that using Ragtag's assembly strategy alone was more effective in reducing misassemblies than when combined with Ragout (Supplementary Material Figures S1). However, when the misassembly correction was applied, the combined strategy of Ragout and Ragtag significantly reduced misassemblies (Figure 1 and Supplementary Material Figures S1), demonstrating that Ragout, when utilizing the phylogenetic information of multiple reference genomes, does indeed play a role in correcting misassemblies (Kolmogorov et al. 2018; Guo et al. 2022). Additionally, we observed that using the Ragtag Correct function with in silico libraries resulted in more misassemblies than using the Ragtag Correct function without in silico libraries. We speculate that this is due to the structural differences between the draft genome of the target species and the reference genome, which are from different individuals. Structural variations are widespread among individuals (Kidd et al. 2008); therefore, using in silico libraries to correct misassemblies is crucial.\u003c/p\u003e\n\u003cp\u003eFurthermore, comparing the scaffold assembly results of two \u003cem\u003eC. alburnus\u003c/em\u003e draft genomes (Figure 2), the high-quality genome (GCA_028476615.1) exhibited significantly fewer cross-links between different chromosomes than the lower-quality genome (GCA_009869775.1). This indicates that the quality of contig assembly significantly affects scaffold assembly. Clearly, longer contigs provide more information and less uncertainty (Luo et al. 2021; Rayamajhi et al. 2022), leading to fewer errors during scaffold assembly. However, the misassembly values provided by Quast-LG did not change significantly before and after misassembly correction in these two genome versions. This may be due to differences between the reference and target genomes. As mentioned, structural variations are widespread among individuals (Kidd et al. 2008). We speculate that the misassemblies values are due to sample differences rather than assembly errors, as the two versions of the genome and the reference genome are from different individuals.\u003c/p\u003e\n\u003cp\u003eIn silico library technology offers a cost-effective strategy for simulating high-throughput sequencing data, thereby guiding assembly optimization. This technique avoids the cost and time associated with physically constructing libraries. Since the original sequencing data for the three draft genomes of the two species were unavailable, we constructed in silico libraries from the draft genomes of the target species. This approach allowed for precise correction of scaffold assembly errors. Without this validation, misassemblies based solely on the reference genome could be erroneously identified (Alonge et al. 2022). It is important to note that the success of in silico libraries highly depends on the quality and coverage of existing sequencing data (Luo et al. 2020). As shown in Figure 2, the cross-links between different chromosomes in the improved genome using in silico libraries constructed from the high-quality genome (GCA_028476615.1) were significantly lower than those from the lower-quality genome (GCA_009869775.1). Moreover, our study suggests that when the target genome is evolutionarily distant from the reference genome, additional data may be needed to verify and guide the correction of misassemblies, such as using Hi-C sequencing data.\u003c/p\u003e\n\u003cp\u003eIn summary, our research highlights both the potential and limitations of reference-guided genome assembly and the use of in silico libraries in improving genome synteny and reducing misassemblies. While reference-guided techniques are highly effective in species with conserved genome structures, they may introduce errors in species with significant genomic variations. The Correct function of Ragtag, when used in conjunction with in silico libraries, has shown promise in accurately identifying and correcting assembly errors, although its success largely depends on the phylogenetic closeness of the reference genomes used. This study underscores the necessity of integrating additional genomic data, such as Hi-C sequencing, especially when dealing with evolutionarily distant genomes, to enhance the accuracy and reliability of genome assemblies.\u003c/p\u003e"},{"header":"Declarations","content":"\u003ch2\u003eFunding\u003c/h2\u003e\n\u003cp\u003eScience \u0026amp; Technology Innovation Program of Hangzhou Academy of Agricultural Sciences [Grant numbers 2022HNCT-01].\u003c/p\u003e\n\u003ch2\u003eConflicts of interest/Competing interests\u003c/h2\u003e\n\u003cp\u003eThe authors declare that they have no conflict of interest in the publication.\u003c/p\u003e\n\u003ch2\u003eAvailability of data and material\u003c/h2\u003e\n\u003cp\u003eThis study utilized several genome assemblies publicly available in the National Center for Biotechnology Information (NCBI) Assembly database, including GCA_009869865.1, GCA_009869775.1, GCA_028476615.1, GCA_018812025.1, and GCA_024489055.1. Additionally, we accessed the chromosome-level genome assembly GWHBOSX00000000 from the China National Center for Bioinformation. The improved versions of the draft genomes for \u003cem\u003eMegalobrama amblycephala\u003c/em\u003e (GCA_009869865.1) and \u003cem\u003eCulter alburnus\u003c/em\u003e (GCA_009869775.1, GCA_028476615.1), labeled as cor_ma, cor_ca, and cor_ca_hu respectively, have been uploaded to Figshare with the DOI: 10.6084/m9.figshare.25621809.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n \u003cli\u003eAlonge M, Lebeigle L, Kirsche M, Jenike K, Ou S, Aganezov S, Wang X, Lippman ZB, Schatz MC, Soyk S (2022) Automated assembly scaffolding using RagTag elevates a new tomato system for high-throughput genome editing. Genome Biol 23(1):258. doi:https://doi.org/10.1186/s13059-022-02823-7\u003c/li\u003e\n \u003cli\u003eBlom MPK (2021) Opportunities and challenges for high-quality biodiversity tissue archives in the age of long-read sequencing. Molecular Ecology 30(23):5935-5948. doi:https://doi.org/10.1111/mec.15909\u003c/li\u003e\n \u003cli\u003eBoetzer M, Henkel CV, Jansen HJ, Butler D, Pirovano W (2011) Scaffolding pre-assembled contigs using SSPACE. Bioinformatics 27(4):578-579. doi:https://doi.org/10.1093/bioinformatics/btq683\u003c/li\u003e\n \u003cli\u003eChen S (2023) Ultrafast one‐pass FASTQ data preprocessing, quality control, and deduplication using fastp. iMeta 2(2). doi:https://doi.org/10.1002/imt2.107\u003c/li\u003e\n \u003cli\u003eChen Z, Pham L, Wu T-C, Mo G, Xia Y, Chang PL, Porter D, Phan T, Che H, Tran H, Bansal V, Shaffer J, Belda-Ferre P, Humphrey G, Knight R, Pevzner P, Pham S, Wang Y, Lei M (2020) Ultralow-input single-tube linked-read library method enables short-read second-generation sequencing systems to routinely generate highly accurate and economical long-range sequencing information. Genome Research 30(6):898-909. doi:https://doi.org/10.1101/gr.260380.119\u003c/li\u003e\n \u003cli\u003eConlin LK, Aref-Eshghi E, McEldrew DA, Luo M, Rajagopalan R (2022) Long-read sequencing for molecular diagnostics in constitutional genetic disorders. Human Mutation 43(11):1531-1544. doi:https://doi.org/10.1002/humu.24465\u003c/li\u003e\n \u003cli\u003eDijk ELv, Naquin D, Gorrichon K, Jaszczyszyn Y, Ouazahrou R, Thermes C, Hernandez C (2023) Genomics in the long-read sequencing era. Trends in Genetics 39(9):649-671. doi:https://doi.org/10.1016/j.tig.2023.04.006\u003c/li\u003e\n \u003cli\u003eGehrig JL, Portik DM, Driscoll MD, Jackson E, Chakraborty S, Gratalo D, Ashby M, Valladares R (2022) Finding the right fit: evaluation of short-read and long-read sequencing approaches to maximize the utility of clinical microbiome data. Microbial Genomics 8(3):000794. doi:https://doi.org/10.1099/mgen.0.000794\u003c/li\u003e\n \u003cli\u003eGuo R, Papanicolaou A, Fritz ML (2022) Validation of reference-assisted assembly using existing and novel Heliothine genomes. Genomics:110441. doi:https://doi.org/10.1016/j.ygeno.2022.110441\u003c/li\u003e\n \u003cli\u003eHe W, Yang J, Jing Y, Xu L, Yu K, Fang X (2023) NGenomeSyn: an easy-to-use and flexible tool for publication-ready visualization of syntenic relationships across multiple genomes. Bioinformatics. doi:https://doi.org/10.1093/bioinformatics/btad121\u003c/li\u003e\n \u003cli\u003eHuang W, Li L, Myers JR, Marth GT (2012) ART: a next-generation sequencing read simulator. Bioinformatics 28(4):593-594. doi:https://doi.org/10.1093/bioinformatics/btr708\u003c/li\u003e\n \u003cli\u003eJiang H, Qian Y, Zhang Z, Meng M, Deng Y, Wang G, He S, Yang L (2023) Chromosome-level genome assembly and whole-genome resequencing of topmouth culter (Culter alburnus) provide insights into the intraspecific variation of its semi-buoyant and adhesive eggs. Mol Ecol Resour. doi:https://doi.org/10.1111/1755-0998.13845\u003c/li\u003e\n \u003cli\u003eKidd JM, Cooper GM, Donahue WF, Hayden HS, Sampas N, Graves T, Hansen N, Teague B, Alkan C, Antonacci F, Haugen E, Zerr T, Yamada NA, Tsang P, Newman TL, T\u0026uuml;z\u0026uuml;n E, Cheng Z, Ebling HM, Tusneem N, David R, Gillett W, Phelps KA, Weaver M, Saranga D, Brand A, Tao W, Gustafson E, McKernan K, Chen L, Malig M, Smith JD, Korn JM, McCarroll SA, Altshuler DA, Peiffer DA, Dorschner M, Stamatoyannopoulos J, Schwartz D, Nickerson DA, Mullikin JC, Wilson RK, Bruhn L, Olson MV, Kaul R, Smith DR, Eichler EE (2008) Mapping and sequencing of structural variation from eight human genomes. Nature 453(7191):56-64. doi:https://doi.org/10.1038/nature06862\u003c/li\u003e\n \u003cli\u003eKolmogorov M, Armstrong J, Raney BJ, Streeter I, Dunn M, Yang F, Odom D, Flicek P, Keane TM, Thybert D, Paten B, Pham S (2018) Chromosome assembly of large and complex genomes using multiple references. Genome Res 28(11):1720-1732. doi:https://doi.org/10.1101/gr.236273.118\u003c/li\u003e\n \u003cli\u003eLang D, Zhang S, Ren P, Liang F, Sun Z, Meng G, Tan Y, Li X, Lai Q, Han L, Wang D, Hu F, Wang W, Liu S (2020) Comparison of the two up-to-date sequencing technologies for genome assembly: HiFi reads of Pacific Biosciences Sequel II system and ultralong reads of Oxford Nanopore. GigaScience 9(12):giaa123. doi:https://doi.org/10.1093/gigascience/giaa123\u003c/li\u003e\n \u003cli\u003eLischer HEL, Shimizu KK (2017) Reference-guided de novo assembly approach improves genome reconstruction for related species. BMC Bioinformatics 18(1):474. doi:https://doi.org/10.1186/s12859-017-1911-6\u003c/li\u003e\n \u003cli\u003eLiu H, Chen C, Gao Z, Min J, Gu Y, Jian J, Jiang X, Cai H, Ebersberger I, Xu M, Zhang X, Chen J, Luo W, Chen B, Chen J, Liu H, Li J, Lai R, Bai M, Wei J, Yi S, Wang H, Cao X, Zhou X, Zhao Y, Wei K, Yang R, Liu B, Zhao S, Fang X, Schartl M, Qian X, Wang W (2017) The draft genome of blunt snout bream (Megalobrama amblycephala) reveals the development of intermuscular bone and adaptation to herbivorous diet. Gigascience 6(7):1-13. doi:https://doi.org/10.1093/gigascience/gix039\u003c/li\u003e\n \u003cli\u003eLiu H, Chen C, Lv M, Liu N, Hu Y, Zhang H, Enbody ED, Gao Z, Andersson L, Wang W (2021) A Chromosome-Level Assembly of Blunt Snout Bream (Megalobrama amblycephala) Genome Reveals an Expansion of Olfactory Receptor Genes in Freshwater Fish. Mol Biol Evol 38(10):4238-4251. doi:https://doi.org/10.1093/molbev/msab152\u003c/li\u003e\n \u003cli\u003eLiu K, Xie N, Wang Y, Liu X (2023a) The Utilization of Reference-Guided Assembly and In Silico Libraries Improves the Draft Genome of Clarias batrachus and Culter alburnus. Mar Biotechnol (NY) 25(6):907-917. doi:https://doi.org/10.1007/s10126-023-10248-x\u003c/li\u003e\n \u003cli\u003eLiu S, Zheng J, Li F, Chi M, Cheng S, Jiang W, Liu Y, Gu Z, Zhao J (2023b) Chromosome-scale assembly and quantitative trait locus mapping for major economic traits of the Culter alburnus genome using Illumina and PacBio sequencing with Hi-C mapping information. Frontiers in Genetics 14. doi:https://doi.org/10.3389/fgene.2023.1072506\u003c/li\u003e\n \u003cli\u003eLogsdon GA, Vollger MR, Eichler EE (2020) Long-read human genome sequencing and its applications. Nature Reviews Genetics 21(10):597-614. doi:https://doi.org/10.1038/s41576-020-0236-x\u003c/li\u003e\n \u003cli\u003eLuo J, Wei Y, Lyu M, Wu Z, Liu X, Luo H, Yan C (2021) A comprehensive review of scaffolding methods in genome assembly. Briefings in Bioinformatics 22(5):bbab033. doi:https://doi.org/10.1093/bib/bbab033\u003c/li\u003e\n \u003cli\u003eLuo Y, Liao X, Wu F-X, Wang J (2020) Computational Approaches for Transcriptome Assembly Based on Sequencing Technologies. Current Bioinformatics 15(1):2-16\u003c/li\u003e\n \u003cli\u003eMantere T, Kersten S, Hoischen A (2019) Long-Read Sequencing Emerging in Medical Genetics. Frontiers in Genetics 10. doi:https://doi.org/10.3389/fgene.2019.00426\u003c/li\u003e\n \u003cli\u003eMarcais G, Delcher AL, Phillippy AM, Coston R, Salzberg SL, Zimin A (2018) MUMmer4: A fast and versatile genome alignment system. PLoS Comput Biol 14(1):e1005944. doi:https://doi.org/10.1371/journal.pcbi.1005944\u003c/li\u003e\n \u003cli\u003eMikheenko A, Prjibelski A, Saveliev V, Antipov D, Gurevich A (2018) Versatile genome assembly evaluation with QUAST-LG. Bioinformatics 34(13):i142-i150. doi:https://doi.org/10.1093/bioinformatics/bty266\u003c/li\u003e\n \u003cli\u003eMinkin I, Medvedev P (2020) Scalable multiple whole-genome alignment and locally collinear block construction with SibeliaZ. Nature Communications 11(1):6327. doi:https://doi.org/10.1038/s41467-020-19777-8\u003c/li\u003e\n \u003cli\u003ePatin NV, Goodwin KD (2022) Long-Read Sequencing Improves Recovery of Picoeukaryotic Genomes and Zooplankton Marker Genes from Marine Metagenomes. mSystems 7(6):e00595-00522. doi:https://doi.org/10.1128/msystems.00595-22\u003c/li\u003e\n \u003cli\u003eRayamajhi N, Cheng C-HC, Catchen JM (2022) Evaluating Illumina-, Nanopore-, and PacBio-based genome assembly strategies with the bald notothen, Trematomus borchgrevinki. G3 Genes|Genomes|Genetics 12(11):jkac192. doi:https://doi.org/10.1093/g3journal/jkac192\u003c/li\u003e\n \u003cli\u003eRen L, Li W, Qin Q, Dai H, Han F, Xiao J, Gao X, Cui J, Wu C, Yan X, Wang G, Liu G, Liu J, Li J, Wan Z, Yang C, Zhang C, Tao M, Wang J, Luo K, Wang S, Hu F, Zhao R, Li X, Liu M, Zheng H, Zhou R, Shu Y, Wang Y, Liu Q, Tang C, Duan W, Liu S (2019) The subgenomes show asymmetric expression of alleles in hybrid lineages of Megalobrama amblycephala x Culter alburnus. Genome Res 29(11):1805-1815. doi:https://doi.org/10.1101/gr.249805.119\u003c/li\u003e\n \u003cli\u003eTomas K, Erik B-R, Olga Vinnere P (2018) A comprehensive model of DNA fragmentation for the preservation of High Molecular Weight DNA. bioRxiv:254276. doi:https://doi.org/10.1101/254276\u003c/li\u003e\n \u003cli\u003eWhibley A, Kelley JL, Narum SR (2021) The changing face of genome assemblies: Guidance on achieving high-quality reference genomes. Molecular Ecology Resources 21(3):641-652. doi:https://doi.org/10.1111/1755-0998.13312\u003c/li\u003e\n \u003cli\u003eZhang T, Zhou J, Gao W, Jia Y, Wei Y, Wang G (2022) Complex genome assembly based on long-read sequencing. Brief Bioinform 23(5):bbac305. doi:https://doi.org/10.1093/bib/bbac305\u003c/li\u003e\n \u003cli\u003eZhao S, Yang X, Pang B, Zhang L, Wang Q, He S, Dou H, Zhang H (2022) A chromosome-level genome assembly of the redfin culter (Chanodichthys erythropterus). Sci Data 9(1):535. doi:https://doi.org/10.1038/s41597-022-01648-0\u003c/li\u003e\n\u003c/ol\u003e"},{"header":"Tables","content":"\u003cp\u003eTable 1\u0026nbsp;Multi-reference Guided Scaffold Assembly and Misassembly Correction Metrics for \u003cem\u003eMegalobrama amblycephala\u0026nbsp;\u003c/em\u003eDraft Genome\u003c/p\u003e\n\u003cdiv align=\"center\"\u003e\n \u003ctable border=\"0\" cellpadding=\"0\" width=\"100%\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd width=\"24.742268041237114%\"\u003e\u003cbr\u003e\u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003ema_ragout\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003ema_ragtag\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"13.402061855670103%\"\u003e\n \u003cp\u003ema_without_insilico\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003ema_with_insilico\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003ema_correct\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003ema_denovo\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"24.742268041237114%\"\u003e\n \u003cp\u003eN50\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e41172990\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e52506231\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"13.402061855670103%\"\u003e\n \u003cp\u003e45776042\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e45322139\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e479171\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e990859\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"24.742268041237114%\"\u003e\n \u003cp\u003eauN\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e46548936.2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e63052515.7\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"13.402061855670103%\"\u003e\n \u003cp\u003e49543677\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e48923487\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e1257298.6\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e2140853.2\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"24.742268041237114%\"\u003e\n \u003cp\u003eN\u0026apos;s per 100 kbp\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\" valign=\"top\"\u003e\n \u003cp\u003e9787.59\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\" valign=\"top\"\u003e\n \u003cp\u003e11117.97\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"13.402061855670103%\" valign=\"top\"\u003e\n \u003cp\u003e13554.59\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\" valign=\"top\"\u003e\n \u003cp\u003e13239.3\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\" valign=\"top\"\u003e\n \u003cp\u003e9784.49\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\" valign=\"top\"\u003e\n \u003cp\u003e9787.62\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"24.742268041237114%\"\u003e\n \u003cp\u003emismatches per 100 kbp\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\" valign=\"top\"\u003e\n \u003cp\u003e509.28\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\" valign=\"top\"\u003e\n \u003cp\u003e506.17\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"13.402061855670103%\" valign=\"top\"\u003e\n \u003cp\u003e499.8\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\" valign=\"top\"\u003e\n \u003cp\u003e503.88\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\" valign=\"top\"\u003e\n \u003cp\u003e513.88\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\" valign=\"top\"\u003e\n \u003cp\u003e514.3\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"24.742268041237114%\"\u003e\n \u003cp\u003emisassemblies\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e7635\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e8298\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"13.402061855670103%\"\u003e\n \u003cp\u003e4853\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e4920\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e7134\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e8080\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"24.742268041237114%\"\u003e\n \u003cp\u003econtig misassemblies\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e5519\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e5173\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"13.402061855670103%\"\u003e\n \u003cp\u003e2772\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e2923\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e5927\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e6325\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"24.742268041237114%\"\u003e\n \u003cp\u003e\u0026nbsp; c. relocations\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e2672\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e2646\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"13.402061855670103%\"\u003e\n \u003cp\u003e1860\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e1894\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e2366\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e2566\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"24.742268041237114%\"\u003e\n \u003cp\u003e\u0026nbsp; c. translocations\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e2700\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e2416\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"13.402061855670103%\"\u003e\n \u003cp\u003e864\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e969\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e3118\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e3363\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"24.742268041237114%\"\u003e\n \u003cp\u003e\u0026nbsp; c. inversions\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e147\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e111\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"13.402061855670103%\"\u003e\n \u003cp\u003e48\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e60\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e443\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e396\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"24.742268041237114%\"\u003e\n \u003cp\u003escaffold misassemblies\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e2116\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e3125\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"13.402061855670103%\"\u003e\n \u003cp\u003e2081\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e1997\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e1207\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e1755\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"24.742268041237114%\"\u003e\n \u003cp\u003e\u0026nbsp; s. relocations\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e987\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e1713\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"13.402061855670103%\"\u003e\n \u003cp\u003e1517\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e1461\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e546\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e735\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"24.742268041237114%\"\u003e\n \u003cp\u003e\u0026nbsp; s. translocations\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e1117\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e1403\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"13.402061855670103%\"\u003e\n \u003cp\u003e557\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e529\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e648\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e1000\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"24.742268041237114%\"\u003e\n \u003cp\u003e\u0026nbsp; s. inversions\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e12\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e9\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"13.402061855670103%\"\u003e\n \u003cp\u003e7\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e7\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e13\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e20\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"24.742268041237114%\"\u003e\n \u003cp\u003emisassembled contigs\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e558\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e90\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"13.402061855670103%\"\u003e\n \u003cp\u003e311\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e336\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e3290\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e2215\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"24.742268041237114%\"\u003e\n \u003cp\u003eMisassembled contigs length\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e1140376612\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e1177589568\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"13.402061855670103%\"\u003e\n \u003cp\u003e1185337931\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e1176999057\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e774645442\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e966903610\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"24.742268041237114%\"\u003e\n \u003cp\u003elocal misassemblies\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e36333\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e36424\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"13.402061855670103%\"\u003e\n \u003cp\u003e34678\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e35600\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e35329\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e35774\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"24.742268041237114%\"\u003e\n \u003cp\u003escaffold gap ext. mis.\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e211\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e236\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"13.402061855670103%\"\u003e\n \u003cp\u003e423\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e432\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e163\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e188\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"24.742268041237114%\"\u003e\n \u003cp\u003escaffold gap loc. mis.\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e5539\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e5992\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"13.402061855670103%\"\u003e\n \u003cp\u003e10650\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e9398\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e5007\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e5269\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"24.742268041237114%\"\u003e\n \u003cp\u003epossible TEs\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e504\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e462\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"13.402061855670103%\"\u003e\n \u003cp\u003e394\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e454\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e808\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e884\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"24.742268041237114%\"\u003e\n \u003cp\u003eunaligned mis. contigs\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e363\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e162\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"13.402061855670103%\"\u003e\n \u003cp\u003e459\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e475\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e913\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e590\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"24.742268041237114%\"\u003e\n \u003cp\u003emismatches\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e4732683\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e4684596\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"13.402061855670103%\"\u003e\n \u003cp\u003e4649345\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e4680545\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e4843479\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e4826478\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"24.742268041237114%\"\u003e\n \u003cp\u003eindels\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e1542737\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e1521082\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"13.402061855670103%\"\u003e\n \u003cp\u003e1525034\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e1526830\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e1629164\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e1607328\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"24.742268041237114%\"\u003e\n \u003cp\u003eindels (\u0026lt;= 5 bp)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e1251549\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e1231709\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"13.402061855670103%\"\u003e\n \u003cp\u003e1236890\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e1236344\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e1332635\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e1311908\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"24.742268041237114%\"\u003e\n \u003cp\u003eindels (\u0026gt; 5 bp)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e291188\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e289373\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"13.402061855670103%\"\u003e\n \u003cp\u003e281144\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e290486\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e296529\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e295420\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"24.742268041237114%\"\u003e\n \u003cp\u003eIndels length\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e16047289\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e15990323\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"13.402061855670103%\"\u003e\n \u003cp\u003e15678591\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e16207677\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e15553476\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.371134020618557%\"\u003e\n \u003cp\u003e15926178\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n\u003c/div\u003e\n\u003cp\u003eWe performed multi-reference guided scaffold assembly of the \u003cem\u003eM. amblycephala\u0026nbsp;\u003c/em\u003edraft genome (GCA_009869865.1) using Ragout, incorporating chromosome-level reference genomes of \u003cem\u003eM. amblycephala\u003c/em\u003e (GCA_018812025.1), \u003cem\u003eC. erythropterus\u003c/em\u003e (GCA_024489055.1), and \u003cem\u003eC. alburnus\u003c/em\u003e (GWHBOSX00000000). This version of the assembly was designated as ma_ragout. To compare the effectiveness of misassembly correction, we constructed three types of genomes based on ma_ragout. The first type of corrected assembly, ma_ragtag, involved performing Ragtag assembly on the ma_ragout assembly using \u003cem\u003eM. amblycephala\u003c/em\u003e chromosome-level reference genome (GCA_018812025.1) as the reference genome. The second type, ma_without_insilico, involved misassembly correction without in silico libraries on the ma_ragout assembly, using the same references. The third type, ma_with_insilico, involved misassembly correction with in silico libraries on the ma_ragout assembly, using the same references. ma_correct represents the result of misassembly correction with in silico libraries on the ma_ragout assembly. ma_denovo represents the \u003cem\u003ede novo\u0026nbsp;\u003c/em\u003escaffold assembly using sspace_basic v2.1.1 after misassembly correction with in silico libraries on the ma_ragout assembly. All statistics are based on contigs of size \u0026gt;= 3000 bp, unless otherwise noted.\u003c/p\u003e\n\u003cp\u003eTable 2\u0026nbsp;Multi-reference Guided Scaffold Assembly and Misassembly Correction Metrics for\u003cem\u003e\u0026nbsp;Culter alburnus\u0026nbsp;\u003c/em\u003eDraft Genome\u003c/p\u003e\n\u003cdiv align=\"center\"\u003e\n \u003ctable border=\"0\" cellpadding=\"0\" width=\"89%\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd width=\"25%\"\u003e\u003cbr\u003e\u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\"\u003e\n \u003cp\u003eca_ragout\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\"\u003e\n \u003cp\u003eca_ragtag\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\"\u003e\n \u003cp\u003eca_with_insilico\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\"\u003e\n \u003cp\u003eca_hu_ragout\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\"\u003e\n \u003cp\u003eca_hu_ragtag\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"15.217391304347826%\"\u003e\n \u003cp\u003eca_hu_with_insilico\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"25%\"\u003e\n \u003cp\u003eN50\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\"\u003e\n \u003cp\u003e39946378\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e44813369\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e40471471\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\"\u003e\n \u003cp\u003e38666041\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e44219289\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"15.217391304347826%\" valign=\"top\"\u003e\n \u003cp\u003e41217093\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"25%\"\u003e\n \u003cp\u003eauN\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\"\u003e\n \u003cp\u003e40571763.8\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e47915037.3\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e43560315.9\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\"\u003e\n \u003cp\u003e39949117.8\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e48056358.6\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"15.217391304347826%\" valign=\"top\"\u003e\n \u003cp\u003e43386478.2\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"25%\"\u003e\n \u003cp\u003eN\u0026apos;s per 100 kbp\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\"\u003e\n \u003cp\u003e9721.27\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e11841.42\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e11659.28\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\"\u003e\n \u003cp\u003e6728.37\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e8091.10\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"15.217391304347826%\" valign=\"top\"\u003e\n \u003cp\u003e8307.72\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"25%\"\u003e\n \u003cp\u003emismatches per 100 kbp\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\"\u003e\n \u003cp\u003e859.15\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e855.88\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e857.80\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\"\u003e\n \u003cp\u003e617.27\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e613.39\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"15.217391304347826%\" valign=\"top\"\u003e\n \u003cp\u003e614.36\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"25%\"\u003e\n \u003cp\u003emisassemblies\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\"\u003e\n \u003cp\u003e5689\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e7300\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e4728\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\"\u003e\n \u003cp\u003e6582\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e7126\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"15.217391304347826%\" valign=\"top\"\u003e\n \u003cp\u003e5861\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"25%\"\u003e\n \u003cp\u003econtig misassemblies\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\"\u003e\n \u003cp\u003e3952\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e3864\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e3026\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\"\u003e\n \u003cp\u003e4929\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e4447\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"15.217391304347826%\" valign=\"top\"\u003e\n \u003cp\u003e3772\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"25%\"\u003e\n \u003cp\u003e\u0026nbsp; c. relocations\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\"\u003e\n \u003cp\u003e2091\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e2086\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e1765\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\"\u003e\n \u003cp\u003e2083\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e1998\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"15.217391304347826%\" valign=\"top\"\u003e\n \u003cp\u003e1781\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"25%\"\u003e\n \u003cp\u003e\u0026nbsp; c. translocations\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\"\u003e\n \u003cp\u003e1593\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e1563\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e1114\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\"\u003e\n \u003cp\u003e2779\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e2403\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"15.217391304347826%\" valign=\"top\"\u003e\n \u003cp\u003e1948\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"25%\"\u003e\n \u003cp\u003e\u0026nbsp; c. inversions\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\"\u003e\n \u003cp\u003e268\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e215\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e147\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\"\u003e\n \u003cp\u003e67\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e46\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"15.217391304347826%\" valign=\"top\"\u003e\n \u003cp\u003e43\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"25%\"\u003e\n \u003cp\u003escaffold misassemblies\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\"\u003e\n \u003cp\u003e1737\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e3436\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e1702\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\"\u003e\n \u003cp\u003e1653\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e2679\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"15.217391304347826%\" valign=\"top\"\u003e\n \u003cp\u003e2089\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"25%\"\u003e\n \u003cp\u003e\u0026nbsp; s. relocations\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\"\u003e\n \u003cp\u003e1019\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e2321\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e1246\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\"\u003e\n \u003cp\u003e1396\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e1979\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"15.217391304347826%\" valign=\"top\"\u003e\n \u003cp\u003e1766\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"25%\"\u003e\n \u003cp\u003e\u0026nbsp; s. translocations\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\"\u003e\n \u003cp\u003e708\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e1109\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e436\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\"\u003e\n \u003cp\u003e255\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e696\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"15.217391304347826%\" valign=\"top\"\u003e\n \u003cp\u003e321\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"25%\"\u003e\n \u003cp\u003e\u0026nbsp; s. inversions\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\"\u003e\n \u003cp\u003e10\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e6\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e20\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\"\u003e\n \u003cp\u003e2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e4\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"15.217391304347826%\" valign=\"top\"\u003e\n \u003cp\u003e2\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"25%\"\u003e\n \u003cp\u003emisassembled contigs\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\"\u003e\n \u003cp\u003e517\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e137\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e535\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\"\u003e\n \u003cp\u003e1130\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e359\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"15.217391304347826%\" valign=\"top\"\u003e\n \u003cp\u003e922\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"25%\"\u003e\n \u003cp\u003eMisassembled contigs length\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\"\u003e\n \u003cp\u003e1062185200\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e1112049297\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e1074032609\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\"\u003e\n \u003cp\u003e1082512646\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e1131457991\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"15.217391304347826%\" valign=\"top\"\u003e\n \u003cp\u003e1100912456\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"25%\"\u003e\n \u003cp\u003elocal misassemblies\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\"\u003e\n \u003cp\u003e41802\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e41930\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e41263\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\"\u003e\n \u003cp\u003e41287\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e41534\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"15.217391304347826%\" valign=\"top\"\u003e\n \u003cp\u003e40995\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"25%\"\u003e\n \u003cp\u003escaffold gap ext. mis.\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\"\u003e\n \u003cp\u003e278\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e302\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e363\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\"\u003e\n \u003cp\u003e206\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e253\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"15.217391304347826%\" valign=\"top\"\u003e\n \u003cp\u003e294\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"25%\"\u003e\n \u003cp\u003escaffold gap loc. mis.\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\"\u003e\n \u003cp\u003e13086\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e13680\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e14458\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\"\u003e\n \u003cp\u003e798\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e1255\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"15.217391304347826%\" valign=\"top\"\u003e\n \u003cp\u003e1638\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"25%\"\u003e\n \u003cp\u003epossible TEs\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\"\u003e\n \u003cp\u003e528\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e532\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e454\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\"\u003e\n \u003cp\u003e324\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e348\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"15.217391304347826%\" valign=\"top\"\u003e\n \u003cp\u003e328\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"25%\"\u003e\n \u003cp\u003eunaligned mis. contigs\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\"\u003e\n \u003cp\u003e465\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e187\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e817\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\"\u003e\n \u003cp\u003e680\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e361\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"15.217391304347826%\" valign=\"top\"\u003e\n \u003cp\u003e793\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"25%\"\u003e\n \u003cp\u003emismatches\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\"\u003e\n \u003cp\u003e7479038\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e7435742\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e7468359\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\"\u003e\n \u003cp\u003e5704753\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e5625575\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"15.217391304347826%\" valign=\"top\"\u003e\n \u003cp\u003e5652469\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"25%\"\u003e\n \u003cp\u003eindels\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\"\u003e\n \u003cp\u003e1406914\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e1397794\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e1406569\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\"\u003e\n \u003cp\u003e2842456\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e2792777\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"15.217391304347826%\" valign=\"top\"\u003e\n \u003cp\u003e2811356\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"25%\"\u003e\n \u003cp\u003eindels (\u0026lt;= 5 bp)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\"\u003e\n \u003cp\u003e1013481\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e1005677\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e1013267\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\"\u003e\n \u003cp\u003e2384903\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e2339357\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"15.217391304347826%\" valign=\"top\"\u003e\n \u003cp\u003e2356570\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"25%\"\u003e\n \u003cp\u003eindels (\u0026gt; 5 bp)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\"\u003e\n \u003cp\u003e393433\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e392117\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e393302\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\"\u003e\n \u003cp\u003e457553\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e453420\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"15.217391304347826%\" valign=\"top\"\u003e\n \u003cp\u003e454786\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"25%\"\u003e\n \u003cp\u003eIndels length\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\"\u003e\n \u003cp\u003e15377377\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e15344830\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e15406340\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\"\u003e\n \u003cp\u003e19023843\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.956521739130435%\" valign=\"top\"\u003e\n \u003cp\u003e18875571\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"15.217391304347826%\" valign=\"top\"\u003e\n \u003cp\u003e18933458\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n\u003c/div\u003e\n\u003cp\u003eSimilar to the misassembly correction of the \u003cem\u003eM. amblycephala\u003c/em\u003e scaffold assembly, we performed multi-reference guided scaffold assembly of two \u003cem\u003eC. alburnus\u003c/em\u003e draft genomes (GCA_009869775.1, GCA_028476615.1) using Ragout, incorporating chromosome-level reference genomes of \u003cem\u003eC. alburnus\u003c/em\u003e (GWHBOSX00000000), \u003cem\u003eC. erythropterus\u003c/em\u003e (GCA_024489055.1), and \u003cem\u003eM. amblycephala\u003c/em\u003e (GCA_018812025.1). These two versions of the assembly were designated as ca_ragout and ca_hu_ragout. To simplify the comparison, we assembled only two types of genomes based on ca_ragout or ca_hu_ragout. The first type of assembly, ca_ragtag and ca_hu_ragtag, involved performing Ragtag assembly on the ca_ragout or ca_hu_ragout assembly using the \u003cem\u003eC. alburnus\u003c/em\u003e chromosome-level reference genome (GWHBOSX00000000) as the reference genome. The second type, ca_with_insilico and ca_hu_with_insilico, involved misassembly correction with in silico libraries on the ca_ragout or ca_hu_ragout assembly, using the same references. Finally, we used \u003cem\u003eC. alburnus\u003c/em\u003e (GWHBOSX00000000) as the reference genome and evaluated the quality of ca_ragout, ca_hu_ragout, ca_ragtag, ca_hu_ragtag, ca_with_insilico, and ca_hu_with_insilico using Quast-LG v5.2.0, assessing metrics such as misassemblies, contiguity, and other relevant statistics.\u003c/p\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"Hangzhou Academy of Agricultural Sciences","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Reference-guided assembly, In silico libraries, Misassembly correction, Megalobrama amblycephala, Culter alburnus","lastPublishedDoi":"10.21203/rs.3.rs-4621443/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-4621443/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eRecent advancements in long-read sequencing technologies are renowned for providing extended read lengths and lower error rates, which enhance the assembly of complex genomes. However, high costs and stringent sample quality requirements limit their widespread adoption, especially for degraded DNA samples. In contrast, short-read technologies require shorter DNA fragments but produce reads challenging genome assembly continuity. Reference-guided assembly offers a practical solution by aligning contigs with a reference genome, thereby improving scaffold continuity. However, the reference-guided assembly can introduce more misassemblies. To address this limitation, this study explores using Ragtag's Correct function integrated with in silico libraries to correct misassemblies in reference-guided assemblies. Using three draft genomes from two fish species, we demonstrate that this hybrid strategy significantly improves scaffold assembly accuracy. Specifically, in \u003cem\u003eMegalobrama amblycephala\u003c/em\u003e, misassemblies were reduced from 8298 to 4920, and cross-links between different chromosomes decreased from 192 to zero in the corrected assemblies. In two \u003cem\u003eCulter alburnus\u003c/em\u003e draft genomes, misassemblies were reduced from 5689 and 6582 to 4728 and 5861, respectively, while cross-links between different chromosomes were significantly reduced from 132 and 13 to five and ten in the corrected assemblies. This approach allowed precise correction of scaffold assembly errors, showcasing its potential to enhance the accuracy of genomic assemblies. Our findings underscore the importance of integrating additional genomic data to achieve reliable genome assemblies, especially for species with significant structural variations. This research provides valuable insights into optimizing genome assembly processes, contributing to advancements in genomic studies.\u003c/p\u003e","manuscriptTitle":"Enhancing the Accuracy of Reference-Guided Genomic Assemblies: Implementing Ragtag Correction for Reference-Guided Scaffolds","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-06-25 04:38:44","doi":"10.21203/rs.3.rs-4621443/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"9e340bbe-3288-47c4-a3f2-d78ce54adaa4","owner":[],"postedDate":"June 25th, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2026-02-14T03:23:02+00:00","versionOfRecord":[],"versionCreatedAt":"2024-06-25 04:38:44","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-4621443","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-4621443","identity":"rs-4621443","version":["v1"]},"buildId":"qtupq5eGEP_6zYnWcrvyt","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2024) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00