OrganPipe: An automated tool to facilitate the assembly, annotation, and curation of mitochondrial and chloroplast genomes

preprint OA: closed
Full text JSON View at publisher
Full text 91,450 characters · extracted from preprint-html · click to expand
OrganPipe: An automated tool to facilitate the assembly, annotation, and curation of mitochondrial and chloroplast genomes | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article OrganPipe: An automated tool to facilitate the assembly, annotation, and curation of mitochondrial and chloroplast genomes Renato R. Moreira-Oliveira†, Bruno Marques Silva†, Michele Molina, and 4 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-5686696/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Analyzing organellar genomes (plastomes and mitogenomes) is important for understanding evolution, cellular function, and genetic diversity. Their stable mutation rates and inheritance patterns make them valuable for phylogenetic studies and taxonomic delimitation approaches. Assembling organellar genomes is considerably more straightforward than nuclear genomes due to their smaller size and simpler structure. To that end, existing tools like NOVOPlasty, GetOrganelle, and MitoHifi can handle different sequencing data types but have limitations, such as the reliance on single k-mers or reference genomes, which leads to suboptimal results and requires manual adjustments of parameters. OrganPipe, a newly developed pipeline, overcomes these challenges by allowing iterative assembly and annotation using multiple seeds and k-mers combinations. To demonstrate its capabilities, OrganPipe was used to assemble and annotate the first mitogenomes of two invertebrate species and the plastomes of two plant species, successfully circularizing and annotating the genomes with high accuracy, besides identifying key genomic features and correcting errors missed by other tools. Accessible to beginners and non-bioinformaticians, OrganPipe empowers diverse users to perform high-quality organelle genome analyses, supporting comprehensive exploration and efficient curation. This versatile tool advances large-scale genomic studies with its user-friendly and efficient design and is accessible at https://github.com/itvgenomics/OrganPipe. † joint first authors Biological sciences/Computational biology and bioinformatics/Genome informatics/Genome assembly algorithms Biological sciences/Computational biology and bioinformatics/Software Biological sciences/Genetics/Genome/Mitochondrial genome Biological sciences/Plant sciences/Plant cell biology/Chloroplasts de novo assembly NGS organelle pipeline Snakemake Figures Figure 1 Figure 2 Figure 3 Figure 4 INTRODUCTION Organellar genomes (plastomes and mitogenomes) and their components have been widely employed in evolutionary studies, being useful for understanding cellular function. Due to their uniparental inheritance pattern (usually maternal) and more stable mutation rates compared to the nuclear genome, organellar genes serve as valuable markers for phylogenetic studies (Barrett et al., 2013 ). Additionally, obtaining organellar genomes aids in species conservation by providing insights into genetic diversity (Nunes et al., 2018 ), which is essential for conservation efforts and habitat management. Plastomes and animal mitogenomes have well-known structures, with plastomes ranging from 140 to180 kbp (Arias-Agudelo et al., 2019 ; Weng et al., 2017 ) and mitogenomes usually ranging from 15 to 20 kbp (Cameron et al., 2011 ; McCartney et al., 2022 ). Plastomes are typically characterized by a quadripartite structure, including two inverted-repeat regions (IRa and IRb), a large single-copy region (LSC), and a small single-copy region (SSC), presenting 110–130 genes (Ruhlman & Jansen, 2014 ). On the other hand, animal mitogenomes are composed of 37 genes and a few short intergenic regions, besides a control region, being considerably more compact (Zardoya, 2020 ). Due to their smaller size and fewer repeated regions compared to nuclear genomes, assembling plastomes and mitogenomes is quite straightforward, usually demanding modest computational resources. Several assemblers have been developed to tackle organellar genome assembly, each designed to process different types of sequencing reads, focusing on specific organellar genomes and using seed information, which can be a single gene or an entire reference genome. NOVOPlasty (Dierckxsens et al., 2016 ) handles paired-end reads to assemble plastomes and mitogenomes using a k-mer-based seed-and-extension heuristic in an iterative fashion. Similarly, GetOrganelle (Jin et al., 2018 ) operates like NOVOPlasty, with the difference that it can use an assembly graph to extract organellar genomes. MitoHifi (Uliano-Silva et al., 2023 ) handles high-quality long reads (HiFi) using HifiAsm (Cheng et al., 2021 ) to assemble the HiFi reads and extract the mitochondrial contigs or selecting them from already assembled contigs, generating an annotated mitogenome at the end, also providing some informative coverage plots (Table 1 ). Table 1 Summary of assemblers developed to aid in organellar genome assembly. Assembler Mitogenome Plastome Read type Multi-k-mer or multi-reference Annotation NovoPlasty Yes Yes Paired-end No No GetOrganelle Yes Yes Paired-end No No MitoHifi Yes No Long reads No Yes OrganPipe Yes Yes Paire-end and Long reads Yes Yes NOVOPlasty and GetOrganelle allow using a single k-mer size at a time, and neither of them annotate the obtained organellar genomes. In contrast, MitoHifi does not use k-mers, accepts a single reference genome at a time, and annotates the mitogenomes with Mitofinder (Allio et al., 2020 ) and MITOS2 (Bernt et al., 2013 ) without performing annotation curation. Relying the assembly process on a single k-mer size or using a single sequence reference as seed may lead to suboptimal results, requiring manual adjustments to parameters by trial and error instead of providing an option for automatic refinements. Here, we present OrganPipe, a tool designed for iterative assembly and annotation, aiding in the curation of mitochondrial and chloroplast genomes. This pipeline can use multiple k-mer sizes and different reference genes/genomes as seeds, halting the assembly process when the organellar genome is circularized or when all provided input options have been exhausted before the annotation step begins. OrganPipe enables users to explore multiple parameters and samples within a single command line, eliminating the need for intensive interactions with the software to adjust k-mer and reference values, providing a comprehensive final report integrating metrics from the assembly process to the automated annotation. In addition, to demonstrate its capabilities, we used OrganPipe to assemble and annotate the first mitogenome sequences of two invertebrate species [ Eulimnadia colombiensis (Limnadiidae, Diplostraca), and Pyrearinus pumilus (Elateridae, Coleoptera)] and the plastomes of two plant species [ Furtadoa mixta (Araceae, Alismatales), and Melanoxylon brauna (Fabaceae, Fabales)], successfully circularizing and annotating the organellar genomes with high accuracy, besides identifying key genomic features and correcting errors missed by other tools. RESULTS Genome sequencing and assembly Illumina sequencing of the five E. colombiensis specimens yielded 11.6 Gb to 23.1 Gb of data, with an average of 17.4 Gb per sample. For P. pumilus , sequenced data were 8.1 Gb and 12.0 Gb for the ITV19301 and ITV50378 specimens, respectively. The sequencing results for the plant species were 10.6 Gb for M. brauna and 20.1 Gb for F. mixta (Table 2 ). Table 2 Sequencing and assembly results of the invertebrate specimens of P. pumilus and E. colombiensis , as well as the plant species M. brauna and F. mixta . The total reads and bases resulting from sequencing are provided, allowing the calculation of estimated sequencing coverage when using GOAT as a reference. Assembly results and NCBI accession numbers for each specimen are also included. Group Organism (estimated genome size) SampleID Raw reads Raw bases Estimated coverage Organellar genome length (bp) Accession Invertebrates Pyrearinus pumilus (694 Mb) ITV19301 54267364 8164000028 11,7636888 15891 PQ572766 ITV50378 79973086 12075935986 17,40048413 15891 PQ572767 Eulimnadia colombiensis (281Mb) ITV57671 153512362 23180366662 82,49240805 15752 PQ572761 ITV57672 154009760 11627736882 41,37984656 15753 PQ572762 ITV57673 107250032 16194754832 57,63257947 15752 PQ572763 ITV57674 89101746 13454363646 47,88029767 15753 PQ572764 ITV57675 152178616 22978971016 81,77569757 15753 PQ572765 Plants Melanoxylon brauna (685 Mb) ITV00582 98806962 10610228885 15,48938523 157162 PQ535232 Furtadoa mixta (4.89 Gb) ITV08189 134238185 20184737794 4,12775824 167613 PQ539039 OrganPipe generated complete and annotated organellar genomes for all nine specimens. All five E. colombiensis specimens were circularized when rrnL was used as seed with all k-mer values (Fig. 1 A). On the other hand, when using cox1, one specimen (ITV57671) did not circularize with k-mer 39, and only the specimen ITV57675 was successfully circularized with CYTB, with k-mer values of 23 and 33. Both P. pumilus specimens were circularized when using all genes (ATP6, CYTB, NAD1, and NAD4) and all k-mer values, except for specimen ITV19301 when using atp6 with k-mer 33 (Fig. 1 B). For the plant species, OrganPipe circularized the plastome of M. brauna using trnK-matK, rbcL, trnL-trnF, and rps16 with all k-mer values. However, when using the trnL, OrganPipe did not circularize the genome with k-mer 39 (Fig. 1 C). For F. mixta , OrganPipe generated a correct circular genome only when using matK (k-mers 23 and 33), while generating an incorrect circular genome using the other two markers, it generated with k-mer 39 (Fig. 1 D). Genomes annotation OrganPipe's automatic annotation correctly identified the 13 protein-coding genes (PCGs), 22 transfer RNA (tRNA) genes, and two ribosomal RNA (rRNA) genes in all circularized genomes of E. colombiensis (Fig. 2 A, Table S1 ). For P. pumilus , OrganPipe identified that Mitos2 missed the annotation of only one tRNA gene (trnY) (Fig. 2 B, Table S1 ), which we correctly annotated by homology at the curation step when using the P. termitilluminans genome (NC_030059) as reference. In the case of the plant species, OrganPipe identified 22 tRNA, 74 PCGs, and four rRNA genes in all circularized genomes obtained for M. brauna (Fig. 3 A, Table S1 ), while the annotation of the rrn3.5 gene was missing in the annotated plastomes of F. mixta (Fig. 3 B, Table S1 ), that we also correctly annotated at the curation step when annotating with GeSeq Chlorobox (Tillich et al., 2017 ). The annotations for the incorrect circularized genomes generated by NOVOPlasty in the OrganPipe workflow are available in the Supplementary Material (Figure S1 , Table S1 ). DISCUSSION OrganPipe is an efficient pipeline for performing large-scale analyses of mitogenomes and plastomes by simultaneously handling multiple seeds and k-mer values. This adaptive approach enhances the exploration of assembly parameters by operating iteratively and halting the process upon detecting a circularized genome. Its efficiency lies in the capacity to explore various parameter combinations in a single run, providing a streamlined user experience. By allowing users to comprehensively adjust seeds, k-mer values, and other parameters, OrganPipe automatizes the combination of parameters, minimizing the necessity for interactive multiple runs and optimizing the overall analysis workflow. As OrganPipe simplifies the selection of optimal seeds and k-mer values for organellar genome assembly, users can easily specify multiple seeds and k-mer values, saving time and effort by eliminating the need to prepare the environment for each combination. In that sense, our results demonstrate that different k-mer values impact the success of generating the correct circular genomes despite the seeds or references used. For E. colombiensis , for instance, CYTB was not the most effective marker to be used as seed, while rrnL, combined with all k-mer values, consistently yielded correct circular genomes. In the case of F. mixta , intergenic regions, such as atpF-atpH and trnL-trnF, may not be suitable for chloroplast assembly, as they did not produce optimal results. Moreover, sequencing coverage is another factor that can impact the correct circularization of organellar genomes. In our analysis, the chloroplast genome of F. mixta was only achievable when matK was used as seed. It is worth noting that this specimen had the lowest estimated sequencing coverage (~ 4×, Table S2 ), considering the predicted genome size available in the GoaT database (Challis et al., 2023 ), highlighting the importance of adequate sequencing depths to obtain consistently reliable organellar genome assemblies. OrganPipe uses nHMMER (Wheeler & Eddy, 2013 ) to refine the annotation of rRNA and tRNA genes and to reassess intergenic regions within the assembled genomic sequence. This reevaluation enables the identification of any missing PCG, rRNA, and tRNA by referencing the RNAcentral database, consisting of an additional quality control measure that ensures accurate and comprehensive genome annotations. In addition, to facilitate standardization, OrganPipe rotates all circularized genomes to start at the rrnS gene for mitochondrial genomes or at the trnH-psbA intergenic region for chloroplast genomes. Also, our pipeline aids the manual curation process by compiling all data and necessary files into a single file, simplifying downstream analysis. Furthermore, the pipeline also generates informative graphical outputs, including depth and recruitment graphs, which help users quickly interpret the results. This visualization feature augments the flexibility of data interpretation, enabling swift insights into the quality and characteristics of the assembled genomes. By eliminating the need for users to be experts in running individual software tools, OrganPipe is accessible even to beginners or non-bioinformaticians. This user-friendly approach empowers individuals from diverse backgrounds to successfully assemble organelle genomes, allowing them to quickly adjust essential parameters, such as the seeds and k-mer values, in a very flexible fashion. In conclusion, OrganPipe provides users a versatile and efficient tool that accommodates a wide range of genomic scenarios, facilitates comprehensive parameter exploration, and supports efficient manual curation for accurate and tailored organelle genomic analyses. METHODS OrganPipe operates in Unix environments (including Ubuntu, Mac OS, and Windows with WSL) and is designed to process raw sequence data in FASTQ and FASTA formats. It is implemented using Snakemake (Köster & Rahmann, 2012 ), a powerful workflow management system. Snakemake offers several advantages, including reproducibility, scalability across different computing environments (from local machines to high-performance clusters and cloud computing), and ease of debugging through its rule-based structure. All processing steps are encapsulated in Singularity (Kurtzer et al., 2017 ) images. In the following sections, we describe the species sequencing, seed selection and how OrganPipe assembles, annotates, and reports the results using an iterative approach (Fig. 4 ). Species and seed selection To demonstrate all the functionalities of OrganPipe, we obtained new sequencing data for two species of invertebrates and two flowering plants, totaling nine specimens distributed as follows: five of the water flea E. colombiensis (vouchers no. ITV57671-ITV57675); two of the click beetle P. pumilus (ITV19301 and ITV50378); one of the aroid F. mixta (ITV08189); and one of the braúna-preta M. brauna (ITV00582). The DNA extractions for the invertebrate samples were performed with the DNeasy Blood & Tissue kit (Qiagen), following the protocol recommended for insect tissues. In the case of the plant specimens, we used the CTAB protocol described in Vasconcelos et al. ( 2021 ). Then, paired-end libraries were constructed from 10–50 ng of genomic DNA using the Illumina DNA Prep kit (Illumina) with xGen adapters for Illumina (Integrated DNA Technologies). The resulting libraries were diluted in a solution of 0.1% Tris-HCl and Tween and pooled to be sequenced in an Illumina NextSeq 500 with the high-output v2 kit (300 cycles, 2× 150 bp). While testing the inputs for OrganPipe analyses, we used 11 seeds (one rrnL, five COX1, and five CYTB, from different species) for the specimens of E. colombiensis , four (ATP6, CYTB, NAD1, and NAD4 from P. termitilluminans species) for P. pumilus , six (two of matK, and one of rbcL, trnL-trnF, rps16, and trnL) for M. brauna , and three (matK, atpF-atpH, and trnL-trnF) for F. mixta . We used k-mer values of 23, 33, and 39 in all analyses. Detailed information regarding accession numbers of the used seeds is provided in the Supplementary Material (Table S1 ). Assembly The assembly of mitogenomes is achieved by leveraging Illumina short reads and PacBio HiFi long reads. Currently, the chloroplast assembly exclusively relies on short reads. The pipeline accepts both YAML and CSV files to facilitate user configuration, consolidating all essential parameters for execution. OrganPipe can perform quality control for short reads data using AdapterRemoval v2 (Schubert et al., 2016 ). Subsequently, the genomes undergo NOVOPlasty assembly, followed by error validation with Pilon 1.24 (Walker et al., 2014 ) after aligning the assembled reads using BWA (H. Li & Durbin, 2009 ). Also, the pipeline allows users to select multiple k-mers and seeds for the assembly process. Input options include a multi-fasta file or a GenBank file, enabling users to specify the features (PCGs, rRNA, or tRNA) to be used as seeds, which may be direct downloads from the NCBI nucleotide database by entering search terms and specifying the target genes. On the other hand, long-read assembly is conducted using MitoHifi v3.2, for which users can supply a multi-GenBank file or search and download GenBank files from the NCBI nucleotide database by querying taxonomically related organisms. Annotation Mitochondrial genome annotation is performed by OrganPipe using MITOS2 and MiTFi (Jühling et al., 2012 ) in the Fast Mode. Users can opt for the Slow Mode of the pipeline, which uses nHMMER to annotate inter-genic regions and re-annotate rRNA and tRNA sequences. To enable this annotation with nHMMER, we created an HMM database file using the RNAcentral (Petrov et al., 2017 ) fasta sequences clustered with CD-HIT (W. Li & Godzik, 2006 ) at 97% identity threshold. After the annotation, the genome is rotated to begin at the rrnS gene to standardize the mitogenome representation. In the case of plastome annotation, OrganPipe uses CPGAVAS2 (Shi et al., 2019 ) and Chloe ( https://github.com/ian-small/Chloe.jl ), with the rotation performed at the trnH-psbA intergenic region, and the annotation refinement is also performed with nHMMER. Software reports After annotation, OrganPipe generates a diverse set of output files (TSV and XLSX), compiling data from the assembly and annotation steps. The report for the NOVOPlasty assembly includes software parameters, read library characteristics, pairing details, contig assembly specificities, assembly statistics, organelle coverage, and contig circularization status, while MitoHiFi outputs include contig statistics and gene information. The Pilon data provides genome size, read statistics, alignment details, coverage metrics, and corrections made during the process. MITOS2 results include gene order, duplicated/missing genes/rRNA/tRNA, and MiTFi files for rRNA and tRNA, all parsed to organize all annotation information. The report with CPGAVAS2 results includes gene characteristics, codon usage, and annotation issues. OrganPipe also checks start and stop codons for each coding sequence, performing the annotation curation and flagging any problematic genes that might need attention. All nHMMER TSV files are parsed with RNAcentral database metadata for straightforward interpretation. Iteration OrganPipe allows the use of multiple seeds and k-mers for assembling mitochondrial and chloroplast genomes. The pipeline continues iterating until a circularized genome is detected or until all seeds have been used. When multiple k-mers are provided, all specified k-mers are assembled before moving on to the next seed. OrganPipe can analyze multiple seeds, even after the organelle genome has been circularized. If a circular genome is obtained at the end of the process, depth and recruitment graphs are generated, and CIRCOS (Krzywinski et al., 2009) is then used to create a circular genome visualization complete with annotations. All essential files for manual curation are compiled into a ZIP file for easy access. Declarations ACKNOWLEDGEMENTS This research was funded by Vale S.A. [Projects Genômica da Herpetofauna (R100603.GH), Diversidade Biológica de Cavernas (R100603.CD), e Genômica da BSE (R100603.85.0X.CB01)]. AUTHOR CONTRIBUTIONS STATEMENT RRMO idealized and developed the first version of OrganPipe and wrote the manuscript. BMS developed the current version of OrganPipe and ran the bioinformatic analysis. MM and SV conducted wet-lab protocols and sequenced the samples. ML performed the final curation and submission of genomes to NCBI. TFL helped implement some features. SV and GN gave essential suggestions for the development stage. All authors reviewed the manuscript. COMPETING INTERESTS The authors declare no competing interests. References Allio, R., Schomaker-Bastos, A., Romiguier, J., Prosdocimi, F., Nabholz, B., & Delsuc, F. (2020). MitoFinder: Efficient automated large-scale extraction of mitogenomic data in target enrichment phylogenomics. Molecular Ecology Resources , 20 (4), 892–905. https://doi.org/10.1111/1755-0998.13160 Arias-Agudelo, L. M., González, F., Isaza, J. P., Alzate, J. F., & Pabón-Mora, N. (2019). Plastome reduction and gene content in New World Pilostyles (Apodanthaceae) unveils high similarities to African and Australian congeners. Molecular Phylogenetics and Evolution , 135 , 193–202. https://doi.org/10.1016/J.YMPEV.2019.03.014 Barrett, C. F., Davis, J. I., Leebens-Mack, J., Conran, J. G., & Stevenson, D. W. (2013). Plastid genomes and deep relationships among the commelinid monocot angiosperms. Cladistics , 29 (1), 65–87. https://doi.org/10.1111/J.1096-0031.2012.00418.X Bernt, M., Donath, A., Jühling, F., Externbrink, F., Florentz, C., Fritzsch, G., Pütz, J., Middendorf, M., & Stadler, P. F. (2013). MITOS: Improved de novo metazoan mitochondrial genome annotation. Molecular Phylogenetics and Evolution , 69 (2), 313–319. https://doi.org/10.1016/J.YMPEV.2012.08.023 Cameron, S. L., Yoshizawa, K., Mizukoshi, A., Whiting, M. F., & Johnson, K. P. (2011). Mitochondrial genome deletions and minicircles are common in lice (Insecta: Phthiraptera). BMC Genomics , 12 (1), 1–15. https://doi.org/10.1186/1471-2164-12-394/FIGURES/4 Challis, R., Kumar, S., Sotero-Caio, C., Brown, M., & Blaxter, M. (2023). Genomes on a Tree (GoaT): A versatile, scalable search engine for genomic and sequencing project metadata across the eukaryotic tree of life. Wellcome Open Research , 8 . https://doi.org/10.12688/WELLCOMEOPENRES.18658.1 Cheng, H., Concepcion, G. T., Feng, X., Zhang, H., & Li, H. (2021). Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nature Methods 2021 18:2 , 18 (2), 170–175. https://doi.org/10.1038/s41592-020-01056-5 Dierckxsens, N., Mardulyn, P., & Smits, G. (2016). NOVOPlasty: de novo assembly of organelle genomes from whole genome data. Nucleic Acids Research , 45 (4), gkw955. https://doi.org/10.1093/nar/gkw955 Jin, J.-J., Yu, W.-B., Yang, J.-B., Song, Y., Yi, T.-S., & Li, D.-Z. (2018). GetOrganelle: a simple and fast pipeline for de novo assembly of a complete circular chloroplast genome using genome skimming data. BioRxiv , 256479. https://doi.org/10.1101/256479 Jühling, F., Pütz, J., Bernt, M., Donath, A., Middendorf, M., Florentz, C., & Stadler, P. F. (2012). Improved systematic tRNA gene annotation allows new insights into the evolution of mitochondrial tRNA structures and into the mechanisms of mitochondrial genome rearrangements. Nucleic Acids Research , 40 (7), 2833–2845. https://doi.org/10.1093/NAR/GKR1131 Köster, J., & Rahmann, S. (2012). Snakemake—a scalable bioinformatics workflow engine. Bioinformatics , 28 (19), 2520–2522. https://doi.org/10.1093/BIOINFORMATICS/BTS480 Kurtzer, G. M., Sochat, V., & Bauer, M. W. (2017). Singularity: Scientific containers for mobility of compute. PLOS ONE , 12 (5), e0177459. https://doi.org/10.1371/JOURNAL.PONE.0177459 Li, H., & Durbin, R. (2009). Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics , 25 (14), 1754–1760. https://doi.org/10.1093/BIOINFORMATICS/BTP324 Li, W., & Godzik, A. (2006). Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics , 22 (13), 1658–1659. https://doi.org/10.1093/BIOINFORMATICS/BTL158 McCartney, M. A., Auch, B., Kono, T., Mallez, S., Zhang, Y., Obille, A., Becker, A., Abrahante, J. E., Garbe, J., Badalamenti, J. P., Herman, A., Mangelson, H., Liachko, I., Sullivan, S., Sone, E. D., Koren, S., Silverstein, K. A. T., Beckman, K. B., & Gohl, D. M. (2022). The genome of the zebra mussel, Dreissena polymorpha: a resource for comparative genomics, invasion genetics, and biocontrol. G3 Genes|Genomes|Genetics , 12 (2). https://doi.org/10.1093/G3JOURNAL/JKAB423 Nunes, G. L., Oliveira, R. R. M., Guimarães, J. T. F., Giulietti, A. M., Caldeira, C., Vasconcelos, S., Pires, E., Dias, M., Watanabe, M., Pereira, J., Jaffé, R., Bandeira, C. H. M. M., Carvalho-Filho, N., da Silva, E. F., Rodrigues, T. M., dos Santos, F. M. G., Fernandes, T., Castilho, A., Souza-Filho, P. W. M., … Oliveira, G. (2018). Quillworts from the Amazon: A multidisciplinary populational study on Isoetes serracarajensis and Isoetes cangae. PLoS ONE , 13 (8). https://doi.org/10.1371/journal.pone.0201417 Petrov, A. I., Kay, S. J. E., Kalvari, I., Howe, K. L., Gray, K. A., Bruford, E. A., Kersey, P. J., Cochrane, G., Finn, R. D., Bateman, A., Kozomara, A., Griffiths-Jones, S., Frankish, A., Zwieb, C. W., Lau, B. Y., Williams, K. P., Chan, P. P., Lowe, T. M., Cannone, J. J., … Dinger, M. E. (2017). RNAcentral: a comprehensive database of non-coding RNA sequences. Nucleic Acids Research , 45 (D1), D128–D134. https://doi.org/10.1093/NAR/GKW1008 Ruhlman, T. A., & Jansen, R. K. (2014). The plastid genomes of flowering plants. Methods in Molecular Biology , 1132 , 3–38. https://doi.org/10.1007/978-1-62703-995-6_1/COVER Schubert, M., Lindgreen, S., & Orlando, L. (2016). AdapterRemoval v2: rapid adapter trimming, identification, and read merging. BMC Research Notes 2016 9:1 , 9 (1), 1–7. https://doi.org/10.1186/S13104-016-1900-2 Shi, L., Chen, H., Jiang, M., Wang, L., Wu, X., Huang, L., & Liu, C. (2019). CPGAVAS2, an integrated plastome sequence annotator and analyzer. Nucleic Acids Research , 47 (W1), W65–W73. https://doi.org/10.1093/NAR/GKZ345 Tillich, M., Lehwark, P., Pellizzer, T., Ulbricht-Jones, E. S., Fischer, A., Bock, R., & Greiner, S. (2017). GeSeq – versatile and accurate annotation of organelle genomes. Nucleic Acids Research , 45 (W1), W6–W11. https://doi.org/10.1093/NAR/GKX391 Uliano-Silva, M., Ferreira, J. G. R. N., Krasheninnikova, K., Blaxter, M., Mieszkowska, N., Hall, N., Holland, P., Durbin, R., Richards, T., Kersey, P., Hollingsworth, P., Wilson, W., Twyford, A., Gaya, E., Lawniczak, M., Lewis, O., Broad, G., Martin, F., Hart, M., … McCarthy, S. A. (2023). MitoHiFi: a python pipeline for mitochondrial genome assembly from PacBio high fidelity reads. BMC Bioinformatics , 24 (1), 1–13. https://doi.org/10.1186/S12859-023-05385-Y/FIGURES/3 Vasconcelos, S., Nunes, G. L., Dias, M. C., Lorena, J., Oliveira, R. R. M., Lima, T. G. L., Pires, E. S., Valadares, R. B. S., Alves, R., Watanabe, M. T. C., Zappi, D. C., Hiura, A. L., Pastore, M., Vasconcelos, L. V., Mota, N. F. O., Viana, P. L., Gil, A. S. B., Simões, A. O., Imperatriz-Fonseca, V. L., Harley, R. M., Giulietti, A. M., Oliveira, G. (2021). Unraveling the plant diversity of the Amazonian canga through DNA barcoding. Ecology and Evolution , 11 (19), 13348-13362. https://doi.org/10.1002/ece3.8057 Walker, B. J., Abeel, T., Shea, T., Priest, M., Abouelliel, A., Sakthikumar, S., Cuomo, C. A., Zeng, Q., Wortman, J., Young, S. K., & Earl, A. M. (2014). Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement. PLoS ONE , 9 (11), e112963. https://doi.org/10.1371/journal.pone.0112963 Weng, M. L., Ruhlman, T. A., & Jansen, R. K. (2017). Expansion of inverted repeat does not decrease substitution rates in Pelargonium plastid genomes. New Phytologist , 214 (2), 842–851. https://doi.org/10.1111/NPH.14375 Wheeler, T. J., & Eddy, S. R. (2013). nhmmer: DNA homology search with profile HMMs. Bioinformatics , 29 (19), 2487–2489. https://doi.org/10.1093/BIOINFORMATICS/BTT403 Zardoya, R. (2020). Recent advances in understanding mitochondrial genome diversity. F1000Research 2020 9:270 , 9 , 270. https://doi.org/10.12688/f1000research.21490.1 Additional Declarations No competing interests reported. Supplementary Files FigureS1.docx TableS1.xlsx Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-5686696","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":400278825,"identity":"0437ed72-e5df-4772-b16a-5e5e714834fd","order_by":0,"name":"Renato R. Moreira-Oliveira†","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA/UlEQVRIie3PMUvEMBTA8Xc8yC3FrE+Q+BV6BKqD1K/yQkGX21wCDp4c1En8KgeuDimFuyU4O0ldnBxOXDoapC5K6I2C+UNDC/3xXgBSqb/cHoaDAZTEXX534RED0fvLXcwXGd7Nyo2Qo5vb5mNrQYlpNuteHkqtN3JNYJ/LRYQc+MeKnActMNM5v1aqaFEQ+IsqRojmOTU1mBrFGbFDHQjmk5qr2GJ0+Kb7gZz37K7M/XKMUFYMU3AN7FqzQpx0gZRRks2LY+8p3AXbsNhGU4uzjj1zlEy9frL2REnZXL/37lLJu6ZzW8unMfItf3yHEWYxYn43NiWVSqX+T58JG00dUXC7oAAAAABJRU5ErkJggg==","orcid":"","institution":"Instituto Tecnológico Vale","correspondingAuthor":true,"prefix":"","firstName":"Renato","middleName":"R.","lastName":"Moreira-Oliveira†","suffix":""},{"id":400278826,"identity":"6838afcc-7fc0-4783-bab3-7caf462e192b","order_by":1,"name":"Bruno Marques Silva†","email":"","orcid":"","institution":"Instituto Tecnológico Vale","correspondingAuthor":false,"prefix":"","firstName":"Bruno","middleName":"Marques","lastName":"Silva†","suffix":""},{"id":400278829,"identity":"ff04d141-699a-4969-8959-c22f5f1fe540","order_by":2,"name":"Michele Molina","email":"","orcid":"","institution":"Instituto Tecnológico Vale","correspondingAuthor":false,"prefix":"","firstName":"Michele","middleName":"","lastName":"Molina","suffix":""},{"id":400278830,"identity":"d9529a67-98d4-4694-968a-b6d185c8287e","order_by":3,"name":"Marx Oliveira-Lima","email":"","orcid":"","institution":"Instituto Tecnológico Vale","correspondingAuthor":false,"prefix":"","firstName":"Marx","middleName":"","lastName":"Oliveira-Lima","suffix":""},{"id":400278832,"identity":"49026c78-9761-4a87-a56c-43621694e6bd","order_by":4,"name":"Tiago Ferreira Leão","email":"","orcid":"","institution":"Instituto Tecnológico Vale","correspondingAuthor":false,"prefix":"","firstName":"Tiago","middleName":"Ferreira","lastName":"Leão","suffix":""},{"id":400278834,"identity":"ac13d834-5f3a-4be4-a81e-a950f04f9c78","order_by":5,"name":"Santelmo Vasconcelos","email":"","orcid":"","institution":"Instituto Tecnológico Vale","correspondingAuthor":false,"prefix":"","firstName":"Santelmo","middleName":"","lastName":"Vasconcelos","suffix":""},{"id":400278838,"identity":"1d83833e-cc3f-4bdc-a9ad-98ff1a87feba","order_by":6,"name":"Gisele Lopes Nunes","email":"","orcid":"","institution":"Instituto Tecnológico Vale","correspondingAuthor":false,"prefix":"","firstName":"Gisele","middleName":"Lopes","lastName":"Nunes","suffix":""}],"badges":[],"createdAt":"2024-12-20 20:53:08","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-5686696/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-5686696/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":73626121,"identity":"403c34f2-c29f-4942-82aa-7409d4c71e94","added_by":"auto","created_at":"2025-01-13 05:28:13","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":138974,"visible":true,"origin":"","legend":"\u003cp\u003eThe different k-mer and seed combinations (X-axis) used for the assembly of five E. colombiensis specimens (A), two P. pumilus(B) specimens, one M. brauna (C) specimen, and one F. mixta (D) specimen. Green squares represent correctly circularized genomes, while yellow and red squares represent incorrect and no circularization, respectively.\u003c/p\u003e","description":"","filename":"1.png","url":"https://assets-eu.researchsquare.com/files/rs-5686696/v1/7128e9aa4ff9ac9bfc2894ea.png"},{"id":73626799,"identity":"44ce9ba1-c37d-435b-8d33-fd03b46e8d81","added_by":"auto","created_at":"2025-01-13 05:36:13","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":140311,"visible":true,"origin":"","legend":"\u003cp\u003eThe mitochondrial genomes of specimen ITV57671 of Eulimnadia colombiensis (A) and specimen ITV19301 of Pyrearinus pumilus(B). Mitochondrial features are represented in two concentric rings: the inner ring displays GC content percentages, while the outer ring illustrates the gene structure, with genes color-coded to reflect their functional categories.\u003c/p\u003e","description":"","filename":"2.png","url":"https://assets-eu.researchsquare.com/files/rs-5686696/v1/109b28b9ddd89344c69387f6.png"},{"id":73626805,"identity":"eb4cd4dd-7f92-4882-8eef-e0b1e3d60b66","added_by":"auto","created_at":"2025-01-13 05:36:13","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":195105,"visible":true,"origin":"","legend":"\u003cp\u003eThe chloroplast genomes of M. brauna (A) andF. mixta (B). Two concentric rings represent chloroplast features. Starting from the innermost ring, it depicts the GC content and the quadripartite structures, with the large single copy (LSC), small single copy (SSC), and the two inverted repeat regions (IRA and IRB). The outermost ring depicts the gene structure of the plastome, with genes color-coded according to their functional categories.\u003c/p\u003e","description":"","filename":"3.png","url":"https://assets-eu.researchsquare.com/files/rs-5686696/v1/40f34ce4024b63b78ab0908f.png"},{"id":73626129,"identity":"76313da0-4dcc-4319-bc0c-29f3a26cae23","added_by":"auto","created_at":"2025-01-13 05:28:13","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":109268,"visible":true,"origin":"","legend":"\u003cp\u003eOrganPipe's workflow. Gray rectangles indicate input files, diamonds indicate decision points, and white rectangles indicate tool steps.\u003c/p\u003e","description":"","filename":"4.png","url":"https://assets-eu.researchsquare.com/files/rs-5686696/v1/c1767b635933003c474bca97.png"},{"id":100735784,"identity":"22608217-fb02-45a1-a473-a99c281cfdcd","added_by":"auto","created_at":"2026-01-20 22:29:53","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1015990,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-5686696/v1/24f3c3c3-064b-488e-b70a-bd8bd9fdf095.pdf"},{"id":73626125,"identity":"ba7d8cd4-aa5f-450e-8fb1-111e71fb4180","added_by":"auto","created_at":"2025-01-13 05:28:13","extension":"docx","order_by":0,"title":"","display":"","copyAsset":false,"role":"supplement","size":592895,"visible":true,"origin":"","legend":"","description":"","filename":"FigureS1.docx","url":"https://assets-eu.researchsquare.com/files/rs-5686696/v1/8a21a111eee4a26bb3f0bce2.docx"},{"id":73626124,"identity":"44a2761e-4be8-4ce4-91e7-fd6ae603c27e","added_by":"auto","created_at":"2025-01-13 05:28:13","extension":"xlsx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":25510,"visible":true,"origin":"","legend":"","description":"","filename":"TableS1.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-5686696/v1/c753b04a252a315b0ec6e600.xlsx"}],"financialInterests":"No competing interests reported.","formattedTitle":"OrganPipe: An automated tool to facilitate the assembly, annotation, and curation of mitochondrial and chloroplast genomes","fulltext":[{"header":"INTRODUCTION","content":"\u003cp\u003eOrganellar genomes (plastomes and mitogenomes) and their components have been widely employed in evolutionary studies, being useful for understanding cellular function. Due to their uniparental inheritance pattern (usually maternal) and more stable mutation rates compared to the nuclear genome, organellar genes serve as valuable markers for phylogenetic studies (Barrett et al., \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e2013\u003c/span\u003e). Additionally, obtaining organellar genomes aids in species conservation by providing insights into genetic diversity (Nunes et al., \u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e2018\u003c/span\u003e), which is essential for conservation efforts and habitat management.\u003c/p\u003e \u003cp\u003ePlastomes and animal mitogenomes have well-known structures, with plastomes ranging from 140 to180 kbp (Arias-Agudelo et al., \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2019\u003c/span\u003e; Weng et al., \u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e2017\u003c/span\u003e) and mitogenomes usually ranging from 15 to 20 kbp (Cameron et al., \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e2011\u003c/span\u003e; McCartney et al., \u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). Plastomes are typically characterized by a quadripartite structure, including two inverted-repeat regions (IRa and IRb), a large single-copy region (LSC), and a small single-copy region (SSC), presenting 110\u0026ndash;130 genes (Ruhlman \u0026amp; Jansen, \u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e2014\u003c/span\u003e). On the other hand, animal mitogenomes are composed of 37 genes and a few short intergenic regions, besides a control region, being considerably more compact (Zardoya, \u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e2020\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eDue to their smaller size and fewer repeated regions compared to nuclear genomes, assembling plastomes and mitogenomes is quite straightforward, usually demanding modest computational resources. Several assemblers have been developed to tackle organellar genome assembly, each designed to process different types of sequencing reads, focusing on specific organellar genomes and using seed information, which can be a single gene or an entire reference genome. NOVOPlasty (Dierckxsens et al., \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e2016\u003c/span\u003e) handles paired-end reads to assemble plastomes and mitogenomes using a k-mer-based seed-and-extension heuristic in an iterative fashion. Similarly, GetOrganelle (Jin et al., \u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e2018\u003c/span\u003e) operates like NOVOPlasty, with the difference that it can use an assembly graph to extract organellar genomes. MitoHifi (Uliano-Silva et al., \u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e2023\u003c/span\u003e) handles high-quality long reads (HiFi) using HifiAsm (Cheng et al., \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e2021\u003c/span\u003e) to assemble the HiFi reads and extract the mitochondrial contigs or selecting them from already assembled contigs, generating an annotated mitogenome at the end, also providing some informative coverage plots (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e).\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eSummary of assemblers developed to aid in organellar genome assembly.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"6\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAssembler\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eMitogenome\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003ePlastome\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eRead type\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eMulti-k-mer or multi-reference\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eAnnotation\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eNovoPlasty\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003ePaired-end\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eNo\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eNo\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eGetOrganelle\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003ePaired-end\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eNo\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eNo\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMitoHifi\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eNo\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eLong reads\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eNo\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eOrganPipe\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003ePaire-end and Long reads\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eNOVOPlasty and GetOrganelle allow using a single k-mer size at a time, and neither of them annotate the obtained organellar genomes. In contrast, MitoHifi does not use k-mers, accepts a single reference genome at a time, and annotates the mitogenomes with Mitofinder (Allio et al., \u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e2020\u003c/span\u003e) and MITOS2 (Bernt et al., \u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e2013\u003c/span\u003e) without performing annotation curation. Relying the assembly process on a single k-mer size or using a single sequence reference as seed may lead to suboptimal results, requiring manual adjustments to parameters by trial and error instead of providing an option for automatic refinements.\u003c/p\u003e \u003cp\u003eHere, we present OrganPipe, a tool designed for iterative assembly and annotation, aiding in the curation of mitochondrial and chloroplast genomes. This pipeline can use multiple k-mer sizes and different reference genes/genomes as seeds, halting the assembly process when the organellar genome is circularized or when all provided input options have been exhausted before the annotation step begins. OrganPipe enables users to explore multiple parameters and samples within a single command line, eliminating the need for intensive interactions with the software to adjust k-mer and reference values, providing a comprehensive final report integrating metrics from the assembly process to the automated annotation. In addition, to demonstrate its capabilities, we used OrganPipe to assemble and annotate the first mitogenome sequences of two invertebrate species [\u003cem\u003eEulimnadia colombiensis\u003c/em\u003e (Limnadiidae, Diplostraca), and \u003cem\u003ePyrearinus pumilus\u003c/em\u003e (Elateridae, Coleoptera)] and the plastomes of two plant species [\u003cem\u003eFurtadoa mixta\u003c/em\u003e (Araceae, Alismatales), and \u003cem\u003eMelanoxylon brauna\u003c/em\u003e (Fabaceae, Fabales)], successfully circularizing and annotating the organellar genomes with high accuracy, besides identifying key genomic features and correcting errors missed by other tools.\u003c/p\u003e"},{"header":"RESULTS","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003eGenome sequencing and assembly\u003c/h2\u003e \u003cp\u003eIllumina sequencing of the five \u003cem\u003eE. colombiensis\u003c/em\u003e specimens yielded 11.6 Gb to 23.1 Gb of data, with an average of 17.4 Gb per sample. For \u003cem\u003eP. pumilus\u003c/em\u003e, sequenced data were 8.1 Gb and 12.0 Gb for the ITV19301 and ITV50378 specimens, respectively. The sequencing results for the plant species were 10.6 Gb for \u003cem\u003eM. brauna\u003c/em\u003e and 20.1 Gb for \u003cem\u003eF. mixta\u003c/em\u003e (Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e).\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eSequencing and assembly results of the invertebrate specimens of \u003cem\u003eP. pumilus\u003c/em\u003e and \u003cem\u003eE. colombiensis\u003c/em\u003e, as well as the plant species \u003cem\u003eM. brauna\u003c/em\u003e and \u003cem\u003eF. mixta\u003c/em\u003e. The total reads and bases resulting from sequencing are provided, allowing the calculation of estimated sequencing coverage when using GOAT as a reference. Assembly results and NCBI accession numbers for each specimen are also included.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"8\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c8\" colnum=\"8\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eGroup\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eOrganism (estimated genome size)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eSampleID\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eRaw reads\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eRaw bases\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eEstimated coverage\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003eOrganellar genome length (bp)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c8\"\u003e \u003cp\u003eAccession\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"6\" rowspan=\"7\"\u003e \u003cp\u003eInvertebrates\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003ePyrearinus pumilus (694 Mb)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eITV19301\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e54267364\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e8164000028\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e11,7636888\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e15891\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003ePQ572766\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eITV50378\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e79973086\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e12075935986\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e17,40048413\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e15891\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003ePQ572767\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\" morerows=\"4\" rowspan=\"5\"\u003e \u003cp\u003eEulimnadia colombiensis (281Mb)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eITV57671\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e153512362\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e23180366662\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e82,49240805\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e15752\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003ePQ572761\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eITV57672\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e154009760\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e11627736882\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e41,37984656\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e15753\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003ePQ572762\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eITV57673\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e107250032\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e16194754832\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e57,63257947\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e15752\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003ePQ572763\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eITV57674\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e89101746\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e13454363646\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e47,88029767\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e15753\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003ePQ572764\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eITV57675\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e152178616\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e22978971016\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e81,77569757\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e15753\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003ePQ572765\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003ePlants\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eMelanoxylon brauna (685 Mb)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eITV00582\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e98806962\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e10610228885\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e15,48938523\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e157162\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003ePQ535232\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eFurtadoa mixta (4.89 Gb)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eITV08189\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e134238185\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e20184737794\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e4,12775824\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e167613\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003ePQ539039\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eOrganPipe generated complete and annotated organellar genomes for all nine specimens. All five \u003cem\u003eE. colombiensis\u003c/em\u003e specimens were circularized when rrnL was used as seed with all k-mer values (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eA). On the other hand, when using cox1, one specimen (ITV57671) did not circularize with k-mer 39, and only the specimen ITV57675 was successfully circularized with CYTB, with k-mer values of 23 and 33. Both \u003cem\u003eP. pumilus\u003c/em\u003e specimens were circularized when using all genes (ATP6, CYTB, NAD1, and NAD4) and all k-mer values, except for specimen ITV19301 when using atp6 with k-mer 33 (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eB).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eFor the plant species, OrganPipe circularized the plastome of \u003cem\u003eM. brauna\u003c/em\u003e using trnK-matK, rbcL, trnL-trnF, and rps16 with all k-mer values. However, when using the trnL, OrganPipe did not circularize the genome with k-mer 39 (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eC). For \u003cem\u003eF. mixta\u003c/em\u003e, OrganPipe generated a correct circular genome only when using matK (k-mers 23 and 33), while generating an incorrect circular genome using the other two markers, it generated with k-mer 39 (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eD).\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eGenomes annotation\u003c/h3\u003e\n\u003cp\u003eOrganPipe's automatic annotation correctly identified the 13 protein-coding genes (PCGs), 22 transfer RNA (tRNA) genes, and two ribosomal RNA (rRNA) genes in all circularized genomes of \u003cem\u003eE. colombiensis\u003c/em\u003e (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eA, Table \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003eS1\u003c/span\u003e). For \u003cem\u003eP. pumilus\u003c/em\u003e, OrganPipe identified that Mitos2 missed the annotation of only one tRNA gene (trnY) (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eB, Table \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003eS1\u003c/span\u003e), which we correctly annotated by homology at the curation step when using the \u003cem\u003eP. termitilluminans\u003c/em\u003e genome (NC_030059) as reference. In the case of the plant species, OrganPipe identified 22 tRNA, 74 PCGs, and four rRNA genes in all circularized genomes obtained for \u003cem\u003eM. brauna\u003c/em\u003e (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eA, Table \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003eS1\u003c/span\u003e), while the annotation of the rrn3.5 gene was missing in the annotated plastomes of \u003cem\u003eF. mixta\u003c/em\u003e (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eB, Table \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003eS1\u003c/span\u003e), that we also correctly annotated at the curation step when annotating with GeSeq Chlorobox (Tillich et al., \u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e2017\u003c/span\u003e). The annotations for the incorrect circularized genomes generated by NOVOPlasty in the OrganPipe workflow are available in the Supplementary Material (Figure \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003eS1\u003c/span\u003e, Table \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003eS1\u003c/span\u003e).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e"},{"header":"DISCUSSION","content":"\u003cp\u003eOrganPipe is an efficient pipeline for performing large-scale analyses of mitogenomes and plastomes by simultaneously handling multiple seeds and k-mer values. This adaptive approach enhances the exploration of assembly parameters by operating iteratively and halting the process upon detecting a circularized genome. Its efficiency lies in the capacity to explore various parameter combinations in a single run, providing a streamlined user experience. By allowing users to comprehensively adjust seeds, k-mer values, and other parameters, OrganPipe automatizes the combination of parameters, minimizing the necessity for interactive multiple runs and optimizing the overall analysis workflow.\u003c/p\u003e \u003cp\u003eAs OrganPipe simplifies the selection of optimal seeds and k-mer values for organellar genome assembly, users can easily specify multiple seeds and k-mer values, saving time and effort by eliminating the need to prepare the environment for each combination. In that sense, our results demonstrate that different k-mer values impact the success of generating the correct circular genomes despite the seeds or references used. For \u003cem\u003eE. colombiensis\u003c/em\u003e, for instance, CYTB was not the most effective marker to be used as seed, while rrnL, combined with all k-mer values, consistently yielded correct circular genomes. In the case of \u003cem\u003eF. mixta\u003c/em\u003e, intergenic regions, such as atpF-atpH and trnL-trnF, may not be suitable for chloroplast assembly, as they did not produce optimal results.\u003c/p\u003e \u003cp\u003eMoreover, sequencing coverage is another factor that can impact the correct circularization of organellar genomes. In our analysis, the chloroplast genome of \u003cem\u003eF. mixta\u003c/em\u003e was only achievable when matK was used as seed. It is worth noting that this specimen had the lowest estimated sequencing coverage (~\u0026thinsp;4\u0026times;, Table \u003cspan refid=\"MOESM2\" class=\"InternalRef\"\u003eS2\u003c/span\u003e), considering the predicted genome size available in the GoaT database (Challis et al., \u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e2023\u003c/span\u003e), highlighting the importance of adequate sequencing depths to obtain consistently reliable organellar genome assemblies.\u003c/p\u003e \u003cp\u003eOrganPipe uses nHMMER (Wheeler \u0026amp; Eddy, \u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e2013\u003c/span\u003e) to refine the annotation of rRNA and tRNA genes and to reassess intergenic regions within the assembled genomic sequence. This reevaluation enables the identification of any missing PCG, rRNA, and tRNA by referencing the RNAcentral database, consisting of an additional quality control measure that ensures accurate and comprehensive genome annotations. In addition, to facilitate standardization, OrganPipe rotates all circularized genomes to start at the rrnS gene for mitochondrial genomes or at the trnH-psbA intergenic region for chloroplast genomes. Also, our pipeline aids the manual curation process by compiling all data and necessary files into a single file, simplifying downstream analysis.\u003c/p\u003e \u003cp\u003eFurthermore, the pipeline also generates informative graphical outputs, including depth and recruitment graphs, which help users quickly interpret the results. This visualization feature augments the flexibility of data interpretation, enabling swift insights into the quality and characteristics of the assembled genomes. By eliminating the need for users to be experts in running individual software tools, OrganPipe is accessible even to beginners or non-bioinformaticians. This user-friendly approach empowers individuals from diverse backgrounds to successfully assemble organelle genomes, allowing them to quickly adjust essential parameters, such as the seeds and k-mer values, in a very flexible fashion. In conclusion, OrganPipe provides users a versatile and efficient tool that accommodates a wide range of genomic scenarios, facilitates comprehensive parameter exploration, and supports efficient manual curation for accurate and tailored organelle genomic analyses.\u003c/p\u003e"},{"header":"METHODS","content":"\u003cp\u003eOrganPipe operates in Unix environments (including Ubuntu, Mac OS, and Windows with WSL) and is designed to process raw sequence data in FASTQ and FASTA formats. It is implemented using Snakemake (K\u0026ouml;ster \u0026amp; Rahmann, \u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e2012\u003c/span\u003e), a powerful workflow management system. Snakemake offers several advantages, including reproducibility, scalability across different computing environments (from local machines to high-performance clusters and cloud computing), and ease of debugging through its rule-based structure. All processing steps are encapsulated in Singularity (Kurtzer et al., \u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e2017\u003c/span\u003e) images. In the following sections, we describe the species sequencing, seed selection and how OrganPipe assembles, annotates, and reports the results using an iterative approach (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e\n\u003ch3\u003eSpecies and seed selection\u003c/h3\u003e\n\u003cp\u003eTo demonstrate all the functionalities of OrganPipe, we obtained new sequencing data for two species of invertebrates and two flowering plants, totaling nine specimens distributed as follows: five of the water flea \u003cem\u003eE. colombiensis\u003c/em\u003e (vouchers no. ITV57671-ITV57675); two of the click beetle \u003cem\u003eP. pumilus\u003c/em\u003e (ITV19301 and ITV50378); one of the aroid \u003cem\u003eF. mixta\u003c/em\u003e (ITV08189); and one of the bra\u0026uacute;na-preta \u003cem\u003eM. brauna\u003c/em\u003e (ITV00582). The DNA extractions for the invertebrate samples were performed with the DNeasy Blood \u0026amp; Tissue kit (Qiagen), following the protocol recommended for insect tissues. In the case of the plant specimens, we used the CTAB protocol described in Vasconcelos et al. (\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). Then, paired-end libraries were constructed from 10\u0026ndash;50 ng of genomic DNA using the Illumina DNA Prep kit (Illumina) with xGen adapters for Illumina (Integrated DNA Technologies). The resulting libraries were diluted in a solution of 0.1% Tris-HCl and Tween and pooled to be sequenced in an Illumina NextSeq 500 with the high-output v2 kit (300 cycles, 2\u0026times; 150 bp).\u003c/p\u003e \u003cp\u003eWhile testing the inputs for OrganPipe analyses, we used 11 seeds (one rrnL, five COX1, and five CYTB, from different species) for the specimens of \u003cem\u003eE. colombiensis\u003c/em\u003e, four (ATP6, CYTB, NAD1, and NAD4 from \u003cem\u003eP. termitilluminans\u003c/em\u003e species) for \u003cem\u003eP. pumilus\u003c/em\u003e, six (two of matK, and one of rbcL, trnL-trnF, rps16, and trnL) for \u003cem\u003eM. brauna\u003c/em\u003e, and three (matK, atpF-atpH, and trnL-trnF) for \u003cem\u003eF. mixta\u003c/em\u003e. We used k-mer values of 23, 33, and 39 in all analyses. Detailed information regarding accession numbers of the used seeds is provided in the Supplementary Material (Table \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003eS1\u003c/span\u003e).\u003c/p\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003eAssembly\u003c/h2\u003e \u003cp\u003eThe assembly of mitogenomes is achieved by leveraging Illumina short reads and PacBio HiFi long reads. Currently, the chloroplast assembly exclusively relies on short reads. The pipeline accepts both YAML and CSV files to facilitate user configuration, consolidating all essential parameters for execution.\u003c/p\u003e \u003cp\u003eOrganPipe can perform quality control for short reads data using AdapterRemoval v2 (Schubert et al., \u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e2016\u003c/span\u003e). Subsequently, the genomes undergo NOVOPlasty assembly, followed by error validation with Pilon 1.24 (Walker et al., \u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e2014\u003c/span\u003e) after aligning the assembled reads using BWA (H. Li \u0026amp; Durbin, \u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e2009\u003c/span\u003e). Also, the pipeline allows users to select multiple k-mers and seeds for the assembly process. Input options include a multi-fasta file or a GenBank file, enabling users to specify the features (PCGs, rRNA, or tRNA) to be used as seeds, which may be direct downloads from the NCBI nucleotide database by entering search terms and specifying the target genes. On the other hand, long-read assembly is conducted using MitoHifi v3.2, for which users can supply a multi-GenBank file or search and download GenBank files from the NCBI nucleotide database by querying taxonomically related organisms.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eAnnotation\u003c/h3\u003e\n\u003cp\u003eMitochondrial genome annotation is performed by OrganPipe using MITOS2 and MiTFi (J\u0026uuml;hling et al., \u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e2012\u003c/span\u003e) in the Fast Mode. Users can opt for the Slow Mode of the pipeline, which uses nHMMER to annotate inter-genic regions and re-annotate rRNA and tRNA sequences. To enable this annotation with nHMMER, we created an HMM database file using the RNAcentral (Petrov et al., \u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e2017\u003c/span\u003e) fasta sequences clustered with CD-HIT (W. Li \u0026amp; Godzik, \u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e2006\u003c/span\u003e) at 97% identity threshold. After the annotation, the genome is rotated to begin at the rrnS gene to standardize the mitogenome representation. In the case of plastome annotation, OrganPipe uses CPGAVAS2 (Shi et al., \u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e2019\u003c/span\u003e) and Chloe (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/ian-small/Chloe.jl\u003c/span\u003e\u003cspan address=\"https://github.com/ian-small/Chloe.jl\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e), with the rotation performed at the trnH-psbA intergenic region, and the annotation refinement is also performed with nHMMER.\u003c/p\u003e\n\u003ch3\u003eSoftware reports\u003c/h3\u003e\n\u003cp\u003eAfter annotation, OrganPipe generates a diverse set of output files (TSV and XLSX), compiling data from the assembly and annotation steps. The report for the NOVOPlasty assembly includes software parameters, read library characteristics, pairing details, contig assembly specificities, assembly statistics, organelle coverage, and contig circularization status, while MitoHiFi outputs include contig statistics and gene information. The Pilon data provides genome size, read statistics, alignment details, coverage metrics, and corrections made during the process. MITOS2 results include gene order, duplicated/missing genes/rRNA/tRNA, and MiTFi files for rRNA and tRNA, all parsed to organize all annotation information. The report with CPGAVAS2 results includes gene characteristics, codon usage, and annotation issues. OrganPipe also checks start and stop codons for each coding sequence, performing the annotation curation and flagging any problematic genes that might need attention. All nHMMER TSV files are parsed with RNAcentral database metadata for straightforward interpretation.\u003c/p\u003e \u003cdiv id=\"Sec11\" class=\"Section2\"\u003e \u003ch2\u003eIteration\u003c/h2\u003e \u003cp\u003eOrganPipe allows the use of multiple seeds and k-mers for assembling mitochondrial and chloroplast genomes. The pipeline continues iterating until a circularized genome is detected or until all seeds have been used. When multiple k-mers are provided, all specified k-mers are assembled before moving on to the next seed. OrganPipe can analyze multiple seeds, even after the organelle genome has been circularized. If a circular genome is obtained at the end of the process, depth and recruitment graphs are generated, and CIRCOS (Krzywinski et al., 2009) is then used to create a circular genome visualization complete with annotations. All essential files for manual curation are compiled into a ZIP file for easy access.\u003c/p\u003e \u003c/div\u003e"},{"header":"Declarations","content":"\u003cp\u003eACKNOWLEDGEMENTS\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThis research was funded by Vale S.A. [Projects Genômica da Herpetofauna (R100603.GH), Diversidade Biológica de Cavernas (R100603.CD), e Genômica da BSE (R100603.85.0X.CB01)].\u003c/p\u003e\n\u003cp\u003eAUTHOR\u0026nbsp;CONTRIBUTIONS\u0026nbsp;STATEMENT\u003c/p\u003e\n\u003cp\u003eRRMO idealized and developed the first version of OrganPipe and wrote the manuscript. BMS developed the current version of OrganPipe and ran the bioinformatic analysis. MM and SV conducted wet-lab protocols and sequenced the samples. ML performed the final curation and submission of genomes to NCBI. TFL helped implement some features. SV and GN gave essential suggestions for the development stage.\u0026nbsp;All\u0026nbsp;authors\u0026nbsp;reviewed\u0026nbsp;the\u0026nbsp;manuscript.\u003c/p\u003e\n\u003cp\u003eCOMPETING\u0026nbsp;INTERESTS\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThe authors declare no competing interests.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n \u003cli\u003eAllio, R., Schomaker-Bastos, A., Romiguier, J., Prosdocimi, F., Nabholz, B., \u0026amp; Delsuc, F. (2020). MitoFinder: Efficient automated large-scale extraction of mitogenomic data in target enrichment phylogenomics. \u003cem\u003eMolecular Ecology Resources\u003c/em\u003e, \u003cem\u003e20\u003c/em\u003e(4), 892\u0026ndash;905. https://doi.org/10.1111/1755-0998.13160\u003c/li\u003e\n \u003cli\u003eArias-Agudelo, L. M., Gonz\u0026aacute;lez, F., Isaza, J. P., Alzate, J. F., \u0026amp; Pab\u0026oacute;n-Mora, N. (2019). Plastome reduction and gene content in New World Pilostyles (Apodanthaceae) unveils high similarities to African and Australian congeners. \u003cem\u003eMolecular Phylogenetics and Evolution\u003c/em\u003e, \u003cem\u003e135\u003c/em\u003e, 193\u0026ndash;202. https://doi.org/10.1016/J.YMPEV.2019.03.014\u003c/li\u003e\n \u003cli\u003eBarrett, C. F., Davis, J. I., Leebens-Mack, J., Conran, J. G., \u0026amp; Stevenson, D. W. (2013). Plastid genomes and deep relationships among the commelinid monocot angiosperms. \u003cem\u003eCladistics\u003c/em\u003e, \u003cem\u003e29\u003c/em\u003e(1), 65\u0026ndash;87. https://doi.org/10.1111/J.1096-0031.2012.00418.X\u003c/li\u003e\n \u003cli\u003eBernt, M., Donath, A., J\u0026uuml;hling, F., Externbrink, F., Florentz, C., Fritzsch, G., P\u0026uuml;tz, J., Middendorf, M., \u0026amp; Stadler, P. F. (2013). MITOS: Improved de novo metazoan mitochondrial genome annotation. \u003cem\u003eMolecular Phylogenetics and Evolution\u003c/em\u003e, \u003cem\u003e69\u003c/em\u003e(2), 313\u0026ndash;319. https://doi.org/10.1016/J.YMPEV.2012.08.023\u003c/li\u003e\n \u003cli\u003eCameron, S. L., Yoshizawa, K., Mizukoshi, A., Whiting, M. F., \u0026amp; Johnson, K. P. (2011). Mitochondrial genome deletions and minicircles are common in lice (Insecta: Phthiraptera).\u0026nbsp;\u003cem\u003eBMC Genomics\u003c/em\u003e, \u003cem\u003e12\u003c/em\u003e(1), 1\u0026ndash;15. https://doi.org/10.1186/1471-2164-12-394/FIGURES/4\u003c/li\u003e\n \u003cli\u003eChallis, R., Kumar, S., Sotero-Caio, C., Brown, M., \u0026amp; Blaxter, M. (2023). Genomes on a Tree (GoaT): A versatile, scalable search engine for genomic and sequencing project metadata across the eukaryotic tree of life. \u003cem\u003eWellcome Open Research\u003c/em\u003e, \u003cem\u003e8\u003c/em\u003e. https://doi.org/10.12688/WELLCOMEOPENRES.18658.1\u003c/li\u003e\n \u003cli\u003eCheng, H., Concepcion, G. T., Feng, X., Zhang, H., \u0026amp; Li, H. (2021). Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. \u003cem\u003eNature Methods 2021 18:2\u003c/em\u003e, \u003cem\u003e18\u003c/em\u003e(2), 170\u0026ndash;175. https://doi.org/10.1038/s41592-020-01056-5\u003c/li\u003e\n \u003cli\u003eDierckxsens, N., Mardulyn, P., \u0026amp; Smits, G. (2016). NOVOPlasty: \u003cem\u003ede novo\u003c/em\u003e assembly of organelle genomes from whole genome data. \u003cem\u003eNucleic Acids Research\u003c/em\u003e, \u003cem\u003e45\u003c/em\u003e(4), gkw955. https://doi.org/10.1093/nar/gkw955\u003c/li\u003e\n \u003cli\u003eJin, J.-J., Yu, W.-B., Yang, J.-B., Song, Y., Yi, T.-S., \u0026amp; Li, D.-Z. (2018). GetOrganelle: a simple and fast pipeline for de novo assembly of a complete circular chloroplast genome using genome skimming data. \u003cem\u003eBioRxiv\u003c/em\u003e, 256479. https://doi.org/10.1101/256479\u003c/li\u003e\n \u003cli\u003eJ\u0026uuml;hling, F., P\u0026uuml;tz, J., Bernt, M., Donath, A., Middendorf, M., Florentz, C., \u0026amp; Stadler, P. F. (2012). Improved systematic tRNA gene annotation allows new insights into the evolution of mitochondrial tRNA structures and into the mechanisms of mitochondrial genome rearrangements. \u003cem\u003eNucleic Acids Research\u003c/em\u003e, \u003cem\u003e40\u003c/em\u003e(7), 2833\u0026ndash;2845. https://doi.org/10.1093/NAR/GKR1131\u003c/li\u003e\n \u003cli\u003eK\u0026ouml;ster, J., \u0026amp; Rahmann, S. (2012). Snakemake\u0026mdash;a scalable bioinformatics workflow engine. \u003cem\u003eBioinformatics\u003c/em\u003e, \u003cem\u003e28\u003c/em\u003e(19), 2520\u0026ndash;2522. https://doi.org/10.1093/BIOINFORMATICS/BTS480\u003c/li\u003e\n \u003cli\u003eKurtzer, G. M., Sochat, V., \u0026amp; Bauer, M. W. (2017). Singularity: Scientific containers for mobility of compute. \u003cem\u003ePLOS ONE\u003c/em\u003e, \u003cem\u003e12\u003c/em\u003e(5), e0177459. https://doi.org/10.1371/JOURNAL.PONE.0177459\u003c/li\u003e\n \u003cli\u003eLi, H., \u0026amp; Durbin, R. (2009). Fast and accurate short read alignment with Burrows\u0026ndash;Wheeler transform. \u003cem\u003eBioinformatics\u003c/em\u003e, \u003cem\u003e25\u003c/em\u003e(14), 1754\u0026ndash;1760. https://doi.org/10.1093/BIOINFORMATICS/BTP324\u003c/li\u003e\n \u003cli\u003eLi, W., \u0026amp; Godzik, A. (2006). Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. \u003cem\u003eBioinformatics\u003c/em\u003e, \u003cem\u003e22\u003c/em\u003e(13), 1658\u0026ndash;1659. https://doi.org/10.1093/BIOINFORMATICS/BTL158\u003c/li\u003e\n \u003cli\u003eMcCartney, M. A., Auch, B., Kono, T., Mallez, S., Zhang, Y., Obille, A., Becker, A., Abrahante, J. E., Garbe, J., Badalamenti, J. P., Herman, A., Mangelson, H., Liachko, I., Sullivan, S., Sone, E. D., Koren, S., Silverstein, K. A. T., Beckman, K. B., \u0026amp; Gohl, D. M. (2022). The genome of the zebra mussel, Dreissena polymorpha: a resource for comparative genomics, invasion genetics, and biocontrol.\u0026nbsp;\u003cem\u003eG3 Genes|Genomes|Genetics\u003c/em\u003e, \u003cem\u003e12\u003c/em\u003e(2). https://doi.org/10.1093/G3JOURNAL/JKAB423\u003c/li\u003e\n \u003cli\u003eNunes, G. L., Oliveira, R. R. M., Guimar\u0026atilde;es, J. T. F., Giulietti, A. M., Caldeira, C., Vasconcelos, S., Pires, E., Dias, M., Watanabe, M., Pereira, J., Jaff\u0026eacute;, R., Bandeira, C. H. M. M., Carvalho-Filho, N., da Silva, E. F., Rodrigues, T. M., dos Santos, F. M. G., Fernandes, T., Castilho, A., Souza-Filho, P. W. M., \u0026hellip; Oliveira, G. (2018). Quillworts from the Amazon: A multidisciplinary populational study on Isoetes serracarajensis and Isoetes cangae. \u003cem\u003ePLoS ONE\u003c/em\u003e, \u003cem\u003e13\u003c/em\u003e(8). https://doi.org/10.1371/journal.pone.0201417\u003c/li\u003e\n \u003cli\u003ePetrov, A. I., Kay, S. J. E., Kalvari, I., Howe, K. L., Gray, K. A., Bruford, E. A., Kersey, P. J., Cochrane, G., Finn, R. D., Bateman, A., Kozomara, A., Griffiths-Jones, S., Frankish, A., Zwieb, C. W., Lau, B. Y., Williams, K. P., Chan, P. P., Lowe, T. M., Cannone, J. J., \u0026hellip; Dinger, M. E. (2017). RNAcentral: a comprehensive database of non-coding RNA sequences. \u003cem\u003eNucleic Acids Research\u003c/em\u003e, \u003cem\u003e45\u003c/em\u003e(D1), D128\u0026ndash;D134. https://doi.org/10.1093/NAR/GKW1008\u003c/li\u003e\n \u003cli\u003eRuhlman, T. A., \u0026amp; Jansen, R. K. (2014). The plastid genomes of flowering plants. \u003cem\u003eMethods in Molecular Biology\u003c/em\u003e, \u003cem\u003e1132\u003c/em\u003e, 3\u0026ndash;38. https://doi.org/10.1007/978-1-62703-995-6_1/COVER\u003c/li\u003e\n \u003cli\u003eSchubert, M., Lindgreen, S., \u0026amp; Orlando, L. (2016). AdapterRemoval v2: rapid adapter trimming, identification, and read merging. \u003cem\u003eBMC Research Notes 2016 9:1\u003c/em\u003e, \u003cem\u003e9\u003c/em\u003e(1), 1\u0026ndash;7. https://doi.org/10.1186/S13104-016-1900-2\u003c/li\u003e\n \u003cli\u003eShi, L., Chen, H., Jiang, M., Wang, L., Wu, X., Huang, L., \u0026amp; Liu, C. (2019). CPGAVAS2, an integrated plastome sequence annotator and analyzer. \u003cem\u003eNucleic Acids Research\u003c/em\u003e, \u003cem\u003e47\u003c/em\u003e(W1), W65\u0026ndash;W73. https://doi.org/10.1093/NAR/GKZ345\u003c/li\u003e\n \u003cli\u003eTillich, M., Lehwark, P., Pellizzer, T., Ulbricht-Jones, E. S., Fischer, A., Bock, R., \u0026amp; Greiner, S. (2017). GeSeq \u0026ndash; versatile and accurate annotation of organelle genomes. \u003cem\u003eNucleic Acids Research\u003c/em\u003e, \u003cem\u003e45\u003c/em\u003e(W1), W6\u0026ndash;W11. https://doi.org/10.1093/NAR/GKX391\u003c/li\u003e\n \u003cli\u003eUliano-Silva, M., Ferreira, J. G. R. N., Krasheninnikova, K., Blaxter, M., Mieszkowska, N., Hall, N., Holland, P., Durbin, R., Richards, T., Kersey, P., Hollingsworth, P., Wilson, W., Twyford, A., Gaya, E., Lawniczak, M., Lewis, O., Broad, G., Martin, F., Hart, M., \u0026hellip; McCarthy, S. A. (2023). MitoHiFi: a python pipeline for mitochondrial genome assembly from PacBio high fidelity reads.\u0026nbsp;\u003cem\u003eBMC Bioinformatics\u003c/em\u003e, \u003cem\u003e24\u003c/em\u003e(1), 1\u0026ndash;13. https://doi.org/10.1186/S12859-023-05385-Y/FIGURES/3\u003c/li\u003e\n \u003cli\u003eVasconcelos, S., Nunes, G. L., Dias, M. C., Lorena, J., Oliveira, R. R. M., Lima, T. G. L., Pires, E. S., Valadares, R. B. S., Alves, R., Watanabe, M. T. C., Zappi, D. C., Hiura, A. L., Pastore, M., Vasconcelos, L. V., Mota, N. F. O., Viana, P. L., Gil, A. S. B., Sim\u0026otilde;es, A. O., Imperatriz-Fonseca, V. L., Harley, R. M., Giulietti, A. M., Oliveira, G. (2021). Unraveling the plant diversity of the Amazonian canga through DNA barcoding. \u003cem\u003eEcology and Evolution\u003c/em\u003e, \u003cem\u003e11\u003c/em\u003e (19), 13348-13362. https://doi.org/10.1002/ece3.8057\u003c/li\u003e\n \u003cli\u003eWalker, B. J., Abeel, T., Shea, T., Priest, M., Abouelliel, A., Sakthikumar, S., Cuomo, C. A., Zeng, Q., Wortman, J., Young, S. K., \u0026amp; Earl, A. M. (2014). Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement. \u003cem\u003ePLoS ONE\u003c/em\u003e, \u003cem\u003e9\u003c/em\u003e(11), e112963. https://doi.org/10.1371/journal.pone.0112963\u003c/li\u003e\n \u003cli\u003eWeng, M. L., Ruhlman, T. A., \u0026amp; Jansen, R. K. (2017). Expansion of inverted repeat does not decrease substitution rates in Pelargonium plastid genomes. \u003cem\u003eNew Phytologist\u003c/em\u003e, \u003cem\u003e214\u003c/em\u003e(2), 842\u0026ndash;851. https://doi.org/10.1111/NPH.14375\u003c/li\u003e\n \u003cli\u003eWheeler, T. J., \u0026amp; Eddy, S. R. (2013). nhmmer: DNA homology search with profile HMMs. \u003cem\u003eBioinformatics\u003c/em\u003e, \u003cem\u003e29\u003c/em\u003e(19), 2487\u0026ndash;2489. https://doi.org/10.1093/BIOINFORMATICS/BTT403\u003c/li\u003e\n \u003cli\u003eZardoya, R. (2020). Recent advances in understanding mitochondrial genome diversity. \u003cem\u003eF1000Research 2020 9:270\u003c/em\u003e, \u003cem\u003e9\u003c/em\u003e, 270. https://doi.org/10.12688/f1000research.21490.1\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"de novo assembly, NGS, organelle, pipeline, Snakemake","lastPublishedDoi":"10.21203/rs.3.rs-5686696/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-5686696/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eAnalyzing organellar genomes (plastomes and mitogenomes) is important for understanding evolution, cellular function, and genetic diversity. Their stable mutation rates and inheritance patterns make them valuable for phylogenetic studies and taxonomic delimitation approaches. Assembling organellar genomes is considerably more straightforward than nuclear genomes due to their smaller size and simpler structure. To that end, existing tools like NOVOPlasty, GetOrganelle, and MitoHifi can handle different sequencing data types but have limitations, such as the reliance on single k-mers or reference genomes, which leads to suboptimal results and requires manual adjustments of parameters. OrganPipe, a newly developed pipeline, overcomes these challenges by allowing iterative assembly and annotation using multiple seeds and k-mers combinations. To demonstrate its capabilities, OrganPipe was used to assemble and annotate the first mitogenomes of two invertebrate species and the plastomes of two plant species, successfully circularizing and annotating the genomes with high accuracy, besides identifying key genomic features and correcting errors missed by other tools. Accessible to beginners and non-bioinformaticians, OrganPipe empowers diverse users to perform high-quality organelle genome analyses, supporting comprehensive exploration and efficient curation. This versatile tool advances large-scale genomic studies with its user-friendly and efficient design and is accessible at https://github.com/itvgenomics/OrganPipe.\u003c/p\u003e\n\u003cp\u003e\u003cbr\u003e\u003c/p\u003e\n\u003cp\u003e\u003csup\u003e†\u003c/sup\u003ejoint first authors\u003c/p\u003e","manuscriptTitle":"OrganPipe: An automated tool to facilitate the assembly, annotation, and curation of mitochondrial and chloroplast genomes","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-01-13 05:28:08","doi":"10.21203/rs.3.rs-5686696/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"e53c5968-b82b-462d-8873-38f4fd543bf8","owner":[],"postedDate":"January 13th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":42661291,"name":"Biological sciences/Computational biology and bioinformatics/Genome informatics/Genome assembly algorithms"},{"id":42661292,"name":"Biological sciences/Computational biology and bioinformatics/Software"},{"id":42661293,"name":"Biological sciences/Genetics/Genome/Mitochondrial genome"},{"id":42661294,"name":"Biological sciences/Plant sciences/Plant cell biology/Chloroplasts"}],"tags":[],"updatedAt":"2026-01-20T21:38:50+00:00","versionOfRecord":[],"versionCreatedAt":"2025-01-13 05:28:08","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-5686696","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-5686696","identity":"rs-5686696","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00