Repeat-rich regions cause false positive detection of NUMTs: a case study in amphibians using an improved cane toad reference genome

preprint OA: closed
📄 Open PDF Full text JSON View at publisher
Full text 2,149 characters · extracted from oa-doi-fallback · click to expand
Abstract Mitochondrial DNA (mtDNA) has been widely used in genetics research for decades. Contamination from nuclear DNA of mitochondrial origin (NUMT) can confound studies of phylogenetic relationships and mtDNA heteroplasmy. Homology searches with mtDNA are widely used to detect NUMTs in the nuclear genome. Nevertheless, false positive detection of NUMTs is common when handling repeat-rich sequences, whilst fragmented genomes might result in missing true NUMTs. In this study, we investigated different NUMT detection methods and how the quality of the genome assembly affects them. We presented an improved nuclear genome assembly (aRhiMar1.3) of the invasive cane toad (Rhinella marina) with additional long-read Nanopore and 10x linked-read sequencing. The final assembly was 3.47 Gb in length with 91.3% of tetrapod universal single-copy orthologs (n=5,310), indicating the gene-containing regions were well assembled. We used three complementary methods (NUMTFinder, dinumt and PALMER) to study the NUMT landscape of the cane toad genome. All three methods yielded consistent results, showing very few NUMTs in the cane toad genome. Furthermore, we expanded NUMT detection analyses to other amphibians and confirmed a weak relationship between genome size and the number of NUMTs present in the nuclear genome. Amphibians are repeat-rich, and we show that the number of NUMTs found in highly repetitive genomes is prone to inflation when using homology-based detection without filters. Together, this study provides an exemplar of how to robustly identify NUMTs in complex genomes when confounding effects on mtDNA analyses are a concern. Significance This study uses an updated cane toad nuclear genome assembly and multiple NUMT detection methods to confirm a lack of NUMTs that might confound the use of mtDNA as a population genetic marker in the cane toad. We provide an exemplar study for NUMT detection accounting for genome assembly quality and composition, and highlight the risks of using BLASTN-based approaches in highly repetitive nuclear genomes. Competing Interest Statement The authors have declared no competing interest.

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: oa-doi-fallback

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2024) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00