Disinfecting eukaryotic reference genomes to improve taxonomic inference from ancient environmental metagenomic data

preprint OA: closed
📄 Open PDF Full text JSON View at publisher
Full text 1,412 characters · extracted from oa-doi-fallback · click to expand
Abstract Ancient environmental DNA is increasingly essential for reconstructing past ecosystems, particularly when palaeontological and archaeological tissue remains are absent. Detecting ancient plant and animal DNA in environmental samples often relies on using extensive eukaryotic reference genome databases for profiling shotgun metagenomics data. However, microbial contamination in these references can introduce substantial biases in taxonomic assignments, especially given the typical low abundance of plant and animal DNA in such samples. In this study, we present a method for identifying bacterial and archaeal-like sequences in eukaryotic genomes and apply it to nearly 3,000 reference genomes from NCBI RefSeq and GenBank (vertebrates, invertebrates, plants) as well as the 1,323 PhyloNorway plant genome assemblies from herbarium material from northern high-latitude regions. Our analysis reveals microbial-like sequences in many eukaryotic reference genomes, which are most pronounced in the PhyloNorway dataset. We provide a detailed map of the microbial-like regions, including genomic coordinates and taxonomic annotations. This resource enables the masking of microbial-like regions during profiling analyses, thereby improving the reliability of ancient environmental metagenomic datasets for downstream analyses. Competing Interest Statement The authors have declared no competing interest.

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: oa-doi-fallback

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00