A Bioinformatic Pipeline for Consensus Taxonomic Classification of Long-Read Amplicons

preprint OA: closed
Full text JSON View at publisher
Full text 2,819 characters · extracted from oa-doi-fallback · click to expand
ABSTRACT Characterizing community composition is fundamental to understanding microbial community function. Recent advances in Oxford Nanopore Technology (ONT) long-read sequencing now allow community profiling using full-length gene amplicons, affording better taxonomic resolution than standard short-amplicon Illumina sequencing. However, robust ONT-compatible profiling workflows are lacking. To address this, we have created the Amplicon Consensus Taxonomy (ACT) pipeline for classifying long-read amplicons. ACT combines output from three existing pipelines – Emu, Sintax, and LACA – to leverage the strengths of each while offsetting their individual limitations. We also developed the ACT database (ACT-DB), a sequence-similarity-aware reference database that clusters highly similar sequences into multi-taxa groups to reduce overclassification. We benchmarked ACT performance against Emu and Sintax using a defined simple mock community, simulated datasets, and a complex rhizosphere community supplemented with novel species. While ACT exhibited generally comparable or superior performance across datasets, ACT demonstrated a marked advantage over Emu and Sintax in identifying novel and low-abundance taxa in both simple and complex communities, resulting in significantly higher species-richness estimates that better reflected those observed in prior Illumina amplicon studies. Furthermore, by clustering ambiguous reference sequences, ACT-DB allowed ACT to resolve reads to meaningful multi-species groups, improving resolution without coercing artificial precision. Together, ACT and ACT-DB form a robust long-read amplicon profiling workflow that confidently identifies known species while reducing overclassification and preserving low-abundance and unknown taxa. IMPORTANCE Microbial communities are frequently characterized by amplicon sequencing of marker genes, such as the bacterial 16S rRNA gene and fungal ITS region. Historically, the standard profiling method has been Illumina sequencing of 200-300 bp amplicons, but improved accuracy of ONT long-read sequencing means it is now possible to sequence amplicons spanning full genes of any size, prompting the need for tools optimized for long amplicons. Here, we describe the ACT bioinformatic pipeline for assigning taxonomy to amplicons of any length. We evaluated ACT performance using full-length 16S amplicon data relative to that of two commonly used pipelines. Additionally, we developed a sequence ambiguity-aware ACT database (ACT-DB) of 16S rRNA sequences to further improve classification accuracy and resolution. Competing Interest Statement The authors have declared no competing interest. Footnotes This version has been revised to update the funding acknowledgments. https://github.com/Halverson-lab/Amplicon_Consensus_Taxonomy

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: oa-doi-fallback

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00