Anomaly detection in metabarcoding amplicon reads using an LSTM-CNN deep neural network ensemble (MetAnoDe)

preprint OA: closed
Full text JSON View at publisher
Full text 2,687 characters · extracted from oa-doi-fallback · click to expand
This is a Preprint and has not been peer reviewed. This is version 1 of this Preprint. You must log in to post a comment. There are no comments or no comments have been made public for this article. This is a Preprint and has not been peer reviewed. This is version 1 of this Preprint. Add a Comment You must log in to post a comment. Comments There are no comments or no comments have been made public for this article. Metabarcoding has emerged as a critical tool in ecology and other scientific disciplines, facilitating species identification in diverse samples for biodiversity monitoring, community and microbiome analysis, dietary studies, and understanding species interactions. However, challenges arise from errors and artifacts introduced during laboratory processes such as PCR and sequencing. Manual inspection is impractical due to the vast amount of sequences, necessitating rapid algorithms to clean the data. Thorough bioinformatic data cleanup can reduce such mistakes by removal of low-quality sequences or such classified as non-fitting through alignments. However, in practice some anomalous sequences evade detection, while also normal sequences may be mistakenly removed. Deep neural networks (DNNs) offer a promising solution by recognizing complex DNA sequence patterns. In this study I present a new software MetAnoDe (Metabarcoding Anomaly Detection), featuring development of novel deep-learning LSTM and CNN models for independent application and use as an ensemble model. MetAnoDe employs an alignment-free approach that complements existing tools, enhancing data cleanup efficiency. Here, the three models were trained for bacterial 16S-V4 and plant ITS2 markers which can be readily reused in other studies. Cross-validation and real-world data testing demonstrate high accuracy. Optimal integration into pipelines can also streamline overall runtime, synergizing effectively with current alignment-based methods. It is further adaptable for other markers due to the software's automated model training capability. In conclusion, MetAnoDe enhances metabarcoding by efficiently identifying anomalous sequences. An integration of DNNs with traditional approaches enhances biodiversity estimates by reducing non-target sequence inclusion, ensuring more accurate and comprehensive results. https://doi.org/10.32942/X2792N Life Sciences machine learning, microbiome, metabarcoding, 16S, ITS2, outlier detection, convolutional neural network, long short-term memory, recurrent neural networks Published: 2025-03-20 07:06 CC-BY Attribution-NonCommercial 4.0 International Data and Code Availability Statement: https://github.com/chiras/MetAnoDe Language: English

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: oa-doi-fallback

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00