Full text
2,687 characters
· extracted from
oa-doi-fallback
· click to expand
This is a Preprint and has not been peer reviewed. This is version 1 of this Preprint.
You must log in to post a comment.
There are no comments or no comments have been made public for this article.
This is a Preprint and has not been peer reviewed. This is version 1 of this Preprint.
Add a Comment
You must log in to post a comment.
Comments
There are no comments or no comments have been made public for this article.
Metabarcoding has emerged as a critical tool in ecology and other scientific disciplines, facilitating species identification in diverse samples for biodiversity monitoring, community and microbiome analysis, dietary studies, and understanding species interactions. However, challenges arise from errors and artifacts introduced during laboratory processes such as PCR and sequencing. Manual inspection is impractical due to the vast amount of sequences, necessitating rapid algorithms to clean the data. Thorough bioinformatic data cleanup can reduce such mistakes by removal of low-quality sequences or such classified as non-fitting through alignments. However, in practice some anomalous sequences evade detection, while also normal sequences may be mistakenly removed. Deep neural networks (DNNs) offer a promising solution by recognizing complex DNA sequence patterns. In this study I present a new software MetAnoDe (Metabarcoding Anomaly Detection), featuring development of novel deep-learning LSTM and CNN models for independent application and use as an ensemble model. MetAnoDe employs an alignment-free approach that complements existing tools, enhancing data cleanup efficiency. Here, the three models were trained for bacterial 16S-V4 and plant ITS2 markers which can be readily reused in other studies. Cross-validation and real-world data testing demonstrate high accuracy. Optimal integration into pipelines can also streamline overall runtime, synergizing effectively with current alignment-based methods. It is further adaptable for other markers due to the software's automated model training capability. In conclusion, MetAnoDe enhances metabarcoding by efficiently identifying anomalous sequences. An integration of DNNs with traditional approaches enhances biodiversity estimates by reducing non-target sequence inclusion, ensuring more accurate and comprehensive results.
https://doi.org/10.32942/X2792N
Life Sciences
machine learning, microbiome, metabarcoding, 16S, ITS2, outlier detection, convolutional neural network, long short-term memory, recurrent neural networks
Published: 2025-03-20 07:06
CC-BY Attribution-NonCommercial 4.0 International
Data and Code Availability Statement:
https://github.com/chiras/MetAnoDe
Language:
English
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.