Full text
2,637 characters
· extracted from
oa-doi-fallback
· click to expand
This is a Preprint and has not been peer reviewed. This is version 1 of this Preprint.
You must log in to post a comment.
There are no comments or no comments have been made public for this article.
This is a Preprint and has not been peer reviewed. This is version 1 of this Preprint.
Add a Comment
You must log in to post a comment.
Comments
There are no comments or no comments have been made public for this article.
Pynnotate is a Python-based tool designed for automated retrieval, parsing, and extraction of annotated gene sequences from GenBank records. The tool addresses the common challenges researchers face when working with GenBank data, including inconsistent gene nomenclature, redundant sequences, and the need for standardised gene extraction across multiple taxa. Pynnotate operates through both a graphical user interface and a command-line interface, making it accessible to users with varying levels of bioinformatics experience. The tool supports flexible sequence retrieval through manually defined accession numbers or NCBI query terms, and offers three distinct filtering modes: unconstrained (all sequences), strict (one sequence per species prioritising gene completeness), and flexible (multiple sequences per species when contributing different genes). Key features include synonym resolution for gene names, customizable sequence headers, metadata tracking, and automated gene extraction into separate files. Built-in dictionaries support animal and plant mitochondrial DNA, chloroplast DNA, and ribosomal DNA, and allow users to provide custom synonym dictionaries. The tool generates structured output including FASTA files, metadata matrices, and detailed logs, facilitating integration with downstream analyses. Designed for speed and scalability, pynnotate efficiently handles large datasets, allowing quick retrieval and extraction of annotated sequences across multiple taxa. Finally, pynnotate serves as a valuable resource for both research applications and educational settings, particularly benefiting educators conducting bioinformatics analyses with students with limited command-line experience.
https://doi.org/10.32942/X2294V
Bioinformatics, Ecology and Evolutionary Biology, Evolution
bioinformatics, comparative genomics, feature extraction, molecular evolution, phylogenetics, Python, sequence annotation
Published: 2026-02-26 10:24
Last Updated: 2026-02-26 10:24
CC BY Attribution 4.0 International
Conflict of interest statement:
None.
Data and Code Availability Statement:
The ‘pynnotate’ public repository is available at https://github.com/fernandacaron/pynnotate.
Language:
English
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.