Abstract
Background Synthetic biology involves combining different DNA fragments, each containing functional biological parts, to address specific problems. Fundamental gene-function research often requires cloning and propagating DNA fragments, such as those from the iGEM Parts Registry or Addgene, typically distributed as circular plasmids. Addgene’s repository alone offers around 150,000 plasmids. To ensure data integrity, cryptographic checksums can be calculated for the sequences. Each sequence has a unique checksum, making checksums useful for validation and quick lookups of associated annotations. For example, the SEGUID checksum uniquely identifies protein sequences with a 27-character string.
Objectives
The original SEGUID, while effective for protein sequences and single-stranded DNA (ssDNA), is not suitable for circular DNA since there is no natural starting position nor for double-stranded DNA (dsDNA) since two separate sequences are present. Challenges include how to uniquely represent linear dsDNA, circular ssDNA, and circular dsDNA. To meet these needs, we propose SEGUID v2, which extends the original SEGUID to handle additional types of sequences.
Conclusions
SEGUID v2 produces orientation and rotation in-variant checksums for single-stranded, double-stranded, possibly staggered, linear, and circular DNA and RNA sequences. Customizable alphabets allow for other types of sequences. In contrast to the original SEGUID, which uses Base64, SEGUID v2 uses Base64url to encode the SHA-1 hash. This ensures SEGUID v2 checksums can be used as-is in filenames, regardless of platform, and in URLs, with minimal friction.
Availability SEGUID v2 is readily available for major program-ming languages, distributed under the MIT license. JavaScript package seguid is available on npm, Python package seguid on PyPi, R package seguid on CRAN, and a Tcl script on GitHub. These tools, along with documentation, examples, and an online SEGUID Calculator, can be found at https://www.seguid.org.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
* Spell and grammar corrections * Minor rephrasing of sentences * Accession IDs corrections * Added minor discussions on SEGUID stability, sequence updates, ApE adoption
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.