SEGUID v2: Extending SEGUID checksums for circular, linear, single- and double-stranded biological sequences

preprint OA: closed
📄 Open PDF Full text JSON View at publisher
Full text 2,275 characters · extracted from oa-doi-fallback · 3 sections · click to expand

Abstract

Background Synthetic biology involves combining different DNA fragments, each containing functional biological parts, to address specific problems. Fundamental gene-function research often requires cloning and propagating DNA fragments, such as those from the iGEM Parts Registry or Addgene, typically distributed as circular plasmids. Addgene’s repository alone offers around 150,000 plasmids. To ensure data integrity, cryptographic checksums can be calculated for the sequences. Each sequence has a unique checksum, making checksums useful for validation and quick lookups of associated annotations. For example, the SEGUID checksum uniquely identifies protein sequences with a 27-character string.

Objectives

The original SEGUID, while effective for protein sequences and single-stranded DNA (ssDNA), is not suitable for circular DNA since there is no natural starting position nor for double-stranded DNA (dsDNA) since two separate sequences are present. Challenges include how to uniquely represent linear dsDNA, circular ssDNA, and circular dsDNA. To meet these needs, we propose SEGUID v2, which extends the original SEGUID to handle additional types of sequences.

Conclusions

SEGUID v2 produces orientation and rotation in-variant checksums for single-stranded, double-stranded, possibly staggered, linear, and circular DNA and RNA sequences. Customizable alphabets allow for other types of sequences. In contrast to the original SEGUID, which uses Base64, SEGUID v2 uses Base64url to encode the SHA-1 hash. This ensures SEGUID v2 checksums can be used as-is in filenames, regardless of platform, and in URLs, with minimal friction. Availability SEGUID v2 is readily available for major program-ming languages, distributed under the MIT license. JavaScript package seguid is available on npm, Python package seguid on PyPi, R package seguid on CRAN, and a Tcl script on GitHub. These tools, along with documentation, examples, and an online SEGUID Calculator, can be found at https://www.seguid.org. Competing Interest Statement The authors have declared no competing interest. Footnotes * Spell and grammar corrections * Minor rephrasing of sentences * Accession IDs corrections * Added minor discussions on SEGUID stability, sequence updates, ApE adoption

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: oa-doi-fallback

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2024) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00