A Novel Approach for Accurate Sequence Assembly Using de Bruijn graphs

preprint OA: closed
📄 Open PDF Full text JSON View at publisher
Full text 1,481 characters · extracted from oa-doi-fallback · click to expand
Abstract Sequence assembly methods are valuable for reconstructing genomes from shorter read fragments. Modern nucleic acid sequencing instruments produce quality scores associated with each reported base; however, these quality scores are not generally used as a core part of sequence assembly or alignment algorithms. Here, we leverage weighted de Bruijn graphs as graphical probability models representing the relative abundances and qualities of kmers within FASTQ-encoded observations. We then utilize these weighted de Bruijn graphs to identify alternate, higher-likelihood candidate sequences compared to the original observations, which are known to contain errors. By improving the original observations with these resampled paths, iteratively across increasing k-lengths, we can use this expectation-maximization approach to “polish” read sets from any sequencing technology according to the mutual information shared in the reads. We use this polishing approach to probabilistically correct simulated short- and long-read datasets of lower coverages and higher error rates than some algorithms can produce satisfactory assemblies for. We find that this approach corrects sequencing errors at rates that are able to produce error-free and nearly-error-free de Bruijn assembly graphs for simulated read-set challenges. Competing Interest Statement The authors have declared no competing interest. Footnotes Contributing authors: athammack{at}lbl.gov; euan{at}stanford.edu;

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: oa-doi-fallback

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2024) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00