Improved Mutation Detection in Duplex Sequencing Data with Sample-Specific Error Profiles

preprint OA: closed CC-BY-NC-4.0
📄 Open PDF Full text JSON View at publisher
Full text 2,236 characters · extracted from oa-html · click to expand
ABSTRACT Duplex sequencing enables highly accurate detection of rare somatic mutations, but existing variant callers often rely on protocol-specific heuristics that limit sensitivity, reproducibility, and cross-study comparability. We present DupCaller, a probabilistic variant caller that builds sample-specific error profiles and applies a strand-aware statistical model for mutation detection. Across 50 synthetic datasets, DupCaller identified 1.25-fold more single-base substitutions (SBSs) and 1.41-fold more indels than a state-of-the-art method, while exhibiting equal or better precision. In three duplex-sequenced cell lines treated with aristolochic acid, it recovered expected mutational signatures while detecting 3.5-fold more SBSs and 2.8-fold more indels. In 93 tissue samples— including neurons, cord blood, sperm, saliva, and blood—DupCaller showed consistent gains, detecting 1.21- to 2.7-fold more mutations. Sensitivity scaled with sample duplication rate, yielding approximately 1.5-fold more mutations under optimal conditions and over 3-fold more in low-duplication samples where other tools falter. These results establish DupCaller as a robust and scalable solution for somatic mutation profiling in duplex sequencing across diverse biological and technical contexts. Competing Interest Statement L.B.A. is a co-founder, CSO, scientific advisory member, and consultant for io9 (now Acurion), has equity and receives income. The terms of this arrangement have been reviewed and approved by the University of California, San Diego in accordance with its conflict of interest policies. L.B.A. is also a compensated member of the scientific advisory board of Inocras. L.B.A.'s spouse is an employee of Hologic, Inc. L.B.A. declares U.S. provisional applications filed with UCSD with serial numbers: 63/269,033; 63/289,601; 63/483,237; 63/412,835; 63/492,348; and 63/366,392 as well as a European patent application with application number EP25305077.7. L.B.A. and S.P.N. also declare provisional patent application PCT/US2023/010679.L.B.A. is also an inventor of a US Patent 10,776,718 for source identification by non-negative matrix factorization. All other authors declare that they have no competing interests.

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: oa-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00
unpaywall
last seen: 2026-05-24T02:00:01.246996+00:00
License: CC-BY-NC-4.0