Enabling Megascale Microbiome Analysis with DartUniFrac

preprint OA: closed CC-BY-4.0
Full text 1,419 characters · extracted from oa-html · click to expand
Abstract We introduce a new algorithm, DartUniFrac, and a near-optimal implementation with GPU acceleration, up to three orders of magnitude faster than the state of the art and scaling to millions of samples (pairwise) and billions of taxa. DartUniFrac connects UniFrac with weighted Jaccard similarity and exploits sketching algorithms for fast computation. We benchmark DartUniFrac against exact UniFrac implementations, demonstrating that DartUniFrac is statistically indistinguishable from them on real-world microbiome and metagenomic datasets. Competing Interest Statement Rob Knight is a scientific advisory board member, and consultant for BiomeSense, Inc., has equity and receives income. He is a scientific advisory board member and has equity in GenCirq. He has equity in and acts as a consultant for Cybele. He is a Vice President and board member of Microbiota Vault, Inc. He is a board member of N=1 IBS advisory board and receives income. He is a Senior Visiting Fellow of HKUST Jockey Club Institute for Advanced Study. The terms of these arrangements have been reviewed and approved by the University of California, San Diego in accordance with its conflict-of-interest policies. D.M. is a consultant for and has equity in BiomeSense, Inc. The terms of these arrangements have been reviewed and approved by the University of California, San Diego, in accordance with its conflict-of-interest policies.

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: oa-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00
unpaywall
last seen: 2026-05-23T02:00:01.238055+00:00
License: CC-BY-4.0