Comparative metagenomics using pan-metagenomic graphs

preprint OA: closed
📄 Open PDF Full text JSON View at publisher
Full text 2,034 characters · extracted from oa-doi-fallback · click to expand
Abstract Identifying microbial genomic factors underlying human phenotypes is a key goal of microbiome research. Sequence graphs are a highly effective tool for genome comparisons because they enable high-resolution de novo analyses that capture and contextualize complex genomic variation. However, applying sequence graphs to complex microbial communities remains challenging due to the scale and complexity of metagenomic data. Existing multi-sample sequence graphs used in these settings are highly complex, computationally expensive, less accurate than single-sample alternatives, and often involve arbitrary coarse-graining. Here, we present copangraph, a multi-sample sequence-graph-based analysis framework for comprehensive comparisons of genomic variation across metagenomes. Copangraph uses a novel homology-based graph, which provides both non-arbitrary, evolutionary-motivated grouping of sequences into the same node as well as flexibility in the scale of variation represented by the graph. Its construction relies on hybrid coassembly, a new coassembly approach in which single-sample graphs are first constructed separately and are then merged to create a multi-sample graph. We also present an algorithm that uses paired-end reads to improve detection of contiguous genomic regions, increasing accuracy. Our results demonstrate that copangraph captures sequence and variant information more accurately than alternative methods, provides graphs that are more suitable for comparative analysis than de Bruijn graphs, and is computationally tractable. We show that copangraph reflects meaningful metagenomic variation across diverse scenarios. Importantly, it enables significantly better performance than other metagenomic representations when predicting the gut colonization trajectories of Vancomycin-resistant Enterococcus. Our results underscore the value of our multi-sample, graph-based framework for comparative metagenomic analyses. Competing Interest Statement The authors have declared no competing interest.

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: oa-doi-fallback

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00