CAFT: A Compositional Log-Linear Model for Microbiome Data with Zero Cells

preprint OA: closed
📄 Open PDF Full text JSON View at publisher
Full text 2,100 characters · extracted from oa-doi-fallback · 4 sections · click to expand

Abstract

Background Differential abundance analysis is fundamental to microbiome research and provides valuable insights into host-microbe interactions. However, microbiome data are compositional, highly sparse (with many zero counts), and influenced by differential experimental biases across taxa. Standard statistical methods often overlook these features. Many approaches analyze relative abundances without accounting for compositionality or rely on pseudocounts, potentially leading to spurious associations and inadequate false discovery rate (FDR) control.

Methods

We introduce a novel framework for differential abundance analysis of microbiome data: the Compositional Accelerated Failure Time (CAFT) model. CAFT addresses zero read counts by treating them as censored observations that are below a detection limit. This approach is inherently resistant to multiplicative technical bias, eliminates the need for pseudocounts, and addresses compositional bias through the establishment of appropriate score test procedures.

Results

Extensive simulations show that CAFT outperforms competing compositional differential abundance methods, including LOCOM, LinDA, ANCOM-BC2, its robust variant, and LDM-clr by offering more robust type I error and FDR control with or without technical bias. Additionally, we applied CAFT to microbiome data on inflammatory bowel disease (IBD) and the upper respiratory tract (URT) to identify differentially abundant gut microbial taxa between IBD patients and healthy controls, as well as URT taxa distinguishing smokers from non-smokers.

Conclusion

We present CAFT, a powerful, robust, and efficient approach for compositional differential abundance analysis. CAFT effectively controls Type I error and maintains FDR control, while demonstrating enhanced power in statistical testing. These capabilities render CAFT a useful tool for compositional microbiome data analysis. Availability and implementation The R package and Vignette are available at https://github.com/mli171/CAFT. Competing Interest Statement The authors have declared no competing interest.

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: oa-doi-fallback

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00