A Realistic Simulation Framework for Evaluating Microbiome Normalization in Sample Stratification and Differential Abundance

doi:10.64898/2026.01.13.699216

A Realistic Simulation Framework for Evaluating Microbiome Normalization in Sample Stratification and Differential Abundance

2026 · doi:10.64898/2026.01.13.699216

preprint OA: closed

Full text JSON View at publisher

Full text 2,235 characters · extracted from oa-doi-fallback · 3 sections · click to expand

Abstract

Background Normalization is a critical yet often poorly understood step in microbiome studies. Suboptimal approaches may lead to inaccurate conclusions in downstream analyses of microbial communities. Currently, there is no benchmarking framework to evaluate how normalisation affects both sample stratification and differential abundance simultaneously across taxonomic levels. In this paper, we propose a simulation pipeline based on real data and multivariate exploratory data analysis to provide a structured and reproducible assessment of normalization methods.

Results

Normalization methods exhibited distinct accuracy across taxonomic levels and sequencing depths. In our case study, at the phylum level, edgeR-TMM and Rarefaction improved accuracy by reducing coverage-related variation while preserving biological structure. In contrast, at the genus level, the overall improvement by normalization was less pronounced, reflecting the weaker influence of sequencing depth variability in this scenario, and EdgeR-TMM again provided the most accurate estimation of biological effect. Multivariate visualizations supported these observations, highlighting both sample-level and taxon-level differences among methods. Yet, ordination-based summaries are not sufficient for differential abundance inference and can be misleading, motivating the use of a simulation environment with known ground truth.

Conclusions

Normalization performance varied with sequencing depth, sparsity, taxonomic resolution, and dataset size. Thus, there is no single normalization method that is expected to be optimal across all conditions. Our proposed simulation and analysis framework offers a reproducible and interpretable platform to evaluate existing and new normalization approaches in microbiome research for specific case studies. Competing Interest Statement The authors have declared no competing interest. Footnotes This version includes a minor correction in the Introduction. In Table 1, McKnight et al. 2019 [34] was previously described as a differential abundance study; this has been corrected to reflect that the study focused on beta-diversity (clustering). No other content, analyses, results, or conclusions were changed

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: oa-doi-fallback ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00