DeepBioSim: Efficient and Versatile Methods for Microbiome Data Simulation with Minimal Statistical Assumptions

preprint OA: closed
📄 Open PDF Full text JSON View at publisher
Full text 1,370 characters · extracted from oa-doi-fallback · 3 sections · click to expand

Abstract

Background The human microbiome profoundly influences health and disease. Robust computational and statistical tools for identifying causal microbe–disease links are therefore critical to uncovering the mechanistic basis of these associations. Yet benchmarking such tools remains difficult: microbiome datasets are sparse, high-dimensional, and contain complex dependencies, and no gold-standard reference set exists. Realistic simulated data with embedded ground truth are essential for fair evaluation of analytical tools. Current simulators often impose strong assumptions, require hard-to-obtain auxiliary information, or fail to scale to large, high-dimensional datasets.

Results

We introduce DeepBioSim, a DEEP-learning framework for BIOlogical SIMulation of microbiome data. DeepBioSim uses variational autoencoders (VAEs) to generate realistic microbiome datasets by sampling directly from the latent distribution of metagenomic or metatranscriptomic count data.

Conclusions

The approach is fast, accurate, and scalable, generating highly realistic synthetic microbiome datasets without extensive hyper-parameter tuning or phylogenetic input. Tests on human RNA-seq data confirm versatility of DeepBioSim, showing it can also reliably simulate single-organism omics profiles. Competing Interest Statement The authors have declared no competing interest.

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: oa-doi-fallback

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00