Integrative Microbiome Profiling of Colorectal Cancer Across South Asian and Western Cohorts Using Interpretable Machine Learning

preprint OA: closed
📄 Open PDF Full text JSON View at publisher
Full text 1,838 characters · extracted from oa-doi-fallback · click to expand
Abstract The incidence of colorectal cancer (CRC) cases has been steadily rising in South Asian countries compared to western countries. Microbiome dysbiosis has been strongly associated with CRC development, with diet and demographic factors playing an important role in shaping gut microbial composition. Since most CRC cases are diagnosed at advanced stages, early detection remains critical for improving patient outcomes. Machine learning (ML) approaches provide a promising strategy to identify predictive microbial biomarkers for early CRC detection. In this study, we have analyzed publicly available 16S rRNA datasets comprising CRC patients and controls from South Asian cohorts (India and Sri Lanka) and compared them with Western cohorts (USA). Our findings revealed a higher relative abundance of Prevotella in South Asian cohorts, whereas Bacteroides predominated in the USA cohort. Across all three datasets, Fusobacterium, Escherichia–Shigella, and Akkermansia were consistently elevated in CRC cases. We evaluated multiple ML algorithms, including Decision Tree, Random Forest, AdaBoost, LogitBoost, XGBoost, Support Vector Machine (SVM), and k-Nearest Neighbors (KNN) and proposed stacked ensemble model to differentiate between CRC and controls. Stacked ensemble model has higher accuracy compared to base models. To improve models interpretability and transparency SHAP (SHapley Additive exPlanations) analysis were performed to identify key taxa influencing model predictions. These results underscore the utility of ML-based microbiome analysis for identifying robust CRC-associated microbial signatures, with potential application in developing generalizable early detection tools. Competing Interest Statement The authors have declared no competing interest. Footnotes Included grant number in the acknowledgements

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: oa-doi-fallback

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00