Assessing the adoption of Causal Language and Methodology in Human Microbiome Studies: A Methodological Review

doi:10.21203/rs.3.rs-8681336/v1

Assessing the adoption of Causal Language and Methodology in Human Microbiome Studies: A Methodological Review

2026 · doi:10.21203/rs.3.rs-8681336/v1

preprint OA: closed

Full text JSON View at publisher

Full text 157,959 characters · extracted from preprint-html · click to expand

Assessing the adoption of Causal Language and Methodology in Human Microbiome Studies: A Methodological Review | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Assessing the adoption of Causal Language and Methodology in Human Microbiome Studies: A Methodological Review Albina Tskhay, Alibek Moldakozhayev, Cristina Longo, Roxana Behruzi, and 4 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-8681336/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Background Microbiome research seeks to clarify how microbial communities influence human health. Although etiological research paradigms are evolving in the bio-medical sciences, many microbiome studies continue to rely on association-based methods that detect statistical patterns but are limited identifying causal mechanisms needed to inform clinical or public health interventions. This methodological review evaluates the extent to which modern causal inference approaches have been adopted in human microbiome studies and identifies persistent challenges to their broader implementation. Methods We systematically reviewed human microbiome studies published between 2019 and 2024 that examined links between the microbiome and health outcomes, or between exposures/interventions and microbiome composition, across ten high-impact journals identified using the Scimago Journal and Country Ranking. Eligible studies were retrieved from PubMed using a predefined search strategy. Two reviewers independently screened titles, abstracts and full texts and extracted data on study design, sampling, analytical framework, confounding control, effect size reporting, and the use of causal language. Analyses were performed using standardized extraction templates. Results Across 205 included studies, adoption of causal inference approaches in microbiome research remains limited. Only 15% of studies used designs or analytical strategies capable of approximating causal effects—12% were randomized controlled trials and 3% were observational studies employing formal causal inference methods. Longitudinal designs were common (45%). However, 30% of studies did not address confounding, and more than 40% did not report intervention-relevant or clinically actionable effect sizes. Studies making stronger causal claims were also more likely to propose intervention-relevant recommendations, regardless of the underlying study design. Conclusion The limited use of rigorous causal inference approaches remains a key barrier to producing actionable evidence in microbiome research. Greater adoption of principled confounding control, improved use of mediation and effect-modification frameworks, and more consistent reporting of interpretable effect sizes are necessary to strengthen causal claims. Advancing methodological standards and promoting interdisciplinary collaboration will be essential for translating microbiome findings into clinically meaningful insights. General Microbiology Epidemiology microbiome methodological review analysis strategies causal inference observational studies confounding control study design effect estimation Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 BACKGROUND Over the last two decades, microbiome research has gained increasing attention due to its potential to reveal complex interactions between microbial communities and human health ( 1 – 7 ). Despite advances in statistical and computational methods, challenges in addressing etiological questions remain pervasive in microbiome research ( 8 ). Randomized controlled trials are often infeasible, so the field relies heavily on observational data analyzed through association-based approaches that require additional assumptions to support causal interpretation ( 9 ). Central among these is the need for an adequate representation of the data-generating mechanisms induced by the study design, research design, sampling frameworks, and measurement approaches ( 8 , 10 ). Microbiome composition is influenced by a complex web of host and environmental factors, including diet, environment, genetics, and host physiology ( 11 , 12 ). The interdependence of these variables creates challenges indisentangling causal relationships from spurious associations using observational studies ( 8 ). For example, distinguishing the effect of a specific microbial taxon from that of underlying host characteristics requires that key identifiability assumptions hold. Modern causal inference theory formalizes these assumptions and provides guidance on the conditions under which causal effects can be estimated from observational data ( 13 – 16 ). In particular, three assumptions are fundamental: consistency, which links observed outcomes to potential outcomes under the exposure actually received; exchangeability, which requires that, conditional on measured covariates, exposure assignment is independent of potential outcomes; and positivity, which ensures that all relevant exposure levels occur with nonzero probability within strata of the covariates. Intrinsic properties of microbiome data can lead to violations of these key assumptions. In particular, high dimensionality and underdetermination make it difficult to measure or adjust for all relevant confounders, threatening exchangeability. Overdispersion and sparsity, including a large proportion of zeros, can complicate the definition and consistent measurement of treatment or exposures, affecting consistency. Time-varying confounding further challenges positivity and exchangeability, as confounders that are themselves affected by prior treatment or outcomes can change over time and influence subsequent treatment assignment ( 17 – 19 ). These challenges are compounded by limitations in study design and measurement. Many microbiome studies are cross-sectional, limiting temporal inferences and, consequently, causal interpretation - unless strong external assumptions are met ( 10 , 20 ). Even longitudinal studies are affected by heterogeneity in sampling, sequencing protocols, and batch effects, which threaten consistency ( 21 ). The microbiome’s dynamic nature, changing over time and in response to environmental stimuli, further complicates the establishment of well-defined and generalizable causal relationships, thereby increasing the need for more complex study designs to capture these dynamics. In contrast, related fields such as epidemiology and pharmacoepidemiology, have successfully adopted causal inference frameworks, emphasizing rigorous study design, careful confounding control, well-defined estimands, and transparent reporting ( 22 ). The relative underutilization of such frameworks in microbiome research limits the ability to generate actionable insights and leaves critical questions about whether interventions on the microbiome could improve health outcomes. Motivated by these limitations, this review systematically assesses the adoption of causal inference methods in human microbiome research. By analysing recent observational studies using 16S rRNA or metagenomic sequencing examining links between the microbiome and health outcomes, or between exposures/interventions and microbiome composition and published in leading journals, we aim to provide a comprehensive overview of the current state of causal inference in the field and outline directions for future methodological development. METHODS The study’s objectives, design, and methods were registered with the Open Science Framework (OSF) https://osf.io/bfn2q ( 23 ). Search To identify publications that best represent microbiome research domain, we selected nine journals with the highest publication output in Microbiology Domain between 2019 and 2024 (i.e., excluding journals publishing only literature reviews) were identified from Scimago Journal and Country Ranking, which ranks journals based on the weighted number of citations received by a journal's publications ( 24 ). Scientific Reports was selected as tenth journal, as a non-microbiome-specific journal identified as having the highest number of microbiome-related publications in the same reference period ( 25 ). Relevant studies published between 2019 and 2024 from each journal were identified through a PubMed search using a pre-specified search strategy (Supplementary material). Only studies that primarily used human gut microbiome data obtained by 16S rRNA or metagenomic shotgun sequencing and examining links between the microbiome and health outcomes, or between exposures/interventions and microbiome composition were included. These platforms were selected because they are the most widely used and methodologically mature approaches for characterizing gut microbial communities, providing sufficient taxonomic and/or functional resolution to support comparative analyses and causal inference–oriented study designs ( 26 ). Literature reviews, descriptive studies, pooled analysis of published studies, studies in animals, and studies that investigated viruses or eukaryotic organisms were excluded. The date range was chosen to cover studies appearing since the cost of 16S rRNA sequencing and newer technologies made it more possible to study larger sample sizes and hence associations with phenotypes ( 27 ). Screening Two reviewers (A.T. and A.M.) independently screened titles and abstracts, followed by full-text assessment when necessary. Screening was conducted separately within each journal with the aim of identifying up to 25 eligible articles per journal for data extraction, for a target total of 250 articles. We deemed this number sufficient to quantify the proportion of articles that would exhibit certain characteristics: with 250 articles, would allow us to obtain a 95%CI half-width of less than 7 percentage points. For journals with more than 100 eligible records, title and abstract screening was capped at 100 articles, whereas for journals with fewer records, all identified articles were screened. Articles meeting the predefined eligibility criteria were advanced to full-text review until the journal-specific cap was reached. Data extraction Two rounds of data extraction were conducted on the included studies to minimize the risk of errors. Data was extracted on the following pre-defined aspects: study design, inferential framework, sample characteristics, data characteristics, statistical methods, mediation analysis and claims, confounding control and justification, number of citations (in Google Scholar), and reporting and action recommendation sentences. A detailed summary of all extraction variables can be found in Table S1. Relevant data were systematically identified by analyzing the title, abstract, and conclusion sections of each article. Special attention was given to the outcomes and results emphasized by the authors, reflecting the central focus of each report. For each study, we then identified domains most pertinent to the reported findings and systematically extracted relevant details. This process included the following study specifics: General Information We recorded general information for each study, including the name(s) of the author(s), the study title, and the journal or report from which the data were extracted. For each study, the publication year and citation count were recorded. Citation data were retrieved from Google Scholar between October and December 2024. Characteristics of Included Studies The study design was classified into the following categories: randomized controlled trial, non-randomized trial, prospective cohort study, retrospective cohort study, case-control study, cross-sectional study, or nested case-control studyBased on the research questions, methodology, and conclusions, the inferential framework employed by each study was categorized as one of the following: statistical (hypothesis) testing, Bayesian inference, causal inference, supervised machine learning (e.g., prediction modeling or classification), or unsupervised learning (e.g., cluster analysis). The inferential framework was classified as causal inference if the study met at least one of the following predefined criteria: (i) the study was a randomized controlled trial; (ii) the authors explicitly stated a causal objective (i.e., to determine the causal effect of X on Y) or identified the analysis as causal; or (iii) the study was observational but employed established causal inference methods aimed at satisfying identifiability assumptions, such as explicit control for confounding using propensity score–based approaches (e.g., matching, weighting, stratification), causal mediation analysis, or other counterfactual-based estimation strategies. We extracted the total number of participants in each study, recorded the number of groups along with their respective sample sizes. The sampling method was recorded as either random or non-random sampling. Exposure and Outcome Variables The type of primary outcome was classified as binary, continuous, or categorical. Similarly, the type of primary exposure was categorized as binary, continuous, or categorical. Microbiome-related variables were further categorized according to their role in the presumed causal structure - as exposure, outcome, mediator, or moderator. Microbiome Measures We recorded the primary microbiome measures used to generate the main conclusions, including alpha diversity, beta diversity, relative abundance, or absolute abundance. Statistical Models We documented the statistical models used in each study to perform inferential analysis. Possible methods included nonparametric tests (e.g., Wilcoxon signed-rank test, Mann-Whitney U test, Kruskal-Wallis test), correlation measures (e.g., Spearman’s or Pearson’s correlation coefficients), hypothesis testing (e.g., t-tests, analysis of variance), regression models (e.g., logistic regression, linear regression, multinomial logistic regression, mixed-effects models), machine learning methods (e.g., random forests, support vector machines, deep learning), and microbiome-specific models (e.g., Linear Discriminant Analysis Effect Size), ANCOM (Analysis of Composition of Microbiomes), MaAsLin (Multivariate Association with Linear Models), PLS-DA (Partial Least Squares Discriminant Analysis)) Confounding Control We examined whether confounding control was implemented and, if so, documented the strategy used. Strategies included covariate adjustment, inverse probability weighting (IPW), matching, restriction, standardization, or stratification. If no confounding control strategy was specified, this was noted. We also assessed whether the choice of confounding control was justified, categorizing the rationale as data-driven (e.g., difference testing or other methods), based on directed acyclic graphs (DAGs), informed by expert input, grounded in literature, or not addressed. Multiple Comparisons We assessed whether studies addressed the issue of multiple comparisons, which refers to the inflation of type I error resulting from conducting multiple statistical hypothesis tests. Specifically, we recorded whether any correction for multiple testing was reported (e.g., Bonferroni adjustment, false discovery rate control) and documented this as “yes” if any such method was applied, or “no” if the issue was not addressed. Mediation and Modification Analyses For each study, we examined whether mediation mechanisms were claimed and noted the corresponding sentence or paragraph and mediation analysis conducted. Similarly, we recorded whether effect modification was reported and type of analyses were conducted. Assessment of Causal Language We extracted the effect size measures reported in the studies, along with sentences used to report and interpret these measures. Where action recommendations or future research recommendations were provided, we recorded these as "yes" or "no" and captured the corresponding sentence(s). Statements were selected in accordance with the criteria described in reference ( 28 ). We evaluated the use of causal versus associational language using two complementary approaches. First, a human reviewer (AT) assessed the strength of causal claims using the framework in ( 23 ), which defines causal language, action recommendations, and their grading (none, weak, moderate, strong). Second, a Large Language Model (LLM), ChatGPT based on the GPT-4 architecture ( 29 ), was used to assess both the strength of causal claims and the strength of practical recommendations. A standardized query was applied to all sentences (Supplementary material), and the model was blinded to other characteristics of the studies to minimize bias. This analysis was conducted in January 2025. Software For abstract and full-text screening and data extraction, we used Covidence, a web-based platform that streamlines systematic review workflows ( 30 ). The extraction templates within Covidence were customized to include the sections described above to ensure consistency and comprehensiveness in data collection. Descriptive graphs and tables were generated using R statistical software (version 4.3.0), which facilitated the analysis and visualization of the data ( 31 ). RESULTS Study selection and included sample A total of 1,864 records were identified across the ten journals included in this review (Fig. 1 Standard systematic review guidelines such as PRISMA were not fully applicable to this methodological review ( 32 ); therefore, Fig. 1 presents a PRISMA-style flow diagram to transparently document the article selection process. Following title and abstract screening, 679 records were screened, and 259 articles were assessed at the full-text level. Of these, 205 studies met the predefined methodological inclusion criteria and were included in the final analysis. As screening was conducted using a journal-stratified sampling strategy with a maximum of 25 included studies per journal, the number of screened and included articles varied across journals, reflecting differences in publication volume and eligibility rather than differential exclusion at later stages of review. This figure summarizes the study selection process across ten journals. Screening was conducted sequentially within each journal until either the prespecified target of 25 eligible publications was reached or no additional publications were available for screening. Consequently, the number of records screened and studies included varies by journal. In total, data were extracted from 205 publications. Because the objective of this review was to characterize the landscape of methodological approaches rather than to exhaustively synthesize all available evidence, a journal-stratified sampling strategy was applied with a maximum of 25 included studies per journal. The flow diagram therefore represents a purposeful screening process rather than complete enumeration. Study designs, sample sizes, and group structure The publication and methodological characteristics of the 205 included studies are summarized in Table 1 . The included studies comprised primarily prospective cohort studies (28.8%, n = 59), cross-sectional studies (26.8%, n = 55), and case–control studies (23.9%, n = 49), with fewer randomized controlled trials (12.2%, n = 25), pre–post studies (4.4%, n = 9), nested case–control studies (2.4%, n = 5), and retrospective cohort studies (1.5%, n = 3). Table 1 Distribution of study characteristics across 205 included studies. Study characteristics Number (out of 205) Percentage (%) Journal Cell Host & Microbe 25 12.2 Gut Microbes 25 12.2 Microbiome 25 12.2 npj Biofilms and Microbiomes 25 12.2 Scientific Reports 25 12.2 mSystems 24 11.7 mSphere 18 8.8 mBio 18 8.8 Nature Microbiology 15 7.3 ISME 5 2.4 Year of publication 2019 16 7.8 2020 36 17.6 2021 39 19.1 2022 43 21.1 2023 39 19.1 2024 31 15.2 Study design Prospective cohort study 59 28.8 Cross-sectional study 55 26.8 Case-control study 49 23.9 Randomized controlled trial 25 12.2 Pre-post study 9 4.4 Nested case-control study 5 2.4 Retrospective cohort study 3 1.5 Inferential framework Statistical (hypothesis) testing 160 78 Causal inference 31 15.1 Supervised machine learning 8 3.9 Unsupervised machine learning 4 2 Bayesian inference 2 1 Role of microbiome studied Exposure 77 37.6 Outcome 75 36.6 Mediator 46 22.4 Moderator/ Effect modifier 7 3.4 Confounding control strategy Adjustment 101 49.3 None 71 34.6 Group-level matching 27 13.2 Stratification 4 2 Restriction 2 1 Confounding control justification None 71 34.6 Difference testing 46 22.4 Data-driven 10 4.9 Literature-driven 10 4.9 DAG 2 1 Multiple comparison control Yes 161 78.5 Mediation mechanism claimed Yes 115 56.1 Moderation mechanism claimed Yes 50 24.4 Effect size measure p-value 51 24.9 Correlation coefficient 44 21.5 Regression coefficient 24 11.7 LEfSe 21 10.2 Log-fold change 20 9.8 Coefficient of determination 14 6.8 Area under the curve 11 5.4 Odds ratio 9 4.4 Other 11 5.4 Future research recommended Yes 178 86.8 Practical recommendations given Yes 105 51.2 This table summarizes key methodological and reporting features of the included studies, including publication year, study design, analytical approach, role of the microbiome in the causal structure, confounding control strategies, outcome modelling practices, and reporting of effect sizes. Counts and percentages reflect the number of studies exhibiting each characteristic out of the total sample (N = 205). The average number of human participants across studies was 2331 (range 8–422417), with a median sample size of 91. Case-control studies showed the greatest heterogeneity in sample size (median = 98; range = 20–422417), driven by a small number of very large studies. Prospective cohort and cross-sectional studies were moderately sized (medians = 81 and 216, respectively), while randomized controlled trials and pre-post studies tended to include smaller cohorts (medians = 56 and 20 respectively). More than half of the studies (53.2%, n = 109) compared two study groups. An additional 15.1% of studies (n = 31) focused on a single study group, typically within longitudinal designs or subgroup analyses. Among single-group studies, the median sample size was 43 (range: 8–1054), whereas two-group studies had a median sample size of 82 (range: 9 − 3890). One study analyzed 16 distinct cohorts, representing the largest number of groups included in a single analysis. Conceptual role of the microbiome in study designs Across the included studies, the microbiome was most frequently conceptualized as an exposure variable (37.6%), followed closely by its role as an outcome (36.6%) (Table 1 ). In 22.0% and 3.4% of studies, the microbiome was analyzed as a mediating variable and effect modifier, respetively. Variable types and operationalization across microbiome roles To examine how conceptual roles translated into analytical practice, we assessed the types of variables used to represent exposures and outcomes when the microbiome was studied as an exposure, outcome, mediator, or effect modifier (Fig. 2 ). When the microbiome was analyzed as an exposure (n = 77), continuous microbiome measures dominated (74/77, 96.1%). The most common configuration paired binary outcomes with continuous microbiome exposures (38/77, 49.4%), followed by continuous outcomes with categorical microbiome exposures (27/77, 35.1%) and continuous outcomes with continuous microbiome exposures (9/77, 11.7%). Binary or discrete microbiome exposures were rare, together accounting for only 3.9% of studies. In studies treating the microbiome as an outcome (n = 75), two main patterns emerged. Continuous microbiome outcomes were most frequently paired with binary exposures (38/75, 50.7%) or discrete exposures (32/75, 42.7%). Continuous exposures were uncommon (5/75, 6.7%), mostly appearing in continuous–continuous configurations (4/75, 5.3%). When the microbiome functioned as a mediating variable (n = 46), the most frequent configurations involved continuous outcomes. Continuous outcomes paired with binary exposures accounted for 30.4% of studies (14/46), while continuous outcomes paired with discrete exposures represented 23.9% (11/46). Other combinations were less common and more evenly distributed, with no single alternative pattern exceeding 13% of studies. Finally, in the smaller subset of studies examining the microbiome as a moderator (n = 7), patterns were heterogeneous. Discrete or binary exposures predominated (5/7, 71.4%), most commonly paired with categorical or continuous outcomes (each 2/7, 28.6%). All other exposure–outcome combinations occurred once or not at all. Each heatmap depicts the frequency of combinations of exposure and outcome variable types (binary, categorical, continuous) used when the microbiome is studied as an exposure, outcome, mediator, or effect modifier. Statistical methods used in microbiome analyses The overall landscape of statistical methods employed across the included studies is shown in Fig. 3 a. In total, 51 distinct analytical methods were identified and grouped into ten broad categories, including non-parametric tests, parametric tests, compositional analyses, correlation-based methods, linear and generalized linear models, machine learning approaches, causal inference methods, survival analysis, Bayesian methods, and other approaches. When grouped by microbiome role and metric, methodological heterogeneity varied considerably. In studies with the microbiome as an exposure (Fig. 3 b), alpha diversity analyses (n = 10) were dominated by non-parametric tests (7/10, 70%), with the remaining analyses using a mix of parametric, compositional, and survival methods. Beta diversity was assessed exclusively with non-parametric methods (2/2). Relative abundance studies (n = 125) employed a wide range of approaches: non-parametric tests were most frequent (28/125, 22.4%), followed by compositional analyses (24/125, 19.2%), parametric tests (15/125, 12.0%), generalized linear models and machine learning (each 12/125, 9.6%), correlation analyses (11/125, 8.8%), causal inference methods (2/125, 1.6%), survival analysis (1/125, 0.8%), and other methods (20/125, 16.0%). Panels summarize the number of studies applying each method type when analyzing exposures (n = 77), outcomes (n = 75), alpha diversity (n = 41), beta diversity (n = 16), and relative abundance (n = 314). Bars represent counts of studies using non-parametric, compositional, parametric, generalized linear, machine-learning, correlation-based, causal-inference, survival, Bayesian, and other analytical approaches. In studies with the microbiome as an outcome variable (Fig. 3 c), alpha diversity analyses (n = 20) were primarily non-parametric (11/20, 55%), with parametric tests (4/20, 20%), generalized linear models (2/20, 10%), and other methods (3/20, 15%). Beta diversity comparisons (n = 11) relied mostly on non-parametric approaches (9/11, 81.8%), with the remaining studies using parametric and other methods (2/11, 18.2%). Relative abundance studies (n = 106) showed high heterogeneity: non-parametric tests were most common (35/106, 33.0%), followed by compositional analyses (21/106, 19.8%), parametric tests (13/106, 12.3%), generalized linear models (12/106, 11.3%), other methods (14/106, 13.2%), machine learning (6/106, 5.7%), and correlation analyses (5/106, 4.7%). Reported effect size measures and uncertainty Across the included studies, reporting practices showed the predominance of difference-testing approaches. The p-value alone or along with other measures was the most commonly reported measure overall, appearing in 25% of studies (Fig. 4 ). Correlation coefficients (21.5%) and regression coefficients (11.7%) were the next most frequently reported effect size measures. Only 10.2% of studies (n = 21) reported confidence intervals for effect size estimates, limiting the interpretability and comparability of reported findings. This faceted bar chart illustrates the distribution of commonly reported effect size measures across different microbiome roles. The choice of effect measures varied by the conceptual role of the microbiome. In microbiome-as-exposure studies, correlation coefficients were most frequently reported (16/77, 20.8%), followed by a Linear discriminant analysis effect size (LEfSe) (11/77, 14.3%), area under the curve or AUC (9/77, 11.7%), regression coefficients (9/77, 11.7%), and log fold-change (7/77, 9.1%), while hazard ratios were rarely reported (2/77, XX %). In microbiome-as-outcome studies, correlation coefficients (16/75, 21.3%) and regression coefficients (10/75, 13.3%) were most commonly estimated, followed by the coefficient of determination (7/75, 9.3%), log fold-change (6/75, 8.0%), and LEfSe (5/75, 6.7%), with fewer instances of the odds ratio (2/75), fold-change (1/75), or incidence ratio (1/75). In microbiome-as-mediators studies, correlation coefficients (11/46, 23.9%), the log fold-change (6/46, 13.0%), regression coefficients (5/46, 10.9%), and LEfSe (5/46, 10.9%) were primarily reported, with rarer instances for the F-statistic (1/46), hazard ratio (1/46), and Mendelian randomization effect size (1/46). Moderator studies were less consistent, with only a few reporting AUC (2/7), coefficient of determination (2/7), correlation coefficient (1/7), LEfSe (1/7), and no studies reporting regression or risk ratios. Confounding Control in Observational Studies Among the 171 observational studies (excluding randomized controlled trials and pre-post designs), 32.0% (n = 54) did not apply any form of confounding control (Fig. 5 ). Of these, 70.0% (n = 38) provided no justification for the absence of confounding control, while 27.8% (n = 15) relied on difference testing to assess covariate balance between groups. Among studies that implemented confounding control, adjustment was the most common strategy (52.0%, n = 89), followed by group-level matching (14.0%, n = 24). Restriction and stratification were rarely used, each appearing in 1.2% of studies (not shown in the Figure). Among studies that adjusted for confounding, 52.8% (n = 47) did not justify their selection of covariates, and 31.5% (n = 28) relied on significance testing to guide covariate inclusion. Mediation and Mechanistic Claims More than half of the included studies (53.2%, n = 109) reported conducting mediation analyses, and 56.1% (n = 115) explicitly claimed a mediation mechanism. Overall, 55% of studies treating the microbiome as an exposure and 33.3% of studies where microbiome was an outcome made claims about mediation mechanisms. Mediation claims were most common in studies examining the microbiome as a mediator (38.3%, n = 44) or as an exposure (37.4%, n = 43). The most frequently used method to support mediation claims was Spearman’s correlation (39.1%), followed by Pearson’s correlation (7.0%). A small proportion of studies supported mediation claims using additional laboratory/animal experiments (6.1%), while an equal proportion (6.1%) claimed causal mechanisms without providing analytical evidence. Strength of Causal Claims and Practical Recommendations In the assessment of the strength of causal claims, overall agreement between the human’s and LLM’s assessments was high, with the LLM tending to assign slightly more conservative ratings, (Figs. 6 a and 6 b). The human reviewer identified 34 studies as making strong causal claims, whereas GPT identified 18. Conversely, 115 studies were rated as making no or weak causal claims by the human reviewer, compared with 136 by GPT. Regarding practical recommendations, 51.2% of studies (n = 105) provided recommendations for implementation. Case-control studies reported causal recommendations in 25 of 49 studies (51.0%). Randomized controlled trials provided recommendations in 15 of 25 studies (60%) and prospective cohort studies did so in 33 of 59 (55.9%). Nested case-control studies provided recommendations in 4 of 5 studies. Pre-post studies reported recommendations in 4 of 9 studies. Cross-sectional studies were less likely to lead to such recommendations, with 23 of 55 reporting recommendations (51.0%). and retrospective cohort studies rarely did so, with only 1 of 3 reporting recommendations. Among studies that provided recommendations, most issued weak or moderate recommendations, and no study issued a strong recommendation (Figs. 6 c and 6 d). Studies with weaker causal claims tended to issue weaker recommendations, although a small number of studies issued recommendations despite making no explicit causal claims. Finally, the strength of causal claims varied by the conceptual role of the microbiome (Fig. 6 e). Studies treating the microbiome as an exposure most commonly made no or weak causal claims, whereas moderate and strong claims were more frequent when the microbiome was examined as a mediator or outcome. Studies considering the microbiome as an effect modifier tended toward weak or moderate claims, although the small number of such studies limits interpretation. DISCUSSION This methodological review demonstrates that, despite rapid growth and methodological innovation in microbiome research, rigorous causal inference approaches remain infrequently applied. Most studies continue to rely on association-based analyses, with limited attention to confounding control, counterfactual reasoning, and clinically interpretable effect estimation. Study design and temporal structure One notable pattern observed in the reviewed literature is a shift toward more temporally structured study designs. In a prior systematic review by Bardenhorst et al. ( 10 ), covering studies published in 2018–2019, only 27% employed longitudinal designs. In contrast, 45% of studies in the present review used longitudinal or prospective designs. This change is consistent with increased recognition of temporality as a key component of causal interpretation. However, while improved temporal ordering strengthens causal plausibility, it does not, on its own, address other core challenges related to confounding, measurement, or effect interpretation ( 33 , 34 ). Microbiome role, variable representation, and causal consistency A particularly informative finding concerns how variable types are distributed according to the microbiome’s conceptual role in the causal model. When treated as an exposure, microbiome features were overwhelmingly represented as continuous measures, such as relative abundances or diversity indices. Although ecologically intuitive, this dominance of broad, continuously defined exposures complicates causal interpretation. Continuous microbiome measures often lack a clear intervention analogue, making it difficult to define well-specified causal estimands or to satisfy the consistency assumption, which requires that exposure levels correspond to coherent interventions.( 35 ) Moreover, the limited use of causal frameworks explicitly designed to handle continuous, high-dimensional exposures further constrains interpretability. As a result, even when statistical associations are robust, the corresponding causal questions often remain unclear, and effect estimates may be difficult to compare across studies or translate into meaningful biological or clinical interpretations. Confounding control and variable selection A recurring limitation across studies was the inconsistent handling and reporting of confounding. Although demographic, behavioral, and clinical covariates were frequently collected, few studies clearly articulated the rationale for confounder selection or employed formal tools such as directed acyclic graphs (DAGs) to justify adjustment strategies. Design-based approaches, including matching or restriction, were also rarely reported. In many cases, confounder selection appeared to rely on difference testing or data-driven significance thresholds as recommended by some current guidelines including the STrengthening the Reporting of OBservational studies in Epidemiology (STROBE) ( 36 ). This approach is problematic, as it may fail to identify variables that strongly influence the outcome but are weakly associated with the exposure, leaving residual confounding unaddressed ( 37 ). It is not expected that a fixed predefined set of confounders would be accounted for in microbiome studies, because ideally the confounder set is to be tailored to the specific research question, design, and population. However, as shown by Vujkovic-Cvijin et al. ( 38 ), factors such as sex, BMI, diet, alcohol consumption, and bowel movement quality explain substantial variation in gut microbiota composition and may confound disease associations if not appropriately accounted for. The inconsistent treatment of such variables reflects a broader challenge in aligning statistical practices with causal reasoning in microbiome research. Effect and uncertainty measures reporting and interpretability Another prominent pattern concerns the limited reporting of effect and uncertainty measures. Many studies relied primarily on p-values alone to convey results, while fewer reported effect estimates, and only a small fraction provided confidence intervals. This practice restricts interpretability, as statistical significance alone does not convey the magnitude, direction, or precision of an effect. As noted in the clinical and epidemiologic literature ( 39 , 40 ) small and clinically irrelevant effects may achieve statistical significance in large samples, whereas meaningful effects may go undetected in smaller studies. Without effect sizes and confidence intervals, it is difficult to assess the practical relevance of reported findings or to compare results across studies. The observed variability in reported effect measures further limits synthesis, particularly when microbiome features are analyzed on different scales or treated differently across causal roles. Mediation analyses and reliance on correlations Mediation analyses represent another area where current practices impose substantial limitations on causal interpretation. Spearman’s correlation was frequently used to explore relationships among exposures, microbiome features, and outcomes, often as a proxy for mediation. While correlation analyses can reveal associations suggestive of potential pathways, they do not establish causal mechanisms. Several challenges arise in this context. First, correlation measures do not distinguish cause from effect or account for exposure–mediator interactions. Second, microbiome-wide correlation analyses implicitly assume independence among microbial features, an assumption violated by compositional constraints and microbial interdependence. Third, correlation-based approaches typically do not control for confounders influencing both the mediator and the outcome, increasing the risk of spurious associations ( 41 , 42 ). As a result, while correlation analyses may be useful for exploratory purposes, they offer limited insight into causal mediation. More formal causal mediation frameworks, such as those developed by VanderWeele ( 43 , 44 ) explicitly define direct and indirect effects under counterfactual assumptions and can, in principle, accommodate treatment–microbiome interactions. The limited adoption of such frameworks in the reviewed literature underscores a gap between the complexity of the scientific questions being posed and the analytical tools commonly applied. Strength of causal claims and reported recommendations Importantly, this review highlights that relatively few microbiome studies make strong explicit causal claims or offer practical recommendations. Studies that did do so were more commonly those based on stronger designs, such as randomized controlled trials, pre–post interventions, or longitudinal cohort studies. This pattern suggests that, despite substantial analytical effort across the field, much of the published microbiome literature does not yet translate into intervention-relevant conclusions. The limited prevalence of causal claims and actionable recommendations underscores the need for broader adoption of causal inference frameworks in microbiome research. Without clearly articulated causal questions and corresponding analytical strategies, large volumes of data may yield primarily associative findings with restricted downstream utility. More systematic use of causal inference techniques has the potential to improve research efficiency by enabling clearer interpretation of results, better prioritization of follow-up studies, and more direct relevance to clinical or public health decision-making. Implications for the field Taken together, these findings highlight slow adoption of the application and reporting of causal inference in microbiome research. These limitations, ranging from variable definition and confounding control to effect measure and size reporting and mediation analysis, do not diminish the value of existing studies but rather delineate the boundaries of what current practices can support. Causal inference frameworks offer a conceptual structure that may help address these challenges, but their potential lies in clarifying assumptions, estimands, and interpretability rather than in providing prescriptive solutions. This is the first methodological review to summarize the methodological landscape, informing future efforts to align research questions, analytical strategies, and causal claims in microbiome science. Limitations This study has several limitations. First, our review was restricted to 205 research articles published in 10 journals. While these journals are highly regarded in the microbiome field, the selection includes only one broad-topic journal, potentially limiting the generalizability of our findings. Standards for causal inference and reporting practices may vary across journals, especially in broader biomedical or interdisciplinary domains. By focusing on the top journals within the microbiome domain, our aim was not to provide an exhaustive survey but to identify prevailing trends, commonly used frameworks, and key methodological challenges in current causal inference practices. Second, stratifying by microbiome role or microbiome measures further reduced the number of studies available for analysis. This limitation is particularly relevant for studies examining the microbiome as a moderator, which were limited to only seven studies, preventing strong conclusions about trends in this specific subgroup. Third, many microbiome studies employ multiple analytical approaches rather than a single method. To facilitate mapping of current methodologies, we extracted data on up to two main analyses per study. Nonetheless, some additional analyses were not included, meaning the overview provided here does not fully capture the complete methodological landscape. Fourth, microbiome research is inherently complex, with study questions, microbiome measures, and analytical techniques varying widely. In this review, we stratified studies only by the microbiome’s conceptual role and the type of microbiome measure. Further stratifications could provide a more nuanced understanding of patterns and trends in current practices but were beyond the scope of this analysis. Finally, the assessment of causal language and the strength of recommendations involved a degree of subjectivity. Although we employed a standardized rubric and resolved discrepancies through consensus discussions, some level of interpretive ambiguity is unavoidable. CONCLUSION As microbiome research matures, there is increasing interest in moving beyond descriptive associations toward robust, interpretable causal claims. This review highlights both encouraging developments and areas of concern in the field’s current methodological practices. The rise in longitudinal and interventional designs is a promising step toward satisfying the fundamental requirement of temporality. Yet, this shift has not been consistently matched by appropriate analytic frameworks or transparent reporting. Moreover, meaningful effect estimation remains limited. By identifying current patterns and gaps, this review highlights the pressing need for methodological innovations and interdisciplinary collaborations between microbiome researchers, statisticians, and epidemiologists to strengthen causal inference practices in the field. Addressing these challenges is essential to unlocking the full potential of microbiome studies to inform precision medicine, public health interventions, and our broader understanding of human health and disease. Declarations Conflicts of interest No conflicts of interest declared Funding We acknowledge the Fonds de Recherche du Québec-Santé (FRQ-S) for the doctoral award to Albina Tskhay and Alibek Moldakozhayev. FRQ-S, did not play any role in the design, analysis or interpretation of results. Authors’ contributions Albina Tskhay designed the data extraction template; screened abstracts and full-text articles; extracted and analysed the data; interpreted the results; and drafted the manuscript. Alibek Moldakozhayev screened abstracts and full-text articles, extracted data, interpreted the results, and critically revised the manuscript for important intellectual content. Cristina Longo co-supervised Albina Tskhay, contributed to the design of the data extraction template, interpreted the results, and critically revised the manuscript for important intellectual content. Roxana Behruzi, Celia Greenwood, Stan Kubow contributed to the interpretation of the results and critically revised the manuscript for important intellectual content. Vadim N. Gladyshev supervised Alibek Moldakozhayev and critically revised the manuscript for important intellectual content. Tibor Schusted supervised Albina Tskhay, contributed to the design of the data extraction template, interpreted the results, and critically revised the manuscript for important intellectual content. References Helmink BA, Khan MAW, Hermann A, Gopalakrishnan V, Wargo JA (2019) The microbiome, cancer, and cancer therapy. Nat Med 25(3):377–388 Hsiao EY, McBride SW, Hsien S, Sharon G, Hyde ER, McCue T et al (2013) Microbiota modulate behavioral and physiological abnormalities associated with neurodevelopmental disorders. Cell 155(7):1451–1463 Li B, Selmi C, Tang R, Gershwin ME, Ma X (2018) The microbiome and autoimmunity: a paradigm from the gut–liver axis. Cell Mol Immunol 15(6):595–609 MacQueen G, Surette M, Moayyedi P (2017) The gut microbiota and psychiatric illness. J Psychiatry Neurosci 42(2):75–77 Roy Sarkar S, Banerjee S (2019) Gut microbiota in neurodegenerative disorders. J Neuroimmunol 328:98–104 Wang P-X, Deng X-R, Zhang C-H, Yuan H-J (2020) Gut microbiota and metabolic syndrome. Chin Med J 133(7):808–816 Alibek K, Farmer S, Tskhay A, Moldakozhayev A, Alibek K, Isakov T (2019) The Role of Infection, Inflammation and Genetic Alterations in ASD Etiopathogenesis: A Review. J Neurol Psychiatr Disord 2:105 Relman DA (2020) Thinking about the microbiome as a causal factor in human health and disease: philosophical and experimental considerations. Curr Opin Microbiol 54:119–126 Lin H, Peddada SD (2020) Analysis of microbial compositions: a review of normalization and differential abundance analysis. npj Biofilms Microbiomes 6(1):60 Bardenhorst SK, Berger T, Klawonn F, Vital M, Karch A, Rübsamen N (2021) Syst Rev Curr Pract mSystems 6(1). 10.1128/msystems.01154-20 . Data Analysis Strategies for Microbiome Studies in Human Populations—a Xia Y, Sun J, Chen D-G (2018) What Are Microbiome Data? Springer Singapore, pp 29–41 Lutz KC, Jiang S, Neugent ML, De Nisco NJ, Zhan X, Li Q (2022) A Survey of Statistical Methods for Microbiome Data Analysis. Front Appl Math Stat. ;8 Nearing JT, Douglas GM, Hayes MG, MacDonald J, Desai DK, Allward N et al (2022) Microbiome differential abundance methods produce different results across 38 datasets. Nat Commun 13(1):342 VanderWeele TJ, Hernán MA (2013) Causal Inference Under Multiple Versions of Treatment. J Causal Inference 1(1):1–20 Lanza ST, Moore JE, Butera NM (2013) Drawing causal inferences using propensity scores: a practical guide for community psychologists. Am J Community Psychol 52(3–4):380–392 Lousdal ML (2018) An introduction to instrumental variable assumptions, validation and estimation. Emerg Themes Epidemiol 15(1):1 Xia Y, Sun J, Chen D-G (2018) What are microbiome data? Statistical analysis of microbiome data with R: Springer; pp. 29–41 Lutz KC, Jiang S, Neugent ML, De Nisco NJ, Zhan X, Li Q (2022) A survey of statistical methods for microbiome data analysis. Front Appl Math Stat 8:884810 Wozniak H, Gaïa N, Lazarevic V, Le Terrier C, Beckmann TS, Balzani E et al (2024) Early reduction in gut microbiota diversity in critically ill patients is associated with mortality. Ann Intensiv Care 14(1):174 Barnett TA, Koushik A, Schuster T (2023) Invited Commentary: Cross-Sectional Studies and Causal Inference-It's Complicated. Am J Epidemiol 192(4):517–519 Debelius J, Song SJ, Vazquez-Baeza Y, Xu ZZ, Gonzalez A, Knight R (2016) Tiny microbes, enormous impacts: what matters in gut microbiome studies? Genome Biol 17(1):217 Fischbach MA (2018) Microbiome: focus on causation and mechanism. Cell 174(4):785–790 Tskhay A, Moldakozhayev A, Longo C, Schuster T (2024) Methodological Evaluation in Microbiome Research: Approaches Employed to Study Microbiome as an Exposure and Outcome Variable. Systematic Review Pre-Registration SCImago SJR — SCImago Journal & Country Rank [Portal] n.d [Available from: http://www.scimagojr.com Huang Z, Liu K, Ma W, Li D, Mo T, Liu Q (2022) The gut microbiome in human health and disease—Where are we and where are we going? A bibliometric analysis. Front Microbiol. ;13 Knight R, Vrbanac A, Taylor BC, Aksenov A, Callewaert C, Debelius J et al (2018) Best practices for analysing microbiomes. Nat Rev Microbiol 16(7):410–422 Minich JJ, Humphrey G, Benitez RAS, Sanders J, Swafford A, Allen EE, Knight R (2018) High-Throughput Miniaturized 16S rRNA Amplicon Library Preparation Reduces Costs while Preserving Microbiome Integrity. mSystems 3(6). 10.1128/msystems.00166 – 18 Haber NA, Wieten SE, Rohrer JM, Arah OA, Tennant PWG, Stuart EA et al (2022) Causal and Associational Language in Observational Health Research: A Systematic Evaluation. Am J Epidemiol 191(12):2084–2097 OpenAI CGPT May 13 ed2024 Innovation VH Covidence systematic review software Melbourne, Australia de Micheaux PL, Drouilhet R, Liquet B (2013) The R software. Fundamentals of Programming and Statistical Analysis. :978-1 Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD et al (2021) The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ 372:n71 Haine D, Dohoo I, Dufour S (2018) Selection and misclassification biases in longitudinal studies. Front veterinary Sci 5:99 Laird NM (1988) Missing data in longitudinal studies. Stat Med 7(1–2):305–315 Chatton A, Rohrer JM (2024) The Causal Cookbook: Recipes for Propensity Scores, G-Computation, and Doubly Robust Standardization. Adv Methods Practices Psychol Sci 7(1):25152459241236149 von Elm E, Altman DG, Egger M, Pocock SJ, Gøtzsche PC, Vandenbroucke JP (2007) The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies. Ann Intern Med 147(8):573–577 Sourial N, Vedel I, Le Berre M, Schuster T (2019) Testing group differences for confounder selection in nonrandomized studies: flawed practice. CMAJ 191(43):E1189–e93 Vujkovic-Cvijin I, Sklar J, Jiang L, Natarajan L, Knight R, Belkaid Y (2020) Host variables confound gut microbiota studies of human disease. Nature 587(7834):448–454 Sullivan GM, Feinn R (2012) Using Effect Size—or Why the P Value Is Not Enough. J Graduate Med Educ 4(3):279–282 Gardner MJ, Altman DG (1986) Confidence intervals rather than P values: estimation rather than hypothesis testing. Br Med J (Clin Res Ed) 292(6522):746–750 Janse RJ, Hoekstra T, Jager KJ, Zoccali C, Tripepi G, Dekker FW, van Diepen M (2021) Conducting correlation analysis: important limitations and pitfalls. Clin Kidney J 14(11):2332–2337 Gloor GB, Macklaim JM, Pawlowsky-Glahn V, Egozcue JJ (2017) Microbiome Datasets Are Compositional: And This Is Not Optional. Front Microbiol 8:2224 VanderWeele TJ (2013) A three-way decomposition of a total effect into direct, indirect, and interactive effects. Epidemiology 24(2):224–232 VanderWeele TJ (2014) A unification of mediation and interaction: a 4-way decomposition. Epidemiology 25(5):749–761 Additional Declarations The authors declare no competing interests. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-8681336","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":579504420,"identity":"452c8891-f633-4660-90b7-2e7f7a95a50e","order_by":0,"name":"Albina Tskhay","email":"","orcid":"","institution":"McGill University","correspondingAuthor":false,"prefix":"","firstName":"Albina","middleName":"","lastName":"Tskhay","suffix":""},{"id":579504421,"identity":"d0edeab9-6eb8-47d0-88d6-fd188f0dda5d","order_by":1,"name":"Alibek Moldakozhayev","email":"","orcid":"","institution":"Harvard Medical School","correspondingAuthor":false,"prefix":"","firstName":"Alibek","middleName":"","lastName":"Moldakozhayev","suffix":""},{"id":579504422,"identity":"47026fcb-55f2-44df-ae7d-0d34ebfb5b40","order_by":2,"name":"Cristina Longo","email":"","orcid":"","institution":"University of Montreal","correspondingAuthor":false,"prefix":"","firstName":"Cristina","middleName":"","lastName":"Longo","suffix":""},{"id":579504423,"identity":"59cb25ae-5cf3-4c9a-99a0-d9ffd44bc9f1","order_by":3,"name":"Roxana Behruzi","email":"","orcid":"","institution":"McGill University","correspondingAuthor":false,"prefix":"","firstName":"Roxana","middleName":"","lastName":"Behruzi","suffix":""},{"id":579504424,"identity":"a84f56ab-23fd-4e22-b830-c02ccf824106","order_by":4,"name":"Celia Greenwood","email":"","orcid":"","institution":"McGill University","correspondingAuthor":false,"prefix":"","firstName":"Celia","middleName":"","lastName":"Greenwood","suffix":""},{"id":579504425,"identity":"fc419e8c-a5b5-4fe9-9a13-3f0ed7fcfceb","order_by":5,"name":"Stan Kubow","email":"","orcid":"","institution":"McGill University","correspondingAuthor":false,"prefix":"","firstName":"Stan","middleName":"","lastName":"Kubow","suffix":""},{"id":579504426,"identity":"e2df4573-fb34-4ae3-8afb-4b346099621b","order_by":6,"name":"Vadim N Gladyshev","email":"","orcid":"","institution":"Harvard Medical School","correspondingAuthor":false,"prefix":"","firstName":"Vadim","middleName":"N","lastName":"Gladyshev","suffix":""},{"id":579504427,"identity":"268b4eab-fa32-4b1e-b44b-88c512dac1b0","order_by":7,"name":"Tibor Schuster","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA+ElEQVRIiWNgGAWjYDACZh4IbQChbICYsfHAAzw6eJC0MDYwMKSBtDQcSMCnhQFVy2EwB68We3beg48rGOrkzSXSnz/4uOd8NH//YpAtdvK4HcaXbHiG4bDhzhk5ho0znt3OnXHjIUhLsmEDbr+YSTYAXWJwI4exmefA7dwNEgdBWg4wEtBSB9SS/rD5z4FzcC32BLQwA7UkGDYzHDiQu4G/EawlEaeWw0C/NBgcNtxw5o3hzJ4DyUC/gALZIDkZlxb2/rMHHzZU1MkbHE9/8OHHAbvc/v7jDx98qLCzxaUFAgyQORIJ6CIEAf8BkpSPglEwCkbB8AcA4eNfvHF4FjoAAAAASUVORK5CYII=","orcid":"","institution":"McGill University","correspondingAuthor":true,"prefix":"","firstName":"Tibor","middleName":"","lastName":"Schuster","suffix":""}],"badges":[],"createdAt":"2026-01-23 16:57:20","currentVersionCode":1,"declarations":{"humanSubjects":false,"vertebrateSubjects":false,"conflictsOfInterestStatement":false,"humanSubjectEthicalGuidelines":false,"humanSubjectConsent":false,"humanSubjectClinicalTrial":false,"humanSubjectCaseReport":false,"vertebrateSubjectEthicalGuidelines":false},"doi":"10.21203/rs.3.rs-8681336/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-8681336/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":102177981,"identity":"0ee3c397-f9e6-475a-b90e-a83352a1c976","added_by":"auto","created_at":"2026-02-09 06:50:34","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":116204,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003ePRISMA flow diagram of study identification, screening, and inclusion\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"1.png","url":"https://assets-eu.researchsquare.com/files/rs-8681336/v1/a39f322733a6d3fbb4c65fe3.png"},{"id":102296557,"identity":"eb4f7227-eaca-4ce3-aa61-bdb685a0b8ef","added_by":"auto","created_at":"2026-02-10 10:20:07","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":91092,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eVariable Type Combinations in Microbiome Research Across Causal Roles.\u003cbr\u003e\n \u003c/strong\u003eEach heatmap depicts the frequency of combinations of exposure and outcome variable types (binary, categorical, continuous) used when the microbiome is studied as an exposure, outcome, mediator, or effect modifier.\u003c/p\u003e","description":"","filename":"2.png","url":"https://assets-eu.researchsquare.com/files/rs-8681336/v1/229e5b5e113fdbbba0e8eb5e.png"},{"id":102296684,"identity":"cb6e46b1-153d-4bea-8b8a-9d950e102dc9","added_by":"auto","created_at":"2026-02-10 10:20:42","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":309836,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eFrequency of Each Method Category Used in Studies to Arrive to The Main Conclusions Grouped by (A) Category and (B) Primary Microbiome Measures.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003ePanels summarize the number of studies applying each method type when analyzing exposures (n = 77), outcomes (n = 75), alpha diversity (n = 41), beta diversity (n = 16), and relative abundance (n = 314). Bars represent counts of studies using non-parametric, compositional, parametric, generalized linear, machine-learning, correlation-based, causal-inference, survival, Bayesian, and other analytical approaches.\u003c/p\u003e","description":"","filename":"3.png","url":"https://assets-eu.researchsquare.com/files/rs-8681336/v1/5f0df1c9e980845b3b8a4a6f.png"},{"id":102296760,"identity":"304490bf-4449-4071-a813-c4a864960611","added_by":"auto","created_at":"2026-02-10 10:21:24","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":272042,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eEffect Size Measures Used in Microbiome Research by Variable Role.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis faceted bar chart illustrates the distribution of commonly reported effect size measures across different microbiome roles.\u003c/p\u003e","description":"","filename":"4.png","url":"https://assets-eu.researchsquare.com/files/rs-8681336/v1/8c79e2b280e8847a78665478.png"},{"id":102177986,"identity":"7df9d4a4-32a1-4115-8a14-b8d7739ad435","added_by":"auto","created_at":"2026-02-09 06:50:34","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":152210,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eDistribution of Justifications for Confounding Control Methods in Epidemiologic Studies. \u003c/strong\u003eThe percentages shown on the y-axis are calculated relative to each confounding control method on the x-axis. Dots represent the upper and lower confidence intervals (CIs) for each justification category within a confounding control method.\u003c/p\u003e","description":"","filename":"5.png","url":"https://assets-eu.researchsquare.com/files/rs-8681336/v1/8676747dec49cb3b9042495f.png"},{"id":102298664,"identity":"ec661758-ac59-4884-a6f8-134468bfec22","added_by":"auto","created_at":"2026-02-10 10:57:19","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":621124,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eAnalysis of Reporting Claims’ and Practical Recommendations’ Strength.\u003c/strong\u003e (A-B) Comparison of the strength of causal claims and action recommendations as assessed by a human reviewer and GPT. Overall agreement was high, with the LLM ratings being slightly more conservative. Most discrepancies were within one level on the 4-point scale (none, weak, moderate, strong). (C-D) Strength of Causal Claims compared against Strength of Recommendations for gradings done by human reviewer and GPT. \"N/A\" indicates no recommendations were provided. Other categories include “none” (recommendations not based on causal reasoning), and “weak,” “moderate,” or “strong” based on their implied causal strength. (E) Strength of Causal Claims by Microbiome Role. Each facet represents a different microbiome role (exposure, mediator, outcome, or effect modifier), with within-category percentages showing the distribution of causal claim strengths.\u003c/p\u003e","description":"","filename":"6.png","url":"https://assets-eu.researchsquare.com/files/rs-8681336/v1/eacbc77da8d29f493416cebb.png"},{"id":102397440,"identity":"87e76d7d-a5fb-41c2-abca-54a2813b34aa","added_by":"auto","created_at":"2026-02-11 10:16:57","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":2691938,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8681336/v1/d303cb06-4d26-4c09-9cec-0919ed9c5668.pdf"}],"financialInterests":"The authors declare no competing interests.","formattedTitle":"\u003cp\u003e\u003cstrong\u003eAssessing the adoption of Causal Language and Methodology in Human Microbiome Studies: A Methodological Review\u003c/strong\u003e\u003c/p\u003e","fulltext":[{"header":"BACKGROUND","content":"\u003cp\u003eOver the last two decades, microbiome research has gained increasing attention due to its potential to reveal complex interactions between microbial communities and human health (\u003cspan additionalcitationids=\"CR2 CR3 CR4 CR5 CR6\" citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e). Despite advances in statistical and computational methods, challenges in addressing etiological questions remain pervasive in microbiome research (\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e). Randomized controlled trials are often infeasible, so the field relies heavily on observational data analyzed through association-based approaches that require additional assumptions to support causal interpretation (\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e). Central among these is the need for an adequate representation of the data-generating mechanisms induced by the study design, research design, sampling frameworks, and measurement approaches (\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e, \u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eMicrobiome composition is influenced by a complex web of host and environmental factors, including diet, environment, genetics, and host physiology (\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e, \u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e). The interdependence of these variables creates challenges indisentangling causal relationships from spurious associations using observational studies (\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e). For example, distinguishing the effect of a specific microbial taxon from that of underlying host characteristics requires that key identifiability assumptions hold. Modern causal inference theory formalizes these assumptions and provides guidance on the conditions under which causal effects can be estimated from observational data (\u003cspan additionalcitationids=\"CR14 CR15\" citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e). In particular, three assumptions are fundamental: consistency, which links observed outcomes to potential outcomes under the exposure actually received; exchangeability, which requires that, conditional on measured covariates, exposure assignment is independent of potential outcomes; and positivity, which ensures that all relevant exposure levels occur with nonzero probability within strata of the covariates.\u003c/p\u003e \u003cp\u003eIntrinsic properties of microbiome data can lead to violations of these key assumptions. In particular, high dimensionality and underdetermination make it difficult to measure or adjust for all relevant confounders, threatening exchangeability. Overdispersion and sparsity, including a large proportion of zeros, can complicate the definition and consistent measurement of treatment or exposures, affecting consistency. Time-varying confounding further challenges positivity and exchangeability, as confounders that are themselves affected by prior treatment or outcomes can change over time and influence subsequent treatment assignment (\u003cspan additionalcitationids=\"CR18\" citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eThese challenges are compounded by limitations in study design and measurement. Many microbiome studies are cross-sectional, limiting temporal inferences and, consequently, causal interpretation - unless strong external assumptions are met (\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e, \u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e). Even longitudinal studies are affected by heterogeneity in sampling, sequencing protocols, and batch effects, which threaten consistency (\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e). The microbiome\u0026rsquo;s dynamic nature, changing over time and in response to environmental stimuli, further complicates the establishment of well-defined and generalizable causal relationships, thereby increasing the need for more complex study designs to capture these dynamics.\u003c/p\u003e \u003cp\u003eIn contrast, related fields such as epidemiology and pharmacoepidemiology, have successfully adopted causal inference frameworks, emphasizing rigorous study design, careful confounding control, well-defined estimands, and transparent reporting (\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e). The relative underutilization of such frameworks in microbiome research limits the ability to generate actionable insights and leaves critical questions about whether interventions on the microbiome could improve health outcomes.\u003c/p\u003e \u003cp\u003e Motivated by these limitations, this review systematically assesses the adoption of causal inference methods in human microbiome research. By analysing recent observational studies using 16S rRNA or metagenomic sequencing examining links between the microbiome and health outcomes, or between exposures/interventions and microbiome composition and published in leading journals, we aim to provide a comprehensive overview of the current state of causal inference in the field and outline directions for future methodological development.\u003c/p\u003e"},{"header":"METHODS","content":"\u003cp\u003eThe study\u0026rsquo;s objectives, design, and methods were registered with the Open Science Framework (OSF) \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://osf.io/bfn2q\u003c/span\u003e\u003cspan address=\"https://osf.io/bfn2q\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e).\u003c/p\u003e \u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003eSearch\u003c/h2\u003e \u003cp\u003eTo identify publications that best represent microbiome research domain, we selected nine journals with the highest publication output in Microbiology Domain between 2019 and 2024 (i.e., excluding journals publishing only literature reviews) were identified from Scimago Journal and Country Ranking, which ranks journals based on the weighted number of citations received by a journal's publications (\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e). Scientific Reports was selected as tenth journal, as a non-microbiome-specific journal identified as having the highest number of microbiome-related publications in the same reference period (\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eRelevant studies published between 2019 and 2024 from each journal were identified through a PubMed search using a pre-specified search strategy (Supplementary material). Only studies that primarily used human gut microbiome data obtained by 16S rRNA or metagenomic shotgun sequencing and examining links between the microbiome and health outcomes, or between exposures/interventions and microbiome composition were included. These platforms were selected because they are the most widely used and methodologically mature approaches for characterizing gut microbial communities, providing sufficient taxonomic and/or functional resolution to support comparative analyses and causal inference\u0026ndash;oriented study designs (\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e). Literature reviews, descriptive studies, pooled analysis of published studies, studies in animals, and studies that investigated viruses or eukaryotic organisms were excluded. The date range was chosen to cover studies appearing since the cost of 16S rRNA sequencing and newer technologies made it more possible to study larger sample sizes and hence associations with phenotypes (\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e).\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eScreening\u003c/h3\u003e\n\u003cp\u003eTwo reviewers (A.T. and A.M.) independently screened titles and abstracts, followed by full-text assessment when necessary. Screening was conducted separately within each journal with the aim of identifying up to 25 eligible articles per journal for data extraction, for a target total of 250 articles. We deemed this number sufficient to quantify the proportion of articles that would exhibit certain characteristics: with 250 articles, would allow us to obtain a 95%CI half-width of less than 7 percentage points. For journals with more than 100 eligible records, title and abstract screening was capped at 100 articles, whereas for journals with fewer records, all identified articles were screened. Articles meeting the predefined eligibility criteria were advanced to full-text review until the journal-specific cap was reached.\u003c/p\u003e\n\u003ch3\u003eData extraction\u003c/h3\u003e\n\u003cp\u003eTwo rounds of data extraction were conducted on the included studies to minimize the risk of errors. Data was extracted on the following pre-defined aspects: study design, inferential framework, sample characteristics, data characteristics, statistical methods, mediation analysis and claims, confounding control and justification, number of citations (in Google Scholar), and reporting and action recommendation sentences. A detailed summary of all extraction variables can be found in Table S1.\u003c/p\u003e \u003cp\u003eRelevant data were systematically identified by analyzing the title, abstract, and conclusion sections of each article. Special attention was given to the outcomes and results emphasized by the authors, reflecting the central focus of each report. For each study, we then identified domains most pertinent to the reported findings and systematically extracted relevant details. This process included the following study specifics:\u003c/p\u003e\n\u003ch3\u003eGeneral Information\u003c/h3\u003e\n\u003cp\u003eWe recorded general information for each study, including the name(s) of the author(s), the study title, and the journal or report from which the data were extracted. For each study, the publication year and citation count were recorded. Citation data were retrieved from Google Scholar between October and December 2024.\u003c/p\u003e\n\u003ch3\u003eCharacteristics of Included Studies\u003c/h3\u003e\n\u003cp\u003eThe study design was classified into the following categories: randomized controlled trial, non-randomized trial, prospective cohort study, retrospective cohort study, case-control study, cross-sectional study, or nested case-control studyBased on the research questions, methodology, and conclusions, the inferential framework employed by each study was categorized as one of the following: statistical (hypothesis) testing, Bayesian inference, causal inference, supervised machine learning (e.g., prediction modeling or classification), or unsupervised learning (e.g., cluster analysis). The inferential framework was classified as causal inference if the study met at least one of the following predefined criteria: (i) the study was a randomized controlled trial; (ii) the authors explicitly stated a causal objective (i.e., to determine the causal effect of X on Y) or identified the analysis as causal; or (iii) the study was observational but employed established causal inference methods aimed at satisfying identifiability assumptions, such as explicit control for confounding using propensity score\u0026ndash;based approaches (e.g., matching, weighting, stratification), causal mediation analysis, or other counterfactual-based estimation strategies.\u003c/p\u003e \u003cp\u003e We extracted the total number of participants in each study, recorded the number of groups along with their respective sample sizes. The sampling method was recorded as either random or non-random sampling.\u003c/p\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003eExposure and Outcome Variables\u003c/h2\u003e \u003cp\u003eThe type of primary outcome was classified as binary, continuous, or categorical. Similarly, the type of primary exposure was categorized as binary, continuous, or categorical. Microbiome-related variables were further categorized according to their role in the presumed causal structure - as exposure, outcome, mediator, or moderator.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eMicrobiome Measures\u003c/h3\u003e\n\u003cp\u003e We recorded the primary microbiome measures used to generate the main conclusions, including alpha diversity, beta diversity, relative abundance, or absolute abundance.\u003c/p\u003e\n\u003ch3\u003eStatistical Models\u003c/h3\u003e\n\u003cp\u003eWe documented the statistical models used in each study to perform inferential analysis. Possible methods included nonparametric tests (e.g., Wilcoxon signed-rank test, Mann-Whitney U test, Kruskal-Wallis test), correlation measures (e.g., Spearman\u0026rsquo;s or Pearson\u0026rsquo;s correlation coefficients), hypothesis testing (e.g., t-tests, analysis of variance), regression models (e.g., logistic regression, linear regression, multinomial logistic regression, mixed-effects models), machine learning methods (e.g., random forests, support vector machines, deep learning), and microbiome-specific models (e.g., Linear Discriminant Analysis Effect Size), ANCOM (Analysis of Composition of Microbiomes), MaAsLin (Multivariate Association with Linear Models), PLS-DA (Partial Least Squares Discriminant Analysis))\u003c/p\u003e \u003cdiv id=\"Sec11\" class=\"Section2\"\u003e \u003ch2\u003eConfounding Control\u003c/h2\u003e \u003cp\u003eWe examined whether confounding control was implemented and, if so, documented the strategy used. Strategies included covariate adjustment, inverse probability weighting (IPW), matching, restriction, standardization, or stratification. If no confounding control strategy was specified, this was noted. We also assessed whether the choice of confounding control was justified, categorizing the rationale as data-driven (e.g., difference testing or other methods), based on directed acyclic graphs (DAGs), informed by expert input, grounded in literature, or not addressed.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec12\" class=\"Section2\"\u003e \u003ch2\u003eMultiple Comparisons\u003c/h2\u003e \u003cp\u003eWe assessed whether studies addressed the issue of multiple comparisons, which refers to the inflation of type I error resulting from conducting multiple statistical hypothesis tests. Specifically, we recorded whether any correction for multiple testing was reported (e.g., Bonferroni adjustment, false discovery rate control) and documented this as \u0026ldquo;yes\u0026rdquo; if any such method was applied, or \u0026ldquo;no\u0026rdquo; if the issue was not addressed.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec13\" class=\"Section2\"\u003e \u003ch2\u003eMediation and Modification Analyses\u003c/h2\u003e \u003cp\u003eFor each study, we examined whether mediation mechanisms were claimed and noted the corresponding sentence or paragraph and mediation analysis conducted. Similarly, we recorded whether effect modification was reported and type of analyses were conducted.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec14\" class=\"Section2\"\u003e \u003ch2\u003eAssessment of Causal Language\u003c/h2\u003e \u003cp\u003eWe extracted the effect size measures reported in the studies, along with sentences used to report and interpret these measures. Where action recommendations or future research recommendations were provided, we recorded these as \"yes\" or \"no\" and captured the corresponding sentence(s). Statements were selected in accordance with the criteria described in reference (\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eWe evaluated the use of causal versus associational language using two complementary approaches. First, a human reviewer (AT) assessed the strength of causal claims using the framework in (\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e), which defines causal language, action recommendations, and their grading (none, weak, moderate, strong). Second, a Large Language Model (LLM), ChatGPT based on the GPT-4 architecture (\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e), was used to assess both the strength of causal claims and the strength of practical recommendations. A standardized query was applied to all sentences (Supplementary material), and the model was blinded to other characteristics of the studies to minimize bias. This analysis was conducted in January 2025.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec15\" class=\"Section2\"\u003e \u003ch2\u003eSoftware\u003c/h2\u003e \u003cp\u003eFor abstract and full-text screening and data extraction, we used Covidence, a web-based platform that streamlines systematic review workflows (\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e). The extraction templates within Covidence were customized to include the sections described above to ensure consistency and comprehensiveness in data collection. Descriptive graphs and tables were generated using R statistical software (version 4.3.0), which facilitated the analysis and visualization of the data (\u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e).\u003c/p\u003e \u003c/div\u003e"},{"header":"RESULTS","content":"\u003cdiv id=\"Sec17\" class=\"Section2\"\u003e \u003ch2\u003eStudy selection and included sample\u003c/h2\u003e \u003cp\u003eA total of 1,864 records were identified across the ten journals included in this review (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e Standard systematic review guidelines such as PRISMA were not fully applicable to this methodological review (\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e); therefore, Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e presents a PRISMA-style flow diagram to transparently document the article selection process. Following title and abstract screening, 679 records were screened, and 259 articles were assessed at the full-text level. Of these, 205 studies met the predefined methodological inclusion criteria and were included in the final analysis. As screening was conducted using a journal-stratified sampling strategy with a maximum of 25 included studies per journal, the number of screened and included articles varied across journals, reflecting differences in publication volume and eligibility rather than differential exclusion at later stages of review.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eThis figure summarizes the study selection process across ten journals. Screening was conducted sequentially within each journal until either the prespecified target of 25 eligible publications was reached or no additional publications were available for screening. Consequently, the number of records screened and studies included varies by journal. In total, data were extracted from 205 publications. Because the objective of this review was to characterize the landscape of methodological approaches rather than to exhaustively synthesize all available evidence, a journal-stratified sampling strategy was applied with a maximum of 25 included studies per journal. The flow diagram therefore represents a purposeful screening process rather than complete enumeration.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec18\" class=\"Section2\"\u003e \u003ch2\u003eStudy designs, sample sizes, and group structure\u003c/h2\u003e \u003cp\u003eThe publication and methodological characteristics of the 205 included studies are summarized in Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e. The included studies comprised primarily prospective cohort studies (28.8%, n\u0026thinsp;=\u0026thinsp;59), cross-sectional studies (26.8%, n\u0026thinsp;=\u0026thinsp;55), and case\u0026ndash;control studies (23.9%, n\u0026thinsp;=\u0026thinsp;49), with fewer randomized controlled trials (12.2%, n\u0026thinsp;=\u0026thinsp;25), pre\u0026ndash;post studies (4.4%, n\u0026thinsp;=\u0026thinsp;9), nested case\u0026ndash;control studies (2.4%, n\u0026thinsp;=\u0026thinsp;5), and retrospective cohort studies (1.5%, n\u0026thinsp;=\u0026thinsp;3).\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eDistribution of study characteristics across 205 included studies.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"4\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eStudy characteristics\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eNumber (out of 205)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003ePercentage (%)\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eJournal\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cem\u003eCell Host \u0026amp; Microbe\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e25\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cem\u003e12.2\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cem\u003eGut Microbes\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e25\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cem\u003e12.2\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cem\u003eMicrobiome\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e25\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cem\u003e12.2\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cem\u003enpj Biofilms and Microbiomes\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e25\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cem\u003e12.2\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cem\u003eScientific Reports\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e25\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cem\u003e12.2\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cem\u003emSystems\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e24\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cem\u003e11.7\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cem\u003emSphere\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e18\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cem\u003e8.8\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cem\u003emBio\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e18\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cem\u003e8.8\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cem\u003eNature Microbiology\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e15\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cem\u003e7.3\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cem\u003eISME\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e5\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cem\u003e2.4\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eYear of publication\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e2019\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e16\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cem\u003e7.8\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e2020\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e36\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cem\u003e17.6\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e2021\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e39\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cem\u003e19.1\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e2022\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e43\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cem\u003e21.1\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e2023\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e39\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cem\u003e19.1\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e2024\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e31\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cem\u003e15.2\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eStudy design\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eProspective cohort study\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e59\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cem\u003e28.8\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eCross-sectional study\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e55\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cem\u003e26.8\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eCase-control study\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e49\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cem\u003e23.9\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eRandomized controlled trial\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e25\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cem\u003e12.2\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ePre-post study\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e9\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cem\u003e4.4\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNested case-control study\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e5\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cem\u003e2.4\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eRetrospective cohort study\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cem\u003e1.5\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eInferential framework\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eStatistical (hypothesis) testing\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e160\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cem\u003e78\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eCausal inference\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e31\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cem\u003e15.1\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eSupervised machine learning\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e8\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cem\u003e3.9\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eUnsupervised machine learning\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e4\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cem\u003e2\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eBayesian inference\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cem\u003e1\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRole of microbiome studied\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eExposure\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e77\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cem\u003e37.6\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eOutcome\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e75\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cem\u003e36.6\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eMediator\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e46\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cem\u003e22.4\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eModerator/ Effect modifier\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e7\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cem\u003e3.4\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eConfounding control strategy\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAdjustment\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e101\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cem\u003e49.3\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNone\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e71\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cem\u003e34.6\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eGroup-level matching\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e27\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cem\u003e13.2\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eStratification\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e4\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cem\u003e2\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eRestriction\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cem\u003e1\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eConfounding control justification\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNone\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e71\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cem\u003e34.6\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eDifference testing\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e46\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cem\u003e22.4\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eData-driven\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e10\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cem\u003e4.9\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eLiterature-driven\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e10\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cem\u003e4.9\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eDAG\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cem\u003e1\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMultiple comparison control\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e161\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cem\u003e78.5\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMediation mechanism claimed\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e115\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cem\u003e56.1\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eModeration mechanism claimed\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e50\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cem\u003e24.4\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eEffect size measure\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ep-value\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e51\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cem\u003e24.9\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eCorrelation coefficient\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e44\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cem\u003e21.5\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eRegression coefficient\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e24\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cem\u003e11.7\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eLEfSe\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e21\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cem\u003e10.2\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eLog-fold change\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e20\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cem\u003e9.8\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eCoefficient of determination\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e14\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cem\u003e6.8\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eArea under the curve\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e11\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cem\u003e5.4\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eOdds ratio\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e9\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cem\u003e4.4\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eOther\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e11\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cem\u003e5.4\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eFuture research recommended\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e178\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cem\u003e86.8\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ePractical recommendations given\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e105\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cem\u003e51.2\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003ctfoot\u003e \u003ctr\u003e\u003ctd colspan=\"4\"\u003eThis table summarizes key methodological and reporting features of the included studies, including publication year, study design, analytical approach, role of the microbiome in the causal structure, confounding control strategies, outcome modelling practices, and reporting of effect sizes. Counts and percentages reflect the number of studies exhibiting each characteristic out of the total sample (N\u0026thinsp;=\u0026thinsp;205).\u003c/td\u003e\u003c/tr\u003e \u003c/tfoot\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e\u003cp\u003eThe average number of human participants across studies was 2331 (range 8\u0026ndash;422417), with a median sample size of 91. Case-control studies showed the greatest heterogeneity in sample size (median\u0026thinsp;=\u0026thinsp;98; range\u0026thinsp;=\u0026thinsp;20\u0026ndash;422417), driven by a small number of very large studies. Prospective cohort and cross-sectional studies were moderately sized (medians\u0026thinsp;=\u0026thinsp;81 and 216, respectively), while randomized controlled trials and pre-post studies tended to include smaller cohorts (medians\u0026thinsp;=\u0026thinsp;56 and 20 respectively).\u003c/p\u003e \u003cp\u003eMore than half of the studies (53.2%, n\u0026thinsp;=\u0026thinsp;109) compared two study groups. An additional 15.1% of studies (n\u0026thinsp;=\u0026thinsp;31) focused on a single study group, typically within longitudinal designs or subgroup analyses. Among single-group studies, the median sample size was 43 (range: 8\u0026ndash;1054), whereas two-group studies had a median sample size of 82 (range: 9 \u0026minus;\u0026thinsp;3890). One study analyzed 16 distinct cohorts, representing the largest number of groups included in a single analysis.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec19\" class=\"Section2\"\u003e \u003ch2\u003eConceptual role of the microbiome in study designs\u003c/h2\u003e \u003cp\u003eAcross the included studies, the microbiome was most frequently conceptualized as an exposure variable (37.6%), followed closely by its role as an outcome (36.6%) (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). In 22.0% and 3.4% of studies, the microbiome was analyzed as a mediating variable and effect modifier, respetively.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec20\" class=\"Section2\"\u003e \u003ch2\u003eVariable types and operationalization across microbiome roles\u003c/h2\u003e \u003cp\u003eTo examine how conceptual roles translated into analytical practice, we assessed the types of variables used to represent exposures and outcomes when the microbiome was studied as an exposure, outcome, mediator, or effect modifier (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eWhen the microbiome was analyzed as an exposure (n\u0026thinsp;=\u0026thinsp;77), continuous microbiome measures dominated (74/77, 96.1%). The most common configuration paired binary outcomes with continuous microbiome exposures (38/77, 49.4%), followed by continuous outcomes with categorical microbiome exposures (27/77, 35.1%) and continuous outcomes with continuous microbiome exposures (9/77, 11.7%). Binary or discrete microbiome exposures were rare, together accounting for only 3.9% of studies.\u003c/p\u003e \u003cp\u003eIn studies treating the microbiome as an outcome (n\u0026thinsp;=\u0026thinsp;75), two main patterns emerged. Continuous microbiome outcomes were most frequently paired with binary exposures (38/75, 50.7%) or discrete exposures (32/75, 42.7%). Continuous exposures were uncommon (5/75, 6.7%), mostly appearing in continuous\u0026ndash;continuous configurations (4/75, 5.3%).\u003c/p\u003e \u003cp\u003eWhen the microbiome functioned as a mediating variable (n\u0026thinsp;=\u0026thinsp;46), the most frequent configurations involved continuous outcomes. Continuous outcomes paired with binary exposures accounted for 30.4% of studies (14/46), while continuous outcomes paired with discrete exposures represented 23.9% (11/46). Other combinations were less common and more evenly distributed, with no single alternative pattern exceeding 13% of studies.\u003c/p\u003e \u003cp\u003eFinally, in the smaller subset of studies examining the microbiome as a moderator (n\u0026thinsp;=\u0026thinsp;7), patterns were heterogeneous. Discrete or binary exposures predominated (5/7, 71.4%), most commonly paired with categorical or continuous outcomes (each 2/7, 28.6%). All other exposure\u0026ndash;outcome combinations occurred once or not at all.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eEach heatmap depicts the frequency of combinations of exposure and outcome variable types (binary, categorical, continuous) used when the microbiome is studied as an exposure, outcome, mediator, or effect modifier.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec21\" class=\"Section2\"\u003e \u003ch2\u003eStatistical methods used in microbiome analyses\u003c/h2\u003e \u003cp\u003eThe overall landscape of statistical methods employed across the included studies is shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003ea. In total, 51 distinct analytical methods were identified and grouped into ten broad categories, including non-parametric tests, parametric tests, compositional analyses, correlation-based methods, linear and generalized linear models, machine learning approaches, causal inference methods, survival analysis, Bayesian methods, and other approaches.\u003c/p\u003e \u003cp\u003eWhen grouped by microbiome role and metric, methodological heterogeneity varied considerably. In studies with the microbiome as an exposure (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eb), alpha diversity analyses (n\u0026thinsp;=\u0026thinsp;10) were dominated by non-parametric tests (7/10, 70%), with the remaining analyses using a mix of parametric, compositional, and survival methods. Beta diversity was assessed exclusively with non-parametric methods (2/2). Relative abundance studies (n\u0026thinsp;=\u0026thinsp;125) employed a wide range of approaches: non-parametric tests were most frequent (28/125, 22.4%), followed by compositional analyses (24/125, 19.2%), parametric tests (15/125, 12.0%), generalized linear models and machine learning (each 12/125, 9.6%), correlation analyses (11/125, 8.8%), causal inference methods (2/125, 1.6%), survival analysis (1/125, 0.8%), and other methods (20/125, 16.0%).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003ePanels summarize the number of studies applying each method type when analyzing exposures (n\u0026thinsp;=\u0026thinsp;77), outcomes (n\u0026thinsp;=\u0026thinsp;75), alpha diversity (n\u0026thinsp;=\u0026thinsp;41), beta diversity (n\u0026thinsp;=\u0026thinsp;16), and relative abundance (n\u0026thinsp;=\u0026thinsp;314). Bars represent counts of studies using non-parametric, compositional, parametric, generalized linear, machine-learning, correlation-based, causal-inference, survival, Bayesian, and other analytical approaches.\u003c/p\u003e \u003cp\u003eIn studies with the microbiome as an outcome variable (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003ec), alpha diversity analyses (n\u0026thinsp;=\u0026thinsp;20) were primarily non-parametric (11/20, 55%), with parametric tests (4/20, 20%), generalized linear models (2/20, 10%), and other methods (3/20, 15%). Beta diversity comparisons (n\u0026thinsp;=\u0026thinsp;11) relied mostly on non-parametric approaches (9/11, 81.8%), with the remaining studies using parametric and other methods (2/11, 18.2%). Relative abundance studies (n\u0026thinsp;=\u0026thinsp;106) showed high heterogeneity: non-parametric tests were most common (35/106, 33.0%), followed by compositional analyses (21/106, 19.8%), parametric tests (13/106, 12.3%), generalized linear models (12/106, 11.3%), other methods (14/106, 13.2%), machine learning (6/106, 5.7%), and correlation analyses (5/106, 4.7%).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec22\" class=\"Section2\"\u003e \u003ch2\u003eReported effect size measures and uncertainty\u003c/h2\u003e \u003cp\u003eAcross the included studies, reporting practices showed the predominance of difference-testing approaches. The p-value alone or along with other measures was the most commonly reported measure overall, appearing in 25% of studies (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e). Correlation coefficients (21.5%) and regression coefficients (11.7%) were the next most frequently reported effect size measures. Only 10.2% of studies (n\u0026thinsp;=\u0026thinsp;21) reported confidence intervals for effect size estimates, limiting the interpretability and comparability of reported findings.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eThis faceted bar chart illustrates the distribution of commonly reported effect size measures across different microbiome roles.\u003c/p\u003e \u003cp\u003eThe choice of effect measures varied by the conceptual role of the microbiome. In microbiome-as-exposure studies, correlation coefficients were most frequently reported (16/77, 20.8%), followed by a Linear discriminant analysis effect size (LEfSe) (11/77, 14.3%), area under the curve or AUC (9/77, 11.7%), regression coefficients (9/77, 11.7%), and log fold-change (7/77, 9.1%), while hazard ratios were rarely reported (2/77, XX %). In microbiome-as-outcome studies, correlation coefficients (16/75, 21.3%) and regression coefficients (10/75, 13.3%) were most commonly estimated, followed by the coefficient of determination (7/75, 9.3%), log fold-change (6/75, 8.0%), and LEfSe (5/75, 6.7%), with fewer instances of the odds ratio (2/75), fold-change (1/75), or incidence ratio (1/75). In microbiome-as-mediators studies, correlation coefficients (11/46, 23.9%), the log fold-change (6/46, 13.0%), regression coefficients (5/46, 10.9%), and LEfSe (5/46, 10.9%) were primarily reported, with rarer instances for the F-statistic (1/46), hazard ratio (1/46), and Mendelian randomization effect size (1/46). Moderator studies were less consistent, with only a few reporting AUC (2/7), coefficient of determination (2/7), correlation coefficient (1/7), LEfSe (1/7), and no studies reporting regression or risk ratios.\u003c/p\u003e \u003cdiv id=\"Sec23\" class=\"Section3\"\u003e \u003ch2\u003eConfounding Control in Observational Studies\u003c/h2\u003e \u003cp\u003eAmong the 171 observational studies (excluding randomized controlled trials and pre-post designs), 32.0% (n\u0026thinsp;=\u0026thinsp;54) did not apply any form of confounding control (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e). Of these, 70.0% (n\u0026thinsp;=\u0026thinsp;38) provided no justification for the absence of confounding control, while 27.8% (n\u0026thinsp;=\u0026thinsp;15) relied on difference testing to assess covariate balance between groups.\u003c/p\u003e \u003cp\u003eAmong studies that implemented confounding control, adjustment was the most common strategy (52.0%, n\u0026thinsp;=\u0026thinsp;89), followed by group-level matching (14.0%, n\u0026thinsp;=\u0026thinsp;24). Restriction and stratification were rarely used, each appearing in 1.2% of studies (not shown in the Figure). Among studies that adjusted for confounding, 52.8% (n\u0026thinsp;=\u0026thinsp;47) did not justify their selection of covariates, and 31.5% (n\u0026thinsp;=\u0026thinsp;28) relied on significance testing to guide covariate inclusion.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv id=\"Sec24\" class=\"Section2\"\u003e \u003ch2\u003eMediation and Mechanistic Claims\u003c/h2\u003e \u003cp\u003eMore than half of the included studies (53.2%, n\u0026thinsp;=\u0026thinsp;109) reported conducting mediation analyses, and 56.1% (n\u0026thinsp;=\u0026thinsp;115) explicitly claimed a mediation mechanism. Overall, 55% of studies treating the microbiome as an exposure and 33.3% of studies where microbiome was an outcome made claims about mediation mechanisms. Mediation claims were most common in studies examining the microbiome as a mediator (38.3%, n\u0026thinsp;=\u0026thinsp;44) or as an exposure (37.4%, n\u0026thinsp;=\u0026thinsp;43).\u003c/p\u003e \u003cp\u003eThe most frequently used method to support mediation claims was Spearman\u0026rsquo;s correlation (39.1%), followed by Pearson\u0026rsquo;s correlation (7.0%). A small proportion of studies supported mediation claims using additional laboratory/animal experiments (6.1%), while an equal proportion (6.1%) claimed causal mechanisms without providing analytical evidence.\u003c/p\u003e \u003cdiv id=\"Sec25\" class=\"Section3\"\u003e \u003ch2\u003eStrength of Causal Claims and Practical Recommendations\u003c/h2\u003e \u003cp\u003eIn the assessment of the strength of causal claims, overall agreement between the human\u0026rsquo;s and LLM\u0026rsquo;s assessments was high, with the LLM tending to assign slightly more conservative ratings, (Figs.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003ea and \u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003eb). The human reviewer identified 34 studies as making strong causal claims, whereas GPT identified 18. Conversely, 115 studies were rated as making no or weak causal claims by the human reviewer, compared with 136 by GPT.\u003c/p\u003e \u003cp\u003eRegarding practical recommendations, 51.2% of studies (n\u0026thinsp;=\u0026thinsp;105) provided recommendations for implementation. Case-control studies reported causal recommendations in 25 of 49 studies (51.0%). Randomized controlled trials provided recommendations in 15 of 25 studies (60%) and prospective cohort studies did so in 33 of 59 (55.9%). Nested case-control studies provided recommendations in 4 of 5 studies. Pre-post studies reported recommendations in 4 of 9 studies. Cross-sectional studies were less likely to lead to such recommendations, with 23 of 55 reporting recommendations (51.0%). and retrospective cohort studies rarely did so, with only 1 of 3 reporting recommendations.\u003c/p\u003e \u003cp\u003eAmong studies that provided recommendations, most issued weak or moderate recommendations, and no study issued a strong recommendation (Figs.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003ec and \u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003ed). Studies with weaker causal claims tended to issue weaker recommendations, although a small number of studies issued recommendations despite making no explicit causal claims.\u003c/p\u003e \u003cp\u003eFinally, the strength of causal claims varied by the conceptual role of the microbiome (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003ee). Studies treating the microbiome as an exposure most commonly made no or weak causal claims, whereas moderate and strong claims were more frequent when the microbiome was examined as a mediator or outcome. Studies considering the microbiome as an effect modifier tended toward weak or moderate claims, although the small number of such studies limits interpretation.\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e"},{"header":"DISCUSSION","content":"\u003cp\u003e This methodological review demonstrates that, despite rapid growth and methodological innovation in microbiome research, rigorous causal inference approaches remain infrequently applied. Most studies continue to rely on association-based analyses, with limited attention to confounding control, counterfactual reasoning, and clinically interpretable effect estimation.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cdiv id=\"Sec27\" class=\"Section2\"\u003e \u003ch2\u003eStudy design and temporal structure\u003c/h2\u003e \u003cp\u003eOne notable pattern observed in the reviewed literature is a shift toward more temporally structured study designs. In a prior systematic review by Bardenhorst et al. (\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e), covering studies published in 2018\u0026ndash;2019, only 27% employed longitudinal designs. In contrast, 45% of studies in the present review used longitudinal or prospective designs. This change is consistent with increased recognition of temporality as a key component of causal interpretation. However, while improved temporal ordering strengthens causal plausibility, it does not, on its own, address other core challenges related to confounding, measurement, or effect interpretation (\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e, \u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec28\" class=\"Section2\"\u003e \u003ch2\u003eMicrobiome role, variable representation, and causal consistency\u003c/h2\u003e \u003cp\u003eA particularly informative finding concerns how variable types are distributed according to the microbiome\u0026rsquo;s conceptual role in the causal model. When treated as an exposure, microbiome features were overwhelmingly represented as continuous measures, such as relative abundances or diversity indices. Although ecologically intuitive, this dominance of broad, continuously defined exposures complicates causal interpretation. Continuous microbiome measures often lack a clear intervention analogue, making it difficult to define well-specified causal estimands or to satisfy the consistency assumption, which requires that exposure levels correspond to coherent interventions.(\u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e) Moreover, the limited use of causal frameworks explicitly designed to handle continuous, high-dimensional exposures further constrains interpretability. As a result, even when statistical associations are robust, the corresponding causal questions often remain unclear, and effect estimates may be difficult to compare across studies or translate into meaningful biological or clinical interpretations.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec29\" class=\"Section2\"\u003e \u003ch2\u003eConfounding control and variable selection\u003c/h2\u003e \u003cp\u003eA recurring limitation across studies was the inconsistent handling and reporting of confounding. Although demographic, behavioral, and clinical covariates were frequently collected, few studies clearly articulated the rationale for confounder selection or employed formal tools such as directed acyclic graphs (DAGs) to justify adjustment strategies. Design-based approaches, including matching or restriction, were also rarely reported.\u003c/p\u003e \u003cp\u003eIn many cases, confounder selection appeared to rely on difference testing or data-driven significance thresholds as recommended by some current guidelines including the STrengthening the Reporting of OBservational studies in Epidemiology (STROBE) (\u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e). This approach is problematic, as it may fail to identify variables that strongly influence the outcome but are weakly associated with the exposure, leaving residual confounding unaddressed (\u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e37\u003c/span\u003e). It is not expected that a fixed predefined set of confounders would be accounted for in microbiome studies, because ideally the confounder set is to be tailored to the specific research question, design, and population. However, as shown by Vujkovic-Cvijin et al. (\u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e), factors such as sex, BMI, diet, alcohol consumption, and bowel movement quality explain substantial variation in gut microbiota composition and may confound disease associations if not appropriately accounted for. The inconsistent treatment of such variables reflects a broader challenge in aligning statistical practices with causal reasoning in microbiome research.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eEffect and uncertainty measures reporting and interpretability\u003c/h3\u003e\n\u003cp\u003eAnother prominent pattern concerns the limited reporting of effect and uncertainty measures. Many studies relied primarily on p-values alone to convey results, while fewer reported effect estimates, and only a small fraction provided confidence intervals. This practice restricts interpretability, as statistical significance alone does not convey the magnitude, direction, or precision of an effect.\u003c/p\u003e \u003cp\u003eAs noted in the clinical and epidemiologic literature (\u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e, \u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e40\u003c/span\u003e) small and clinically irrelevant effects may achieve statistical significance in large samples, whereas meaningful effects may go undetected in smaller studies. Without effect sizes and confidence intervals, it is difficult to assess the practical relevance of reported findings or to compare results across studies. The observed variability in reported effect measures further limits synthesis, particularly when microbiome features are analyzed on different scales or treated differently across causal roles.\u003c/p\u003e \u003cdiv id=\"Sec31\" class=\"Section2\"\u003e \u003ch2\u003eMediation analyses and reliance on correlations\u003c/h2\u003e \u003cp\u003eMediation analyses represent another area where current practices impose substantial limitations on causal interpretation. Spearman\u0026rsquo;s correlation was frequently used to explore relationships among exposures, microbiome features, and outcomes, often as a proxy for mediation. While correlation analyses can reveal associations suggestive of potential pathways, they do not establish causal mechanisms.\u003c/p\u003e \u003cp\u003eSeveral challenges arise in this context. First, correlation measures do not distinguish cause from effect or account for exposure\u0026ndash;mediator interactions. Second, microbiome-wide correlation analyses implicitly assume independence among microbial features, an assumption violated by compositional constraints and microbial interdependence. Third, correlation-based approaches typically do not control for confounders influencing both the mediator and the outcome, increasing the risk of spurious associations (\u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e, \u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eAs a result, while correlation analyses may be useful for exploratory purposes, they offer limited insight into causal mediation. More formal causal mediation frameworks, such as those developed by VanderWeele (\u003cspan citationid=\"CR43\" class=\"CitationRef\"\u003e43\u003c/span\u003e, \u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e44\u003c/span\u003e) explicitly define direct and indirect effects under counterfactual assumptions and can, in principle, accommodate treatment\u0026ndash;microbiome interactions. The limited adoption of such frameworks in the reviewed literature underscores a gap between the complexity of the scientific questions being posed and the analytical tools commonly applied.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec32\" class=\"Section2\"\u003e \u003ch2\u003eStrength of causal claims and reported recommendations\u003c/h2\u003e \u003cp\u003e Importantly, this review highlights that relatively few microbiome studies make strong explicit causal claims or offer practical recommendations. Studies that did do so were more commonly those based on stronger designs, such as randomized controlled trials, pre\u0026ndash;post interventions, or longitudinal cohort studies. This pattern suggests that, despite substantial analytical effort across the field, much of the published microbiome literature does not yet translate into intervention-relevant conclusions.\u003c/p\u003e \u003cp\u003eThe limited prevalence of causal claims and actionable recommendations underscores the need for broader adoption of causal inference frameworks in microbiome research. Without clearly articulated causal questions and corresponding analytical strategies, large volumes of data may yield primarily associative findings with restricted downstream utility. More systematic use of causal inference techniques has the potential to improve research efficiency by enabling clearer interpretation of results, better prioritization of follow-up studies, and more direct relevance to clinical or public health decision-making.\u003c/p\u003e \u003cdiv id=\"Sec33\" class=\"Section3\"\u003e \u003ch2\u003eImplications for the field\u003c/h2\u003e \u003cp\u003eTaken together, these findings highlight slow adoption of the application and reporting of causal inference in microbiome research. These limitations, ranging from variable definition and confounding control to effect measure and size reporting and mediation analysis, do not diminish the value of existing studies but rather delineate the boundaries of what current practices can support. Causal inference frameworks offer a conceptual structure that may help address these challenges, but their potential lies in clarifying assumptions, estimands, and interpretability rather than in providing prescriptive solutions.\u003c/p\u003e \u003cp\u003eThis is the first methodological review to summarize the methodological landscape, informing future efforts to align research questions, analytical strategies, and causal claims in microbiome science.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec34\" class=\"Section3\"\u003e \u003ch2\u003eLimitations\u003c/h2\u003e \u003cp\u003eThis study has several limitations. First, our review was restricted to 205 research articles published in 10 journals. While these journals are highly regarded in the microbiome field, the selection includes only one broad-topic journal, potentially limiting the generalizability of our findings. Standards for causal inference and reporting practices may vary across journals, especially in broader biomedical or interdisciplinary domains. By focusing on the top journals within the microbiome domain, our aim was not to provide an exhaustive survey but to identify prevailing trends, commonly used frameworks, and key methodological challenges in current causal inference practices. Second, stratifying by microbiome role or microbiome measures further reduced the number of studies available for analysis. This limitation is particularly relevant for studies examining the microbiome as a moderator, which were limited to only seven studies, preventing strong conclusions about trends in this specific subgroup. Third, many microbiome studies employ multiple analytical approaches rather than a single method. To facilitate mapping of current methodologies, we extracted data on up to two main analyses per study. Nonetheless, some additional analyses were not included, meaning the overview provided here does not fully capture the complete methodological landscape. Fourth, microbiome research is inherently complex, with study questions, microbiome measures, and analytical techniques varying widely. In this review, we stratified studies only by the microbiome\u0026rsquo;s conceptual role and the type of microbiome measure. Further stratifications could provide a more nuanced understanding of patterns and trends in current practices but were beyond the scope of this analysis. Finally, the assessment of causal language and the strength of recommendations involved a degree of subjectivity. Although we employed a standardized rubric and resolved discrepancies through consensus discussions, some level of interpretive ambiguity is unavoidable.\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e"},{"header":"CONCLUSION","content":"\u003cp\u003eAs microbiome research matures, there is increasing interest in moving beyond descriptive associations toward robust, interpretable causal claims. This review highlights both encouraging developments and areas of concern in the field\u0026rsquo;s current methodological practices. The rise in longitudinal and interventional designs is a promising step toward satisfying the fundamental requirement of temporality. Yet, this shift has not been consistently matched by appropriate analytic frameworks or transparent reporting. Moreover, meaningful effect estimation remains limited.\u003c/p\u003e \u003cp\u003e By identifying current patterns and gaps, this review highlights the pressing need for methodological innovations and interdisciplinary collaborations between microbiome researchers, statisticians, and epidemiologists to strengthen causal inference practices in the field. Addressing these challenges is essential to unlocking the full potential of microbiome studies to inform precision medicine, public health interventions, and our broader understanding of human health and disease.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e \u003ch2\u003eConflicts of interest\u003c/h2\u003e \u003cp\u003eNo conflicts of interest declared\u003c/p\u003e \u003c/p\u003e\u003ch2\u003eFunding\u003c/h2\u003e \u003cp\u003eWe acknowledge the Fonds de Recherche du Qu\u0026eacute;bec-Sant\u0026eacute; (FRQ-S) for the doctoral award to Albina Tskhay and Alibek Moldakozhayev. FRQ-S, did not play any role in the design, analysis or interpretation of results.\u003c/p\u003e\u003ch2\u003eAuthors\u0026rsquo; contributions\u003c/h2\u003e \u003cp\u003eAlbina Tskhay designed the data extraction template; screened abstracts and full-text articles; extracted and analysed the data; interpreted the results; and drafted the manuscript. Alibek Moldakozhayev screened abstracts and full-text articles, extracted data, interpreted the results, and critically revised the manuscript for important intellectual content. Cristina Longo co-supervised Albina Tskhay, contributed to the design of the data extraction template, interpreted the results, and critically revised the manuscript for important intellectual content. Roxana Behruzi, Celia Greenwood, Stan Kubow contributed to the interpretation of the results and critically revised the manuscript for important intellectual content. Vadim N. Gladyshev supervised Alibek Moldakozhayev and critically revised the manuscript for important intellectual content. Tibor Schusted supervised Albina Tskhay, contributed to the design of the data extraction template, interpreted the results, and critically revised the manuscript for important intellectual content.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eHelmink BA, Khan MAW, Hermann A, Gopalakrishnan V, Wargo JA (2019) The microbiome, cancer, and cancer therapy. Nat Med 25(3):377\u0026ndash;388\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHsiao EY, McBride SW, Hsien S, Sharon G, Hyde ER, McCue T et al (2013) Microbiota modulate behavioral and physiological abnormalities associated with neurodevelopmental disorders. Cell 155(7):1451\u0026ndash;1463\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi B, Selmi C, Tang R, Gershwin ME, Ma X (2018) The microbiome and autoimmunity: a paradigm from the gut\u0026ndash;liver axis. Cell Mol Immunol 15(6):595\u0026ndash;609\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMacQueen G, Surette M, Moayyedi P (2017) The gut microbiota and psychiatric illness. J Psychiatry Neurosci 42(2):75\u0026ndash;77\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRoy Sarkar S, Banerjee S (2019) Gut microbiota in neurodegenerative disorders. J Neuroimmunol 328:98\u0026ndash;104\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang P-X, Deng X-R, Zhang C-H, Yuan H-J (2020) Gut microbiota and metabolic syndrome. Chin Med J 133(7):808\u0026ndash;816\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAlibek K, Farmer S, Tskhay A, Moldakozhayev A, Alibek K, Isakov T (2019) The Role of Infection, Inflammation and Genetic Alterations in ASD Etiopathogenesis: A Review. J Neurol Psychiatr Disord 2:105\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRelman DA (2020) Thinking about the microbiome as a causal factor in human health and disease: philosophical and experimental considerations. Curr Opin Microbiol 54:119\u0026ndash;126\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLin H, Peddada SD (2020) Analysis of microbial compositions: a review of normalization and differential abundance analysis. npj Biofilms Microbiomes 6(1):60\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBardenhorst SK, Berger T, Klawonn F, Vital M, Karch A, R\u0026uuml;bsamen N (2021) Syst Rev Curr Pract mSystems 6(1). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1128/msystems.01154-20\u003c/span\u003e\u003cspan address=\"10.1128/msystems.01154-20\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. Data Analysis Strategies for Microbiome Studies in Human Populations\u0026mdash;a\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eXia Y, Sun J, Chen D-G (2018) What Are Microbiome Data? Springer Singapore, pp 29\u0026ndash;41\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLutz KC, Jiang S, Neugent ML, De Nisco NJ, Zhan X, Li Q (2022) A Survey of Statistical Methods for Microbiome Data Analysis. Front Appl Math Stat. ;8\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNearing JT, Douglas GM, Hayes MG, MacDonald J, Desai DK, Allward N et al (2022) Microbiome differential abundance methods produce different results across 38 datasets. Nat Commun 13(1):342\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eVanderWeele TJ, Hern\u0026aacute;n MA (2013) Causal Inference Under Multiple Versions of Treatment. J Causal Inference 1(1):1\u0026ndash;20\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLanza ST, Moore JE, Butera NM (2013) Drawing causal inferences using propensity scores: a practical guide for community psychologists. Am J Community Psychol 52(3\u0026ndash;4):380\u0026ndash;392\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLousdal ML (2018) An introduction to instrumental variable assumptions, validation and estimation. Emerg Themes Epidemiol 15(1):1\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eXia Y, Sun J, Chen D-G (2018) What are microbiome data? Statistical analysis of microbiome data with R: Springer; pp. 29\u0026ndash;41\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLutz KC, Jiang S, Neugent ML, De Nisco NJ, Zhan X, Li Q (2022) A survey of statistical methods for microbiome data analysis. Front Appl Math Stat 8:884810\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWozniak H, Ga\u0026iuml;a N, Lazarevic V, Le Terrier C, Beckmann TS, Balzani E et al (2024) Early reduction in gut microbiota diversity in critically ill patients is associated with mortality. Ann Intensiv Care 14(1):174\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBarnett TA, Koushik A, Schuster T (2023) Invited Commentary: Cross-Sectional Studies and Causal Inference-It's Complicated. Am J Epidemiol 192(4):517\u0026ndash;519\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDebelius J, Song SJ, Vazquez-Baeza Y, Xu ZZ, Gonzalez A, Knight R (2016) Tiny microbes, enormous impacts: what matters in gut microbiome studies? Genome Biol 17(1):217\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFischbach MA (2018) Microbiome: focus on causation and mechanism. Cell 174(4):785\u0026ndash;790\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTskhay A, Moldakozhayev A, Longo C, Schuster T (2024) Methodological Evaluation in Microbiome Research: Approaches Employed to Study Microbiome as an Exposure and Outcome Variable. Systematic Review Pre-Registration\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSCImago SJR \u0026mdash; SCImago Journal \u0026amp; Country Rank [Portal] n.d [Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://www.scimagojr.com\u003c/span\u003e\u003cspan address=\"http://www.scimagojr.com\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHuang Z, Liu K, Ma W, Li D, Mo T, Liu Q (2022) The gut microbiome in human health and disease\u0026mdash;Where are we and where are we going? A bibliometric analysis. Front Microbiol. ;13\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKnight R, Vrbanac A, Taylor BC, Aksenov A, Callewaert C, Debelius J et al (2018) Best practices for analysing microbiomes. Nat Rev Microbiol 16(7):410\u0026ndash;422\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMinich JJ, Humphrey G, Benitez RAS, Sanders J, Swafford A, Allen EE, Knight R (2018) High-Throughput Miniaturized 16S rRNA Amplicon Library Preparation Reduces Costs while Preserving Microbiome Integrity. mSystems 3(6). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1128/msystems.00166\u0026thinsp;\u0026ndash;\u0026thinsp;18\u003c/span\u003e\u003cspan address=\"10.1128/msystems.00166\u0026thinsp;\u0026ndash;\u0026thinsp;18\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHaber NA, Wieten SE, Rohrer JM, Arah OA, Tennant PWG, Stuart EA et al (2022) Causal and Associational Language in Observational Health Research: A Systematic Evaluation. Am J Epidemiol 191(12):2084\u0026ndash;2097\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eOpenAI CGPT May 13 ed2024\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eInnovation VH Covidence systematic review software Melbourne, Australia\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ede Micheaux PL, Drouilhet R, Liquet B (2013) The R software. Fundamentals of Programming and Statistical Analysis. :978-1\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePage MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD et al (2021) The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ 372:n71\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHaine D, Dohoo I, Dufour S (2018) Selection and misclassification biases in longitudinal studies. Front veterinary Sci 5:99\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLaird NM (1988) Missing data in longitudinal studies. Stat Med 7(1\u0026ndash;2):305\u0026ndash;315\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChatton A, Rohrer JM (2024) The Causal Cookbook: Recipes for Propensity Scores, G-Computation, and Doubly Robust Standardization. Adv Methods Practices Psychol Sci 7(1):25152459241236149\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003evon Elm E, Altman DG, Egger M, Pocock SJ, G\u0026oslash;tzsche PC, Vandenbroucke JP (2007) The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies. Ann Intern Med 147(8):573\u0026ndash;577\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSourial N, Vedel I, Le Berre M, Schuster T (2019) Testing group differences for confounder selection in nonrandomized studies: flawed practice. CMAJ 191(43):E1189\u0026ndash;e93\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eVujkovic-Cvijin I, Sklar J, Jiang L, Natarajan L, Knight R, Belkaid Y (2020) Host variables confound gut microbiota studies of human disease. Nature 587(7834):448\u0026ndash;454\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSullivan GM, Feinn R (2012) Using Effect Size\u0026mdash;or Why the P Value Is Not Enough. J Graduate Med Educ 4(3):279\u0026ndash;282\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGardner MJ, Altman DG (1986) Confidence intervals rather than P values: estimation rather than hypothesis testing. Br Med J (Clin Res Ed) 292(6522):746\u0026ndash;750\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJanse RJ, Hoekstra T, Jager KJ, Zoccali C, Tripepi G, Dekker FW, van Diepen M (2021) Conducting correlation analysis: important limitations and pitfalls. Clin Kidney J 14(11):2332\u0026ndash;2337\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGloor GB, Macklaim JM, Pawlowsky-Glahn V, Egozcue JJ (2017) Microbiome Datasets Are Compositional: And This Is Not Optional. Front Microbiol 8:2224\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eVanderWeele TJ (2013) A three-way decomposition of a total effect into direct, indirect, and interactive effects. Epidemiology 24(2):224\u0026ndash;232\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eVanderWeele TJ (2014) A unification of mediation and interaction: a 4-way decomposition. Epidemiology 25(5):749\u0026ndash;761\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":true,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"McGill University","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"microbiome, methodological review, analysis strategies, causal inference, observational studies, confounding control, study design, effect estimation","lastPublishedDoi":"10.21203/rs.3.rs-8681336/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-8681336/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003ch2\u003eBackground\u003c/h2\u003e \u003cp\u003eMicrobiome research seeks to clarify how microbial communities influence human health. Although etiological research paradigms are evolving in the bio-medical sciences, many microbiome studies continue to rely on association-based methods that detect statistical patterns but are limited identifying causal mechanisms needed to inform clinical or public health interventions. This methodological review evaluates the extent to which modern causal inference approaches have been adopted in human microbiome studies and identifies persistent challenges to their broader implementation.\u003c/p\u003e\u003ch2\u003eMethods\u003c/h2\u003e \u003cp\u003eWe systematically reviewed human microbiome studies published between 2019 and 2024 that examined links between the microbiome and health outcomes, or between exposures/interventions and microbiome composition, across ten high-impact journals identified using the Scimago Journal and Country Ranking. Eligible studies were retrieved from PubMed using a predefined search strategy. Two reviewers independently screened titles, abstracts and full texts and extracted data on study design, sampling, analytical framework, confounding control, effect size reporting, and the use of causal language. Analyses were performed using standardized extraction templates.\u003c/p\u003e\u003ch2\u003eResults\u003c/h2\u003e \u003cp\u003eAcross 205 included studies, adoption of causal inference approaches in microbiome research remains limited. Only 15% of studies used designs or analytical strategies capable of approximating causal effects\u0026mdash;12% were randomized controlled trials and 3% were observational studies employing formal causal inference methods. Longitudinal designs were common (45%). However, 30% of studies did not address confounding, and more than 40% did not report intervention-relevant or clinically actionable effect sizes. Studies making stronger causal claims were also more likely to propose intervention-relevant recommendations, regardless of the underlying study design.\u003c/p\u003e\u003ch2\u003eConclusion\u003c/h2\u003e \u003cp\u003eThe limited use of rigorous causal inference approaches remains a key barrier to producing actionable evidence in microbiome research. Greater adoption of principled confounding control, improved use of mediation and effect-modification frameworks, and more consistent reporting of interpretable effect sizes are necessary to strengthen causal claims. Advancing methodological standards and promoting interdisciplinary collaboration will be essential for translating microbiome findings into clinically meaningful insights.\u003c/p\u003e","manuscriptTitle":"Assessing the adoption of Causal Language and Methodology in Human Microbiome Studies: A Methodological Review","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-02-09 06:50:27","doi":"10.21203/rs.3.rs-8681336/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"72135ba1-0061-4526-841b-c692e93f1ff5","owner":[],"postedDate":"February 9th, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":61661971,"name":"General Microbiology"},{"id":61661972,"name":"Epidemiology"}],"tags":[],"updatedAt":"2026-02-10T16:31:46+00:00","versionOfRecord":[],"versionCreatedAt":"2026-02-09 06:50:27","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-8681336","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-8681336","identity":"rs-8681336","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00