Leveraging Bayesian Networks for Consensus Network Construction and Multi-Method Feature Selection to Decode Disease Prediction

preprint OA: closed CC-BY-4.0
📄 Open PDF Full text JSON View at publisher
Full text 167,900 characters · extracted from oa-pdf · 8 sections · click to expand

Abstract

Constructing reliable microbiome co-occurrence networks and identifying disease-associated taxa remain major challenges in microbiome research due to variability introduced by different inference algorithms. To overcome these limitations, we present CMIMN, a novel R package that uses a Bayesian network framework based on conditional mutual information to infer robust microbial interaction networks. To further enhance reliability, we construct a consensus microbiome network by integrating results from CMIMN and three widely used methods— SPIEC-EASI, SPRING, and SPARCC. This consensus approach, which overlays and weights edges shared across methods, reduces inconsistencies and provides a more biologically meaningful view of microbial relationships. In addition, we introduce a multi-method framework for identifying disease-associated microbial taxa by combining machine learning and network-based feature selection. Our ML pipeline applies distinct algorithms and identifies key taxa based on their consistent importance across models. Complementing this, we employ two network-based strategies that prioritize taxa based on centrality differences between ‘clean tubers’ and ‘scab-infected tubers’ networks, as well as a composite scoring system that ranks nodes using integrated network metrics. Our results show that CMIMN achieves high robustness in network inference, and that the consensus network further improves stability and interpretability. The multi-method feature selection framework enhances confidence in identifying biologically relevant taxa linked to potato common scab disease. Notably, we identify Bacteroidota, WPS-2, and Proteobacteria at the Phylum level; Actinobacteria, AD3, Bacilli, Anaerolineae, and Ktedonobacteria at the Class level; and C0119, Defluviicoccales, Bacteroidales, and Ktedonobacterales at the Order level as key taxa associated with disease status.

Keywords

Bayesian networks, Microbiome Co-occurrence network inference, Soil microbial ecology, Multi-method Feature selection. 1 Introduction Potatoes, as the world’s fourth most essential crop, play a vital role in addressing global food security. However, soil-borne diseases like potato common scab, caused by Streptomyces scabies and related pathogens, significantly threaten potato yield and quality. This disease results in economic losses due to the rejection of tubers with pitted, corky lesions and has broader implications for food security. While some growers and consultants may claim that fumigation with broad-spectrum fungicides controls common scab symptoms, evidence for its effectiveness remains inconsistent, and fumigation is often costly and environmentally restrictive [1]. In practice, genetic resistance (or tolerance) in potato cultivars has emerged as a more effective and sustainable strategy for managing the disease [2]. A promising complementary approach involves exploiting the potential of naturally disease-suppressive soils, which harbor specific microbial communities that suppress pathogens and reduce disease outbreaks. Research has shown that suppressive soils often harbor distinct microbial communities with a higher abundance of antagonistic taxa, such as non-pathogenic Streptomyces spp. and Bacillus [3, 4]. These insights underscore the importance of investigating microbial interactions in suppressive soils to guide environmentally sustainable disease control practices. Understanding the composition of microbial communities and the environmental factors shaping this composition is crucial for comprehending biological processes [5–10] and predicting plant phenotypic variations related to plant health ∗Corresponding author: [email protected] .CC-BY 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint and crop production [11–14]. To fully explore the complex interactions between microbes and their environment, we need robust computational approaches that can accurately represent microbial communities. Graphical structures like networks offer a strong mathematical framework for examining organismal relationships across various interactions, including those observed in food webs, plant-plant interactions, plant-animal associations, and their applications in detecting networks within gene regulation and protein-protein interaction systems [15]. Networks provide a formal yet intuitive representation of complex systems, where species are represented as nodes, and their interactions are represented as edges [16]. Although network analysis is widely used in microbiome studies, its application to soil microbial communities has emerged more recently, marked by growing interest in co-occurrence analysis. [17]. However, the complexity of soil presents unique challenges in constructing and interpreting network models, stemming from the need to account for the inter- and intra-variability of samples, which results from the intrinsic heterogeneity of soil conditions [10]. In recent years, the field of microbiology has witnessed significant advancements in network analysis techniques. Studies conducted by Wagg et al. [17] have pioneered the application of network theory to understand complex microbial interactions in various soil microbiome systems. Their findings highlight the potential of community network complexity in influencing ecosystem functions, suggesting that microbial interactions play a crucial role in soil health and resilience. However, their results also indicate that simpler diversity metrics, such as species richness, may explain a substantial proportion of variation in ecosystem functionality. This underscores the need to combine network metrics with traditional diversity measures to obtain a comprehensive understanding of microbial community functions. Moreover, network analysis in microbiome studies requires substantial methodological improvements, as demonstrated by Guseva et al. [15], who showed that different network-construction algorithms can significantly impact the inferred structure of microbial networks in soil ecosystems. This variability highlights the importance of carefully considering methodological choices when applying network analysis to soil microbial datasets, enabling more robust and hypothesis-driven research. In particular, inconsistencies across network inference methods pose a major challenge, limiting the reliability of microbiome-based discoveries. Working with microbiome data is inherently challenging due to its sparse, high-dimensional, and compositional nature [18]. Numerous methods exist to infer the structure of microbiome networks, but the results often exhibit minimal overlap across different approaches, reflecting the variability and complexity of the data [15]. Furthermore, the absence of a universally accepted gold standard for network evaluation complicates efforts to assess and validate inferred networks. Thus, there is an urgent need for methodological advancements that can enhance the reliability of microbiome network inference and provide biologically meaningful insights. In addition to constructing reliable networks, identifying important disease-related taxa is a key aspect of microbiome studies, particularly in the context of disease management [19]. Such insights can provide farmers with actionable tools to better understand and control the microbial ecosystems influencing crop health and productivity. Developing robust

Methods

to identify taxa associated with disease resistance or susceptibility is essential for translating microbiome research into practical agricultural solutions. By identifying microbial indicators of plant health, researchers can advance both scientific understanding and practical applications, offering sustainable strategies to mitigate disease impacts and enhance agricultural outcomes. This study builds upon prior research that explored associations between soil properties and biological phenotypes using machine learning models, including random forests and Bayesian neural networks [18]. Extending this work, we adopt a network-based perspective—specifically leveraging Bayesian networks—to investigate microbial relationships within the soil microbiome. Our approach addresses two primary challenges in microbiome analysis: constructing reliable co-occurrence networks and identifying microbial taxa associated with potato common scab disease. To enhance the robustness of disease-associated OTU identification, we introduce a comprehensive multi-method framework that combines machine learning-based feature selection with network-based strategies. This integrative design ensures that candidate taxa are supported by both predictive modeling and ecological network context, thereby increasing confidence in their biological relevance. The main contributions of this study are as follows: 1) We present a novel Bayesian network-based algorithm, implemented in the R packageCMIMN; 2) We propose a consensus network approach that integrates results from CMIMN, SPIEC-EASI, SPRING, and SPARCC, improving the reliability of inferred microbial associations; and 3) We develop a dual feature selection strategy that incorporates both machine learning outputs and network centrality metrics to identify key disease-associated taxa. Importantly, the methodology introduced here is broadly applicable and can be adapted for analyzing disease-related microbiome datasets across both agricultural and clinical domains. 2 Materials and methods Figure 1 shows a graphical representation of our pipeline with four major panels. Each panel represents a specific step: 2 .CC-BY 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint • Data Preparation: Microbiome abundance data is filtered to retain operational taxonomic units (OTUs) present in at least 15 samples, ensuring that low-prevalence taxa do not introduce noise. The resulting filtered data matrix serves as the foundation for downstream analyses. • Constructing Microbiome Networks: Microbiome networks are constructed using four meth- ods— SE_glasso, SPRING, SPARCC, and the proposed CMIMN. These networks represent microbial taxa as nodes and their co-occurrence relationships as edges. To enhance reliability, a consensus microbiome network is constructed by integrating results from all four methods. Edge weights indicate the level of agreement among methods, with a weight of 4 representing relationships confirmed by all methods and 0 indicating no agreement. • Feature Selection (machine learning (ML)): The filtered data is normalized using three methods (CLR, log, and TSS) and subjected to different ML-based feature selection methods. Each method assigns a "TOTAL" score to each OTU based on how frequently it is selected as important across the seven strategies. OTUs with high "TOTAL" scores are considered key features for further analysis. • Feature Selection (network-based): Networks are separately constructed for ‘scab-infected’ and ‘clean tuber’ samples using SE_glasso, SPRING, SPARCC, and CMIMN. Topological features are computed for each node in the networks. Two distinct strategies—differential centrality analysis and weighted scoring of OTUs—are applied to identify important OTUs based on network structure. The Venn diagram at the bottom illustrates the overlap between OTUs identified by ML-based methods and network- based feature selection strategies. This overlap highlights the robustness of the selected OTUs as key contributors to disease resistance. This workflow integrates statistical rigor and biological relevance, ensuring that the identified OTUs are reliable targets for further investigation and potential microbial intervention strategies. 2.1 Data preparation In this study, we focus on the soil microbiome (matrix of abundances) in a variety of taxonomic orders, including Phylum, Class, and Order from soil samples acquired from potato fields in Wisconsin and Minnesota. We concentrated on these three taxonomic levels due to their balance of interpretability and feature dimensionality, allowing for a meaningful analysis of microbial community structure and its association with disease. The dataset consists of microbial community data of pre-planting soils and the corresponding disease levels in the plants at harvest. Overall, we collected 256 soil samples, 108 of which were taken from 36 commercial fields in Minnesota, and 148 of which were taken from 50 fields in Wisconsin. This extensive dataset provides a comprehensive representation of soil microbial communities across two major potato-growing regions in the Upper Midwest. DNA was extracted from the the pre-planting soils, and analyzed for microbial community data following the method in [18]. Bacterial and fungal DNA was sequenced to capture a diverse range of microbial taxa, enabling us to investigate their interactions and potential roles in disease resistance or susceptibility. At harvest, potatoes were hand-harvested from a one-meter hill (usually 3-4 plants) at each sampling location. Tubers from one plant were visually evaluated for the presence of pitted scab lesions: which is a sign for serious common scab disease, as these tubers would be excluded from marketable yield. This binary disease label, 0 for healthy and 1 for diseased, serves as the target variable in our analyses, linking microbial community features to agricultural outcomes. The input data is a matrix with non-negative read counts that were generated by a sequencing procedure, filtered out so that we only include OTUs that appear in at least 15 samples [18]. This filtering ensures a focus on microbial taxa with sufficient prevalence to contribute meaningfully to statistical and network-based analyses, reducing noise from rare taxa. Table S1 displays the number of features (OTUs) before and after filtering for different taxonomic levels. To enhance reproducibility and facilitate further research, the raw sequencing data and preprocessing scripts are available upon request or through the project repository. 2.2 Constructing microbiome networks In the “Conditional mutual information algorithm for constructing microbiome networks (CMIMN)” section, we introduce a novel Bayesian Network-based approach, CMIMN, which leverages Conditional Mutual Information (CMI) to infer microbial associations in a more robust and scalable manner. However, despite advancements in microbiome network inference, no single method consistently produces reliable results due to inherent differences in statistical assumptions, data preprocessing, and sparsity constraints. Each inference method captures different aspects of microbial interactions, often leading to inconsistencies across constructed networks. Therefore, in the “Consensus network construction” section, we present a consensus microbiome network that integrates results from multiple inference methods to enhance reliability and mitigate method-specific biases. 3 .CC-BY 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint Pool  Data Preparation  Feature Selection (ML) Normalization for ML methods Feature Selection (Network-based) Constructing networks Centrality Metrics clean tubers networkscab-infected tubers Microbiome networks were constructed using four different methods for healthy (clean tuber) and diseased (scab-infected tuber) samples. Centrality metrics were then calculated for nodes in these networks. Eigenvector Degree PageRank ClosenessBetweenness Strategy 1: Differential Centrality Analysis: Compare centrality metrics between the scab- infected and clean tuber networks for each algorithm. Identify the top 20% of OTUs with the most substantial differences. Strategy 2: Weighted Scoring for OTUs: Calculate a weighted score for each OTU within each network using centrality metrics and select the top 20%. DataSet Order We focus on the soil microbiome (abundance matrix) in a variety of taxonomic orders including Phylum, Class, Order, Family, and Genus. PhylumClass Filtering ... ... Data Preparation Filter out OTU with less than 15 samples DataSet Filtered Data Sample Filtered data Construct microbiome network CMIMN Algorithm Three well-known methods Constructing a Consensus Network CMIMN SAPRCCSE_glasso Filtered data SPRING Filtered data 4 Log transformation CMIMN Consensus Network 4 Feature Selection (ML) Applying ML methods We assign a value ("Total") to each OTU based on the number of times the OTU is selected as an important feature under the seven criteria. Input for ML methods: 1- Clr (filtered data) 2- Filtered data 3- Log (filtered data) 4- TSS (filtered data) ML Methods for feature selection: 1-SelectKBest (KBest) 2-logistic regression (LR) 3-decision tree (DT) 4- Gradient Boosting (GB) 5-Random Forrest (RF) 6-Mutual Information (Mutual) MaxKBestLR DT GB RF MutualTotal X X X X X X X 7 X - X X - X X 5 ML-selected OTUs Clr Original Log TSS ... ... ...... We sort the OTUs based on the "TOTAL" column and select important OTUs with a "TOTAL" value higher than 4. Normalization for ML methods Filtering ... ... MethodOutPut Network-based-selected OTUs Ml-selected OTUs Network-based- selected OTUsOverlap Taxa ... ... 444 4 4 2 231 Find the overlap between important OTUs resulted by ML and network-based Figure 1: Workflow of the microbiome analysis pipeline for identifying key microbial drivers of disease resistance. Using potato common scab as an example, this pipeline consists of five main steps and is generally applicable to any microbiome dataset: (1) Data Preparation – Raw microbiome data is preprocessed to retain OTUs present in at least 15 samples, ensuring that low-prevalence taxa do not introduce noise. The resulting filtered data matrix serves as the foundation for downstream analysis, focusing on taxonomic levels such as Phylum, Class, and Order. (2) Construct microbiome network – Microbiome networks are constructed using four inference methods: SE_glasso, SPRING, SPARCC, and the proposed CMIMN. These networks represent microbial taxa as nodes and their interactions as edges. To improve reliability, a consensus microbiome network is constructed by integrating results from all four methods. Edge weights indicate the level of agreement among methods, with a weight of 4 representing relationships confirmed by all methods and 0 indicating no agreement. (3) Feature Selection (ML) – The filtered data undergoes normalization using three

Methods

(CLR, log, and TSS) before applying ML-based feature selection methods. A "TOTAL" score is assigned to each OTU based on its selection frequency across ML methods, identifying key taxa strongly associated with disease outcomes. (4) Feature Selection (Network-Based) – Microbiome networks are separately constructed for ‘scab-infected’ and ‘clean tuber’ samples. Two strategies are applied to identify key OTUs based on network structure: (i) Differential Centrality Analysis, and (ii) Weighted Scoring of OTUs. Final OTU Selection: Identifying Overlap– The last step identifies the overlap between OTUs selected by ML-based and network-based approaches, ensuring robust and reliable feature selection for downstream microbiome analysis. 2.2.1 Conditional mutual information algorithm for constructing microbiome networks (CMIMN ) We outline the methodology behind theCMIMN algorithm, a novel approach for constructing microbiome networks. First, we introduce the foundational concepts of Mutual Information (MI) and Conditional Mutual Information (CMI), which are key components of the CMIMN framework. Next, we provide an overview of Bayesian Networks, their structure, and their applicability to microbiome research. Finally, we describe the detailed steps of the CMIMN algorithm, highlighting its dynamic thresholding and order-independent features that address the unique challenges of microbiome datasets. Mutual information and conditional mutual information: MI and CMI are proven to be effective for detecting relationships between variables due to their capability to measure nonlinear dependencies [20]. MI and CMI between the variables X and Y , given the vector of variables Z, are defined as follows [21]: M I(X, Y ) = Z R Z R p(x, y) log p(x, y) p(x) p(y) dx dy, (1) CM I(X, Y |Z) = Z Rp Z R Z R p(x, y, z) log p(x, y|z) p(x|z) p(y|z) dx dy d z (2) where p is the dimension of vector Z and p(x, y), p(x) and p(y) represent the joint distribution of X and Y , marginal distribution of X, marginal distribution of Y , respectively. p(x, y, z), p(x, y|z), p(x|z) and p(y|z) indicate joint 4 .CC-BY 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint distribution of X, Y and Z, the conditional density distribution of X and Y given Z, the conditional density distribution of X given Z and the conditional density distribution of Y given Z, respectively. Under the assumption that data follows a Gaussian distribution, MI for two continuous variables X and Y can be calculated as [16, 22, 23]: M I(X, Y ) = 1 2 log σ2 X σ2 Y σXY , (3) where σ2 X, σ2 Y and σXY indicate the variance of X, the variance of Y and the covariance betweenX and Y , respectively. When X and Y are independent, then M I(X, Y ) = 0 . Similarly, CM I(X, Y |Z) is defined as: CM I(X, Y |Z) = 1 2 log |C(X, Z)||C(Y, Z)| |C(Z)||C(X, Y, Z)| , (4) where C is the covariance matrix and |.| is the determinant of matrix C. C(X,Y) and C(X,Y ,Z) denote the covariance matrix of variables X and Y and variables X, Y , andZ, respectively. When X and Y are conditionally independent given Z, then CM I(X, Y |Z) = 0 . These measures form the backbone of many network inference methods, including Bayesian Networks (BNs), which are particularly suited for capturing complex dependencies in microbiome datasets. Below, we provide an overview of BNs and their applications. Overview of Bayesian networks: BNs are probabilistic graphical models that represent complex relationships among variables using directed acyclic graphs (DAGs). Each node in a BN represents a random variable, while the directed edges capture conditional dependencies between them. BNs have been extensively applied in various biological network analyses, such as gene regulatory networks [24–26], but their use in microbiome research remains limited. There are three main approaches for learning the structure of BNs: Constraint-Based Methods: These are based on conditional independence tests to infer the network structure [22, 24, 25, 27–31]. Score-Based Methods: These involve optimizing a scoring function to search among candidate network structures [32–34]. Hybrid Methods: These combine elements of both constraint-based and score-based approaches to leverage their respective strengths [16,35–39]. Among these, the PC algorithm and its derivatives, such as Fast Causal Inference, Really Fast Causal Inference, and PCA-CMI [22, 29, 30, 40–42], are prominent constraint-based methods. Despite their widespread use, these methods have notable limitations: 1- Order Dependence: The results can vary depending on the sequence in which the nodes are processed [43]. 2- Static Threshold Dependency: Using fixed thresholds for conditional independence tests often leads to false positives or false negatives, reducing the reliability of inferred networks [24]. CMIMN algorithm: We propose the CMIMN algorithm to overcome the challenges posed by microbiome data, providing an order-independent, dynamically threshold, and sparsity-controlled framework for microbiome network construction: 1) Order independence: Traditional PC-based methods, such as PCA-CMI, are susceptible to order dependence, where the sequence in which nodes are processed affects the inferred network structure. This occurs because, in these methods, the tests for conditional independence and edge removal are performed simultaneously during each iteration, making the results highly sensitive to the order of node traversal. In contrast, CMIMN eliminates this dependency by decoupling these steps. Specifically, for each step of the algorithm, CMIMN begins by fixing the set of potential separators for every edge (X, Y ). This set is determined as the intersection of the neighbors of X and Y in the current graph. By defining the separators upfront, the algorithm ensures that all configurations are consistently evaluated, regardless of the order in which nodes or edges are processed. Once the potential separators are fixed, the algorithm proceeds to calculate the independence measures (e.g., CMI) for each edge using the predefined separator sets. Edges that fail the independence test, based on the dynamically determined thresholds, are then removed. This sequential separation of tasks — fixing separators, calculating independence measures, and then removing edges — ensures that the outcome of each step is independent of the traversal order of the nodes. 2) Dynamic thresholds: In traditional PC-based methods an edge between two nodes is removed if the independence measure (e.g., MI or CMI) falls below a predefined static threshold, θ, usually 0.05. However, this fixed-threshold approach is inherently rigid and can lead to significant issues. Specifically, static thresholds are often poorly calibrated to the scale and variability of different datasets, resulting in false positives (retaining spurious edges) or false negatives (removing meaningful edges). To overcome these limitations, CMIMN employs a quantile-based dynamic threshold approach. Instead of using a single, static threshold, thresholds are adaptively determined based on the statistical properties of the dataset. For example, in each iteration, thresholds are set using specific quantiles (e.g., the 70th percentile) of the computed MI or CMI values. This ensures that the threshold dynamically adjusts to the distribution of independence measures, accommodating variability in the data’s scale, density, and characteristics. 3) Sparsity control: Network sparsity is a crucial factor in microbiome studies, as overly dense networks can obscure biologically meaningful interactions, while overly sparse networks may omit key relationships. In traditional PC-based methods, selecting a single static threshold does not allow for precise control over network sparsity. However,CMIMN 5 .CC-BY 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint addresses this challenge by offering precise control over network sparsity through quantile-based criteria for edge removal. By dynamically tuning the sparsity threshold, researchers can specify the desired percentage of edges to retain, ensuring that the resulting network retains significant edges representing meaningful microbial interactions while reducing noise and redundancy. The steps of the CMIMN algorithm are outlined below: Step 0: Initialization: Generate a complete network with the number of nodes equal to the number of taxa. Step 1: Calculate MI of Order 0: Compute MI values for each pair of nodes. Step 2: Remove Edges: Remove edges for which MI values are smaller than θ1, the threshold for the MI test. The resulting network at this stage is denoted by S0. Step 3: Calculate CMI of Order 1: Calculate CM I(X, Y |Z) where Z belongs to the set VXY = ADJ (X) ∩ ADJ (Y ) in S0. Here, ADJ (X) represents the set of nodes that are adjacent to X. We consider ‘paths of length 2 betweenX and Y ’ to mean thatZ is a common neighbor of both X and Y . Thus, a path from X to Y via Z consists of two edges: one connecting X to Z and another connecting Z to Y . This configuration is used to assess the indirect interactions between X and Y mediated by Z, focusing on how Z influences the dependency between X and Y . Step 4: Remove Edges: Define CM I70(X, Y |Z) as the 70th percentile of all CM I(X, Y |Z) values. If CM I70(X, Y |Z) < θ 2 (the threshold for the CMI test), remove the edge between X and Y . The resulting skeleton at this stage is denoted by S1. Final Outcome: The resulting network S1 is a fully undirected skeleton. The primary challenge in using BN methods to infer microbiome networks lies in the normalization of count datasets for the BN algorithm. To address this challenge, we apply a logarithmic transformation to the count data to stabilize variance, reduce skewness, and address compositional constraints inherent in microbiome datasets. This transformation ensures that MI and CMI operate on a continuous, normalized scale, enhancing their reliability. Without this step, applying MI or CMI directly to raw counts would yield biased results due to the data’s non-normalized and highly variable nature. The CMIMN algorithm is implemented in R and publicly available in https://github.com/solislemuslab/CMIMN. 2.2.2 Constructing a consensus network We apply three state-of-the-art network methods on the soil microbiome data: 1) SParse InversE Covariance Estimation for Ecological Association Inference ( SPIEC-EASI) method [44], using the graphical lasso option (referred to as SE_glasso) ; 2) SPRING: Semi-Parametric Rank-based approach for Inference in Graphical Models [45], and 3) SPARCC: Sparse Correlations for Compositional Data [46]. The input of these methods is an abundance matrix and the output is undirected networks in which nodes represent OTUs and edges corresponds to interactions between them. First, SE_glasso [44] estimates sparse inverse covariance matrices to infer ecological associations in microbial communities. The approach is designed to address the challenges of compositional data and high dimensionality commonly encountered in microbiome studies. By accurately modeling microbial interactions and leveraging graphical lasso regularization, SE_glasso uncovers significant ecological relationships between different taxa. Second, the SPRING [45] algorithm is a powerful method for inferring associations in complex biological networks that combines the advantages of both parametric and non-parametric approaches to construct a co-occurrence network from abundance data, commonly encountered in microbiome studies. By transforming the abundance values into ranks and utilizing rank-based statistical tests, SPRING overcomes the challenges of compositional data and improves robustness against outliers and extreme values. This algorithm effectively identifies significant interactions between different taxa, providing valuable insights into the underlying structure and ecological relationships within microbial communities. Third, SPARCC [46] is a powerful algorithm used to analyze microbial communities and infer associations between different taxa. Specifically designed for compositional data, which represents the relative abundances of microbial taxa, SPARCC addresses the challenges of dealing with non-negative and constrained data. By regularizing the correlation matrix through a bootstrap procedure and using sparsity-inducing techniques, SPARCC efficiently estimates sparse correlations between taxa, revealing significant co-occurrence patterns and potential ecological interactions within the microbial community. The algorithm’s ability to handle compositional data makes it a valuable tool for investigating complex microbial ecosystems and unraveling the underlying relationships between taxa. Building reliable microbiome networks is difficult because different algorithms produce varying results. Each method has its own strengths and weaknesses, which can lead to inconsistent interpretations of microbial interactions. To solve this, we combine the results from four methods to create a consensus network: CMIMN, SE_glasso, SPRING, and SPARCC. The consensus network is represented as a weighted adjacency matrix, where each edge (connection between two nodes) is assigned a weight value ranging from 0 (not identified by any algorithm) to 4 (identified by all four algorithms). This weight reflects the level of agreement among the algorithms regarding the presence of the edge. Indeed, the consensus network is constructed by first generating individual networks using each of the four methods. These networks are then overlaid, and the weight of each edge in the consensus network is computed as the sum of the binary indicators (presence/absence) of that edge across all four networks. The resulting weighted network not only highlights the most reliable edges (with higher weights) but also provides a comprehensive representation of microbial interactions. Among the advantages of the Weighted Consensus Network, we can highlight the robustness as integrating multiple algorithms reduces the impact of biases or errors associated with any single method; stability as the weighted approach provides a holistic view of microbial interactions, capturing edges that are consistently supported 6 .CC-BY 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint across algorithms; interpretability as the weight values offer a straightforward measure of edge confidence, allowing researchers to focus on highly reliable interactions for downstream analysis, and sparsity Control as selecting different threshold values for edge weights (e.g., retaining only edges with weights ≥ 1, ≥ 2, ≥ 3, or = 4 ) can control the sparsity of the network to match their analytical goals. For instance, lower thresholds (e.g., ≥ 1) result in denser networks that include more potential interactions, while higher thresholds (e.g., = 4) produce sparser networks focused only on the most reliable interactions identified by all algorithms. This weighted consensus network serves as a stable foundation for exploring microbiome interactions and identifying key microbial taxa and their relationships within complex ecosystems. By incorporating agreement across multiple methods, it offers a more reliable and nuanced perspective on the underlying microbial community structure. 2.3 Multi-method approach to identify key microbial drivers of disease resistance Feature selection is a crucial step in data analysis, involving the identification of significant features or covariates that possess high predictive power. In the context of high-dimensional data, such as microbiome datasets, feature selection becomes indispensable to extract relevant information and reduce computational complexity. In particular, when studying diseases, it becomes imperative to identify important OTUs that are strongly associated with the disease’s onset or progression. By identifying these key OTUs, we gain essential insights into potential driver pathogens or beneficial microbes. Subsequently, controlling the abundance or activity of these crucial OTUs can pave the way for novel disease interventions and management strategies, opening up avenues for precision medicine and tailored therapies. To identify key OTUs associated with disease outcomes, we employ a two-pronged feature selection approach: (1) Using machine learning-based methods, and (2) Using network-based methods. By integrating these complementary approaches, we ensure a comprehensive and biologically meaningful selection of microbial taxa associated with disease outcomes. 2.3.1 Using machine learning-based methods To identify important OTUs, we first normalize the filtered microbiome data using four different transformations: centered log-ratio (CLR), raw filtered data, logarithmic transformation (log), and total sum scaling (TSS). These normalization methods help account for compositional constraints and improve the reliability of machine learning (ML)-based feature selection. We then apply all ML-based feature selection strategies, implemented in the scikit-learn library [47] in Python: (1) “SelectKBest” method, which selects features based on the k highest analysis of variance F-value scores, (2) Selection of the top k features based on the mutual information statistic, (3) Recursive Feature Elimination (RFE) with logistic regression, (4) RFE with a decision tree, (5) RFE with gradient boosting, and (6) RFE with Random Forest (RF). Additionally, we introduce a 7th method that includes OTUs in the model if their maximum value falls within the top 30% of the dataset. Running all seven strategies, we assign a value ("TOTAL") to each OTU based on the number of times the OTU is selected as an important feature under the seven criteria. Specifically, an OTU that is selected as important by all seven strategies will have "TOTAL" value of 7. Subsequently, we sort the OTUs based on the "TOTAL" column and select important OTUs with a "TOTAL" value higher than a defined threshold. These approaches collectively provide valuable insights into the most influential OTUs in the context of our feature selection analysis, allowing us to make informed decisions and draw meaningful conclusions in the subsequent stages of our study. 2.3.2 Using network-based methods While machine learning-based methods identify statistically relevant OTUs, they do not capture microbial interactions that may play a crucial role in disease resistance. To address this limitation, we employ a network-based feature selection strategy that compares microbial co-occurrence patterns between diseased and healthy samples. We construct microbial interaction networks using four well-established methods (1-SPARCC, 2-SE_glasso, 3- SPRING, and CMIMN) based on samples from two classes: one representing samples without the disease (‘clean tubers’) and the other with the disease (‘scab-infected tubers’). We then apply two complementary network-based feature selection strategies: Strategy 1: differential centrality analysis. This approach analyzes five centrality metrics for each OTU: 1- Degree (connectivity within the network), 2- Betweenness (importance in connecting other taxa), 3- Closeness (proximity to all other taxa), 4- Eigenvector Centrality (influence based on connected neighbors), and 5- PageRank (importance based on link structure). We rank OTUs based on the difference in their centrality measures between the ‘clean tubers’ and ‘scab-infected tubers’ networks. The top 20% of OTUs showing the most significant variations are selected as key taxa. The final taxa are considered important if they are selected by all four network inference methods. Strategy 2: weighted scoring of OTUs based on network topology. This method assigns a weighted score to each OTU based on its network properties using the following formula: Score j i = w1 × DEj i + w2 × EVj i + w3 × PRj i + w4 × CLj i + w5 × BEj i (5) 7 .CC-BY 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint where, i denotes the OTU being evaluated and j represents the network inference method used for constructing the microbiome network. Here, DE, EV , PR, CL, and BE represent Degree, Eigenvector, PageRank, Closeness, and Betweenness centrality measures, respectively. The weights assigned to the centrality measures reflect their relative importance in capturing biologically meaningful insights from the network structure. To ensure that the OTUs identified as significant have greater overlap with Strategy 1, we set the weights as follows: w1 = 0.1, w2 = 0.1, w3 = 0.1, w4 = 0.2, w5 = 0.5, with Betweenness (50%) given the highest weight due to its role in structuring the microbial community. The final top 20% of OTUs with the highest scores are considered key players in microbial interactions related to disease resistance. Unifying machine learning and network-based approaches for reliable microbiome feature selection: To improve the reliability of microbiome feature selection, we integrate both machine learning and network-based strategies. Specifically, we evaluate OTUs identified by two different network-based methods with those selected through multiple machine learning algorithms. The final set of selected OTUs consists of taxa consistently prioritized across these complementary approaches, thereby increasing confidence in their biological relevance. This integrative strategy combines the predictive power of machine learning with the structural insights derived from microbial interaction networks, resulting in a robust and interpretable set of microbial features. The selected OTUs represent strong candidates for microbial drivers of disease resistance and may inform the development of microbiome-targeted interventions aimed at enhancing crop resilience and promoting sustainable agricultural practices. 3 Results 3.1 Robustness study of different algorithms for learning the microbiome network In order to assess the robustness ofCMIMN algorithm in learning the microbiome network, we conducted a comprehensive analysis. In the initial step, we constructed the network utilizing the entirety of all samples. Subsequently, we performed a critical evaluation by randomly selecting 70% of the samples and generating 50 distinct datasets derived exclusively from this subset. On each of these 50 datasets, we executed the algorithm independently to construct separate networks. We compare the networks constructed from each of the 50 generated datasets to the corresponding network generated using the complete set of samples, for each method based on F-score. This comparative evaluation assesses how consistently each algorithm reconstructs microbial associations across different generated of the data. The F-score is a widely used metric that balances precision and recall, providing a single measure of a method’s accuracy in detecting true microbial interactions while minimizing false positives and false negatives. A higher F-score indicates better network reconstruction accuracy and reliability. To visually represent these comparisons, we created box plots for the F-score values obtained from each iteration. This allowed me to not only assess the overall performance but also identify any potential variations in performance across different taxonomic levels, including Phylum, Class, and Order. Furthermore, we extended this rigorous evaluation to encompass various network construction methods, including CMIMN, SPRING, SE_glasso, and SPARCC. The comparison was conducted at each taxonomic order, resulting in a comprehensive assessment of the method’s robustness under different conditions. Figure S1 presents the box plots showcasing the F-scores resulting from the application of different methods at varying taxonomic orders. This visualization provides a clear and insightful representation of the method’s performance across different scenarios, offering valuable insights into its reliability and effectiveness. Performance comparisons were made across different taxonomic levels, including Phylum, Class, and Order. Our algorithm, CMIMN, exhibits superior performance as indicated by the narrower range of box plots in all three taxonomic levels, demonstrating its robustness. Notably, among all algorithms, SE_glasso shows the least favorable results for Class level. 3.2 Minimal Overlap Across Network Inference Methods Highlights the Need for Using a Consensus Approach We constructed microbiome networks using four different inference methods: ( SE_glasso, SPRING, SPARCC, and CMIMN) at the Phylum, Class, and Order levels. Despite using the same dataset, the resulting networks exhibited minimal overlap, highlighting the high variability in microbial interaction patterns inferred by different methods. Figure 2 shows the Venn diagrams of common edges inferred by different methods at different taxonomic levels: only 24 common edges at the Phylum level, 80 at the Class level, and 522 at the Order level. These results indicate that network structures can vary significantly depending on the inference method used, which raises concerns about the reliability of conclusions drawn from any single approach. Table S2, S3, and S4 present network metrics for the four different methods at the Phylum, Class, and Order levels. The substantial differences in network topology metrics underscore the inherent differences in algorithmic assumptions and their impact on inferred microbial interactions. This variation highlights the importance of employing a consensus-based approach to enhance network robustness, reduce algorithm-specific biases, and improve biological interpretability. 8 .CC-BY 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint Table S5, S6, and S7 present the top nodes in microbial networks based on topological measures at the Phylum, Class, and Order levels. These networks were constructed using SE_glasso, SPRING, SPARCC , and CMIMN methods, and the analysis encompasses data from all samples, without distinguishing between diseased and healthy conditions. Interestingly, certain OTUs were consistently identified as highly connected nodes across all four network construction methods, reinforcing their biological significance. According to these tables, Acidobacteriae was identified as important at the Class level by all algorithms and network metrics, while C0119 was consistently identified at the Order level. The recurrence of these taxa across multiple network inference methods suggests their potential ecological importance and role in microbial community stability. 78 26 26 6 34 24 2 0 36 18 6 2 0 4 22 SPARCC(148) SE_glasso(60) SPRING(172) CMIMN(136) (a) Phylum level 336 152 312 42 102 80 18 2 404 124 66 2 0 0 60 SPARCC(880) SE_glasso(228) SPRING(732) CMIMN(856) (b) Class level 1002 282 1086 24 124 522 72 134 778 614 740 20 14 90 578 SPARCC(3400) SE_glasso(2170) SPRING(2060) CMIMN(3574) (c) Order level Figure 2: Venn diagrams illustrating the overlap of common edges in microbiome networks constructed using four different inference methods (SE_glasso, SPRING, SPARCC, and CMIMN) at different taxonomic levels based on all samples. (a) Phylum level: 24 common edges among all methods. (b) Class level: 80 common edges. (c) Order level: 522 common edges. 3.3 Consensus microbiome network: enhancing reliability through integration Figures 3, S2, and S3 visualize the microbiome networks at the Phylum, Class, and Order levels respectively. Part (a) of each figure represents the ‘clean tubers’ network, which is characterized by a denser and more connected microbial community, with more diverse interactions. Part (b) shows the ‘scab-infected tubers’ network, highlighting nodes and edges unique to diseased samples. Part (c) illustrates the common interactions between the two networks. Nodes represent OTUs and are color-coded: Purple: Common OTUs shared between ‘clean tubers’ and ‘scab-infected tubers’ networks. Blue: OTUs unique to the ‘clean tubers’ network. Green: OTUs unique to the ‘scab-infected tubers’ network. Node size reflects connectivity (degree), while edges are distinguished as dashed (confirmed by three methods) or solid (confirmed by all four methods). At the Class and Order levels, due to the density of edges, only solid edges (confirmed by all four methods) are reported for clarity. In both ‘clean tubers’ and ‘scab-infected tubers’ microbiome networks, we identified microbial associations consistently supported by all four network inference methods. Many of these associations have also been independently reported as ecologically meaningful in soil ecosystems. The intersections of microbial associations between the clean tubers’ and scab-infected tubers’ networks are summarized in Tables S8, S9, and S10, corresponding to the Phylum, Class, and Order levels, respectively. Each table includes three columns: the first lists associations unique to the ‘clean tubers’ network, the second shows associations exclusive to the ‘scab-infected tubers’ network, and the third highlights shared associations—those present in both networks—representing conserved or stable microbial interactions across conditions. Due to the large number of associations observed at the Class and Order levels, we report the complete set of edges confirmed by all four methods in Supplementary Tables S9 and S10. Here, we focus on selected Phylum-level associations that are most frequently supported by existing literature. In the Table S8 for ‘clean tubers’ network, we observed the interaction between Planctomycetota–Patescibacteria, which likely reflects syntrophic or symbiotic interactions, as both phyla possess reduced genomes and are known to co-occur in structured soil aggregates [48]. The edge between Methylomirabilota–WPS-2 may indicate shared adaptation to oligotrophic or co-contaminated soil conditions, where both phyla are often involved in carbon and nitrogen cycling under stress [49]. Additionally, Gemmatimonadota–Proteobacteria and Gemmatimonadota–Acidobacteriota were found only in the clean network; these phyla are often associated with nutrient cycling and stable soil conditions, suggesting cooperative metabolic roles in healthy tuber-associated soils. 9 .CC-BY 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint (a)‘clean tubers’ network (b)‘scab-infected tubers’ network (c) common interactions Figure 3: The microbiome network at the Phylum taxonomic level. Part (a) represents the ‘clean tubers’ network, part (b) displays the ‘scab-infected tubers’ network, and part (c) shows the common interactions between them. Nodes represent Operational Taxonomic Units (OTUs) and are color-coded: purple for common OTUs shared between ‘clean tubers’ and ‘scab-infected tubers’ networks, blue for OTUs unique to the ‘clean tubers’ network, and green for OTUs unique to the ‘scab-infected tubers’ network. Node size indicates their degree of connectivity. Edges are categorized as dashed lines (confirmed by three methods) or solid lines (confirmed by all four methods). In the ‘scab-infected network, theActinobacteriota–Gemmatimonadota interaction was unique and mirrors findings from boreal forest soils, where both phyla jointly contributed to the transformation of dissolved organic matter during freeze–thaw cycles, pointing to potential functional synergy under stress [50]. In both ‘clean tubers’ and ‘scab-infected networks, the edges Actinobacteriota–Proteobacteria and Proteobacte- ria–Acidobacteriota were consistently present, suggesting stable and ecologically relevant relationships across condi- tions. The co-occurrence of Actinobacteriota and Proteobacteria has been reported in sandy and layered soils, where both phyla are dominant and likely contribute to complementary roles in organic matter degradation and nutrient cy- cling [51]. Meanwhile, Proteobacteria and Acidobacteriota are among the most abundant phyla in forest and agricultural soils and are known to occupy distinct but co-existing niches—Proteobacteria favoring copiotrophic (nutrient-rich) envi- ronments and Acidobacteriota preferring oligotrophic, acidic soils—indicating a functional partitioning that supports broad microbial diversity and resilience [52]. 3.4 Feature selection using a multi-method approach 3.4.1 Machine learning-based feature selection Tables S11, S12, and S13 summarize the results of machine learning (ML)-based feature selection at the Phylum, Class, and Order levels, respectively. Each table includes five columns: four corresponding to different data normalization strategies applied prior to ML analysis, and a fifth representing their intersection. The first column (ML_CLR) reports OTUs selected from count-filtered data normalized using the CLR transformation. The second column (ML_Original) shows OTUs selected from raw count-filtered data without transformation. The third column (ML_Log) includes OTUs identified from log-transformed data, and the fourth column (ML_TSS) presents selections based on data normalized using TSS. The final column (ML_Intersection) lists OTUs consistently identified as important across all four normalization methods, highlighting microbial taxa that are robust to normalization choice. According to these tables, there is a good overlap between different normalization methods. 3.4.2 Network-Based Selection: Strategy 1 (Differential Centrality Analysis) Tables S14, S15, and S16 provide lists of selected OTUs at the Phylum, Class, and Order levels, respectively, based on Strategy 1: Differential Centrality Analysis. OTUs were selected according to two criteria: (1) those exhibiting the largest differences in centrality values between ‘clean tubers’ and ‘scab-infected tubers’ networks, and (2) those consistently identified across all four network inference methods ( SE_glasso, SPRING, SPARCC, and CMIMN), rein- forcing their biological relevance. The first column of these tables shows the OTU name, while the second column (Features) indicates the centrality measure(s) responsible for the OTU’s selection. These taxa represent microbial features whose connectivity patterns consistently differ between ‘clean tubers’ and ‘scab-infected tubers’ networks, suggesting a potential role in disease dynamics. 3.4.3 Network-Based Selection: Strategy 2 (Composite Scoring Approach) First, we constructed two distinct microbiome networks: one for ‘clean tubers’ and one for ‘scab-infected tubers’. Networks were generated using four inference methods: SE_glasso, SPRING, SPARCC, and CMIMN. 10 .CC-BY 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint Next, we applied a composite scoring system that integrates multiple centrality metrics into a single weighted score for each OTU, as defined in Equation (5). This score was calculated separately for each of the four network inference methods. The top 20% of OTUs with the highest scores were considered significant. To evaluate the agreement between network-based selection (Strategy 2) and ML-based selection, we examined the overlap between the top 20% highest-scoring OTUs and those selected by ML-based methods. Tables S17, S18, and S19 summarize these overlaps at the Phylum, Class, and Order levels, respectively. In these tables, the left section corresponds to results from the ‘clean tubers’ network, while the right section presents the results from the ‘scab-infected tubers’ network. Each section contains five columns: The first column lists the OTUs selected by Strategy 2. Columns 2–5 indicate whether the same OTUs were also identified by ML methods under four normalization strategies—CLR transformation, untransformed (raw) data, log-transformation, and TSS normalization. A value of “1” in these columns denotes agreement between Strategy 2 and the corresponding ML method for that normalization, while “0” indicates the OTU was uniquely selected by the network-based approach. This design enables a direct comparison of method overlap under different data preprocessing conditions. Figures S4 and S5 provide a visual representation of these overlaps, illustrating the average agreement between ML-based and network-based selection methods under different conditions. For ‘clean tubers’, at the Phylum level,CMIMN and SPARCC demonstrated slightly better agreement with ML-based selection across different normalization strategies. At the Order level, agreement was highest overall, particularly under CLR and TSS normalization. For ‘scab-infected’ tubers,CMIMN generally exhibited higher agreement with ML-based selection at the Phylum level, while SPARCC showed better alignment at the Class level in certain cases. At the Order level, both CMIMN and SPARCC consistently achieved strong agreement with ML-based methods. These findings indicate that CMIMN and SPARCC consistently align more closely with ML-based feature selection methods, particularly at the Order level and under CLR and TSS normalization. This underscores their robustness and reliability in identifying important OTUs across different experimental conditions. Tables S20, S21, and S22 present the overlap between Strategy 2 and the ML approaches at the Phylum, Class, and Order levels, focusing specifically on OTUs that were commonly identified inboth the ‘clean tubers’ and ‘scab-infected tubers’ networks for each method. The left sections of these tables report overlaps for the CMIMN and SE_glasso methods. The first column lists the microbial taxa consistently identified across both networks using each respective method. Columns 2 through 5 indicate whether these taxa were also selected by ML methods under different normalization strategies. A value of “1” denotes agreement between the ML and network-based methods, while “0” indicates no overlap. Similarly, the right sections of the tables summarize the results for the SPARCC and SPRING methods, following the same structure. Finally, Tables S23 and S24 illustrate the overlap among important OTUs identified by all four algorithms ( CMIMN, SPARCC,SE_glasso,SPRING)for the ‘clean tubers’ and ‘scab-infected tubers’ networks, respectively, based on Strategy 2. 3.5 Overall OTUs identified by all methods as key drivers of disease Table 1 summarizes the key OTUs identified through different selection strategies, including Machine Learning-based feature selection (ML), Network-Based Selection: Strategy 1 (Differential Centrality Analysis) (Strategy 1), and Network-Based Selection: Strategy 2 (Composite Scoring Approach) (Strategy 2), at the Phylum, Class, and Order levels. At the Phylum level, we found no overlap between the taxa selected by ML-based and network-based methods, highlighting the complementary nature of these approaches. To provide a comprehensive view, we report the most consistently selected taxa within each category. ML-based selected taxa: Firmicutes, identified through ML-based feature selection, comprise 5.5% of the bacterial community and promote plant growth and disease suppression, especiallyBacillus spp., which enhance root colonization and pathogen inhibition through antimicrobial metabolites [53]. Cyanobacteria (0.9%) also emerged as important ML-selected taxa, contributing to nitrogen fixation, biofilm formation, and soil structure improvement, benefiting microbial community stability [54]. Less abundant phyla such as Armatimonadota (0.2%) may also be directly related to the disease, as negative relationship between the abundance of those phyla and soil suppressive ability of scab has been observed in one of our studies (data unpublished). The precise implications of NB1-j (0.1%) in disease progression are still unclear, but its involvement in nitrogen cycling and interactions with microalgae suggest potential indirect influences [55]. Network-based selected taxa (intersection of both strategies): Bacteroidota, WPS.2, and Proteobacteria were con- sistently identified across both network-based strategies, indicating strong and robust association with disease status. Bacteroidota (5.5%) are involved in nutrient cycling and pathogen competition, both of which contribute to disease suppression [56]. WPS.2, though less prevalent (0.3%), showed a negative relationship with suppressive soil capacity in our prior (unpublished) observations. Proteobacteria represent a taxonomically diverse group containing both beneficial 11 .CC-BY 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint Level ML Strategy 1 Strategy 2 Intersection Phylum Firmicutes Bacteroidota Bacteroidota ▲ Bacteroidota Cyanobacteria WPS.2 WPS.2 ▲ WPS.2 Armatimonadota Proteobacteria Proteobacteria ▲ Proteobacteria NB1.j Class Bacilli Desulfotobacteriia Ktedonobacteria ▲ Actinobacteria Ktedonobacteria Actinobacteria Vicinamibacteria ▲ AD3 Cyanobacteriia Syntrophobacteria Actinobacteria • Bacilli Saccharimonadia AD3 Gammaproteobacteria • Anaerolineae Planctomycetes Alphaproteobacteria • Ktedonobacteria Ignavibacteria Acidobacteriae Dehalococcoidia Anaerolineae Anaerolineae AD3 MB.A2.108 Blastocatellia Chthononomadetes Bacilli Order Saccharimonadales C0119 C0119 * C0119 Bacillales Defluviicoccales Rhizobiales ■ Defluviicoccales C0119 Bacteroidales Chitinophagales ■ Bacteroidales Subgroup.2 Kryptoniales Ktedonobacterales • Ktedonobacterales Xanthomonadales B12.WMSP1 Microtrichales Acidobacteriales Desulfotobacteriales Chloroplast Alicyclobacillales Paenibacillales Acetobacterales Pseudomonadales Anaerolineales Elsterales Bacteroidales Ktedonobacterales Table 1: Important taxa at the Phylum, Class, and Order levels across three different strategies. The symbols indicate intersections between the levels: * for taxa appearing in ML, Strategy 1, and Strategy 2; ■ for taxa appearing in ML and Strategy 1; • for taxa appearing in ML and Strategy 2; ▲ for taxa appearing in Strategy 1 and Strategy 2. (e.g., Rhizobium) and pathogenic (e.g., Pseudomonas syringae, Ralstonia solanacearum) members, reflecting their complex role in disease ecology [57]. At the Class level, Network-based intersection: Actinobacteria and AD3 were selected by both network-based strategies, indicating strong structural importance in microbial networks associated with disease. Actinobacteria are key contributors to soil suppressiveness against plant pathogens. Notably, non-pathogenic Streptomyces spp. produce antibiotics that inhibit soil-borne pathogens, including Streptomyces scabies, the causative agent of common scab disease [58]. Although less well characterized, Although less well-characterized, AD3 was identified as a robust Class-level taxon across both network-based strategies. This group has been associated with degraded or polluted soils and reduced organic matter content, suggesting its presence may indicate shifts in microbial community structure linked to soil stress and disease vulnerability [59]. ML and network-based Strategy 2 intersection: Bacilli, Anaerolineae, and Ktedonobacteria were jointly identified by ML-based feature selection and network-based Strategy 2, suggesting these taxa are both predictive and structurally central in the disease-associated microbiome. Bacilli (notably Bacillus spp.) are widely recognized for their role in plant protection and disease suppression, particularly through Bacillus spp., which produce lipopeptides and hydrolytic enzymes that enhance root colonization and pathogen inhibition [60]. Ktedonobacteria exhibit complex morphologies and genomic features, leading to speculation that they may be a valuable microbial resource for novel compounds [61]. Anaerolineae, frequently found in low-oxygen soil habitats, play a crucial role in carbon degradation processes, including the breakdown of plant-derived compounds [62,63]. This activity can modify the soil environment, potentially suppressing plant diseases through nutrient competition or the production of inhibitory substances. 12 .CC-BY 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint At the Order level, Confirmed by all ML methods and both network-based strategies: C0119 was the only taxon consistently identified by all machine learning models and both network-based strategies, highlighting its strong and stable association with disease-relevant microbial networks. Although taxonomically unclassified, recent studies have shown that C0119 is a dominant order in biochar-amended soils, environments known to support improved microbial diversity, carbon cycling, and root-associated community stability [64]. Its consistent emergence across diverse analytical methods suggests an ecologically meaningful role in shaping disease-conducive or suppressive soil environments. Selected by ML and Network Strategy 1: Defluviicoccales, while often linked to anaerobic degradation, have been observed in disease-prone soils, where they may contribute to microbial shifts influencing pathogen persistence [65]. Bacteroidales are involved in organic matter degradation and nutrient cycling. Some members have been associated with pathogen suppression via competitive exclusion and enhancement of soil nutrient availability, contributing indirectly to disease resistance [66]. Selected by ML and Network Strategy 2: Ktedonobacterales have been associated with disease suppression due to their potential for producing antimicrobial compounds and their metabolic similarity to antibiotic-producing Actinomycetes [61].

Conclusion

This study introduces a comprehensive framework for robust microbiome network inference and the identification of disease-associated microbial taxa, specifically in the context of potato common scab. We developed CMIMN, a novel Bayesian network algorithm based on conditional mutual information, which exhibited superior robustness and interpretability across taxonomic levels. Recognizing the limitations of individual network inference methods, we integrated CMIMN with three widely used approaches— SPIEC-EASI, SPRING, and SPARCC—to construct consensus microbiome networks. These consensus networks captured biologically meaningful co-occurrence patterns while reducing algorithm-specific variability, thereby enhancing confidence in the inferred microbial interactions. To identify taxa relevant to disease, we implemented a multi-method feature selection framework combining different machine learning algorithms with two network-based strategies. The machine learning component provided predictive stability by selecting features that consistently appeared across models, while the network-based methods leveraged centrality metrics and topological differences between networks to capture taxa important to microbial structure and community dynamics. This integrative approach enabled the detection of microbial taxa with both statistical significance and ecological relevance. Our results revealed clear distinctions in microbial community structure between clean and scab-infected tubers. At the Phylum level, Bacteroidota, WPS-2, and Proteobacteria were identified through both network-based strategies, while Firmicutes and Cyanobacteria were highlighted by machine learning models. Interactions such as Actinobacteri- ota–Proteobacteria and Planctomycetota–Patescibacteria were found to be consistently supported by all four network inference methods and corroborated by existing soil studies, reinforcing their ecological relevance. At the Class level, Actinobacteria, AD3, Bacilli, Anaerolineae, and Ktedonobacteria were identified by either multiple strategies or the intersection of both ML and network approaches. These classes are associated with key ecological functions such as carbon degradation, nutrient cycling, antimicrobial production, and disease suppression. At the Order level, C0119 was the only taxon confirmed by all machine learning models and both network-based strategies, highlighting its potential as a robust indicator of disease status. Other important orders included Bacteroidales, Defluviicoccales, and Ktedonobacterales, identified by at least two independent methods. The topological analysis of microbial networks further revealed differences in connectivity and interaction density between clean and diseased tuber microbiomes. Clean tuber networks exhibited higher overall connectivity, suggesting a more stable and cooperative microbial community. In contrast, disease-associated networks were more fragmented and featured shifts in taxa centrality, indicating structural reorganization in response to pathogen pressure. Several interactions identified in these networks—such as Actinobacteriota–Gemmatimonadota and Methylomirabilota–WPS- 2—have also been observed in previous studies investigating soil response to stress or contamination. The concept of disease-suppressive soil suggests fundamental differences in the microbiological environment between healthy and disease-conductive soils; meanwhile, soil microorganisms have been proposed as bioindicators of general soil health [67]. Because the microbiomes in this study were extracted from pre-planting soils of potato fields, those selected microbial features which distinct healthy and disease-conductive soils seem to exist long before disease emergency. Despite the variations driven by geography, management, and climate legacy in this large-scale survey, microbial signal was strong for those that tend to produce disease-free tubers. This suggests promising utility of soil microbiome in inferring indicators for soil health and predicting potato scab diseases. 13 .CC-BY 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint Altogether, our integrative approach provides a scalable and interpretable framework for microbiome network analysis and biomarker discovery. By combining Bayesian inference, consensus-based network construction, and multi-

Method

feature selection, we bridge predictive modeling with ecological insight. These findings not only improve our understanding of microbial community dynamics in disease contexts but also establish a foundation for microbiome- informed strategies in plant health management and sustainable agriculture. As a next step, our broader vision is to develop an interactive Shiny application that enables biologists to upload their microbiome and disease data to identify reliable taxa-disease associations and uncover robust co-occurrence relationships—making advanced analysis tools more accessible and actionable for the biological research community. In addition, our approach extracts biologically meaningful information by comparing network structures between clean tubers and scab-infected tubers: Strategy 1 focuses on differences in centrality measures across the two networks, while Strategy 2 analyzes each network independently to highlight key taxa. To further enhance this comparative analysis, we are interested in applying the Microbiome Network Alignment (MiNAA) algorithm [68], which aligns microbial networks across conditions, allowing us to extract deeper biological insights about shifts in microbial interactions associated with disease. Consent for publication Not applicable. Availability of data and materials The 16S and ITS amplicon sequencing data associated with this study are publicly available at the NCBI Short Read Archive under the BioProject PRJNA1135141. The R package of CMIMN and all R code for this paper are available in the github repository, https://github.com/solislemuslab/CMIMN. Competing interests The authors declare that they have no competing interests. Funding This work was supported by the National Science Foundation (DEB-2144367 to CSL). The work was also supported by USDA Specialty Crop Multi-State Grant Program award SCMP1701. Authors’ contributions CSL and RL developed the idea. RL and SS collected the data. RA led all statistical analyses from data preprocessing to fitting of machine learning models, as well as summarizing the results by the creation of figures. RA wrote the initial complete draft of the manuscript. SS, RL and CSL contributed in interpretations, editing, and revision of the manuscript. All authors read and approved the final manuscript.

Acknowledgements

References [1] Husein Ajwa, William J Ntow, Ruijun Qin, and Suduan Gao. Properties of soil fumigants and their fate in the environment. In Hayes’ Handbook of Pesticide Toxicology, pages 315–330. Elsevier, 2010. [2] Sarah Braun, Amanda Gevens, Amy Charkowski, Christina Allen, and Shelley Jansky. Potato common scab: A review of the causal pathogens, management practices, varietal resistance screening methods, and host resistance. American Journal of Potato Research, 94:283–296, 2017. [3] Linda L Kinkel, Daniel C Schlatter, Matthew G Bakker, and Brett E Arenz. Streptomyces competition and co-evolution in relation to plant disease suppression. Research in microbiology, 163(8):490–499, 2012. [4] Noah Rosenzweig, James M Tiedje, John F Quensen III, Qingxiao Meng, and Jianjun J Hao. Microbial communi- ties associated with potato common scab-suppressive soil determined by pyrosequencing analyses. Plant disease, 96(5):718–725, 2012. [5] Noah Fierer, Christian L Lauber, Kelly S Ramirez, Jesse Zaneveld, Mark A Bradford, and Rob Knight. Comparative metagenomic, phylogenetic and physiological analyses of soil microbial communities across nitrogen gradients. The ISME Journal, 6(5):1007–1017, 2012. 14 .CC-BY 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint [6] Thea Whitman, Rachel Neurath, Adele Perera, Ilexis Chu-Jacoby, Daliang Ning, Jizhong Zhou, Peter Nico, Jennifer Pett-Ridge, and Mary Firestone. Microbial community assembly differs across minerals in a rhizosphere microcosm. Environmental Microbiology, 20(12):4444–4460, 2018. [7] Anna M. Cates, Michael J. Braus, Thea L. Whitman, and Randall D. Jackson. Separate drivers for microbial carbon mineralization and physical protection of carbon. Soil Biology and Biochemistry, 133:72–82, 2019. [8] Christina Kranz and Thea Whitman. Short communication: Surface charring from prescribed burning has minimal effects on soil bacterial community composition two weeks post-fire in jack pine barrens. Applied Soil Ecology, 144:134–138, 2019. [9] Thea Whitman, Ellen Whitman, Jamie Woolet, Mike D. Flannigan, Dan K. Thompson, and Marc-André Parisien. Soil bacterial and fungal response to wildfires in the canadian boreal forest across a burn severity gradient. Soil Biology and Biochemistry, 138:107571, 2019. [10] Alex Carr, Christian Diener, Nitin S Baliga, and Sean M Gibbons. Use and abuse of correlation analyses in microbial ecology. The ISME journal, 13(11):2647–2655, 2019. [11] Cassandra Allsup and Richard Lankau. Migration of soil microbes may promote tree seedling tolerance to drying conditions. Ecology, 100:e02729, 04 2019. [12] R. A. Rioux, C. M. Stephens, and J. P. Kerns. Factors affecting pathogenicity of the turfgrass dollar spot pathogen in natural and model hosts. bioRxiv, page 630582, 01 2019. [13] Richard Lankau, Isabelle George, and Max Miao. Crop health optimized by microbial diversity across phylogenetic scales. Submitted, 2020. [14] Emily W Lankau, Diane Xue, Rachel Chrisensen, Amanda J Gevens, and Richard A Lankau. Management and soil conditions influence common scab severity on potato tubers via indirect effects on soil microbial communities. Phytopathology™, 2020/02/27 2020. [15] Ksenia Guseva, Sean Darcy, Eva Simon, Lauren V Alteio, Alicia Montesinos-Navarro, and Christina Kaiser. From diversity to complexity: Microbial networks in soils. Soil Biology and Biochemistry, 169:108604, 2022. [16] Rosa Aghdam, Mojtaba Ganjali, Xiujun Zhang, and Changiz Eslahchi. CN: a consensus algorithm for infer- ring gene regulatory networks using the sorder algorithm and conditional mutual information test. Molecular BioSystems, 11(3):942–949, 2015. [17] Cameron Wagg, Klaus Schlaeppi, Samiran Banerjee, Eiko E Kuramae, and Marcel GA van der Heijden. Fungal- bacterial diversity and microbiome complexity predict ecosystem functioning.Nature communications, 10(1):4841, 2019. [18] Rosa Aghdam, Xudong Tang, Shan Shan, Richard Lankau, and Claudia Solís-Lemus. Human limits in machine learning: prediction of potato yield and disease using soil microbiome data. BMC bioinformatics, 25:366, 2024. [19] Evan Gorstein, Rosa Aghdam, and Claudia Solís-Lemus. Highdimmixedmodels. jl: Robust high-dimensional mixed-effects models across omics data. PLOS Computational Biology, 21(1):e1012143, 2025. [20] Helena Brunel, Joan-Josep Gallardo-Chacón, Alfonso Buil, Montserrat Vallverdú, José Manuel Soria, Pere Caminal, and Alexandre Perera. Miss: a non-linear methodology based on mutual information for genetic association studies in both population and sib-pairs analysis. Bioinformatics, 26(15):1811–1818, 2010. [21] Gökmen Altay and Frank Emmert-Streib. Revealing differences in gene network inference algorithms on the network level by ensemble methods. Bioinformatics, 26(14):1738–1744, 2010. [22] Xiujun Zhang, Xing-Ming Zhao, Kun He, Le Lu, Yongwei Cao, Jingdong Liu, Jin-Kao Hao, Zhi-Ping Liu, and Luonan Chen. Inferring gene regulatory networks from gene expression data by path consistency algorithm based on conditional mutual information. Bioinformatics, 28(1):98–104, January 2012. [23] Xiujun Zhang, Juan Zhao, Jin-Kao Hao, Xing-Ming Zhao, and Luonan Chen. Conditional mutual inclusive information enables accurate quantification of associations in gene regulatory networks. Nucleic acids research, 43(5):e31–e31, 2015. [24] Sayyed Hadi Mahmoodi, Rosa Aghdam, and Changiz Eslahchi. An order independent algorithm for inferring gene regulatory network using quantile value for conditional independence tests. Scientific reports, 11(1):7605, 2021. [25] Rosa Aghdam, Mojtaba Ganjali, and Changiz Eslahchi. IPCA-CMI: an algorithm for inferring gene regulatory networks based on a combination of pca-cmi and mit score. PloS one, 9(4):e92600, 2014. [26] Parisa Niloofar, Rosa Aghdam, and Changiz Eslahchi. Gaem: Genetic algorithm based expectation-maximization for inferring gene regulatory networks from incomplete data. Computers in Biology and Medicine, 183:109238, 2024. 15 .CC-BY 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint [27] Diego Colombo, Marloes H. Maathuis, Markus Kalisch, and Thomas S. Richardson. Learning high-dimensional directed acyclic graphs with latent and selection variables. Ann. Statist., 40(1):294–321, 02 2012. [28] Judea Pearl. Causality: Models, Reasoning and Inference. Cambridge University Press, New York, NY , USA, 2nd edition, 2009. [29] P. Spirtes, C. Glymour, and R. Scheines. Causation, Prediction, and Search. MIT press, 2nd edition, 2000. [30] Rosa Aghdam, Mojtaba Ganjali, Parisa Niloofar, and Changiz Eslahchi. Inferring gene regulatory networks by an order independent algorithm using incomplete data sets. Journal of Applied Statistics, 43(5):893–913, 2016. [31] Seyed Amir Malekpour, Maryam Shahdoust, Rosa Aghdam, and Mehdi Sadeghi. wplogicnet: logic gate and structure inference in gene regulatory networks. Bioinformatics, 39(2):btad072, 2023. [32] Luis M. de Campos. A scoring function for learning bayesian networks based on mutual information and conditional independence tests. Journal of Machine Learning Research, 7:2149–2187, 2006. [33] Eli Faulkner. K2ga: Heuristically guided evolution of bayesian network structures from data. In CIDM, pages 18–25. IEEE, 2007. [34] Seiya Imoto, Takao Goto, and Satoru Miyano. Estimation of genetic networks and functional structures between genes by using bayesian networks and nonparametric regression. In Russ B. Altman, A. Keith Dunker, Lawrence Hunter, and Teri E. Klein, editors, Pacific Symposium on Biocomputing, pages 175–186, 2002. [35] S. Acid and L. M. Campos. A hybrid methodology for learning belief networks: Benedict. International Journal of Approximate Reasoning, 27(3):235–262, 2001. [36] D. M. Chickering, D. Geiger, and D. Heckerman. Learning Bayesian networks: Search methods and experimental results. In Preliminary papers of the 5th International Workshop on Artificial Intelligence and Statistics, pages 112–128, 1995. [37] Markus Kalisch, Martin Mächler, Diego Colombo, Marloes H. Maathuis, and Peter Bühlmann. Causal inference using graphical models with the r package pcalg. Journal of Statistical Software, 47(11):1–26, 5 2012. [38] Marloes H. Maathuis, Markus Kalisch, and Peter Bühlmann. Estimating high-dimensional intervention effects from observational data. Ann. Statist, 37(6A):3133–3164, 2009. [39] Ioannis Tsamardinos, Laura E. Brown, and Constantin F. Aliferis. The max-min hill-climbing bayesian network structure learning algorithm. Machine Learning, 65(1):31–78, 2006. [40] Peter Spirtes, Christopher Meek, and Thomas Richardson. Causal inference in the presence of latent variables and selection bias. In Proceedings of the Eleventh conference on Uncertainty in artificial intelligence, pages 499–506. Morgan Kaufmann Publishers Inc., 1995. [41] Peter Spirtes. An anytime algorithm for causal inference. In Proc. of the Eighth International Workshop on Artificial Intelligence and Statistics, pages 213–221. Citeseer, 2001. [42] Jiji Zhang. On the completeness of orientation rules for causal discovery in the presence of latent confounders and selection bias. Artificial Intelligence, 172(16):1873–1896, 2008. [43] Rosa Aghdam, Vahid Rezaei Tabar, and Hamid Pezeshk. Some node ordering methods for the k2 algorithm. Computational Intelligence, 35(1):42–58, 2019. [44] Zachary D Kurtz, Christian L Müller, Emily R Miraldi, Dan R Littman, Martin J Blaser, and Richard A Bonneau. Sparse and compositionally robust inference of microbial ecological networks. PLoS computational biology, 11(5):e1004226, 2015. [45] Grace Yoon, Irina Gaynanova, and Christian L Müller. Microbial networks in spring-semi-parametric rank-based correlation and partial correlation estimation for quantitative microbiome data. Frontiers in genetics, 10:516, 2019. [46] Jonathan Friedman and Eric J Alm. Inferring correlation networks from genomic survey data.PLoS computational biology, 8(9):e1002687, 2012. [47] F. Pedregosa, G. Varoquaux, A. Gramfort, V . Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V . Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011. [48] Cale O Seymour, Marike Palmer, Eric D Becraft, Ramunas Stepanauskas, Ariel D Friel, Frederik Schulz, Tanja Woyke, Emiley Eloe-Fadrosh, Dengxun Lai, Jian-Yu Jiao, et al. Hyperactive nanobacteria with host-dependent traits pervade omnitrophota. Nature Microbiology, 8(4):727–744, 2023. 16 .CC-BY 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint [49] Ying Zhang, Fanghan Qian, and Yanyu Bao. Variations of microbiota and metabolites in rhizosphere soil of carmona microphylla at the co-contaminated site with polycyclic aromatic hydrocarbons and heavy metals. Ecotoxicology and Environmental Safety, 290:117734, 2025. [50] Yan Yang, Jing Geng, Shulan Cheng, Huajun Fang, Yifan Guo, Yuna Li, Yi Zhou, Fangying Shi, and Karen Vancampenhout. Linking soil microbial community to the chemical composition of dissolved organic matter in a boreal forest during freeze–thaw cycles. Geoderma, 431:116359, 2023. [51] Zhen Guo, Haiou Zhang, Juan Li, Tianqing Chen, Huanyuan Wang, and Yang Zhang. Distribution of soil microorganisms in different complex soil layers in mu us sandy land. PLoS One, 18(4):e0283341, 2023. [52] Hua Wei, Changhui Peng, Bin Yang, Hanxiong Song, Quan Li, Lin Jiang, Gang Wei, Kefeng Wang, Hui Wang, Shirong Liu, et al. Contrasting soil bacterial community, diversity, and function in two forests in china. Frontiers in Microbiology, 9:1693, 2018. [53] Roeland L Berendsen, Corné MJ Pieterse, and Peter AHM Bakker. The rhizosphere microbiome and plant health. Trends in plant science, 17(8):478–486, 2012. [54] Hongli He, Runyu Miao, Lilong Huang, Hongshan Jiang, and Yunqing Cheng. Vegetative cells may perform nitrogen fixation function under nitrogen deprivation in anabaena sp. strain pcc 7120 based on genome-wide differential expression analysis. PLoS One, 16(3):e0248155, 2021. [55] BLD Uthpala Pushpakumara, Kshitij Tandon, Anusuya Willis, and Heroen Verbruggen. Unravelling microalgal- bacterial interactions in aquatic ecosystems through 16s rrna gene-based co-occurrence networks. Scientific Reports, 13(1):2743, 2023. [56] Rodrigo Mendes, Marco Kruijt, Irene De Bruijn, Ester Dekkers, Menno Van Der V oort, Johannes HM Schneider, Yvette M Piceno, Todd Z DeSantis, Gary L Andersen, Peter AHM Bakker, et al. Deciphering the rhizosphere microbiome for disease-suppressive bacteria. Science, 332(6033):1097–1100, 2011. [57] Stéphane Genin and Timothy P Denny. Pathogenomics of the ralstonia solanacearum species complex. Annual review of phytopathology, 50(1):67–89, 2012. [58] Marzieh Ebrahimi-Zarandi, Roohallah Saberi Riseh, and Mika T Tarkka. Actinobacteria as effective biocontrol agents against plant pathogens, an overview on their role in eliciting plant defense. Microorganisms, 10(9):1739, 2022. [59] Gang Wang, Ying Ren, Xuanjiao Bai, Yuying Su, and Jianping Han. Contributions of beneficial microorganisms in soil remediation and quality improvement of medicinal plants. Plants, 11(23):3200, 2022. [60] Djordje Fira, Ivica Dimki´c, Tanja Beri´c, Jelena Lozo, and Slaviša Stankovi´c. Biological control of plant pathogens by bacillus species. Journal of biotechnology, 285:44–55, 2018. [61] Shuhei Yabe, Yasuteru Sakai, Keietsu Abe, and Akira Yokota. Diversity of ktedonobacteria with actinomycetes-like morphology in terrestrial environments. Microbes and environments, 32(1):61–70, 2017. [62] Paige E Payne, Loren N Knobbe, Patricia Chanton, Julian Zaugg, Behzad Mortazavi, and Olivia U Mason. Uncovering novel functions of the enigmatic, abundant, and active anaerolineae in a salt marsh ecosystem. Msystems, 10(1):e01162–24, 2025. [63] Yuqin Liang, Liang Wei, Shuang Wang, Can Hu, Mouliang Xiao, Zhenke Zhu, Yangwu Deng, Xiaohong Wu, Yakov Kuzyakov, Jianping Chen, et al. Long-term fertilization suppresses rice pathogens by microbial volatile compounds. Journal of environmental management, 336:117722, 2023. [64] Zhiqiang Tang, Liying Zhang, Na He, Diankai Gong, Hong Gao, Zuobin Ma, Liang Fu, Mingzhu Zhao, Hui Wang, Changhua Wang, et al. Soil bacterial community as impacted by addition of rice straw and biochar. Scientific Reports, 11(1):22185, 2021. [65] Hazel A Barton, Juan G Giarrizzo, Paula Suarez, Charles E Robertson, Mark J Broering, Eric D Banks, Parag A Vaishampayan, and Kasthisuri Venkateswaran. Microbial diversity in a venezuelan orthoquartzite cave is dominated by the chloroflexi (class ktedonobacterales) and thaumarchaeota group i. 1c. Frontiers in microbiology, 5:615, 2014. [66] Ian DEA Lidbury, Chiara Borsetto, Andrew RJ Murphy, Andrew Bottrill, Alexandra ME Jones, Gary D Bending, John P Hammond, Yin Chen, Elizabeth MH Wellington, and David J Scanlan. Niche-adaptation in plant-associated bacteroidetes favours specialisation in organic phosphorus mineralisation. The ISME Journal, 15(4):1040–1055, 2021. [67] Marketa Sagova-Mareckova, Marek Omelka, and Jan Kopecky. The golden goal of soil management: disease- suppressive soils. Phytopathology®, 113(4):741–752, 2023. [68] Reed Nelson, Rosa Aghdam, and Claudia Solis-Lemus. Minaa: Microbiome network alignment algorithm. Journal of Open Source Software, 9(96):5448, 2024. 17 .CC-BY 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint SUPPLEMENTARY MATERIAL : L EVERAGING BAYESIAN NETWORKS FOR CONSENSUS NETWORK CONSTRUCTION AND MULTI-METHOD FEATURE SELECTION TO DECODE DISEASE PREDICTION Rosa Aghdam Wisconsin Institute for Discovery University of Wisconsin-Madison Madison, WI Shan Shan Department of Plant Pathology Wisconsin Institute for Discovery University of Wisconsin-Madison Madison, WI Richard Lankau Department of Plant Pathology Wisconsin Institute for Discovery University of Wisconsin-Madison Madison, WI Claudia Solís-Lemus∗ Department of Plant Pathology Wisconsin Institute for Discovery University of Wisconsin-Madison Madison, WI List of Tables 1 Taxonomic level (first column), number of OTUs in the original dataset (second column), and number of OTUs remaining after filtering out those that appear in fewer than 15 samples (third column). . . . 2 2 Network metrics for microbiome networks constructed using four different methods based on all samples at the Phylum level. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3 Network metrics for microbiome networks constructed using four different methods based on all samples at the Class level. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 4 Network metrics for microbiome networks constructed using four different methods based on all samples at the Order level. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 5 Commonly identified important OTUs based on topological features in microbiome networks con- structed from all samples at the Phylum level. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 6 Commonly identified important OTUs based on topological features in microbiome networks con- structed from all samples at the Class level. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 7 Commonly identified important OTUs based on topological features in microbiome networks con- structed from all samples at the Order level. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 8 Phylum-level microbial associations identified in ‘clean tuber’ and ‘scab-infected tubers’ microbiome networks. Only interactions confirmed by all four inference methods are shown. . . . . . . . . . . . . 9 9 Class-level microbial associations identified in ‘clean tuber’ and ‘scab-infected tubers’ microbiome networks. Only interactions confirmed by all four inference methods are shown. . . . . . . . . . . . . 9 10 Order-level microbial associations identified in ‘clean tuber’ and ‘scab-infected tubers’ microbiome networks. Only interactions confirmed by all four inference methods are shown. . . . . . . . . . . . . 10 11 Important OTUs identified using Multi Machine Learning (ML) methods at the Phylum level. . . . . 12 ∗Corresponding author: [email protected] .CC-BY 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint Soil BN 12 Important OTUs identified using Multi Machine Learning (ML) methods at the Class level. . . . . . . 13 13 Important OTUs identified using Multi Machine Learning (ML) methods at the Order level. . . . . . 14 14 Important OTUs identified as key features in response to pitted scab at the Phylum level using Strategy 1: Differential Centrality Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 15 Important OTUs identified as key features in response to pitted scab at the Class level using Strategy 1: Differential Centrality Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 16 Important OTUs identified as key features in response to pitted scab at theOrder level using Strategy 1: Differential Centrality Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 17 Selection of key Operational Taxonomic Units (OTUs) at thePhylum level using network-based feature selection (Strategy 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 18 Selection of key Operational Taxonomic Units (OTUs) at theClass level using network-based feature selection (Strategy 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 19 Selection of key Operational Taxonomic Units (OTUs) at the Order level using network-based feature selection (Strategy 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 20 Selection of Operational Taxonomic Units (OTUs) at the Phylum level in both networks of ‘Clean Tubers’ and ‘Scab-Infected Tubers’ using Network-Based Method (Strategy 2) and Machine Learning (ML) methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 21 Selection of Operational Taxonomic Units (OTUs) at the Class level in both networks of ‘Clean Tubers’ and ‘Scab-Infected Tubers’ using Network-Based Method (Strategy 2) and Machine Learning (ML) methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 22 Selection of Operational Taxonomic Units (OTUs) at theOrder level in both networks of ‘Clean Tubers’ and ‘Scab-Infected Tubers’ using Network-Based Method (Strategy 2) and Machine Learning (ML) methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 23 Operational Taxonomic Units (OTUs) of significance in the ‘Clean Tubers’ network, selected by all four algorithms: CMIMN, SPARCC, SE_glasso , and SPRING. . . . . . . . . . . . . . . . . . . . . . 27 24 Operational Taxonomic Units (OTUs) of significance in the ‘scab-infected tubers’ network, selected by all four algorithms: CMIMN, SPARCC, SE_glasso , and SPRING. . . . . . . . . . . . . . . . . . . . 27 List of Figures 1 Box plots illustrating F-scores obtained from the robustness analysis of different microbiome network construction methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 Microbiome network at the Class taxonomic level . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 3 microbiome network at the Order taxonomic level. . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 4 Overlap Between Machine Learning Methods based on different nomalized data sets and Network-Based Approaches for ‘clean tubers’ network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 5 Overlap Between Machine Learning Methods based on different nomalized data sets and Network-Based Approaches for ‘scab-infected tubers’ network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 Table 1: Taxonomic level (first column), number of OTUs in the original dataset (second column), and number of OTUs remaining after filtering out those that appear in fewer than 15 samples (third column). Level # OTUs # OTUs after filtering Phylum 57 42 Class 152 108 Order 378 224 2 .CC-BY 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint Soil BN 1.0 �.,.. I .,. �•-t-• • • • • I • 0.8 • Level Q) E;3 ,.._ phylum • (/) class • order 0.6 0.4 CMIMN SE_glasso SPRING SPARCC Algorithm Figure 1: Box plots illustrating F-scores obtained from the robustness analysis of different microbiome network construction methods. The evaluation involved constructing networks from 50 distinct datasets, each generated by randomly selecting 70% of the samples from the full dataset. Performance was assessed across different taxonomic levels, including Phylum, Class, and Order. Four network inference algorithms— CMIMN, SPRING, SE_glasso, and SPARCC—were compared. Our results demonstrate that CMIMN exhibits superior robustness, as indicated by consistently higher and more stable F-scores across all taxonomic levels. Notably,SE_glasso shows the least favorable performance at the Class level, with greater variability and lower F-scores. 3 .CC-BY 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint Soil BN ‘clean tubers’ network ‘scab-infected tubers’ network common interactions Figure 2: This figure illustrates the microbiome network at the Class taxonomic level. Part (a) represents the ‘clean tubers’ network, part (b) displays the ‘scab-infected tubers’ network, and part (c) shows the common interactions between them. Nodes correspond to Operational Taxonomic Units (OTUs) and are color-coded: purple for OTUs shared between ‘clean tubers’ and ‘scab-infected tubers’ networks, blue for OTUs unique to the ‘clean tubers’ network, and green for OTUs unique to the ‘scab-infected tubers’ network. Edges are shown by solid lines which confirmed by all four method). 4 .CC-BY 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint Soil BN ‘clean tubers’ network ‘scab-infected tubers’ network common interactions Figure 3: This figure showcases the microbiome network at the Order taxonomic level. Part (a) represents the ‘clean tubers’ network, part (b) displays the ‘scab-infected tubers’ network, and part (c) shows the common interactions between them. Nodes correspond to Operational Taxonomic Units (OTUs) and are color-coded: purple for OTUs shared between ‘clean tubers’ and ‘scab-infected tubers’ networks, blue for OTUs unique to the ‘clean tubers’ network, and green for OTUs unique to the ‘scab-infected tubers’ network. The top 20% of nodes with the highest centrality scores resulted by Equation (5) are labeled in dark purple. Node size reflects their degree of connectivity. Edges are shown as solid lines that confirmed by all four methods. 5 .CC-BY 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint Soil BN Table 2: Network metrics for microbiome networks constructed using four different methods based on all samples at the Phylum level.

Method

num_edges average_path_length transitivity mean_degree mean_distance modularity SPARCC 148 1.917 0.506 3.895 1.917 0.238 SE_glasso 60 2.322 0.429 1.579 2.322 0.351 SPRING 172 2.658 0.272 4.526 2.658 0.456 CMIMN 136 3.06 0.636 3.579 3.06 0.372 Table 2 summarizes network metrics for different methods applied at the Phylum level of microbiome analysis. Notable variations in network characteristics are observed across these methods. SPRING constructs a network with 172 edges, indicating a relatively complex network structure, while SE_glasso results in a sparser network with only 60 edges. SPARCC falls in between with 148 edges, and CMIMN generates a network with 136 edges. In terms of average path length, SPARCC exhibits shorter paths on average (1.917), suggesting high connectivity, whereas CMIMN has longer average path lengths indicating less direct connections. Transitivity, a measure of clustering, is highest for CMIMN (0.636), implying a higher level of clustering in its network. The mean_degree and mean_distance offer insights into network density and structure; SPARCC and SPRING have higher mean degrees (3.895 and 4.526, respectively) and shorter mean distances (1.917 and 2.322, respectively), suggesting denser and more compact networks. Finally, SPRING exhibits the highest modularity value (0.456), indicating a potential modular structure within its network. These metrics collectively illustrate that different methods yield networks with distinct structural characteristics, highlighting the importance of selecting the method that aligns with specific research objectives and hypotheses. Table 3: Network metrics for microbiome networks constructed using four different methods based on all samples at the Class level. topology_name num_edges average_path_length transitivity mean_degree mean_distance modularity SPARCC 880 2.249 0.597 8.889 2.249 0.195 SE_glasso 228 2.388 0.562 2.303 2.388 0.333 SPRING 732 2.697 0.208 7.394 2.697 0.409 CMIMN 856 2.402 0.603 8.646 2.402 0.467 Table 3 provides network metrics for various methods applied at the Class level of microbiome analysis. Notable differences in network characteristics are evident across these methods. SPARCC yields a network with 880 edges, indicating a relatively complex and highly connected structure. SE_glasso, in contrast, results in a sparser network with 228 edges, suggesting fewer connections between Class-level components. SPRING’s network falls in between with 732 edges, while CMIMN constructs a network with 856 edges. When examining the average path length, SPARCC exhibits shorter paths on average (2.249), implying higher overall connectivity. SE_glasso and SPRING have longer average path lengths (2.388 and 2.697, respectively), indicating relatively less direct connections. Transitivity, a measure of network clustering, is highest for CMIMN (0.603), signifying a substantial degree of clustering within its network. Mean degree and mean distance provide insights into the density and overall structure of the networks; SPARCC and CMIMN have higher mean degrees (8.889 and 8.646, respectively) and shorter mean distances (2.249 and 2.402, respectively), suggesting denser and more compact networks. Finally, in terms of modularity,CMIMN exhibits the highest value (0.467), indicating a potential modular structure within its network. Table 4: Network metrics for microbiome networks constructed using four different methods based on all samples at the Order level. topology_name num_edges average_path_length transitivity mean_degree mean_distance modularity SPARCC 3400 2.091 0.647 17.989 2.091 0.212 SE_glasso 2170 2.414 0.503 11.481 2.414 0.324 SPRING 2060 2.595 0.175 10.899 2.595 0.399 CMIMN 3574 2.391 0.626 18.91 2.391 0.426 Table 4 presents network metrics for different methods applied at the Order level of microbiome analysis. Notable variations in network characteristics are observed across these methods. SPARCC produces a network with 3,400 edges, 6 .CC-BY 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint Soil BN indicating a complex and highly connected structure. SE_glasso results in a network with 2,170 edges, suggesting a somewhat sparser network compared to SPARCC. SPRING falls in between with 2,060 edges, while CMIMN constructs a network with 3,574 edges, representing a highly connected network. Examining average path length, SPARCC exhibits shorter paths on average (2.091), indicating a high degree of connectivity.SE_glasso and SPRING have longer average path lengths (2.414 and 2.595, respectively), implying less direct connections. Transitivity, a measure of network clustering, is highest for SPARCC (0.647), signifying a substantial level of clustering within its network. Mean degree and mean distance provide insights into network density and structure; SPARCC and CMIMN have higher mean degrees (17.989 and 18.91, respectively) and shorter mean distances (2.091 and 2.391, respectively), suggesting dense and compact networks. Finally, in terms of modularity, CMIMN exhibits the highest value (0.426), indicating a potential modular structure within its network. These metrics collectively demonstrate that different methods yield networks with distinct structural characteristics at the Order level, emphasizing the importance of method selection based on specific research objectives and hypotheses. In Tables S2 to S4, which present network metrics for various taxonomic levels, we observed notable trends. Specifically, the methods yielded distinct results: • For average_path_length, mean_distance, and modularity, the SPARCC method consistently produced the lowest values, indicating its efficiency in minimizing these metrics across all taxonomic levels. • Conversely, in the case of transitivity, the SPRING algorithm consistently yielded the lowest values. This suggests that the SPRING algorithm is particularly effective in minimizing transitivity, irrespective of the taxonomic level under consideration. Table 5: Commonly identified important OTUs based on topological features in microbiome networks constructed from all samples at the Phylum level. Microbiome networks were constructed using four different inference methods: SE_glasso, SPRING, SPARCC, and CMIMN. For each network, key topological features—Degree, Betweenness, Close- ness, Eigenvector Centrality, and PageRank—were computed to assess the importance of individual OTUs. The top 20% of OTUs based on each centrality measure were selected as important. An OTU was included in this table if it consistently ranked in the top 20% across all four network inference methods based on at least one topological feature. For example, Proteobacteria was ranked in the top 20% of networks generated by all four methods based on Degree, Closeness, and PageRank. Important OTUs Features Proteobacteria Degree, Closeness, Page Rank Acidobacteriota Degree, Closeness, Page Rank WPS.2 Betweenness, Closeness NB1.j Page Rank 7 .CC-BY 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint Soil BN Table 6: Commonly identified important OTUs based on topological features in microbiome networks constructed from all samples at the Class level. Microbiome networks were constructed using four different inference methods: SE_glasso, SPRING, SPARCC, and CMIMN. For each network, key topological features—Degree, Betweenness, Close- ness, Eigenvector Centrality, and PageRank—were computed to assess the importance of individual OTUs. The top 20% of OTUs based on each centrality measure were selected as important. An OTU was included in this table if it consistently ranked in the top 20% across all four network inference methods based on at least one topological feature. For example, Acidobacteriae was consistently ranked in the top 20% of networks generated by all four methods based on all topological features. OTUs Topology Measures Acidobacteriae Degree, Betweenness, Closeness, Eigenvector Centrality, Page Rank Alphaproteobacteria Degree, Betweenness, Closeness, Page Rank Anaerolineae Degree, Betweenness, Closeness Ignavibacteria Degree, Eigenvector Centrality, Page Rank Ktedonobacteria Betweenness Gammaproteobacteria Betweenness, Page Rank MB.A2.108 Closeness OM190 Closeness Syntrophobacteria Eigenvector Centrality Kryptonia Eigenvector Centrality Desulfobulbia Eigenvector Centrality Table 7: Commonly identified important OTUs based on topological features in microbiome networks constructed from all samples at the Order level. Microbiome networks were constructed using four different inference methods: SE_glasso, SPRING, SPARCC, and CMIMN. For each network, key topological features—Degree, Betweenness, Close- ness, Eigenvector Centrality, and PageRank—were computed to assess the importance of individual OTUs. The top 20% of OTUs based on each centrality measure were selected as important. An OTU was included in this table if it consistently ranked in the top 20% across all four network inference methods based on at least one topological feature. For example, C0119 was consistently ranked in the top 20% of networks generated by all four methods based on all topological features. OTUs Topology Measures C0119 Degree, Betweenness, Closeness, Eigenvector Centrality, Page Rank Sphingomonadales Degree, Closeness, Page Rank Gammaproteobacteria.Incertae Degree, Betweenness, Closeness, Page Rank Defluviicoccales Degree, Page Rank Microtrichales Degree, Page Rank Gemmatimonadales Betweenness Propionibacteriales Betweenness B10.SB3A Betweenness Erysipelotrichales Closeness Reyranellales Closeness 8 .CC-BY 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint Soil BN Table 8: Phylum-level microbial associations identified in ‘clean tuber’ and ‘scab-infected tubers’ microbiome networks. Only interactions confirmed by all four inference methods are shown. ‘clean tubers’ Network Links ‘scab-infected tubers’ Network Links Common Links Chloroflexi–Acidobacteriota Actinobacteriota–Gemmatimonadota Actinobacteriota–Proteobacteria Chloroflexi–Planctomycetota Actinobacteriota–Acidobacteriota Proteobacteria–Acidobacteriota Gemmatimonadota–Acidobacteriota Gemmatimonadota–Acidobacteriota Gemmatimonadota–Proteobacteria Proteobacteria–Firmicutes Planctomycetota–Patescibacteria Proteobacteria–Myxococcota Methylomirabilota–WPS.2 Bacteroidota–Patescibacteria Armatimonadota–WPS.2 NB1.j–MBNT15 Proteobacteria–Bacteroidota Spirochaetota–MBNT15 Desulfobacterota–MBNT15 Verrucomicrobiota–Bacteroidota Myxococcota–Bacteroidota Table 9: Class-level microbial associations identified in ‘clean tuber’ and ‘scab-infected tubers’ microbiome networks. Only interactions confirmed by all four inference methods are shown. ‘clean tuber’ Network Links ‘scab-infected tubers’ Network Links Common Links AD3–Kapabacteria AD3–Chthonomonadetes AD3–Acidobacteriae AD3–Verrucomicrobiae AD3–Ktedonobacteria Acidobacteriae–Chthonomonadetes Acidobacteriae–Armatimonadia AD3–Syntrophobacteria Acidobacteriae–Gemmatimonadetes Acidobacteriae–Lineage.IIa Actinobacteria–Gemmatimonadetes Acidobacteriae–Ktedonobacteria Acidobacteriae–OLB14 Alphaproteobacteria–Blastocatellia Alphaproteobacteria–Gammaproteobacteria Actinobacteria–Alphaproteobacteria Aminicenantia–Desulfobulbia Alphaproteobacteria–Thermoleophilia Actinobacteria–Bacilli Aminicenantia–Thermodesulfovibrionia Bacteroidia–Gammaproteobacteria Actinobacteria–Bacteroidia Anaerolineae–Dehalococcoidia Chthonomonadetes–Ktedonobacteria Alphaproteobacteria–Bacteroidia Anaerolineae–MB.A2.108 Gemmatimonadetes–TK10 Alphaproteobacteria–Gemmatimonadetes Bacilli–Blastocatellia Alphaproteobacteria–Saccharimonadia Bacteroidia–Verrucomicrobiae Alphaproteobacteria–TK10 Chloroflexia–TK10 Alphaproteobacteria–Vampirivibrionia Coriobacteriia–Desulfobulbia Anaerolineae–Phycisphaerae Coriobacteriia–Syntrophia Anaerolineae–Thermoanaerobaculia Coriobacteriia–Syntrophobacteria Anaerolineae–Vicinamibacteria Desulfitobacteriia–Desulfobaccia Babeliae–Chlamydiae Desulfitobacteriia–Desulfobulbia Bacilli–Gemmatimonadetes Desulfitobacteriia–Syntrophobacteria Blastocatellia–SHA.26 Desulfobaccia–Desulfobulbia Blastocatellia–Thermoanaerobaculia Desulfobaccia–Kryptonia Blastocatellia–Vicinamibacteria Desulfobaccia–Syntrophobacteria Cyanobacteriia–Planctomycetes Desulfobaccia–Thermodesulfovibrionia Gammaproteobacteria–JG30.KF.CM66 Desulfobacteria–Kryptonia Gammaproteobacteria–Saccharimonadia Desulfobacteria–Syntrophobacteria Gammaproteobacteria–Vampirivibrionia Desulfobacteria–Thermodesulfovibrionia Gitt.GS.136–MB.A2.108 Desulfobulbia–Syntrophia Gitt.GS.136–Methylomirabilia Desulfobulbia–Syntrophobacteria Gitt.GS.136–SHA.26 Desulfobulbia–Thermodesulfovibrionia Gitt.GS.136–Vicinamibacteria Gammaproteobacteria–Polyangia Holophagae–Thermoanaerobaculia Gammaproteobacteria–Thermoleophilia JG30.KF.CM66–Ktedonobacteria Kryptonia–Syntrophobacteria Kapabacteria–Lineage.IIa Spirochaetia–Syntrophia Ktedonobacteria–Planctomycetes Spirochaetia–Thermodesulfovibrionia Ktedonobacteria–Saccharimonadia Syntrophia–Thermodesulfovibrionia Lineage.IIa–Vampirivibrionia MB.A2.108–Methylomirabilia MB.A2.108–P2.11E MB.A2.108–Subgroup.25 MB.A2.108–Vicinamibacteria Methylomirabilia–SHA.26 Methylomirabilia–Subgroup.25 Continued on next page 9 .CC-BY 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint Soil BN Table 9 Continued from previous page Clean Network Links Disease Network Links Common Links Planctomycetes–Verrucomicrobiae Subgroup.22–Subgroup.25 Subgroup.22–Subgroup.5 Subgroup.22–Vicinamibacteria Subgroup.25–bacteriap25 Subgroup.5–Vicinamibacteria Table 10: Order-level microbial associations identified in ‘clean tuber’ and ‘scab-infected tubers’ microbiome networks. Only interactions confirmed by all four inference methods are shown. Clean Network Links Disease Network Links Common Links Acetobacterales–Armatimonadales AKIW659–Aminicenantales Acetobacterales–C0119 Acetobacterales–B12.WMSP1 AKIW659–Bacteroidales Acidobacteriales–Bryobacterales Acetobacterales–Bryobacterales AKIW659–Defluviicoccales Acidobacteriales–Elsterales Acetobacterales–Elsterales AKIW659–Syntrophales Acidobacteriales–Ktedonobacterales Acetobacterales–Frankiales AKIW659–Syntrophobacterales Actinomarinales–PAUC26f Acetobacterales–Isosphaerales Acetobacterales–Acidobacteriales Anaerolineales–SBR1031 Acetobacterales–Saccharimonadales Acetobacterales–Ktedonobacterales B12.WMSP1–Elev.1554 Acidobacteriales–Chthonomonadales Acetobacterales–Reyranellales Bacillales–Paenibacillales Acidobacteriales–Gaiellales Acidiferrobacterales–Desulfobaccales Blastocatellales–Chloroflexales Acidobacteriales–Micropepsales Acidiferrobacterales–Kryptoniales Bryobacterales–Micropepsales Acidobacteriales–Subgroup.2 Acidiferrobacterales–Methylomirabilales Bryobacterales–Solibacterales Acidobacteriales–Xanthomonadales Acidobacteriales–Ardenticatenales Burkholderiales–Pedosphaerales Actinomarinales–PLTA13 Acidobacteriales–C0119 Burkholderiales–Rhizobiales Azospirillales–Chloroflexales Acidobacteriales–Solibacterales Catenulisporales–Elev.1554 Azospirillales–Propionibacteriales Actinomarinales–Ardenticatenales Chitinophagales–Sphingomonadales Azospirillales–Pyrinomonadales Actinomarinales–Candidatus.Adlerbacteria Elev.1554–Ktedonobacterales Azospirillales–Rhodobacterales Actinomarinales–Ignavibacteriales Elsterales–Frankiales Azospirillales–Subgroup.2 Actinomarinales–Rhodothermales Erysipelotrichales–Peptostreptococcales.Tissierellales B10.SB3A–B12.WMSP1 Aminicenantales–Desulfobaccales Gaiellales–IMCC26256 B10.SB3A–Candidatus.Liptonbacteria Aminicenantales–Desulfobulbales Ignavibacteriales–SJA.15 B10.SB3A–Catenulisporales Aminicenantales–SJA.15 Micrococcales–Sphingomonadales B10.SB3A–Diplorickettsiales Aminicenantales–Syntrophales Microtrichales–Subgroup.17 B10.SB3A–Elev.1554 Aminicenantales–Syntrophobacterales PLTA13–Subgroup.17 B10.SB3A–Gemmatales Anaerolineales–Kryptoniales Rhizobiales–Solirubrobacterales B10.SB3A–Isosphaerales Anaerolineales–S085 B10.SB3A–Ktedonobacterales Anaerolineales–Subgroup.17 B10.SB3A–Micropepsales B12.WMSP1–Subgroup.13 B10.SB3A–Nannocystales Bacillales–Chitinophagales B10.SB3A–Subgroup.2 Bacillales–Sphingomonadales B10.SB3A–X24.Nov Bacteroidales–Candidatus.Moranbacteria B12.WMSP1–CCD24 Bacteroidales–Defluviicoccales B12.WMSP1–Catenulisporales Bacteroidales–Desulfobaccales B12.WMSP1–Ktedonobacterales Bacteroidales–Desulfobulbales B12.WMSP1–Microtrichales Bacteroidales–OPB41 B12.WMSP1–Steroidobacterales Bacteroidales–SJA.15 B12.WMSP1–Subgroup.17 Bacteroidales–Spirochaetales B12.WMSP1–WD260 Bacteroidales–Syntrophobacterales Bacillales–Gemmatimonadales Bryobacterales–C0119 Blastocatellales–Subgroup.7 Bryobacterales–Chthonomonadales Blastocatellales–Thermoanaerobaculales C0119–Candidatus.Adlerbacteria Bryobacterales–Candidatus.Liptonbacteria C0119–Chthonomonadales Bryobacterales–Elsterales C0119–Elsterales Bryobacterales–Gammaproteobacteria.Incertae.Sedis C0119–Gammaproteobacteria.Incertae.Sedis Bryobacterales–Gemmatimonadales C0119–Ktedonobacterales Burkholderiales–Caulobacterales C0119–Micropepsales Burkholderiales–Haliangiales C0119–PAUC26f Burkholderiales–Sphingobacteriales C0119–Sphingomonadales C0119–Gemmatimonadales Candidatus.Adlerbacteria–Desulfitobacteriales C0119–PLTA13 Candidatus.Adlerbacteria–Kryptoniales C0119–Solibacterales Candidatus.Adlerbacteria–PAUC26f CCD24–Caldilineales Candidatus.Moranbacteria–Desulfobaccales CCD24–Catenulisporales Candidatus.Moranbacteria–Desulfobulbales Continued on next page 10 .CC-BY 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint Soil BN Table 10 Continued from previous page Clean Network Links Disease Network Links Common Links CCD24–RBG.13.54.9 Candidatus.Moranbacteria–Ignavibacteriales CCD24–Rokubacteriales Candidatus.Moranbacteria–Ktedonobacterales CCD24–Steroidobacterales Candidatus.Moranbacteria–Syntrophobacterales CCD24–Subgroup.17 Candidatus.Yanofskybacteria–Kryptoniales CCD24–Vicinamibacterales Candidatus.Yanofskybacteria–PLTA13 Caldilineales–Catenulisporales Candidatus.Yanofskybacteria–Subgroup.17 Caldilineales–Microtrichales Chitinophagales–Chthoniobacterales Caldilineales–Propionibacteriales Chitinophagales–Sphingobacteriales Candidatus.Jorgensenbacteria–Candidatus.Kaiserbacteria Cytophagales–Polyangiales Candidatus.Jorgensenbacteria–Candidatus.Liptonbacteria Defluviicoccales–Desulfitobacteriales Candidatus.Jorgensenbacteria–KF.JG30.C25 Defluviicoccales–Kryptoniales Catenulisporales–Elsterales Defluviicoccales–PAUC26f Catenulisporales–Kapabacteriales Defluviicoccales–PLTA13 Catenulisporales–Ktedonobacterales Defluviicoccales–S085 Catenulisporales–WD260 Defluviicoccales–SJA.15 Caulobacterales–Rhizobiales Desulfitobacteriales–Desulfobaccales Caulobacterales–Sphingomonadales Desulfitobacteriales–Kryptoniales Caulobacterales–Xanthomonadales Desulfobaccales–Desulfobulbales Chitinophagales–Micrococcales Desulfobaccales–Ignavibacteriales Chitinophagales–Rhizobiales Desulfobaccales–Kryptoniales Chloroflexales–Kallotenuales Desulfobaccales–SJA.15 Chloroflexales–Propionibacteriales Desulfobaccales–Syntrophales Chloroflexales–Pyrinomonadales Desulfobaccales–Syntrophobacterales Chloroflexales–RBG.13.54.9 Desulfobulbales–Kryptoniales Chthoniobacterales–Solibacterales Desulfobulbales–SJA.15 Chthonomonadales–Elsterales Desulfobulbales–Syntrophales Chthonomonadales–PLTA13 Desulfobulbales–Syntrophobacterales Cyanobacteriales–Leptolyngbyales Elsterales–Gemmatimonadales DS.100–X24.Nov Elsterales–Ktedonobacterales Diplorickettsiales–Xanthomonadales Elsterales–Micropepsales Elev.1554–Gammaproteobacteria.Incertae.Sedis Elsterales–Reyranellales Elev.1554–Gemmatales Frankiales–Gemmatimonadales Elev.1554–Subgroup.13 Gammaproteobacteria.Incertae.Sedis–Paenibacillales Elev.1554–WD260 Gemmatimonadales–Kallotenuales Elsterales–Gammaproteobacteria.Incertae.Sedis Ignavibacteriales–JG36.TzT.191 Elsterales–Subgroup.13 Ignavibacteriales–Kryptoniales FFCH16263–Nitrospirales Ignavibacteriales–Ktedonobacterales FFCH16263–Subgroup.2 Ignavibacteriales–Syntrophobacterales FFCH16263–Vicinamibacterales Kryptoniales–Syntrophobacterales Frankiales–Gaiellales Ktedonobacterales–Rickettsiales Gammaproteobacteria.Incertae.Sedis–Xanthomonadales Ktedonobacterales–Syntrophobacterales Gemmatales–Isosphaerales Methylomirabilales–OPB41 Gemmatimonadales–Paenibacillales Methylomirabilales–SJA.15 Haliangiales–Pedosphaerales Methylomirabilales–Syntrophales IMCC26256–Subgroup.2 OPB41–Syntrophales Isosphaerales–Ktedonobacterales OPB41–Syntrophobacterales KF.JG30.C25–Subgroup.13 PAUC26f–PLTA13 Ktedonobacterales–Micropepsales PAUC26f–SJA.15 Ktedonobacterales–PAUC26f PLTA13–Syntrophobacterales Ktedonobacterales–Subgroup.2 Paenibacillales–Rickettsiales Leptolyngbyales–Oxyphotobacteria.Incertae.Sedis Paenibacillales–Sphingomonadales Micromonosporales–Propionibacteriales Paenibacillales–Xanthomonadales Micromonosporales–Rhizobiales Propionibacteriales–Thermomicrobiales Micropepsales–Subgroup.2 Rhodothermales–SAR202.clade Micropepsales–Xanthomonadales SJA.15–Syntrophobacterales Microtrichales–Phycisphaerales Sphingomonadales–Xanthomonadales Microtrichales–Rhodobacterales Spirochaetales–Syntrophales Microtrichales–SBR1031 Microtrichales–Thermoanaerobaculales Microtrichales–Vicinamibacterales Nannocystales–PLTA13 Nannocystales–Steroidobacterales Nannocystales–WD260 Nitrospirales–Rokubacteriales Nitrospirales–Vicinamibacterales Continued on next page 11 .CC-BY 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint Soil BN Table 10 Continued from previous page Clean Network Links Disease Network Links Common Links Pedosphaerales–Thermomicrobiales Phycisphaerales–SBR1031 Propionibacteriales–Rhodobacterales Pyrinomonadales–RBG.13.54.9 Pyrinomonadales–Vicinamibacterales Pyrinomonadales–X24.Nov RBG.13.54.9–Rokubacteriales RBG.13.54.9–SBR1031 RBG.13.54.9–Subgroup.7 Reyranellales–Solibacterales Rhodobacterales–Verrucomicrobiales Rhodobacterales–WD260 Rokubacteriales–Subgroup.17 Rokubacteriales–X24.Nov SBR1031–Steroidobacterales SBR1031–Subgroup.17 SBR1031–Thermoanaerobaculales Saccharimonadales–Xanthomonadales Sphingobacteriales–Xanthomonadales Steroidobacterales–Vicinamibacterales Subgroup.17–Vicinamibacterales Subgroup.17–X24.Nov Subgroup.2–WD260 Subgroup.7–Thermoanaerobaculales Vicinamibacterales–X24.Nov Table 11: Important OTUs identified using Multi Machine Learning (ML) methods at the Phylum level. Microbiome data were normalized using four different techniques: (1) Centered Log-Ratio (CLR), (2) the original dataset (no transformation), (3) Log transformation, and (4) Total Sum Scaling (TSS). Seven ML-based feature selection methods were applied to each normalized dataset to identify important OTUs. An OTU was selected if it was identified as important by at least five out of the seven ML methods. The table presents the OTUs identified as important for each normalization method, with separate columns for ML_Clr (CLR-normalized data), ML_Original (original dataset), ML_Log (log-transformed data), and ML_TSS (TSS-normalized data). The final ML_Intersection column lists the OTUs that were consistently selected as important across all normalization methods, highlighting robust microbial taxa that are independent of the normalization approach. ML_Clr ML_Original ML_Log ML_TSS ML_Intersection Firmicutes Armatimonadota Firmicutes Firmicutes Firmicutes Cyanobacteria Firmicutes Armatimonadota Cyanobacteria Cyanobacteria Methylomirabilota Deinococcota Cyanobacteria Patescibacteria Armatimonadota Armatimonadota Cyanobacteria Deinococcota Armatimonadota NB1.j Deinococcota NB1.j Verrucomicrobiota Bacteroidota WPS.2 Verrucomicrobiota Spirochaetota Spirochaetota Acidobacteriota Patescibacteria NB1.j Methylomirabilota MBNT15 Methylomirabilota Bacteroidota Acidobacteriota NB1.j WPS.2 Nitrospirota Bacteroidota Desulfobacterota Verrucomicrobiota NB1.j 12 .CC-BY 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint Soil BN Table 12: Important OTUs identified using Multi Machine Learning (ML) methods at the Class level. Microbiome data were normalized using four different techniques: (1) Centered Log-Ratio (CLR), (2) the original dataset (no transformation), (3) Log transformation, and (4) Total Sum Scaling (TSS). Seven ML-based feature selection methods were applied to each normalized dataset to identify important OTUs. An OTU was selected if it was identified as important by at least five out of the seven ML methods. The table presents the OTUs identified as important for each normalization method, with separate columns for ML_Clr (CLR-normalized data), ML_Original (original dataset), ML_Log (log-transformed data), and ML_TSS (TSS-normalized data). The final ML_Intersection column lists the OTUs that were consistently selected as important across all normalization methods, highlighting robust microbial taxa that are independent of the normalization approach. ML_Clr ML_Original ML_Log ML_TSS ML_Intersection Bacilli Bacilli Bacilli Vicinamibacteria Bacilli Ktedonobacteria Latescibacteria Ignavibacteria Saccharimonadia Ktedonobacteria Cyanobacteriia Saccharimonadia Verrucomicrobiae Gemmatimonadetes Cyanobacteriia Saccharimonadia Ktedonobacteria Deinococci Gitt.GS.136 Saccharimonadia Planctomycetes Spirochaetia Latescibacteria Oligoflexia Planctomycetes Lineage.IIa Planctomycetes Saccharimonadia Ktedonobacteria Ignavibacteria Vampirivibrionia Kryptonia Ktedonobacteria Anaerolineae Dehalococcoidia Verrucomicrobiae Verrucomicrobiae OLB14 Bacilli Anaerolineae Ignavibacteria Deinococci Kryptonia Acidimicrobiia MB.A2.108 Dehalococcoidia Ignavibacteria MB.A2.108 Planctomycetes Chthonomonadetes Thermoplasmata OLB14 Cyanobacteriia Nitrospiria Kryptonia Deinococci Cyanobacteriia Planctomycetes Methylomirabilia Anaerolineae Dehalococcoidia AD3 Ignavibacteria Pla4.lineage Methylomirabilia Gitt.GS.136 MB.A2.108 MB.A2.108 Chthonomonadetes Dehalococcoidia Acidobacteriae Methylomirabilia MB.A2.108 Anaerolineae TK10 Chthonomonadetes Gitt.GS.136 Armatimonadia Bacteroidia OLB14 Vicinamibacteria Chthonomonadetes Polyangia Bacteroidia Anaerolineae Oligoflexia Thermoleophilia Oligoflexia Vampirivibrionia Clostridia KD4.96 Kryptonia Holophagae Spirochaetia Cyanobacteriia TK10 Chthonomonadetes Dehalococcoidia Kryptonia 13 .CC-BY 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint Soil BN Table 13: Important OTUs identified using Multi Machine Learning (ML) methods at theOrder level. Microbiome data were normalized using four different techniques: (1) Centered Log-Ratio (CLR), (2) the original dataset (no transformation), (3) Log transformation, and (4) Total Sum Scaling (TSS). Seven ML-based feature selection methods were applied to each normalized dataset to identify important OTUs. An OTU was selected if it was identified as important by at least five out of the seven ML methods. The table presents the OTUs identified as important for each normalization method, with separate columns for ML_Clr (CLR-normalized data), ML_Original (original dataset), ML_Log (log-transformed data), and ML_TSS (TSS-normalized data). The final ML_Intersection column lists the OTUs that were consistently selected as important across all normalization methods, highlighting robust microbial taxa that are independent of the normalization approach. ML_Clr ML_Original ML_Log ML_TSS ML_Intersection Saccharimonadales Anaerolineales Saccharimonadales Sphingomonadales Saccharimonadales Bacillales Acetobacterales Bacillales Saccharimonadales Bacillales C0119 Chloroplast Subgroup.2 SBR1031 C0119 Isosphaerales P AUC26f Solibacterales Kryptoniales Subgroup.2 Subgroup.2 Subgroup.17 Micropepsales V icinamibacterales Xanthomonadales Blastocatellales Bacteroidales Anaerolineales Anaerolineales Acidobacteriales Xanthomonadales Alic yclobacillales Latescibacterales C0119 Chloroplast Acidobacteriales Micropepsales Frankiales Bryobacterales Alic yclobacillales Chloroplast Bacillales Gaiellales Micropepsales P aenibacillales Alicyclobacillales Sphingomonadales Alic yclobacillales Chloroplast Acetobacterales P aenibacillales Defluviicoccales Bacteroidales Subgroup.2 Pseudomonadales Acetobacterales Frankiales B10.SB3A Gaiellales Anaerolineales Pseudomonadales Saccharimonadales Acetobacterales PL TA13 Elsterales Anaerolineales Caulobacterales Chloroplast Bacillales Bacteroidales Elsterales Acidobacteriales Defluviicoccales Microtrichales Ktedonobacterales Oligofle xales Pseudomonadales Propionibacteriales Candidatus.Le vybacteria Sphingomonadales Bacteroidales Bryobacterales Acidobacteriales Pseudomonadales Kineosporiales Deinococcales C0119 Sphingomonadales Rokubacteriales SBR1031 Ktedonobacterales Gaiellales Caulobacterales Acetobacterales Rokubacteriales Sphingomonadales B10.SB3A B12.WMSP1 Clostridiales Frankiales Chlorofle xales Microtrichales SBR1031 P AUC26f Micropepsales Rhodobacterales Chthoniobacterales Chthoniobacterales Elsterales Gaiellales Candidatus.Y anofskybacteria SBR1031 Deinococcales Frankiales PL TA13 KF.JG30.C25 Kineosporiales Pseudomonadales B10.SB3A Defluviicoccales Kineosporiales Ktedonobacterales Obscuribacterales Kineosporiales Obscuribacterales SBR1031 Subgroup.2 Rokubacteriales Ktedonobacterales Rokubacteriales Latescibacterales Subgroup.17 Subgroup.17 Frankiales Chitinophag ales Lineage.IV Lineage.IV Rick ettsiales B12.WMSP1 P AUC26f Micromonosporales Micropepsales Blastocatellales Spirochaetales Gammaproteobacteria.Incertae Enterobacterales Xanthomonadales Chitinophag ales Blastocatellales Micrococcales Syntrophobacterales Kineosporiales Cytophag ales Gaiellales Sphingobacteriales Isosphaerales Obscuribacterales PL TA13 SJ A.15 P aenibacillales Deinococcales Solibacterales Elsterales Xanthomonadales Chthoniobacterales CCD24 Pseudonocardiales C0119 Chthonomonadales Defluviicoccales Clostridiales Ktedonobacterales Acidobacteriales Desulfitobacteriales Candidatus.Y anofskybacteria Micromonosporales Pseudonocardiales S085 Kryptoniales Kryptoniales Actinomarinales Obscuribacterales Chthonomonadales Chthonomonadales Rick ettsiales SJ A.28 Thermoanaerobaculales P aenibacillales P aenibacillales PL TA13 Alic yclobacillales Propionibacteriales Bryobacterales Xanthomonadales S085 Rick ettsiales SJ A.28 PL TA13 Elsterales Chitinophag ales Isosphaerales Ardenticatenales Obscuribacterales Bacteroidales Rokubacteriales Gemmatimonadales Solibacterales Nitrospirales Rhizobiales Defluviicoccales CCD24 Solirubrobacterales S08514 .CC-BY 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint Soil BN Table 14: Important OTUs identified as key features in response to pitted scab at thePhylum level using Strategy 1: Differential Centrality Analysis. First, microbiome networks were constructed separately for ‘scab-infected tubers’ and ‘clean tubers’ using four inference methods:SE_glasso, SPRING, SPARCC, and CMIMN. Then, centrality metrics (Degree, Betweenness, Closeness, Eigenvector, and PageRank) were calculated for both networks, and the differences in centrality values between diseased and healthy conditions were computed for each method. OTUs ranked in the top 20% based on these centrality differences across all four methods were selected as important. The table presents OTUs that were consistently identified as important across all four inference methods. The Features column indicates the centrality measure that determined the significance of each OTU. These OTUs represent microbial taxa whose network connectivity consistently exhibited significant differences between healthy and diseased conditions across all four methods. Important OTUs Topological Features Bacteroidota Betweenness WPS.2 Betweenness Proteobacteria Closeness Table 15: Important OTUs identified as key features in response to pitted scab at the Class level using Strategy 1: Differential Centrality Analysis. First, microbiome networks were constructed separately for ‘scab-infected tubers’ and ‘clean tubers’ using four inference methods:SE_glasso, SPRING, SPARCC, and CMIMN. Then, centrality metrics (Degree, Betweenness, Closeness, Eigenvector, and PageRank) were calculated for both networks, and the differences in centrality values between diseased and healthy conditions were computed for each method. OTUs ranked in the top 20% based on these centrality differences across all four methods were selected as important. The table presents OTUs that were consistently identified as important across all four inference methods. The Features column indicates the centrality measure that determined the significance of each OTU. These OTUs represent microbial taxa whose network connectivity consistently exhibited significant differences between healthy and diseased conditions across all four methods. Important OTUs Topological Features Desulfitobacteriia degree, Eigenvector Centrality, page_rank Actinobacteria, AD3 betweenness, closeness Syntrophobacteria Eigenvector Centrality Table 16: Important OTUs identified as key features in response to pitted scab at the Order level using Strategy 1: Differential Centrality Analysis. First, microbiome networks were constructed separately for ‘scab-infected tubers’ and ‘clean tubers’ using four inference methods:SE_glasso, SPRING, SPARCC, and CMIMN. Then, centrality metrics (Degree, Betweenness, Closeness, Eigenvector, and PageRank) were calculated for both networks, and the differences in centrality values between diseased and healthy conditions were computed for each method. OTUs ranked in the top 20% based on these centrality differences across all four methods were selected as important. The table presents OTUs that were consistently identified as important across all four inference methods. The Features column indicates the centrality measure that determined the significance of each OTU. These OTUs represent microbial taxa whose network connectivity consistently exhibited significant differences between healthy and diseased conditions across all four methods. Important OTUs Topological Features C0119 Degree, Betweenness, Closeness, Eigenvector Centrality, Page Rank Defluviicoccales Closeness, Eigenvector Centrality Bacteroidales Closeness, Eigenvector Centrality Kryptoniales Eigenvector Centrality B12.WMSP1 Eigenvector Centrality Desulfitobacteriales Page Rank 15 .CC-BY 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint Soil BN Table 17: Selection of key Operational Taxonomic Units (OTUs) at the Phylum level in microbiome networks constructed separately for ‘clean tubers’ (left panel) and ‘scab-infected tubers’ (right panel) using network-based feature selection (Strategy 2: Composite Scoring Approach). Steps for Identifying These OTUs: 1- Network Construction: Microbiome networks were separately built for clean tubers and scab-infected tubers using four inference methods: SE_glasso, SPRING, SPARCC, and CMIMN. 2- Weighted Scoring Within Each Method: A weighted score was assigned to each OTU within each method based on multiple centrality metrics (Degree, Betweenness, Closeness, Eigenvector, and PageRank). Selection of Important OTUs: The top 20% of OTUs with the highest Score 1 were selected within each individual method. Table Column Explanations: First column in each panel: OTUs selected by Strategy 2 for each specific network inference method (i.e., these OTUs are among the top 20% highest-scoring OTUs for that method). Next four columns (CLR, Original, Log, TSS): Overlap between Strategy 2-selected OTUs and those identified by ML-based feature selection under different normalization approaches. A 1 in a column means that the ML method also identified the OTU as important under that normalization method. A 0 in a column means that the OTU was not selected by the ML method under that normalization method. Note: This table presents the weighted score for each OTU within each inference method. Unlike later steps in Strategy 2, this table does not include the combined score across all methods. ‘clean tubers’ network ‘scab-infected tubers’ network CMIMN clr original log TSS CMIMN clr original log TSS Methylomirabilota 1 1 0 1 Nitrospirota 0 0 0 1 Myxococcota 0 0 0 0 Desulfobacterota 0 0 0 1 Nitrospirota 0 0 0 1 Myxococcota 0 0 0 0 Desulfobacterota 0 0 0 1 Bacteroidota 1 0 1 1 MBNT15 1 0 0 0 NB1.j 1 1 1 1 Armatimonadota 1 1 1 1 Patescibacteria 0 1 0 1 Proteobacteria 0 0 0 0 Proteobacteria 0 0 0 0 WPS.2 1 0 1 0 Acidobacteriota 1 0 0 1 SPARCC SPARCC WPS.2 1 0 1 0 Firmicutes 1 1 1 1 Methylomirabilota 1 1 0 1 Desulfobacterota 0 0 0 1 Acidobacteriota 1 0 0 1 Gemmatimonadota 0 0 0 0 Proteobacteria 0 0 0 0 Acidobacteriota 1 0 0 1 Patescibacteria 0 1 0 1 Proteobacteria 0 0 0 0 Gemmatimonadota 0 0 0 0 Verrucomicrobiota 1 1 1 0 Actinobacteriota 0 0 0 0 Bacteroidota 1 0 1 1 Planctomycetota 0 0 0 0 Cyanobacteria 1 1 1 1 SE_glasso SE_glasso WPS.2 1 0 1 0 Proteobacteria 0 0 0 0 Acidobacteriota 1 0 0 1 MBNT15 1 0 0 0 Proteobacteria 0 0 0 0 Myxococcota 0 0 0 0 Chloroflexi 0 0 0 0 Bacteroidota 1 0 1 1 Patescibacteria 0 1 0 1 NB1.j 1 1 1 1 Planctomycetota 0 0 0 0 Desulfobacterota 0 0 0 1 Gemmatimonadota 0 0 0 0 Firmicutes 1 1 1 1 Armatimonadota 1 1 1 1 Actinobacteriota 0 0 0 0 SPRING SPRING WPS.2 1 0 1 0 Proteobacteria 0 0 0 0 NB1.j 1 1 1 1 Chloroflexi 0 0 0 0 Patescibacteria 0 1 0 1 Nitrospirota 0 0 0 1 Actinobacteriota 0 0 0 0 Bacteroidota 1 0 1 1 Methylomirabilota 1 1 0 1 Firmicutes 1 1 1 1 Proteobacteria 0 0 0 0 NB1.j 1 1 1 1 GAL15 0 0 0 0 RCP2.54 0 0 0 0 MBNT15 1 0 0 0 Acidobacteriota 1 0 0 1 16 .CC-BY 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint Soil BN Table 18: Selection of key Operational Taxonomic Units (OTUs) at theClass level in microbiome networks constructed separately for ‘clean tubers’ (left panel) and ‘scab-infected tubers’ (right panel) using network-based feature selection (Strategy 2: Composite Scoring Approach). Steps for Identifying These OTUs: 1- Network Construction: Microbiome networks were separately built for clean tubers and scab-infected tubers using four inference methods: SE_glasso, SPRING, SPARCC, and CMIMN. 2- Weighted Scoring Within Each Method: A weighted score was assigned to each OTU within each method based on multiple centrality metrics (Degree, Betweenness, Closeness, Eigenvector, and PageRank). Selection of Important OTUs: The top 20% of OTUs with the highest Score 1 were selected within each individual method. Table Column Explanations: First column in each panel: OTUs selected by Strategy 2 for each specific network inference method (i.e., these OTUs are among the top 20% highest-scoring OTUs for that method). Next four columns (CLR, Original, Log, TSS): Overlap between Strategy 2-selected OTUs and those identified by ML-based feature selection under different normalization approaches. A 1 in a column means that the ML method also identified the OTU as important under that normalization method. A 0 in a column means that the OTU was not selected by the ML method under that normalization method. Note: This table presents the weighted score for each OTU within each inference method. Unlike later steps in Strategy 2, this table does not include the combined score across all methods. clean tuber Net clr original log TSS scab-infected tubers Net clr original log TSS CMIMN Ktedonobacteria 1 1 1 1 Ignavibacteria 1 1 1 1 Ignavibacteria 1 1 1 1 Parcubacteria 0 0 0 0 OM190 0 0 0 0 Anaerolineae 1 1 1 1 Acidimicrobiia 0 0 0 1 Vicinamibacteria 0 1 0 1 Vicinamibacteria 0 1 0 1 Kryptonia 1 1 1 1 Actinobacteria 0 0 0 0 Acidimicrobiia 0 0 0 1 Desulfobaccia 0 0 0 0 Gammaproteobacteria 0 0 0 0 Gammaproteobacteria 0 0 0 0 Alphaproteobacteria 0 0 0 0 Alphaproteobacteria 0 0 0 0 AD3 0 0 1 0 Blastocatellia 0 0 0 0 Acidobacteriae 0 0 0 1 Gitt.GS.136 0 1 1 1 Nitrospiria 0 0 0 1 Parcubacteria 0 0 0 0 Microgenomatia 0 0 0 0 Acidobacteriae 0 0 0 1 Myxococcia 0 0 0 0 Polyangia 0 0 0 1 BD2.11.terrestrial.group 0 0 0 0 Chloroflexia 0 0 0 0 Thermoleophilia 0 0 0 1 KD4.96 0 0 0 1 Gitt.GS.136 0 1 1 1 Thermoleophilia 0 0 0 1 Blastocatellia 0 0 0 0 MB.A2.108 1 1 1 1 Holophagae 0 1 0 0 Babeliae 0 0 0 0 KD4.96 0 0 0 1 Longimicrobia 0 0 0 0 Bacilli 1 1 1 1 SPARCC Vicinamibacteria 0 1 0 1 Actinobacteria 0 0 0 0 Anaerolineae 1 1 1 1 Bacilli 1 1 1 1 OLB14 1 1 1 0 AD3 0 0 1 0 Actinobacteria 0 0 0 0 Acidobacteriae 0 0 0 1 Ktedonobacteria 1 1 1 1 Anaerolineae 1 1 1 1 Methylomirabilia 1 1 0 1 Blastocatellia 0 0 0 0 Acidobacteriae 0 0 0 1 Chloroflexia 0 0 0 0 Bacteroidia 1 0 0 1 Thermoleophilia 0 0 0 1 Saccharimonadia 1 1 1 1 Ignavibacteria 1 1 1 1 Chlamydiae 0 0 0 0 Ktedonobacteria 1 1 1 1 Alphaproteobacteria 0 0 0 0 Nitrospiria 0 0 0 1 Continued on next page 17 .CC-BY 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint Soil BN Table 18 – continued from previous page clean tuber Net clr original log TSS scab-infected tubers Net clr original log TSS Planctomycetes 1 1 1 1 Parcubacteria 0 0 0 0 Gammaproteobacteria 0 0 0 0 Alphaproteobacteria 0 0 0 0 MB.A2.108 1 1 1 1 Rhodothermia 0 0 0 0 Gemmatimonadetes 0 0 0 1 Kryptonia 1 1 1 1 Gitt.GS.136 0 1 1 1 Gemmatimonadetes 0 0 0 1 AD3 0 0 1 0 Syntrophobacteria 0 0 0 0 Subgroup.25 0 0 0 0 Thermodesulfovibrionia 0 0 0 0 Blastocatellia 0 0 0 0 Polyangia 0 0 0 1 Thermoanaerobaculia 0 0 0 0 Gammaproteobacteria 0 0 0 0 SE_glasso Ktedonobacteria 1 1 1 1 AD3 0 0 1 0 Syntrophobacteria 0 0 0 0 Ktedonobacteria 1 1 1 1 Anaerolineae 1 1 1 1 Bacilli 1 1 1 1 Acidobacteriae 0 0 0 1 Desulfobaccia 0 0 0 0 Desulfobaccia 0 0 0 0 Gammaproteobacteria 0 0 0 0 AD3 0 0 1 0 Desulfobulbia 0 0 0 0 Vicinamibacteria 0 1 0 1 Syntrophobacteria 0 0 0 0 Actinobacteria 0 0 0 0 Thermodesulfovibrionia 0 0 0 0 Thermoleophilia 0 0 0 1 Alphaproteobacteria 0 0 0 0 Planctomycetes 1 1 1 1 Ignavibacteria 1 1 1 1 Methylomirabilia 1 1 0 1 Kryptonia 1 1 1 1 MB.A2.108 1 1 1 1 Actinobacteria 0 0 0 0 Gemmatimonadetes 0 0 0 1 Blastocatellia 0 0 0 0 Alphaproteobacteria 0 0 0 0 Anaerolineae 1 1 1 1 Gammaproteobacteria 0 0 0 0 Dehalococcoidia 1 1 1 1 Latescibacteria 0 1 1 0 Verrucomicrobiae 1 1 1 0 Subgroup.25 0 0 0 0 Acidobacteriae 0 0 0 1 Gitt.GS.136 0 1 1 1 TK10 1 0 0 1 Saccharimonadia 1 1 1 1 Syntrophia 0 0 0 0 Desulfobulbia 0 0 0 0 Gemmatimonadetes 0 0 0 1 SPRING Actinobacteria 0 0 0 0 AD3 0 0 1 0 Blastocatellia 0 0 0 0 Syntrophobacteria 0 0 0 0 Vicinamibacteria 0 1 0 1 Bacteroidia 1 0 0 1 Acidimicrobiia 0 0 0 1 Rhodothermia 0 0 0 0 Alphaproteobacteria 0 0 0 0 Polyangia 0 0 0 1 Acidobacteriae 0 0 0 1 Nitrospiria 0 0 0 1 Parcubacteria 0 0 0 0 Alphaproteobacteria 0 0 0 0 Polyangia 0 0 0 1 Bdellovibrionia 0 0 0 0 Gammaproteobacteria 0 0 0 0 Cyanobacteriia 1 1 1 1 Rhodothermia 0 0 0 0 Acidimicrobiia 0 0 0 1 Longimicrobia 0 0 0 0 Microgenomatia 0 0 0 0 AKAU4049 0 0 0 0 Kazania 0 0 0 0 Anaerolineae 1 1 1 1 Parcubacteria 0 0 0 0 Ktedonobacteria 1 1 1 1 Gammaproteobacteria 0 0 0 0 Continued on next page 18 .CC-BY 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint Soil BN Table 18 – continued from previous page clean tuber Net clr original log TSS scab-infected tubers Net clr original log TSS Kazania 0 0 0 0 Thermoleophilia 0 0 0 1 Gracilibacteria 0 0 0 0 Blastocatellia 0 0 0 0 Gemmatimonadetes 0 0 0 1 OM190 0 0 0 0 Thermoplasmata 1 0 0 0 Desulfitobacteriia 0 0 0 0 Verrucomicrobiae 1 1 1 0 Anaerolineae 1 1 1 1 Desulfobulbia 0 0 0 0 Bacilli 1 1 1 1 Table 19: Selection of key Operational Taxonomic Units (OTUs) at theOrder level in microbiome networks constructed separately for ‘clean tubers’ (left panel) and ‘scab-infected tubers’ (right panel) using network-based feature selection (Strategy 2: Composite Scoring Approach). Steps for Identifying These OTUs: 1- Network Construction: Microbiome networks were separately built for clean tubers and scab-infected tubers using four inference methods: SE_glasso, SPRING, SPARCC, and CMIMN. 2- Weighted Scoring Within Each Method: A weighted score was assigned to each OTU within each method based on multiple centrality metrics (Degree, Betweenness, Closeness, Eigenvector, and PageRank). Selection of Important OTUs: The top 20% of OTUs with the highest Score 1 were selected within each individual method. Table Column Explanations: First column in each panel: OTUs selected by Strategy 2 for each specific network inference method (i.e., these OTUs are among the top 20% highest-scoring OTUs for that method). Next four columns (CLR, Original, Log, TSS): Overlap between Strategy 2-selected OTUs and those identified by ML-based feature selection under different normalization approaches. A 1 in a column means that the ML method also identified the OTU as important under that normalization method. A 0 in a column means that the OTU was not selected by the ML method under that normalization method. Note: This table presents the weighted score for each OTU within each inference method. Unlike later steps in Strategy 2, this table does not include the combined score across all methods. clean tuber Net clr original log TSS scab-infected tubers Net clr original log TSS CMIMN Ktedonobacterales 1 1 1 1 Ktedonobacterales 1 1 1 1 C0119 1 1 1 1 Elev.1554 0 0 0 0 Acidobacteriales 1 1 1 1 Microtrichales 0 1 0 1 Rhizobiales 0 0 0 1 Bryobacterales 0 1 1 1 Bacillales 1 1 1 1 C0119 1 1 1 1 Propionibacteriales 0 1 1 0 Acetobacterales 1 1 1 1 Micropepsales 1 1 1 1 Haliangiales 0 0 0 0 PLTA13 1 1 1 1 Pedosphaerales 0 0 0 0 Chitinophagales 0 1 1 1 Rickettsiales 1 0 1 1 Actinomarinales 0 0 0 1 Rhizobiales 0 0 0 1 Subgroup.17 0 1 1 1 Phycisphaerales 0 0 0 0 Chloroplast 1 1 1 1 PLTA13 1 1 1 1 Bryobacterales 0 1 1 1 Chloroplast 1 1 1 1 Rhodobacterales 1 0 0 0 Subgroup.17 0 1 1 1 B12.WMSP1 0 1 1 0 Subgroup.2 1 1 1 1 Elev.1554 0 0 0 0 Sphingobacteriales 0 1 0 0 Chloroflexales 1 0 0 0 Reyranellales 0 0 0 0 Acetobacterales 1 1 1 1 Actinomarinales 0 0 0 1 Desulfobaccales 0 0 0 0 Thermoactinomycetales 0 0 0 0 Elsterales 1 1 1 1 S085 1 1 0 1 Gaiellales 1 1 1 1 Erysipelotrichales 0 0 0 0 B10.SB3A 0 1 1 1 SBR1031 1 1 1 1 Continued on next page 19 .CC-BY 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint Soil BN Table 19 – continued from previous page clean tuber Net clr original log TSS scab-infected tubers Net clr original log TSS Microtrichales 0 1 0 1 Gaiellales 1 1 1 1 Gemmatimonadales 0 0 0 1 Myxococcales 0 0 0 0 Vicinamibacterales 0 0 0 1 Kryptoniales 0 1 1 1 Micrococcales 1 0 0 0 Opitutales 0 0 0 0 Frankiales 1 1 1 1 CCD24 1 0 0 1 Subgroup.2 1 1 1 1 Solirubrobacterales 0 0 0 1 Thermoanaerobaculales 0 0 1 0 Blastocatellales 1 1 0 1 Anaerolineales 1 1 1 1 Saccharimonadales 1 1 1 1 Bacteroidales 1 1 1 1 Micromonosporales 0 0 1 1 CCD24 1 0 0 1 IMCC26256 0 0 0 0 Verrucomicrobiales 0 0 0 0 Ardenticatenales 0 0 0 1 Kryptoniales 0 1 1 1 Desulfitobacteriales 1 0 0 0 Sphingobacteriales 0 1 0 0 Candidatus.Yanofskybacteria 1 1 0 0 Ignavibacteriales 0 0 0 0 Catenulisporales 0 0 0 0 Burkholderiales 0 0 0 0 PAUC26f 0 1 1 1 Babeliales 0 0 0 0 Streptomycetales 0 0 0 0 SPARCC C0119 1 1 1 1 Ktedonobacterales 1 1 1 1 Microtrichales 0 1 0 1 Bacteroidales 1 1 1 1 Acidobacteriales 1 1 1 1 Gemmatimonadales 0 0 0 1 Vicinamibacterales 0 0 0 1 Bryobacterales 0 1 1 1 Elsterales 1 1 1 1 Microtrichales 0 1 0 1 Chloroplast 1 1 1 1 Subgroup.17 0 1 1 1 Ktedonobacterales 1 1 1 1 Thermomicrobiales 0 0 0 0 Bryobacterales 0 1 1 1 Blastocatellales 1 1 0 1 Xanthomonadales 1 1 1 1 Sphingomonadales 1 1 1 1 Leptolyngbyales 0 0 0 0 Micrococcales 1 0 0 0 Elev.1554 0 0 0 0 Elsterales 1 1 1 1 Micropepsales 1 1 1 1 Actinomarinales 0 0 0 1 Acetobacterales 1 1 1 1 Ignavibacteriales 0 0 0 0 Chloroflexales 1 0 0 0 Reyranellales 0 0 0 0 B12.WMSP1 0 1 1 0 Anaerolineales 1 1 1 1 Chitinophagales 0 1 1 1 C0119 1 1 1 1 Sphingomonadales 1 1 1 1 Solibacterales 1 1 1 0 B10.SB3A 0 1 1 1 Streptomycetales 0 0 0 0 Frankiales 1 1 1 1 SJA.15 0 1 0 0 Solibacterales 1 1 1 0 Frankiales 1 1 1 1 Gaiellales 1 1 1 1 Chloroflexales 1 0 0 0 X24.Nov 0 0 0 0 Defluviicoccales 1 1 1 1 SBR1031 1 1 1 1 Candidatus.Yanofskybacteria 1 1 0 0 Subgroup.2 1 1 1 1 Chitinophagales 0 1 1 1 Propionibacteriales 0 1 1 0 Bacillales 1 1 1 1 Paenibacillales 1 1 1 1 Erysipelotrichales 0 0 0 0 Nitrospirales 0 0 0 1 Kryptoniales 0 1 1 1 Saccharimonadales 1 1 1 1 Acidobacteriales 1 1 1 1 Continued on next page 20 .CC-BY 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint Soil BN Table 19 – continued from previous page clean tuber Net clr original log TSS scab-infected tubers Net clr original log TSS Sphingobacteriales 0 1 0 0 Streptosporangiales 0 0 0 0 Bacillales 1 1 1 1 Azospirillales 0 0 0 0 CCD24 1 0 0 1 Acetobacterales 1 1 1 1 Subgroup.17 0 1 1 1 SBR1031 1 1 1 1 Rhizobiales 0 0 0 1 Syntrophobacterales 0 1 0 0 Nannocystales 0 0 0 0 Gammaproteobacteria.Incertae 0 0 0 1 Catenulisporales 0 0 0 0 CCD24 1 0 0 1 Rokubacteriales 1 1 1 1 Desulfobaccales 0 0 0 0 Caldilineales 0 0 0 0 Xanthomonadales 1 1 1 1 Pyrinomonadales 0 0 0 0 Sphingobacteriales 0 1 0 0 SE_glasso Ktedonobacterales 1 1 1 1 Sphingomonadales 1 1 1 1 Acetobacterales 1 1 1 1 Chitinophagales 0 1 1 1 Elsterales 1 1 1 1 Ktedonobacterales 1 1 1 1 Microtrichales 0 1 0 1 Elev.1554 0 0 0 0 Xanthomonadales 1 1 1 1 Ignavibacteriales 0 0 0 0 Vicinamibacterales 0 0 0 1 Bacteroidales 1 1 1 1 Desulfobaccales 0 0 0 0 Pedosphaerales 0 0 0 0 Ignavibacteriales 0 0 0 0 Subgroup.17 0 1 1 1 Micropepsales 1 1 1 1 C0119 1 1 1 1 Acidobacteriales 1 1 1 1 Defluviicoccales 1 1 1 1 SBR1031 1 1 1 1 PLTA13 1 1 1 1 C0119 1 1 1 1 Elsterales 1 1 1 1 SJA.15 0 1 0 0 Burkholderiales 0 0 0 0 Bryobacterales 0 1 1 1 Syntrophobacterales 0 1 0 0 Sphingomonadales 1 1 1 1 SJA.15 0 1 0 0 Subgroup.2 1 1 1 1 Desulfobulbales 0 0 0 0 B10.SB3A 0 1 1 1 Gemmatimonadales 0 0 0 1 PAUC26f 0 1 1 1 Anaerolineales 1 1 1 1 Elev.1554 0 0 0 0 Actinomarinales 0 0 0 1 Rhizobiales 0 0 0 1 PAUC26f 0 1 1 1 Caldilineales 0 0 0 0 Rhodothermales 0 0 0 0 B12.WMSP1 0 1 1 0 Frankiales 1 1 1 1 Propionibacteriales 0 1 1 0 Candidatus.Yanofskybacteria 1 1 0 0 Solibacterales 1 1 1 0 Desulfobaccales 0 0 0 0 Caulobacterales 0 1 1 0 Paenibacillales 1 1 1 1 Actinomarinales 0 0 0 1 Kryptoniales 0 1 1 1 Chloroflexales 1 0 0 0 Bryobacterales 0 1 1 1 Subgroup.17 0 1 1 1 Acidobacteriales 1 1 1 1 Candidatus.Jorgensenbacteria 0 0 0 0 Methylomirabilales 0 0 0 0 Rhodobacterales 1 0 0 0 Armatimonadales 0 0 0 0 Chitinophagales 0 1 1 1 Reyranellales 0 0 0 0 Frankiales 1 1 1 1 Syntrophales 0 0 0 0 Syntrophobacterales 0 1 0 0 Acetobacterales 1 1 1 1 Methylomirabilales 0 0 0 0 Bacillales 1 1 1 1 Continued on next page 21 .CC-BY 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint Soil BN Table 19 – continued from previous page clean tuber Net clr original log TSS scab-infected tubers Net clr original log TSS Gemmatimonadales 0 0 0 1 Micrococcales 1 0 0 0 Aminicenantales 0 0 0 0 Microtrichales 0 1 0 1 Desulfobulbales 0 0 0 0 AKIW659 0 0 0 0 Syntrophales 0 0 0 0 Ardenticatenales 0 0 0 1 SPRING JG36.TzT.191 0 0 0 0 Erysipelotrichales 0 0 0 0 Paludibaculum 0 0 0 0 Opitutales 0 0 0 0 Caldilineales 0 0 0 0 Candidatus.Adlerbacteria 0 0 0 0 KF.JG30.C25 1 0 0 0 Brevibacillales 0 0 0 0 Haliangiales 0 0 0 0 Elev.16S.1166 0 0 0 0 Gammaproteobacteria.Incertae 0 0 0 1 Sphingobacteriales 0 1 0 0 JG36.GS.52 0 0 0 0 Cytophagales 0 0 0 1 Bacteroidales 1 1 1 1 Flavobacteriales 0 0 0 0 C0119 1 1 1 1 Enterobacterales 1 0 0 0 Rhodothermales 0 0 0 0 Haliangiales 0 0 0 0 Erysipelotrichales 0 0 0 0 Chloroplast 1 1 1 1 Chitinophagales 0 1 1 1 Xanthomonadales 1 1 1 1 Paenibacillales 1 1 1 1 WD260 0 0 0 0 PB19 0 0 0 0 KF.JG30.C25 1 0 0 0 SJA.15 0 1 0 0 Paracaedibacterales 0 0 0 0 SAR202.clade 0 0 0 0 Syntrophales 0 0 0 0 Chloroplast 1 1 1 1 Isosphaerales 1 1 1 0 Thermomicrobiales 0 0 0 0 PAUC26f 0 1 1 1 Sphingomonadales 1 1 1 1 Desulfitobacteriales 1 0 0 0 Rhodospirillales 0 0 0 0 Sphingomonadales 1 1 1 1 MSB.4B10 0 0 0 0 Steroidobacterales 0 0 0 0 PAUC26f 0 1 1 1 JG36.GS.52 0 0 0 0 SBR1031 1 1 1 1 Paludibaculum 0 0 0 0 Entomoplasmatales 0 0 0 0 PB19 0 0 0 0 S085 1 1 0 1 SJA.28 0 1 0 1 Isosphaerales 1 1 1 0 Saccharimonadales 1 1 1 1 Polyangiales 0 0 0 0 Obscuribacterales 1 1 1 1 Defluviicoccales 1 1 1 1 Rickettsiales 1 0 1 1 Elev.16S.1166 0 0 0 0 Fibrobacterales 0 0 0 0 Candidatus.Kaiserbacteria 0 0 0 0 Chitinophagales 0 1 1 1 Candidatus.Liptonbacteria 0 0 0 0 Pyrinomonadales 0 0 0 0 Xanthomonadales 1 1 1 1 Lactobacillales 0 0 0 0 FFCH16263 0 0 0 0 FCPU453 0 0 0 0 Subgroup.7 0 0 0 0 Ktedonobacterales 1 1 1 1 Rhodobacterales 1 0 0 0 Chthoniobacterales 0 1 1 1 Rhizobiales 0 0 0 1 Oxyphotobacteria.Incertae.Sedis 0 0 0 0 Pedosphaerales 0 0 0 0 Kineosporiales 1 1 1 1 Blastocatellales 1 1 0 1 Microtrichales 0 1 0 1 22 .CC-BY 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint Soil BN (/) "O0..c 0.8 ci3 E -o 0.6 Q) (/) cu .0 .:: . 0 3 ci3 � 0.4 cu ....J � Q) Q) 3 ci3 .0 c Q) 0.2 E Q) Q) Ol <( 0.0 I • o• cir Phylum • • • 0 • - --- 0 •o• • • log original TSS Class •o o• I - - - - !.. • eo • • • • o• • • cir log original TSS Normalization Methods • • • 0 cir Order • o• o _._ i.,_ -I o•• • • • log original TSS • CMiMN 0 SE_glasso • • SPARCC • SPRING Figure 4: Average Overlap Between Machine Learning (ML) Methods based on different nomalized data sets (clr, original data, log transformation, and TSS transformation) and Network-Based Approaches for ‘clean tubers’ network in Phylum, Class, and Order levels. 23 .CC-BY 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint Soil BN 0.8 (/) 0 ci3 E -o 0.6 Q) (/) cu .0 ,._ 0 3 ci3 � 0.4 cu ....J � Q) Q) 3 .0 c Q) 0.2 E Q) Q) ,._ 0.0 I • • 0 • • cir •o- - log Phylum 0 - .• • • •o • • • original TSS Class • 0 • • 0 - .,.•o � .. I ""'"-• • • • • • cir log original TSS Normalization Methods o• • • cir Order • • • • • 0 • - .- 0 I 0 • • CMiMN 0 SE_glasso • • SPARCC • • SPRING log original TSS Figure 5: Average Overlap Between Machine Learning (ML) Methods based on different nomalized data sets (clr, original data, log transformation, and TSS transformation) and Network-Based Approaches for ‘scab-infected tubers’ network in Phylum, Class, and Order levels. Table 20: Selection of Operational Taxonomic Units (OTUs) at thePhylum level in both networks of ‘Clean Tubers’ and ‘Scab-Infected Tubers’ using Network-Based Method (Strategy 2) and Machine Learning (ML) methods. The left part presents results from CMIMN and SE_glasso, while the right part displays results from SPARCC and SPRING algorithms. The first column represents OTUs selected by the network-based method (Strategy 2), and columns 2 to 4 show the overlap between OTUs selected by Strategy 2 and those chosen by ML methods using different data normalization approaches (clr, original, log, and TSS). A "1" indicates that the ML method selected the respective OTU, while "0" signifies that the ML method did not select the respective OTU. CMIMN clr original log TSS SPARCC clr original log TSS Myxococcota 0 0 0 0 Acidobacteriota 1 0 0 1 Nitrospirota 0 0 0 1 Proteobacteria 0 0 0 0 Desulfobacterota 0 0 0 1 Gemmatimonadota 0 0 0 0 Proteobacteria 0 0 0 0 SE_glasso clr original log TSS SPRING clr original log TSS Proteobacteria 0 0 0 0 NB1.j 1 1 1 1 24 .CC-BY 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint Soil BN Table 21: Selection of Operational Taxonomic Units (OTUs) at the Class level in both networks of ‘Clean Tubers’ and ‘Scab-Infected Tubers’ using Network-Based Method (Strategy 2) and Machine Learning (ML) methods. The left part presents results from CMIMN and SE_glasso, while the right part displays results from SPARCC and SPRING algorithms. The first column represents OTUs selected by the network-based method (Strategy 2), and columns 2 to 4 show the overlap between OTUs selected by Strategy 2 and those chosen by ML methods using different data normalization approaches (clr, original, log, and TSS). A "1" indicates that the ML method selected the respective OTU, while "0" signifies that the ML method did not select the respective OTU. CMIMN clr original log TSS SPARCC clr original log TSS Ignavibacteria 1 1 1 1 Anaerolineae 1 1 1 1 Acidimicrobiia 0 0 0 1 Actinobacteria 0 0 0 0 Vicinamibacteria 0 1 0 1 Ktedonobacteria 1 1 1 1 Gammaproteobacteria 0 0 0 0 Acidobacteriae 0 0 0 1 Alphaproteobacteria 0 0 0 0 Alphaproteobacteria 0 0 0 0 Blastocatellia 0 0 0 0 Gammaproteobacteria 0 0 0 0 Gitt.GS.136 0 1 1 1 Gemmatimonadetes 0 0 0 1 Parcubacteria 0 0 0 0 AD3 0 0 1 0 Acidobacteriae 0 0 0 1 Blastocatellia 0 0 0 0 KD4.96 0 0 0 1 Thermoleophilia 0 0 0 1 SE_glasso SPRING Ktedonobacteria 1 1 1 1 Blastocatellia 0 0 0 0 Syntrophobacteria 0 0 0 0 Acidimicrobiia 0 0 0 1 Anaerolineae 1 1 1 1 Alphaproteobacteria 0 0 0 0 Acidobacteriae 0 0 0 1 Parcubacteria 0 0 0 0 Desulfobaccia 0 0 0 0 Polyangia 0 0 0 1 AD3 0 0 1 0 Gammaproteobacteria 0 0 0 0 Actinobacteria 0 0 0 0 Rhodothermia 0 0 0 0 Gemmatimonadetes 0 0 0 1 Anaerolineae 1 1 1 1 Alphaproteobacteria 0 0 0 0 Kazania 0 0 0 0 Gammaproteobacteria 0 0 0 0 Desulfobulbia 0 0 0 0 25 .CC-BY 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint Soil BN Table 22: Selection of Operational Taxonomic Units (OTUs) at the Order level in both networks of ‘Clean Tubers’ and ‘Scab-Infected Tubers’ using Network-Based Method (Strategy 2) and Machine Learning (ML) methods. The left part presents results from CMIMN and SE_glasso, while the right part displays results from SPARCC and SPRING algorithms. The first column represents OTUs selected by the network-based method (Strategy 2), and columns 2 to 4 show the overlap between OTUs selected by Strategy 2 and those chosen by ML methods using different data normalization approaches (clr, original, log, and TSS). A "1" indicates that the ML method selected the respective OTU, while "0" signifies that the ML method did not select the respective OTU. CMIMN clr original log TSS SPARCC clr original log TSS Ktedonobacterales 1 1 1 1 C0119 1 1 1 1 C0119 1 1 1 1 Microtrichales 0 1 0 1 Rhizobiales 0 0 0 1 Acidobacteriales 1 1 1 1 PLTA13 1 1 1 1 Elsterales 1 1 1 1 Actinomarinales 0 0 0 1 Ktedonobacterales 1 1 1 1 Subgroup.17 0 1 1 1 Bryobacterales 0 1 1 1 Chloroplast 1 1 1 1 Xanthomonadales 1 1 1 1 Bryobacterales 0 1 1 1 Acetobacterales 1 1 1 1 Elev.1554 0 0 0 0 Chloroflexales 1 0 0 0 Acetobacterales 1 1 1 1 Chitinophagales 0 1 1 1 Gaiellales 1 1 1 1 Sphingomonadales 1 1 1 1 Microtrichales 0 1 0 1 Frankiales 1 1 1 1 Subgroup.2 1 1 1 1 Solibacterales 1 1 1 0 CCD24 1 0 0 1 SBR1031 1 1 1 1 Kryptoniales 0 1 1 1 Sphingobacteriales 0 1 0 0 Sphingobacteriales 0 1 0 0 Bacillales 1 1 1 1 CCD24 1 0 0 1 Subgroup.17 0 1 1 1 SE_glasso clr original log TSS SPRING clr original log TSS Ktedonobacterales 1 1 1 1 Paludibaculum 0 0 0 0 Acetobacterales 1 1 1 1 KF.JG30.C25 1 0 0 0 Elsterales 1 1 1 1 Haliangiales 0 0 0 0 Microtrichales 0 1 0 1 JG36.GS.52 0 0 0 0 Desulfobaccales 0 0 0 0 Erysipelotrichales 0 0 0 0 Ignavibacteriales 0 0 0 0 Chitinophagales 0 1 1 1 Acidobacteriales 1 1 1 1 PB19 0 0 0 0 C0119 1 1 1 1 Chloroplast 1 1 1 1 SJA.15 0 1 0 0 Sphingomonadales 1 1 1 1 Bryobacterales 0 1 1 1 PAUC26f 0 1 1 1 Sphingomonadales 1 1 1 1 Isosphaerales 1 1 1 0 PAUC26f 0 1 1 1 Elev.16S.1166 0 0 0 0 Elev.1554 0 0 0 0 Xanthomonadales 1 1 1 1 Actinomarinales 0 0 0 1 Subgroup.17 0 1 1 1 Chitinophagales 0 1 1 1 Frankiales 1 1 1 1 Syntrophobacterales 0 1 0 0 Methylomirabilales 0 0 0 0 Gemmatimonadales 0 0 0 1 Desulfobulbales 0 0 0 0 Syntrophales 0 0 0 0 26 .CC-BY 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint Soil BN Table 23: Operational Taxonomic Units (OTUs) of significance in the ‘Clean Tubers’ network, selected by all four algorithms: CMIMN, SPARCC, SE_glasso , and SPRING. Level important OTUs Phylum Proteobacteria WPS.2 Class Ktedonobacteria Vicinamibacteria Actinobacteria Gammaproteobacteria Alphaproteobacteria Acidobacteriae Order C0119 Rhizobiales Chitinophagales Table 24: Operational Taxonomic Units (OTUs) of significance in the ‘scab-infected tubers’ network, selected by all four algorithms: CMIMN, SPARCC, SE_glasso , and SPRING. Level important OTUs Phylum Bacteroidota Proteobacteria Class Anaerolineae Gammaproteobacteria Alphaproteobacteria AD3 Blastocatellia Bacilli Order Ktedonobacterales Microtrichales 27 .CC-BY 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: oa-pdf

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00
unpaywall
last seen: 2026-05-24T02:00:01.246996+00:00
License: CC-BY-4.0