{"paper_id":"245b0bdd-88f8-4285-bb31-4679928eeff4","body_text":"LEVERAGING BAYESIAN NETWORKS FOR CONSENSUS\nNETWORK CONSTRUCTION AND MULTI-METHOD FEATURE\nSELECTION TO DECODE DISEASE PREDICTION\nRosa Aghdam∗1, Shan Shan2, Richard Lankau2, Claudia Solís- Lemus1,2\n1Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI\n2Department of Plant Pathology, University of Wisconsin-Madison, Madison, WI\nAbstract\nConstructing reliable microbiome co-occurrence networks and identifying disease-associated taxa remain major\nchallenges in microbiome research due to variability introduced by different inference algorithms. To overcome these\nlimitations, we present CMIMN, a novel R package that uses a Bayesian network framework based on conditional mutual\ninformation to infer robust microbial interaction networks. To further enhance reliability, we construct a consensus\nmicrobiome network by integrating results from CMIMN and three widely used methods— SPIEC-EASI, SPRING, and\nSPARCC. This consensus approach, which overlays and weights edges shared across methods, reduces inconsistencies\nand provides a more biologically meaningful view of microbial relationships. In addition, we introduce a multi-method\nframework for identifying disease-associated microbial taxa by combining machine learning and network-based feature\nselection. Our ML pipeline applies distinct algorithms and identifies key taxa based on their consistent importance\nacross models. Complementing this, we employ two network-based strategies that prioritize taxa based on centrality\ndifferences between ‘clean tubers’ and ‘scab-infected tubers’ networks, as well as a composite scoring system that ranks\nnodes using integrated network metrics. Our results show that CMIMN achieves high robustness in network inference,\nand that the consensus network further improves stability and interpretability. The multi-method feature selection\nframework enhances confidence in identifying biologically relevant taxa linked to potato common scab disease. Notably,\nwe identify Bacteroidota, WPS-2, and Proteobacteria at the Phylum level; Actinobacteria, AD3, Bacilli, Anaerolineae,\nand Ktedonobacteria at the Class level; and C0119, Defluviicoccales, Bacteroidales, and Ktedonobacterales at the\nOrder level as key taxa associated with disease status.\nKeywords: Bayesian networks, Microbiome Co-occurrence network inference, Soil microbial ecology, Multi-method\nFeature selection.\n1 Introduction\nPotatoes, as the world’s fourth most essential crop, play a vital role in addressing global food security. However,\nsoil-borne diseases like potato common scab, caused by Streptomyces scabies and related pathogens, significantly\nthreaten potato yield and quality. This disease results in economic losses due to the rejection of tubers with pitted, corky\nlesions and has broader implications for food security. While some growers and consultants may claim that fumigation\nwith broad-spectrum fungicides controls common scab symptoms, evidence for its effectiveness remains inconsistent,\nand fumigation is often costly and environmentally restrictive [1]. In practice, genetic resistance (or tolerance) in\npotato cultivars has emerged as a more effective and sustainable strategy for managing the disease [2]. A promising\ncomplementary approach involves exploiting the potential of naturally disease-suppressive soils, which harbor specific\nmicrobial communities that suppress pathogens and reduce disease outbreaks. Research has shown that suppressive\nsoils often harbor distinct microbial communities with a higher abundance of antagonistic taxa, such as non-pathogenic\nStreptomyces spp. and Bacillus [3, 4]. These insights underscore the importance of investigating microbial interactions\nin suppressive soils to guide environmentally sustainable disease control practices.\nUnderstanding the composition of microbial communities and the environmental factors shaping this composition is\ncrucial for comprehending biological processes [5–10] and predicting plant phenotypic variations related to plant health\n∗Corresponding author: aghdam@wisc.edu\n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint \n\nand crop production [11–14]. To fully explore the complex interactions between microbes and their environment, we\nneed robust computational approaches that can accurately represent microbial communities. Graphical structures like\nnetworks offer a strong mathematical framework for examining organismal relationships across various interactions,\nincluding those observed in food webs, plant-plant interactions, plant-animal associations, and their applications in\ndetecting networks within gene regulation and protein-protein interaction systems [15]. Networks provide a formal\nyet intuitive representation of complex systems, where species are represented as nodes, and their interactions are\nrepresented as edges [16]. Although network analysis is widely used in microbiome studies, its application to soil\nmicrobial communities has emerged more recently, marked by growing interest in co-occurrence analysis. [17]. However,\nthe complexity of soil presents unique challenges in constructing and interpreting network models, stemming from the\nneed to account for the inter- and intra-variability of samples, which results from the intrinsic heterogeneity of soil\nconditions [10].\nIn recent years, the field of microbiology has witnessed significant advancements in network analysis techniques.\nStudies conducted by Wagg et al. [17] have pioneered the application of network theory to understand complex\nmicrobial interactions in various soil microbiome systems. Their findings highlight the potential of community\nnetwork complexity in influencing ecosystem functions, suggesting that microbial interactions play a crucial role in\nsoil health and resilience. However, their results also indicate that simpler diversity metrics, such as species richness,\nmay explain a substantial proportion of variation in ecosystem functionality. This underscores the need to combine\nnetwork metrics with traditional diversity measures to obtain a comprehensive understanding of microbial community\nfunctions. Moreover, network analysis in microbiome studies requires substantial methodological improvements, as\ndemonstrated by Guseva et al. [15], who showed that different network-construction algorithms can significantly impact\nthe inferred structure of microbial networks in soil ecosystems. This variability highlights the importance of carefully\nconsidering methodological choices when applying network analysis to soil microbial datasets, enabling more robust\nand hypothesis-driven research. In particular, inconsistencies across network inference methods pose a major challenge,\nlimiting the reliability of microbiome-based discoveries.\nWorking with microbiome data is inherently challenging due to its sparse, high-dimensional, and compositional\nnature [18]. Numerous methods exist to infer the structure of microbiome networks, but the results often exhibit\nminimal overlap across different approaches, reflecting the variability and complexity of the data [15]. Furthermore,\nthe absence of a universally accepted gold standard for network evaluation complicates efforts to assess and validate\ninferred networks. Thus, there is an urgent need for methodological advancements that can enhance the reliability of\nmicrobiome network inference and provide biologically meaningful insights.\nIn addition to constructing reliable networks, identifying important disease-related taxa is a key aspect of microbiome\nstudies, particularly in the context of disease management [19]. Such insights can provide farmers with actionable tools\nto better understand and control the microbial ecosystems influencing crop health and productivity. Developing robust\nmethods to identify taxa associated with disease resistance or susceptibility is essential for translating microbiome\nresearch into practical agricultural solutions. By identifying microbial indicators of plant health, researchers can\nadvance both scientific understanding and practical applications, offering sustainable strategies to mitigate disease\nimpacts and enhance agricultural outcomes.\nThis study builds upon prior research that explored associations between soil properties and biological phenotypes using\nmachine learning models, including random forests and Bayesian neural networks [18]. Extending this work, we adopt\na network-based perspective—specifically leveraging Bayesian networks—to investigate microbial relationships within\nthe soil microbiome. Our approach addresses two primary challenges in microbiome analysis: constructing reliable\nco-occurrence networks and identifying microbial taxa associated with potato common scab disease. To enhance the\nrobustness of disease-associated OTU identification, we introduce a comprehensive multi-method framework that\ncombines machine learning-based feature selection with network-based strategies. This integrative design ensures\nthat candidate taxa are supported by both predictive modeling and ecological network context, thereby increasing\nconfidence in their biological relevance. The main contributions of this study are as follows: 1) We present a novel\nBayesian network-based algorithm, implemented in the R packageCMIMN; 2) We propose a consensus network approach\nthat integrates results from CMIMN, SPIEC-EASI, SPRING, and SPARCC, improving the reliability of inferred microbial\nassociations; and 3) We develop a dual feature selection strategy that incorporates both machine learning outputs and\nnetwork centrality metrics to identify key disease-associated taxa. Importantly, the methodology introduced here is\nbroadly applicable and can be adapted for analyzing disease-related microbiome datasets across both agricultural and\nclinical domains.\n2 Materials and methods\nFigure 1 shows a graphical representation of our pipeline with four major panels. Each panel represents a specific step:\n2\n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint \n\n• Data Preparation: Microbiome abundance data is filtered to retain operational taxonomic units (OTUs)\npresent in at least 15 samples, ensuring that low-prevalence taxa do not introduce noise. The resulting filtered\ndata matrix serves as the foundation for downstream analyses.\n• Constructing Microbiome Networks: Microbiome networks are constructed using four meth-\nods— SE_glasso, SPRING, SPARCC, and the proposed CMIMN. These networks represent microbial taxa\nas nodes and their co-occurrence relationships as edges. To enhance reliability, a consensus microbiome\nnetwork is constructed by integrating results from all four methods. Edge weights indicate the level of\nagreement among methods, with a weight of 4 representing relationships confirmed by all methods and 0\nindicating no agreement.\n• Feature Selection (machine learning (ML)): The filtered data is normalized using three methods (CLR, log,\nand TSS) and subjected to different ML-based feature selection methods. Each method assigns a \"TOTAL\"\nscore to each OTU based on how frequently it is selected as important across the seven strategies. OTUs with\nhigh \"TOTAL\" scores are considered key features for further analysis.\n• Feature Selection (network-based): Networks are separately constructed for ‘scab-infected’ and ‘clean tuber’\nsamples using SE_glasso, SPRING, SPARCC, and CMIMN. Topological features are computed for each node in\nthe networks. Two distinct strategies—differential centrality analysis and weighted scoring of OTUs—are\napplied to identify important OTUs based on network structure.\nThe Venn diagram at the bottom illustrates the overlap between OTUs identified by ML-based methods and network-\nbased feature selection strategies. This overlap highlights the robustness of the selected OTUs as key contributors to\ndisease resistance. This workflow integrates statistical rigor and biological relevance, ensuring that the identified OTUs\nare reliable targets for further investigation and potential microbial intervention strategies.\n2.1 Data preparation\nIn this study, we focus on the soil microbiome (matrix of abundances) in a variety of taxonomic orders, including\nPhylum, Class, and Order from soil samples acquired from potato fields in Wisconsin and Minnesota. We concentrated\non these three taxonomic levels due to their balance of interpretability and feature dimensionality, allowing for a\nmeaningful analysis of microbial community structure and its association with disease.\nThe dataset consists of microbial community data of pre-planting soils and the corresponding disease levels in the\nplants at harvest. Overall, we collected 256 soil samples, 108 of which were taken from 36 commercial fields in\nMinnesota, and 148 of which were taken from 50 fields in Wisconsin. This extensive dataset provides a comprehensive\nrepresentation of soil microbial communities across two major potato-growing regions in the Upper Midwest.\nDNA was extracted from the the pre-planting soils, and analyzed for microbial community data following the method\nin [18]. Bacterial and fungal DNA was sequenced to capture a diverse range of microbial taxa, enabling us to investigate\ntheir interactions and potential roles in disease resistance or susceptibility. At harvest, potatoes were hand-harvested\nfrom a one-meter hill (usually 3-4 plants) at each sampling location. Tubers from one plant were visually evaluated for\nthe presence of pitted scab lesions: which is a sign for serious common scab disease, as these tubers would be excluded\nfrom marketable yield. This binary disease label, 0 for healthy and 1 for diseased, serves as the target variable in our\nanalyses, linking microbial community features to agricultural outcomes.\nThe input data is a matrix with non-negative read counts that were generated by a sequencing procedure, filtered out so\nthat we only include OTUs that appear in at least 15 samples [18]. This filtering ensures a focus on microbial taxa with\nsufficient prevalence to contribute meaningfully to statistical and network-based analyses, reducing noise from rare taxa.\nTable S1 displays the number of features (OTUs) before and after filtering for different taxonomic levels. To enhance\nreproducibility and facilitate further research, the raw sequencing data and preprocessing scripts are available upon\nrequest or through the project repository.\n2.2 Constructing microbiome networks\nIn the “Conditional mutual information algorithm for constructing microbiome networks (CMIMN)” section, we introduce\na novel Bayesian Network-based approach, CMIMN, which leverages Conditional Mutual Information (CMI) to infer\nmicrobial associations in a more robust and scalable manner. However, despite advancements in microbiome network\ninference, no single method consistently produces reliable results due to inherent differences in statistical assumptions,\ndata preprocessing, and sparsity constraints. Each inference method captures different aspects of microbial interactions,\noften leading to inconsistencies across constructed networks. Therefore, in the “Consensus network construction”\nsection, we present a consensus microbiome network that integrates results from multiple inference methods to enhance\nreliability and mitigate method-specific biases.\n3\n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint \n\nPool\n Data Preparation   Feature Selection (ML)\nNormalization for ML methods\nFeature Selection (Network-based)\nConstructing networks \nCentrality Metrics\nclean tubers networkscab-infected tubers\nMicrobiome networks were constructed using\nfour different methods for healthy (clean tuber)\nand diseased (scab-infected tuber) samples.\nCentrality metrics were then calculated for\nnodes in these networks.\nEigenvector \nDegree\nPageRank \nClosenessBetweenness\nStrategy 1: Differential Centrality Analysis: \nCompare centrality metrics  between the scab-\ninfected and clean tuber networks for each\nalgorithm. Identify the top 20% of OTUs with\nthe most substantial differences.\nStrategy 2: Weighted Scoring for OTUs: \nCalculate a weighted score for each OTU within\neach network using centrality metrics and select\nthe top 20%. \nDataSet Order\nWe focus on the soil microbiome (abundance\nmatrix) in a variety of taxonomic orders\nincluding Phylum, Class, Order, Family, and\nGenus.\nPhylumClass\nFiltering\n...\n...\nData Preparation\nFilter out OTU with less than 15 samples\nDataSet\nFiltered Data\nSample\nFiltered data\nConstruct microbiome network\nCMIMN Algorithm\nThree well-known methods\nConstructing a Consensus Network\nCMIMN\nSAPRCCSE_glasso\nFiltered data\nSPRING\nFiltered data\n4\nLog transformation\nCMIMN\nConsensus\nNetwork\n4\nFeature Selection (ML)\nApplying ML methods\nWe assign a value (\"Total\") to each OTU based\non the number of times the OTU is selected as an\nimportant feature under the seven criteria.\nInput for ML methods:\n1- Clr (filtered data) \n2- Filtered data\n3- Log (filtered data) \n4- TSS (filtered data) \nML Methods for feature selection:\n 1-SelectKBest (KBest)\n2-logistic regression (LR)\n3-decision tree (DT)\n4- Gradient Boosting (GB)\n5-Random Forrest (RF)\n6-Mutual Information (Mutual)\nMaxKBestLR DT GB RF MutualTotal\nX X X X X X X 7\nX - X X - X X 5\nML-selected OTUs\nClr Original Log TSS\n...\n... ......\nWe sort the OTUs based on the \"TOTAL\" column\nand select important OTUs with a \"TOTAL\"\nvalue higher than 4.\nNormalization for ML methods\nFiltering \n... ...\nMethodOutPut\nNetwork-based-selected OTUs\nMl-selected\n OTUs\nNetwork-based-\nselected OTUsOverlap\nTaxa\n...\n...\n444\n4\n4 2\n231\nFind the overlap between important OTUs resulted by ML and network-based  \nFigure 1: Workflow of the microbiome analysis pipeline for identifying key microbial drivers of disease resistance. Using potato common scab as an example, this pipeline\nconsists of five main steps and is generally applicable to any microbiome dataset: (1) Data Preparation – Raw microbiome data is preprocessed to retain OTUs present\nin at least 15 samples, ensuring that low-prevalence taxa do not introduce noise. The resulting filtered data matrix serves as the foundation for downstream analysis,\nfocusing on taxonomic levels such as Phylum, Class, and Order. (2) Construct microbiome network – Microbiome networks are constructed using four inference methods:\nSE_glasso, SPRING, SPARCC, and the proposed CMIMN. These networks represent microbial taxa as nodes and their interactions as edges. To improve reliability, a\nconsensus microbiome network is constructed by integrating results from all four methods. Edge weights indicate the level of agreement among methods, with a weight of\n4 representing relationships confirmed by all methods and 0 indicating no agreement. (3) Feature Selection (ML) – The filtered data undergoes normalization using three\nmethods (CLR, log, and TSS) before applying ML-based feature selection methods. A \"TOTAL\" score is assigned to each OTU based on its selection frequency across\nML methods, identifying key taxa strongly associated with disease outcomes. (4) Feature Selection (Network-Based) – Microbiome networks are separately constructed\nfor ‘scab-infected’ and ‘clean tuber’ samples. Two strategies are applied to identify key OTUs based on network structure: (i) Differential Centrality Analysis, and (ii)\nWeighted Scoring of OTUs. Final OTU Selection: Identifying Overlap– The last step identifies the overlap between OTUs selected by ML-based and network-based\napproaches, ensuring robust and reliable feature selection for downstream microbiome analysis.\n2.2.1 Conditional mutual information algorithm for constructing microbiome networks (CMIMN )\nWe outline the methodology behind theCMIMN algorithm, a novel approach for constructing microbiome networks. First,\nwe introduce the foundational concepts of Mutual Information (MI) and Conditional Mutual Information (CMI), which\nare key components of the CMIMN framework. Next, we provide an overview of Bayesian Networks, their structure, and\ntheir applicability to microbiome research. Finally, we describe the detailed steps of the CMIMN algorithm, highlighting\nits dynamic thresholding and order-independent features that address the unique challenges of microbiome datasets.\nMutual information and conditional mutual information: MI and CMI are proven to be effective for detecting\nrelationships between variables due to their capability to measure nonlinear dependencies [20]. MI and CMI between\nthe variables X and Y , given the vector of variables Z, are defined as follows [21]:\nM I(X, Y ) =\nZ\nR\nZ\nR\np(x, y) log p(x, y)\np(x) p(y) dx dy, (1)\nCM I(X, Y |Z) =\nZ\nRp\nZ\nR\nZ\nR\np(x, y, z) log p(x, y|z)\np(x|z) p(y|z) dx dy d z (2)\nwhere p is the dimension of vector Z and p(x, y), p(x) and p(y) represent the joint distribution of X and Y , marginal\ndistribution of X, marginal distribution of Y , respectively. p(x, y, z), p(x, y|z), p(x|z) and p(y|z) indicate joint\n4\n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint \n\ndistribution of X, Y and Z, the conditional density distribution of X and Y given Z, the conditional density distribution\nof X given Z and the conditional density distribution of Y given Z, respectively. Under the assumption that data follows\na Gaussian distribution, MI for two continuous variables X and Y can be calculated as [16, 22, 23]:\nM I(X, Y ) = 1\n2 log σ2\nX σ2\nY\nσXY\n, (3)\nwhere σ2\nX, σ2\nY and σXY indicate the variance of X, the variance of Y and the covariance betweenX and Y , respectively.\nWhen X and Y are independent, then M I(X, Y ) = 0 . Similarly, CM I(X, Y |Z) is defined as:\nCM I(X, Y |Z) = 1\n2 log |C(X, Z)||C(Y, Z)|\n|C(Z)||C(X, Y, Z)| , (4)\nwhere C is the covariance matrix and |.| is the determinant of matrix C. C(X,Y) and C(X,Y ,Z) denote the covariance\nmatrix of variables X and Y and variables X, Y , andZ, respectively. When X and Y are conditionally independent\ngiven Z, then CM I(X, Y |Z) = 0 . These measures form the backbone of many network inference methods, including\nBayesian Networks (BNs), which are particularly suited for capturing complex dependencies in microbiome datasets.\nBelow, we provide an overview of BNs and their applications.\nOverview of Bayesian networks: BNs are probabilistic graphical models that represent complex relationships\namong variables using directed acyclic graphs (DAGs). Each node in a BN represents a random variable, while the\ndirected edges capture conditional dependencies between them. BNs have been extensively applied in various biological\nnetwork analyses, such as gene regulatory networks [24–26], but their use in microbiome research remains limited.\nThere are three main approaches for learning the structure of BNs: Constraint-Based Methods: These are based\non conditional independence tests to infer the network structure [22, 24, 25, 27–31]. Score-Based Methods: These\ninvolve optimizing a scoring function to search among candidate network structures [32–34]. Hybrid Methods: These\ncombine elements of both constraint-based and score-based approaches to leverage their respective strengths [16,35–39].\nAmong these, the PC algorithm and its derivatives, such as Fast Causal Inference, Really Fast Causal Inference, and\nPCA-CMI [22, 29, 30, 40–42], are prominent constraint-based methods. Despite their widespread use, these methods\nhave notable limitations: 1- Order Dependence: The results can vary depending on the sequence in which the nodes are\nprocessed [43]. 2- Static Threshold Dependency: Using fixed thresholds for conditional independence tests often leads\nto false positives or false negatives, reducing the reliability of inferred networks [24].\nCMIMN algorithm: We propose the CMIMN algorithm to overcome the challenges posed by microbiome data, providing\nan order-independent, dynamically threshold, and sparsity-controlled framework for microbiome network construction:\n1) Order independence: Traditional PC-based methods, such as PCA-CMI, are susceptible to order dependence, where\nthe sequence in which nodes are processed affects the inferred network structure. This occurs because, in these methods,\nthe tests for conditional independence and edge removal are performed simultaneously during each iteration, making\nthe results highly sensitive to the order of node traversal. In contrast, CMIMN eliminates this dependency by decoupling\nthese steps. Specifically, for each step of the algorithm, CMIMN begins by fixing the set of potential separators for every\nedge (X, Y ). This set is determined as the intersection of the neighbors of X and Y in the current graph. By defining\nthe separators upfront, the algorithm ensures that all configurations are consistently evaluated, regardless of the order in\nwhich nodes or edges are processed. Once the potential separators are fixed, the algorithm proceeds to calculate the\nindependence measures (e.g., CMI) for each edge using the predefined separator sets. Edges that fail the independence\ntest, based on the dynamically determined thresholds, are then removed. This sequential separation of tasks — fixing\nseparators, calculating independence measures, and then removing edges — ensures that the outcome of each step is\nindependent of the traversal order of the nodes.\n2) Dynamic thresholds: In traditional PC-based methods an edge between two nodes is removed if the independence\nmeasure (e.g., MI or CMI) falls below a predefined static threshold, θ, usually 0.05. However, this fixed-threshold\napproach is inherently rigid and can lead to significant issues. Specifically, static thresholds are often poorly calibrated\nto the scale and variability of different datasets, resulting in false positives (retaining spurious edges) or false negatives\n(removing meaningful edges). To overcome these limitations, CMIMN employs a quantile-based dynamic threshold\napproach. Instead of using a single, static threshold, thresholds are adaptively determined based on the statistical\nproperties of the dataset. For example, in each iteration, thresholds are set using specific quantiles (e.g., the 70th\npercentile) of the computed MI or CMI values. This ensures that the threshold dynamically adjusts to the distribution of\nindependence measures, accommodating variability in the data’s scale, density, and characteristics.\n3) Sparsity control: Network sparsity is a crucial factor in microbiome studies, as overly dense networks can obscure\nbiologically meaningful interactions, while overly sparse networks may omit key relationships. In traditional PC-based\nmethods, selecting a single static threshold does not allow for precise control over network sparsity. However,CMIMN\n5\n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint \n\naddresses this challenge by offering precise control over network sparsity through quantile-based criteria for edge\nremoval. By dynamically tuning the sparsity threshold, researchers can specify the desired percentage of edges to\nretain, ensuring that the resulting network retains significant edges representing meaningful microbial interactions while\nreducing noise and redundancy. The steps of the CMIMN algorithm are outlined below:\nStep 0: Initialization: Generate a complete network with the number of nodes equal to the number of taxa.\nStep 1: Calculate MI of Order 0: Compute MI values for each pair of nodes.\nStep 2: Remove Edges: Remove edges for which MI values are smaller than θ1, the threshold for the MI test. The resulting network at this stage is denoted by S0.\nStep 3: Calculate CMI of Order 1: Calculate CM I(X, Y |Z) where Z belongs to the set VXY = ADJ (X) ∩ ADJ (Y ) in S0. Here, ADJ (X) represents\nthe set of nodes that are adjacent to X. We consider ‘paths of length 2 betweenX and Y ’ to mean thatZ is a common neighbor of both X and Y . Thus, a path\nfrom X to Y via Z consists of two edges: one connecting X to Z and another connecting Z to Y . This configuration is used to assess the indirect interactions\nbetween X and Y mediated by Z, focusing on how Z influences the dependency between X and Y .\nStep 4: Remove Edges: Define CM I70(X, Y |Z) as the 70th percentile of all CM I(X, Y |Z) values. If CM I70(X, Y |Z) < θ 2 (the threshold for the CMI\ntest), remove the edge between X and Y . The resulting skeleton at this stage is denoted by S1.\nFinal Outcome: The resulting network S1 is a fully undirected skeleton.\nThe primary challenge in using BN methods to infer microbiome networks lies in the normalization of count datasets for\nthe BN algorithm. To address this challenge, we apply a logarithmic transformation to the count data to stabilize variance,\nreduce skewness, and address compositional constraints inherent in microbiome datasets. This transformation ensures\nthat MI and CMI operate on a continuous, normalized scale, enhancing their reliability. Without this step, applying MI\nor CMI directly to raw counts would yield biased results due to the data’s non-normalized and highly variable nature.\nThe CMIMN algorithm is implemented in R and publicly available in https://github.com/solislemuslab/CMIMN.\n2.2.2 Constructing a consensus network\nWe apply three state-of-the-art network methods on the soil microbiome data: 1) SParse InversE Covariance Estimation\nfor Ecological Association Inference ( SPIEC-EASI) method [44], using the graphical lasso option (referred to as\nSE_glasso) ; 2) SPRING: Semi-Parametric Rank-based approach for Inference in Graphical Models [45], and 3)\nSPARCC: Sparse Correlations for Compositional Data [46]. The input of these methods is an abundance matrix and the\noutput is undirected networks in which nodes represent OTUs and edges corresponds to interactions between them.\nFirst, SE_glasso [44] estimates sparse inverse covariance matrices to infer ecological associations in microbial\ncommunities. The approach is designed to address the challenges of compositional data and high dimensionality\ncommonly encountered in microbiome studies. By accurately modeling microbial interactions and leveraging graphical\nlasso regularization, SE_glasso uncovers significant ecological relationships between different taxa.\nSecond, the SPRING [45] algorithm is a powerful method for inferring associations in complex biological networks that\ncombines the advantages of both parametric and non-parametric approaches to construct a co-occurrence network from\nabundance data, commonly encountered in microbiome studies. By transforming the abundance values into ranks and\nutilizing rank-based statistical tests, SPRING overcomes the challenges of compositional data and improves robustness\nagainst outliers and extreme values. This algorithm effectively identifies significant interactions between different taxa,\nproviding valuable insights into the underlying structure and ecological relationships within microbial communities.\nThird, SPARCC [46] is a powerful algorithm used to analyze microbial communities and infer associations between\ndifferent taxa. Specifically designed for compositional data, which represents the relative abundances of microbial taxa,\nSPARCC addresses the challenges of dealing with non-negative and constrained data. By regularizing the correlation\nmatrix through a bootstrap procedure and using sparsity-inducing techniques, SPARCC efficiently estimates sparse\ncorrelations between taxa, revealing significant co-occurrence patterns and potential ecological interactions within the\nmicrobial community. The algorithm’s ability to handle compositional data makes it a valuable tool for investigating\ncomplex microbial ecosystems and unraveling the underlying relationships between taxa.\nBuilding reliable microbiome networks is difficult because different algorithms produce varying results. Each method\nhas its own strengths and weaknesses, which can lead to inconsistent interpretations of microbial interactions. To\nsolve this, we combine the results from four methods to create a consensus network:\nCMIMN, SE_glasso, SPRING,\nand SPARCC. The consensus network is represented as a weighted adjacency matrix, where each edge (connection\nbetween two nodes) is assigned a weight value ranging from 0 (not identified by any algorithm) to 4 (identified by all\nfour algorithms). This weight reflects the level of agreement among the algorithms regarding the presence of the edge.\nIndeed, the consensus network is constructed by first generating individual networks using each of the four methods.\nThese networks are then overlaid, and the weight of each edge in the consensus network is computed as the sum of\nthe binary indicators (presence/absence) of that edge across all four networks. The resulting weighted network not\nonly highlights the most reliable edges (with higher weights) but also provides a comprehensive representation of\nmicrobial interactions. Among the advantages of the Weighted Consensus Network, we can highlight the robustness as\nintegrating multiple algorithms reduces the impact of biases or errors associated with any single method; stability as the\nweighted approach provides a holistic view of microbial interactions, capturing edges that are consistently supported\n6\n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint \n\nacross algorithms; interpretability as the weight values offer a straightforward measure of edge confidence, allowing\nresearchers to focus on highly reliable interactions for downstream analysis, and sparsity Control as selecting different\nthreshold values for edge weights (e.g., retaining only edges with weights ≥ 1, ≥ 2, ≥ 3, or = 4 ) can control the\nsparsity of the network to match their analytical goals. For instance, lower thresholds (e.g., ≥ 1) result in denser\nnetworks that include more potential interactions, while higher thresholds (e.g., = 4) produce sparser networks focused\nonly on the most reliable interactions identified by all algorithms. This weighted consensus network serves as a stable\nfoundation for exploring microbiome interactions and identifying key microbial taxa and their relationships within\ncomplex ecosystems. By incorporating agreement across multiple methods, it offers a more reliable and nuanced\nperspective on the underlying microbial community structure.\n2.3 Multi-method approach to identify key microbial drivers of disease resistance\nFeature selection is a crucial step in data analysis, involving the identification of significant features or covariates that\npossess high predictive power. In the context of high-dimensional data, such as microbiome datasets, feature selection\nbecomes indispensable to extract relevant information and reduce computational complexity. In particular, when\nstudying diseases, it becomes imperative to identify important OTUs that are strongly associated with the disease’s onset\nor progression. By identifying these key OTUs, we gain essential insights into potential driver pathogens or beneficial\nmicrobes. Subsequently, controlling the abundance or activity of these crucial OTUs can pave the way for novel disease\ninterventions and management strategies, opening up avenues for precision medicine and tailored therapies. To identify\nkey OTUs associated with disease outcomes, we employ a two-pronged feature selection approach: (1) Using machine\nlearning-based methods, and (2) Using network-based methods. By integrating these complementary approaches, we\nensure a comprehensive and biologically meaningful selection of microbial taxa associated with disease outcomes.\n2.3.1 Using machine learning-based methods\nTo identify important OTUs, we first normalize the filtered microbiome data using four different transformations:\ncentered log-ratio (CLR), raw filtered data, logarithmic transformation (log), and total sum scaling (TSS). These\nnormalization methods help account for compositional constraints and improve the reliability of machine learning\n(ML)-based feature selection.\nWe then apply all ML-based feature selection strategies, implemented in the scikit-learn library [47] in Python: (1)\n“SelectKBest” method, which selects features based on the k highest analysis of variance F-value scores, (2) Selection\nof the top k features based on the mutual information statistic, (3) Recursive Feature Elimination (RFE) with logistic\nregression, (4) RFE with a decision tree, (5) RFE with gradient boosting, and (6) RFE with Random Forest (RF).\nAdditionally, we introduce a 7th method that includes OTUs in the model if their maximum value falls within the top\n30% of the dataset. Running all seven strategies, we assign a value (\"TOTAL\") to each OTU based on the number\nof times the OTU is selected as an important feature under the seven criteria. Specifically, an OTU that is selected\nas important by all seven strategies will have \"TOTAL\" value of 7. Subsequently, we sort the OTUs based on the\n\"TOTAL\" column and select important OTUs with a \"TOTAL\" value higher than a defined threshold. These approaches\ncollectively provide valuable insights into the most influential OTUs in the context of our feature selection analysis,\nallowing us to make informed decisions and draw meaningful conclusions in the subsequent stages of our study.\n2.3.2 Using network-based methods\nWhile machine learning-based methods identify statistically relevant OTUs, they do not capture microbial interactions\nthat may play a crucial role in disease resistance. To address this limitation, we employ a network-based feature\nselection strategy that compares microbial co-occurrence patterns between diseased and healthy samples. We construct\nmicrobial interaction networks using four well-established methods (1-SPARCC, 2-SE_glasso, 3- SPRING, and CMIMN)\nbased on samples from two classes: one representing samples without the disease (‘clean tubers’) and the other with the\ndisease (‘scab-infected tubers’). We then apply two complementary network-based feature selection strategies:\nStrategy 1: differential centrality analysis. This approach analyzes five centrality metrics for each OTU: 1- Degree\n(connectivity within the network), 2- Betweenness (importance in connecting other taxa), 3- Closeness (proximity to all\nother taxa), 4- Eigenvector Centrality (influence based on connected neighbors), and 5- PageRank (importance based\non link structure). We rank OTUs based on the difference in their centrality measures between the ‘clean tubers’ and\n‘scab-infected tubers’ networks. The top 20% of OTUs showing the most significant variations are selected as key taxa.\nThe final taxa are considered important if they are selected by all four network inference methods.\nStrategy 2: weighted scoring of OTUs based on network topology. This method assigns a weighted score to each\nOTU based on its network properties using the following formula:\nScore j\ni = w1 × DEj\ni + w2 × EVj\ni + w3 × PRj\ni + w4 × CLj\ni + w5 × BEj\ni (5)\n7\n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint \n\nwhere, i denotes the OTU being evaluated and j represents the network inference method used for constructing the\nmicrobiome network. Here, DE, EV , PR, CL, and BE represent Degree, Eigenvector, PageRank, Closeness, and\nBetweenness centrality measures, respectively. The weights assigned to the centrality measures reflect their relative\nimportance in capturing biologically meaningful insights from the network structure. To ensure that the OTUs identified\nas significant have greater overlap with Strategy 1, we set the weights as follows: w1 = 0.1, w2 = 0.1, w3 = 0.1, w4 =\n0.2, w5 = 0.5, with Betweenness (50%) given the highest weight due to its role in structuring the microbial community.\nThe final top 20% of OTUs with the highest scores are considered key players in microbial interactions related to\ndisease resistance.\nUnifying machine learning and network-based approaches for reliable microbiome feature selection: To improve\nthe reliability of microbiome feature selection, we integrate both machine learning and network-based strategies.\nSpecifically, we evaluate OTUs identified by two different network-based methods with those selected through multiple\nmachine learning algorithms. The final set of selected OTUs consists of taxa consistently prioritized across these\ncomplementary approaches, thereby increasing confidence in their biological relevance. This integrative strategy\ncombines the predictive power of machine learning with the structural insights derived from microbial interaction\nnetworks, resulting in a robust and interpretable set of microbial features. The selected OTUs represent strong candidates\nfor microbial drivers of disease resistance and may inform the development of microbiome-targeted interventions aimed\nat enhancing crop resilience and promoting sustainable agricultural practices.\n3 Results\n3.1 Robustness study of different algorithms for learning the microbiome network\nIn order to assess the robustness ofCMIMN algorithm in learning the microbiome network, we conducted a comprehensive\nanalysis. In the initial step, we constructed the network utilizing the entirety of all samples. Subsequently, we performed\na critical evaluation by randomly selecting 70% of the samples and generating 50 distinct datasets derived exclusively\nfrom this subset. On each of these 50 datasets, we executed the algorithm independently to construct separate networks.\nWe compare the networks constructed from each of the 50 generated datasets to the corresponding network generated\nusing the complete set of samples, for each method based on F-score. This comparative evaluation assesses how\nconsistently each algorithm reconstructs microbial associations across different generated of the data. The F-score is a\nwidely used metric that balances precision and recall, providing a single measure of a method’s accuracy in detecting true\nmicrobial interactions while minimizing false positives and false negatives. A higher F-score indicates better network\nreconstruction accuracy and reliability. To visually represent these comparisons, we created box plots for the F-score\nvalues obtained from each iteration. This allowed me to not only assess the overall performance but also identify any\npotential variations in performance across different taxonomic levels, including Phylum, Class, and Order. Furthermore,\nwe extended this rigorous evaluation to encompass various network construction methods, including CMIMN, SPRING,\nSE_glasso, and SPARCC. The comparison was conducted at each taxonomic order, resulting in a comprehensive\nassessment of the method’s robustness under different conditions. Figure S1 presents the box plots showcasing the\nF-scores resulting from the application of different methods at varying taxonomic orders. This visualization provides a\nclear and insightful representation of the method’s performance across different scenarios, offering valuable insights\ninto its reliability and effectiveness. Performance comparisons were made across different taxonomic levels, including\nPhylum, Class, and Order. Our algorithm, CMIMN, exhibits superior performance as indicated by the narrower range of\nbox plots in all three taxonomic levels, demonstrating its robustness. Notably, among all algorithms, SE_glasso shows\nthe least favorable results for Class level.\n3.2 Minimal Overlap Across Network Inference Methods Highlights the Need for Using a Consensus Approach\nWe constructed microbiome networks using four different inference methods: ( SE_glasso, SPRING, SPARCC, and\nCMIMN) at the Phylum, Class, and Order levels. Despite using the same dataset, the resulting networks exhibited minimal\noverlap, highlighting the high variability in microbial interaction patterns inferred by different methods.\nFigure 2 shows the Venn diagrams of common edges inferred by different methods at different taxonomic levels: only\n24 common edges at the Phylum level, 80 at the Class level, and 522 at the Order level. These results indicate that\nnetwork structures can vary significantly depending on the inference method used, which raises concerns about the\nreliability of conclusions drawn from any single approach.\nTable S2, S3, and S4 present network metrics for the four different methods at the Phylum, Class, and Order levels. The\nsubstantial differences in network topology metrics underscore the inherent differences in algorithmic assumptions and\ntheir impact on inferred microbial interactions. This variation highlights the importance of employing a consensus-based\napproach to enhance network robustness, reduce algorithm-specific biases, and improve biological interpretability.\n8\n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint \n\nTable S5, S6, and S7 present the top nodes in microbial networks based on topological measures at the Phylum, Class,\nand Order levels. These networks were constructed using SE_glasso, SPRING, SPARCC , and CMIMN methods, and\nthe analysis encompasses data from all samples, without distinguishing between diseased and healthy conditions.\nInterestingly, certain OTUs were consistently identified as highly connected nodes across all four network construction\nmethods, reinforcing their biological significance. According to these tables, Acidobacteriae was identified as important\nat the Class level by all algorithms and network metrics, while C0119 was consistently identified at the Order level. The\nrecurrence of these taxa across multiple network inference methods suggests their potential ecological importance and\nrole in microbial community stability.\n78\n26\n26\n6\n34\n24\n2\n0\n36\n18\n6 2\n0\n4\n22\nSPARCC(148) SE_glasso(60)\nSPRING(172) CMIMN(136)\n(a) Phylum level\n336\n152\n312\n42\n102\n80\n18\n2\n404\n124\n66 2\n0\n0\n60\nSPARCC(880) SE_glasso(228)\nSPRING(732) CMIMN(856) (b) Class level\n1002\n282\n1086\n24\n124\n522\n72\n134\n778\n614\n740 20\n14\n90\n578\nSPARCC(3400) SE_glasso(2170)\nSPRING(2060) CMIMN(3574) (c) Order level\nFigure 2: Venn diagrams illustrating the overlap of common edges in microbiome networks constructed using four different inference methods (SE_glasso, SPRING,\nSPARCC, and CMIMN) at different taxonomic levels based on all samples. (a) Phylum level: 24 common edges among all methods. (b) Class level: 80 common edges. (c)\nOrder level: 522 common edges.\n3.3 Consensus microbiome network: enhancing reliability through integration\nFigures 3, S2, and S3 visualize the microbiome networks at the Phylum, Class, and Order levels respectively. Part (a) of\neach figure represents the ‘clean tubers’ network, which is characterized by a denser and more connected microbial\ncommunity, with more diverse interactions. Part (b) shows the ‘scab-infected tubers’ network, highlighting nodes\nand edges unique to diseased samples. Part (c) illustrates the common interactions between the two networks. Nodes\nrepresent OTUs and are color-coded: Purple: Common OTUs shared between ‘clean tubers’ and ‘scab-infected tubers’\nnetworks. Blue: OTUs unique to the ‘clean tubers’ network. Green: OTUs unique to the ‘scab-infected tubers’ network.\nNode size reflects connectivity (degree), while edges are distinguished as dashed (confirmed by three methods) or solid\n(confirmed by all four methods). At the Class and Order levels, due to the density of edges, only solid edges (confirmed\nby all four methods) are reported for clarity.\nIn both ‘clean tubers’ and ‘scab-infected tubers’ microbiome networks, we identified microbial associations consistently\nsupported by all four network inference methods. Many of these associations have also been independently reported\nas ecologically meaningful in soil ecosystems. The intersections of microbial associations between the clean tubers’\nand scab-infected tubers’ networks are summarized in Tables S8, S9, and S10, corresponding to the Phylum, Class,\nand Order levels, respectively. Each table includes three columns: the first lists associations unique to the ‘clean\ntubers’ network, the second shows associations exclusive to the ‘scab-infected tubers’ network, and the third highlights\nshared associations—those present in both networks—representing conserved or stable microbial interactions across\nconditions.\nDue to the large number of associations observed at the Class and Order levels, we report the complete set of edges\nconfirmed by all four methods in Supplementary Tables S9 and S10. Here, we focus on selected Phylum-level\nassociations that are most frequently supported by existing literature. In the Table S8 for ‘clean tubers’ network,\nwe observed the interaction between Planctomycetota–Patescibacteria, which likely reflects syntrophic or symbiotic\ninteractions, as both phyla possess reduced genomes and are known to co-occur in structured soil aggregates [48].\nThe edge between Methylomirabilota–WPS-2 may indicate shared adaptation to oligotrophic or co-contaminated\nsoil conditions, where both phyla are often involved in carbon and nitrogen cycling under stress [49]. Additionally,\nGemmatimonadota–Proteobacteria and Gemmatimonadota–Acidobacteriota were found only in the clean network;\nthese phyla are often associated with nutrient cycling and stable soil conditions, suggesting cooperative metabolic roles\nin healthy tuber-associated soils.\n9\n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint \n\n(a)‘clean tubers’ network\n (b)‘scab-infected tubers’ network\n(c) common interactions\nFigure 3: The microbiome network at the Phylum taxonomic level. Part (a) represents the ‘clean tubers’ network, part (b) displays the ‘scab-infected tubers’ network, and\npart (c) shows the common interactions between them. Nodes represent Operational Taxonomic Units (OTUs) and are color-coded: purple for common OTUs shared\nbetween ‘clean tubers’ and ‘scab-infected tubers’ networks, blue for OTUs unique to the ‘clean tubers’ network, and green for OTUs unique to the ‘scab-infected tubers’\nnetwork. Node size indicates their degree of connectivity. Edges are categorized as dashed lines (confirmed by three methods) or solid lines (confirmed by all four\nmethods).\nIn the ‘scab-infected network, theActinobacteriota–Gemmatimonadota interaction was unique and mirrors findings\nfrom boreal forest soils, where both phyla jointly contributed to the transformation of dissolved organic matter during\nfreeze–thaw cycles, pointing to potential functional synergy under stress [50].\nIn both ‘clean tubers’ and ‘scab-infected networks, the edges Actinobacteriota–Proteobacteria and Proteobacte-\nria–Acidobacteriota were consistently present, suggesting stable and ecologically relevant relationships across condi-\ntions. The co-occurrence of Actinobacteriota and Proteobacteria has been reported in sandy and layered soils, where\nboth phyla are dominant and likely contribute to complementary roles in organic matter degradation and nutrient cy-\ncling [51]. Meanwhile, Proteobacteria and Acidobacteriota are among the most abundant phyla in forest and agricultural\nsoils and are known to occupy distinct but co-existing niches—Proteobacteria favoring copiotrophic (nutrient-rich) envi-\nronments and Acidobacteriota preferring oligotrophic, acidic soils—indicating a functional partitioning that supports\nbroad microbial diversity and resilience [52].\n3.4 Feature selection using a multi-method approach\n3.4.1 Machine learning-based feature selection\nTables S11, S12, and S13 summarize the results of machine learning (ML)-based feature selection at the Phylum, Class,\nand Order levels, respectively. Each table includes five columns: four corresponding to different data normalization\nstrategies applied prior to ML analysis, and a fifth representing their intersection. The first column (ML_CLR) reports\nOTUs selected from count-filtered data normalized using the CLR transformation. The second column (ML_Original)\nshows OTUs selected from raw count-filtered data without transformation. The third column (ML_Log) includes\nOTUs identified from log-transformed data, and the fourth column (ML_TSS) presents selections based on data\nnormalized using TSS. The final column (ML_Intersection) lists OTUs consistently identified as important across all\nfour normalization methods, highlighting microbial taxa that are robust to normalization choice. According to these\ntables, there is a good overlap between different normalization methods.\n3.4.2 Network-Based Selection: Strategy 1 (Differential Centrality Analysis)\nTables S14, S15, and S16 provide lists of selected OTUs at the Phylum, Class, and Order levels, respectively, based\non Strategy 1: Differential Centrality Analysis. OTUs were selected according to two criteria: (1) those exhibiting\nthe largest differences in centrality values between ‘clean tubers’ and ‘scab-infected tubers’ networks, and (2) those\nconsistently identified across all four network inference methods ( SE_glasso, SPRING, SPARCC, and CMIMN), rein-\nforcing their biological relevance. The first column of these tables shows the OTU name, while the second column\n(Features) indicates the centrality measure(s) responsible for the OTU’s selection. These taxa represent microbial\nfeatures whose connectivity patterns consistently differ between ‘clean tubers’ and ‘scab-infected tubers’ networks,\nsuggesting a potential role in disease dynamics.\n3.4.3 Network-Based Selection: Strategy 2 (Composite Scoring Approach)\nFirst, we constructed two distinct microbiome networks: one for ‘clean tubers’ and one for ‘scab-infected tubers’.\nNetworks were generated using four inference methods: SE_glasso, SPRING, SPARCC, and CMIMN.\n10\n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint \n\nNext, we applied a composite scoring system that integrates multiple centrality metrics into a single weighted score\nfor each OTU, as defined in Equation (5). This score was calculated separately for each of the four network inference\nmethods. The top 20% of OTUs with the highest scores were considered significant. To evaluate the agreement\nbetween network-based selection (Strategy 2) and ML-based selection, we examined the overlap between the top 20%\nhighest-scoring OTUs and those selected by ML-based methods.\nTables S17, S18, and S19 summarize these overlaps at the Phylum, Class, and Order levels, respectively. In these tables,\nthe left section corresponds to results from the ‘clean tubers’ network, while the right section presents the results from\nthe ‘scab-infected tubers’ network. Each section contains five columns: The first column lists the OTUs selected by\nStrategy 2. Columns 2–5 indicate whether the same OTUs were also identified by ML methods under four normalization\nstrategies—CLR transformation, untransformed (raw) data, log-transformation, and TSS normalization. A value of “1”\nin these columns denotes agreement between Strategy 2 and the corresponding ML method for that normalization, while\n“0” indicates the OTU was uniquely selected by the network-based approach. This design enables a direct comparison\nof method overlap under different data preprocessing conditions. Figures S4 and S5 provide a visual representation\nof these overlaps, illustrating the average agreement between ML-based and network-based selection methods under\ndifferent conditions. For ‘clean tubers’, at the Phylum level,CMIMN and SPARCC demonstrated slightly better agreement\nwith ML-based selection across different normalization strategies. At the Order level, agreement was highest overall,\nparticularly under CLR and TSS normalization. For ‘scab-infected’ tubers,CMIMN generally exhibited higher agreement\nwith ML-based selection at the Phylum level, while SPARCC showed better alignment at the Class level in certain cases.\nAt the Order level, both CMIMN and SPARCC consistently achieved strong agreement with ML-based methods. These\nfindings indicate that CMIMN and SPARCC consistently align more closely with ML-based feature selection methods,\nparticularly at the Order level and under CLR and TSS normalization. This underscores their robustness and reliability\nin identifying important OTUs across different experimental conditions.\nTables S20, S21, and S22 present the overlap between Strategy 2 and the ML approaches at the Phylum, Class, and Order\nlevels, focusing specifically on OTUs that were commonly identified inboth the ‘clean tubers’ and ‘scab-infected tubers’\nnetworks for each method. The left sections of these tables report overlaps for the CMIMN and SE_glasso methods. The\nfirst column lists the microbial taxa consistently identified across both networks using each respective method. Columns\n2 through 5 indicate whether these taxa were also selected by ML methods under different normalization strategies. A\nvalue of “1” denotes agreement between the ML and network-based methods, while “0” indicates no overlap. Similarly,\nthe right sections of the tables summarize the results for the SPARCC and SPRING methods, following the same structure.\nFinally, Tables S23 and S24 illustrate the overlap among important OTUs identified by all four algorithms ( CMIMN,\nSPARCC,SE_glasso,SPRING)for the ‘clean tubers’ and ‘scab-infected tubers’ networks, respectively, based on Strategy\n2.\n3.5 Overall OTUs identified by all methods as key drivers of disease\nTable 1 summarizes the key OTUs identified through different selection strategies, including Machine Learning-based\nfeature selection (ML), Network-Based Selection: Strategy 1 (Differential Centrality Analysis) (Strategy 1), and\nNetwork-Based Selection: Strategy 2 (Composite Scoring Approach) (Strategy 2), at the Phylum, Class, and Order\nlevels.\nAt the Phylum level, we found no overlap between the taxa selected by ML-based and network-based methods,\nhighlighting the complementary nature of these approaches. To provide a comprehensive view, we report the most\nconsistently selected taxa within each category.\nML-based selected taxa: Firmicutes, identified through ML-based feature selection, comprise 5.5% of the bacterial\ncommunity and promote plant growth and disease suppression, especiallyBacillus spp., which enhance root colonization\nand pathogen inhibition through antimicrobial metabolites [53]. Cyanobacteria (0.9%) also emerged as important\nML-selected taxa, contributing to nitrogen fixation, biofilm formation, and soil structure improvement, benefiting\nmicrobial community stability [54]. Less abundant phyla such as Armatimonadota (0.2%) may also be directly related\nto the disease, as negative relationship between the abundance of those phyla and soil suppressive ability of scab has\nbeen observed in one of our studies (data unpublished). The precise implications of NB1-j (0.1%) in disease progression\nare still unclear, but its involvement in nitrogen cycling and interactions with microalgae suggest potential indirect\ninfluences [55].\nNetwork-based selected taxa (intersection of both strategies): Bacteroidota, WPS.2, and Proteobacteria were con-\nsistently identified across both network-based strategies, indicating strong and robust association with disease status.\nBacteroidota (5.5%) are involved in nutrient cycling and pathogen competition, both of which contribute to disease\nsuppression [56]. WPS.2, though less prevalent (0.3%), showed a negative relationship with suppressive soil capacity in\nour prior (unpublished) observations. Proteobacteria represent a taxonomically diverse group containing both beneficial\n11\n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint \n\nLevel ML Strategy 1 Strategy 2 Intersection\nPhylum\nFirmicutes Bacteroidota Bacteroidota ▲ Bacteroidota\nCyanobacteria WPS.2 WPS.2 ▲ WPS.2\nArmatimonadota Proteobacteria Proteobacteria ▲ Proteobacteria\nNB1.j\nClass\nBacilli Desulfotobacteriia Ktedonobacteria ▲ Actinobacteria\nKtedonobacteria Actinobacteria Vicinamibacteria ▲ AD3\nCyanobacteriia Syntrophobacteria Actinobacteria • Bacilli\nSaccharimonadia AD3 Gammaproteobacteria • Anaerolineae\nPlanctomycetes Alphaproteobacteria • Ktedonobacteria\nIgnavibacteria Acidobacteriae\nDehalococcoidia Anaerolineae\nAnaerolineae AD3\nMB.A2.108 Blastocatellia\nChthononomadetes Bacilli\nOrder\nSaccharimonadales C0119 C0119 * C0119\nBacillales Defluviicoccales Rhizobiales ■ Defluviicoccales\nC0119 Bacteroidales Chitinophagales ■ Bacteroidales\nSubgroup.2 Kryptoniales Ktedonobacterales • Ktedonobacterales\nXanthomonadales B12.WMSP1 Microtrichales\nAcidobacteriales Desulfotobacteriales\nChloroplast\nAlicyclobacillales\nPaenibacillales\nAcetobacterales\nPseudomonadales\nAnaerolineales\nElsterales\nBacteroidales\nKtedonobacterales\nTable 1: Important taxa at the Phylum, Class, and Order levels across three different strategies. The symbols indicate\nintersections between the levels: * for taxa appearing in ML, Strategy 1, and Strategy 2; ■ for taxa appearing in ML\nand Strategy 1; • for taxa appearing in ML and Strategy 2; ▲ for taxa appearing in Strategy 1 and Strategy 2.\n(e.g., Rhizobium) and pathogenic (e.g., Pseudomonas syringae, Ralstonia solanacearum) members, reflecting their\ncomplex role in disease ecology [57].\nAt the Class level, Network-based intersection: Actinobacteria and AD3 were selected by both network-based\nstrategies, indicating strong structural importance in microbial networks associated with disease. Actinobacteria are\nkey contributors to soil suppressiveness against plant pathogens. Notably, non-pathogenic Streptomyces spp. produce\nantibiotics that inhibit soil-borne pathogens, including Streptomyces scabies, the causative agent of common scab\ndisease [58]. Although less well characterized, Although less well-characterized, AD3 was identified as a robust\nClass-level taxon across both network-based strategies. This group has been associated with degraded or polluted soils\nand reduced organic matter content, suggesting its presence may indicate shifts in microbial community structure linked\nto soil stress and disease vulnerability [59].\nML and network-based Strategy 2 intersection: Bacilli, Anaerolineae, and Ktedonobacteria were jointly identified by\nML-based feature selection and network-based Strategy 2, suggesting these taxa are both predictive and structurally\ncentral in the disease-associated microbiome. Bacilli (notably Bacillus spp.) are widely recognized for their role in\nplant protection and disease suppression, particularly through Bacillus spp., which produce lipopeptides and hydrolytic\nenzymes that enhance root colonization and pathogen inhibition [60]. Ktedonobacteria exhibit complex morphologies\nand genomic features, leading to speculation that they may be a valuable microbial resource for novel compounds [61].\nAnaerolineae, frequently found in low-oxygen soil habitats, play a crucial role in carbon degradation processes,\nincluding the breakdown of plant-derived compounds [62,63]. This activity can modify the soil environment, potentially\nsuppressing plant diseases through nutrient competition or the production of inhibitory substances.\n12\n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint \n\nAt the Order level, Confirmed by all ML methods and both network-based strategies: C0119 was the only taxon\nconsistently identified by all machine learning models and both network-based strategies, highlighting its strong and\nstable association with disease-relevant microbial networks. Although taxonomically unclassified, recent studies\nhave shown that C0119 is a dominant order in biochar-amended soils, environments known to support improved\nmicrobial diversity, carbon cycling, and root-associated community stability [64]. Its consistent emergence across\ndiverse analytical methods suggests an ecologically meaningful role in shaping disease-conducive or suppressive soil\nenvironments.\nSelected by ML and Network Strategy 1: Defluviicoccales, while often linked to anaerobic degradation, have been\nobserved in disease-prone soils, where they may contribute to microbial shifts influencing pathogen persistence [65].\nBacteroidales are involved in organic matter degradation and nutrient cycling. Some members have been associated with\npathogen suppression via competitive exclusion and enhancement of soil nutrient availability, contributing indirectly to\ndisease resistance [66].\nSelected by ML and Network Strategy 2: Ktedonobacterales have been associated with disease suppression due to their\npotential for producing antimicrobial compounds and their metabolic similarity to antibiotic-producing Actinomycetes\n[61].\nConclusion\nThis study introduces a comprehensive framework for robust microbiome network inference and the identification\nof disease-associated microbial taxa, specifically in the context of potato common scab. We developed CMIMN, a\nnovel Bayesian network algorithm based on conditional mutual information, which exhibited superior robustness and\ninterpretability across taxonomic levels. Recognizing the limitations of individual network inference methods, we\nintegrated CMIMN with three widely used approaches— SPIEC-EASI, SPRING, and SPARCC—to construct consensus\nmicrobiome networks. These consensus networks captured biologically meaningful co-occurrence patterns while\nreducing algorithm-specific variability, thereby enhancing confidence in the inferred microbial interactions.\nTo identify taxa relevant to disease, we implemented a multi-method feature selection framework combining different\nmachine learning algorithms with two network-based strategies. The machine learning component provided predictive\nstability by selecting features that consistently appeared across models, while the network-based methods leveraged\ncentrality metrics and topological differences between networks to capture taxa important to microbial structure and\ncommunity dynamics. This integrative approach enabled the detection of microbial taxa with both statistical significance\nand ecological relevance.\nOur results revealed clear distinctions in microbial community structure between clean and scab-infected tubers. At\nthe Phylum level, Bacteroidota, WPS-2, and Proteobacteria were identified through both network-based strategies,\nwhile Firmicutes and Cyanobacteria were highlighted by machine learning models. Interactions such as Actinobacteri-\nota–Proteobacteria and Planctomycetota–Patescibacteria were found to be consistently supported by all four network\ninference methods and corroborated by existing soil studies, reinforcing their ecological relevance.\nAt the Class level, Actinobacteria, AD3, Bacilli, Anaerolineae, and Ktedonobacteria were identified by either multiple\nstrategies or the intersection of both ML and network approaches. These classes are associated with key ecological\nfunctions such as carbon degradation, nutrient cycling, antimicrobial production, and disease suppression. At the\nOrder level, C0119 was the only taxon confirmed by all machine learning models and both network-based strategies,\nhighlighting its potential as a robust indicator of disease status. Other important orders included Bacteroidales,\nDefluviicoccales, and Ktedonobacterales, identified by at least two independent methods.\nThe topological analysis of microbial networks further revealed differences in connectivity and interaction density\nbetween clean and diseased tuber microbiomes. Clean tuber networks exhibited higher overall connectivity, suggesting\na more stable and cooperative microbial community. In contrast, disease-associated networks were more fragmented\nand featured shifts in taxa centrality, indicating structural reorganization in response to pathogen pressure. Several\ninteractions identified in these networks—such as Actinobacteriota–Gemmatimonadota and Methylomirabilota–WPS-\n2—have also been observed in previous studies investigating soil response to stress or contamination.\nThe concept of disease-suppressive soil suggests fundamental differences in the microbiological environment between\nhealthy and disease-conductive soils; meanwhile, soil microorganisms have been proposed as bioindicators of general\nsoil health [67]. Because the microbiomes in this study were extracted from pre-planting soils of potato fields, those\nselected microbial features which distinct healthy and disease-conductive soils seem to exist long before disease\nemergency. Despite the variations driven by geography, management, and climate legacy in this large-scale survey,\nmicrobial signal was strong for those that tend to produce disease-free tubers. This suggests promising utility of soil\nmicrobiome in inferring indicators for soil health and predicting potato scab diseases.\n13\n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint \n\nAltogether, our integrative approach provides a scalable and interpretable framework for microbiome network analysis\nand biomarker discovery. By combining Bayesian inference, consensus-based network construction, and multi-\nmethod feature selection, we bridge predictive modeling with ecological insight. These findings not only improve our\nunderstanding of microbial community dynamics in disease contexts but also establish a foundation for microbiome-\ninformed strategies in plant health management and sustainable agriculture. As a next step, our broader vision is to\ndevelop an interactive Shiny application that enables biologists to upload their microbiome and disease data to identify\nreliable taxa-disease associations and uncover robust co-occurrence relationships—making advanced analysis tools\nmore accessible and actionable for the biological research community. In addition, our approach extracts biologically\nmeaningful information by comparing network structures between clean tubers and scab-infected tubers: Strategy\n1 focuses on differences in centrality measures across the two networks, while Strategy 2 analyzes each network\nindependently to highlight key taxa. To further enhance this comparative analysis, we are interested in applying the\nMicrobiome Network Alignment (MiNAA) algorithm [68], which aligns microbial networks across conditions, allowing\nus to extract deeper biological insights about shifts in microbial interactions associated with disease.\nConsent for publication\nNot applicable.\nAvailability of data and materials\nThe 16S and ITS amplicon sequencing data associated with this study are publicly available at the NCBI Short Read\nArchive under the BioProject PRJNA1135141. The R package of CMIMN and all R code for this paper are available in\nthe github repository, https://github.com/solislemuslab/CMIMN.\nCompeting interests\nThe authors declare that they have no competing interests.\nFunding\nThis work was supported by the National Science Foundation (DEB-2144367 to CSL). The work was also supported by\nUSDA Specialty Crop Multi-State Grant Program award SCMP1701.\nAuthors’ contributions\nCSL and RL developed the idea. RL and SS collected the data. RA led all statistical analyses from data preprocessing\nto fitting of machine learning models, as well as summarizing the results by the creation of figures. RA wrote the initial\ncomplete draft of the manuscript. SS, RL and CSL contributed in interpretations, editing, and revision of the manuscript.\nAll authors read and approved the final manuscript.\nAcknowledgements\nReferences\n[1] Husein Ajwa, William J Ntow, Ruijun Qin, and Suduan Gao. Properties of soil fumigants and their fate in the\nenvironment. In Hayes’ Handbook of Pesticide Toxicology, pages 315–330. Elsevier, 2010.\n[2] Sarah Braun, Amanda Gevens, Amy Charkowski, Christina Allen, and Shelley Jansky. Potato common scab: A\nreview of the causal pathogens, management practices, varietal resistance screening methods, and host resistance.\nAmerican Journal of Potato Research, 94:283–296, 2017.\n[3] Linda L Kinkel, Daniel C Schlatter, Matthew G Bakker, and Brett E Arenz. Streptomyces competition and\nco-evolution in relation to plant disease suppression. Research in microbiology, 163(8):490–499, 2012.\n[4] Noah Rosenzweig, James M Tiedje, John F Quensen III, Qingxiao Meng, and Jianjun J Hao. Microbial communi-\nties associated with potato common scab-suppressive soil determined by pyrosequencing analyses. Plant disease,\n96(5):718–725, 2012.\n[5] Noah Fierer, Christian L Lauber, Kelly S Ramirez, Jesse Zaneveld, Mark A Bradford, and Rob Knight. Comparative\nmetagenomic, phylogenetic and physiological analyses of soil microbial communities across nitrogen gradients.\nThe ISME Journal, 6(5):1007–1017, 2012.\n14\n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint \n\n[6] Thea Whitman, Rachel Neurath, Adele Perera, Ilexis Chu-Jacoby, Daliang Ning, Jizhong Zhou, Peter Nico,\nJennifer Pett-Ridge, and Mary Firestone. Microbial community assembly differs across minerals in a rhizosphere\nmicrocosm. Environmental Microbiology, 20(12):4444–4460, 2018.\n[7] Anna M. Cates, Michael J. Braus, Thea L. Whitman, and Randall D. Jackson. Separate drivers for microbial\ncarbon mineralization and physical protection of carbon. Soil Biology and Biochemistry, 133:72–82, 2019.\n[8] Christina Kranz and Thea Whitman. Short communication: Surface charring from prescribed burning has minimal\neffects on soil bacterial community composition two weeks post-fire in jack pine barrens. Applied Soil Ecology,\n144:134–138, 2019.\n[9] Thea Whitman, Ellen Whitman, Jamie Woolet, Mike D. Flannigan, Dan K. Thompson, and Marc-André Parisien.\nSoil bacterial and fungal response to wildfires in the canadian boreal forest across a burn severity gradient. Soil\nBiology and Biochemistry, 138:107571, 2019.\n[10] Alex Carr, Christian Diener, Nitin S Baliga, and Sean M Gibbons. Use and abuse of correlation analyses in\nmicrobial ecology. The ISME journal, 13(11):2647–2655, 2019.\n[11] Cassandra Allsup and Richard Lankau. Migration of soil microbes may promote tree seedling tolerance to drying\nconditions. Ecology, 100:e02729, 04 2019.\n[12] R. A. Rioux, C. M. Stephens, and J. P. Kerns. Factors affecting pathogenicity of the turfgrass dollar spot pathogen\nin natural and model hosts. bioRxiv, page 630582, 01 2019.\n[13] Richard Lankau, Isabelle George, and Max Miao. Crop health optimized by microbial diversity across phylogenetic\nscales. Submitted, 2020.\n[14] Emily W Lankau, Diane Xue, Rachel Chrisensen, Amanda J Gevens, and Richard A Lankau. Management and\nsoil conditions influence common scab severity on potato tubers via indirect effects on soil microbial communities.\nPhytopathology™, 2020/02/27 2020.\n[15] Ksenia Guseva, Sean Darcy, Eva Simon, Lauren V Alteio, Alicia Montesinos-Navarro, and Christina Kaiser. From\ndiversity to complexity: Microbial networks in soils. Soil Biology and Biochemistry, 169:108604, 2022.\n[16] Rosa Aghdam, Mojtaba Ganjali, Xiujun Zhang, and Changiz Eslahchi. CN: a consensus algorithm for infer-\nring gene regulatory networks using the sorder algorithm and conditional mutual information test. Molecular\nBioSystems, 11(3):942–949, 2015.\n[17] Cameron Wagg, Klaus Schlaeppi, Samiran Banerjee, Eiko E Kuramae, and Marcel GA van der Heijden. Fungal-\nbacterial diversity and microbiome complexity predict ecosystem functioning.Nature communications, 10(1):4841,\n2019.\n[18] Rosa Aghdam, Xudong Tang, Shan Shan, Richard Lankau, and Claudia Solís-Lemus. Human limits in machine\nlearning: prediction of potato yield and disease using soil microbiome data. BMC bioinformatics, 25:366, 2024.\n[19] Evan Gorstein, Rosa Aghdam, and Claudia Solís-Lemus. Highdimmixedmodels. jl: Robust high-dimensional\nmixed-effects models across omics data. PLOS Computational Biology, 21(1):e1012143, 2025.\n[20] Helena Brunel, Joan-Josep Gallardo-Chacón, Alfonso Buil, Montserrat Vallverdú, José Manuel Soria, Pere\nCaminal, and Alexandre Perera. Miss: a non-linear methodology based on mutual information for genetic\nassociation studies in both population and sib-pairs analysis. Bioinformatics, 26(15):1811–1818, 2010.\n[21] Gökmen Altay and Frank Emmert-Streib. Revealing differences in gene network inference algorithms on the\nnetwork level by ensemble methods. Bioinformatics, 26(14):1738–1744, 2010.\n[22] Xiujun Zhang, Xing-Ming Zhao, Kun He, Le Lu, Yongwei Cao, Jingdong Liu, Jin-Kao Hao, Zhi-Ping Liu, and\nLuonan Chen. Inferring gene regulatory networks from gene expression data by path consistency algorithm based\non conditional mutual information. Bioinformatics, 28(1):98–104, January 2012.\n[23] Xiujun Zhang, Juan Zhao, Jin-Kao Hao, Xing-Ming Zhao, and Luonan Chen. Conditional mutual inclusive\ninformation enables accurate quantification of associations in gene regulatory networks. Nucleic acids research,\n43(5):e31–e31, 2015.\n[24] Sayyed Hadi Mahmoodi, Rosa Aghdam, and Changiz Eslahchi. An order independent algorithm for inferring gene\nregulatory network using quantile value for conditional independence tests. Scientific reports, 11(1):7605, 2021.\n[25] Rosa Aghdam, Mojtaba Ganjali, and Changiz Eslahchi. IPCA-CMI: an algorithm for inferring gene regulatory\nnetworks based on a combination of pca-cmi and mit score. PloS one, 9(4):e92600, 2014.\n[26] Parisa Niloofar, Rosa Aghdam, and Changiz Eslahchi. Gaem: Genetic algorithm based expectation-maximization\nfor inferring gene regulatory networks from incomplete data. Computers in Biology and Medicine, 183:109238,\n2024.\n15\n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint \n\n[27] Diego Colombo, Marloes H. Maathuis, Markus Kalisch, and Thomas S. Richardson. Learning high-dimensional\ndirected acyclic graphs with latent and selection variables. Ann. Statist., 40(1):294–321, 02 2012.\n[28] Judea Pearl. Causality: Models, Reasoning and Inference. Cambridge University Press, New York, NY , USA, 2nd\nedition, 2009.\n[29] P. Spirtes, C. Glymour, and R. Scheines. Causation, Prediction, and Search. MIT press, 2nd edition, 2000.\n[30] Rosa Aghdam, Mojtaba Ganjali, Parisa Niloofar, and Changiz Eslahchi. Inferring gene regulatory networks by an\norder independent algorithm using incomplete data sets. Journal of Applied Statistics, 43(5):893–913, 2016.\n[31] Seyed Amir Malekpour, Maryam Shahdoust, Rosa Aghdam, and Mehdi Sadeghi. wplogicnet: logic gate and\nstructure inference in gene regulatory networks. Bioinformatics, 39(2):btad072, 2023.\n[32] Luis M. de Campos. A scoring function for learning bayesian networks based on mutual information and\nconditional independence tests. Journal of Machine Learning Research, 7:2149–2187, 2006.\n[33] Eli Faulkner. K2ga: Heuristically guided evolution of bayesian network structures from data. In CIDM, pages\n18–25. IEEE, 2007.\n[34] Seiya Imoto, Takao Goto, and Satoru Miyano. Estimation of genetic networks and functional structures between\ngenes by using bayesian networks and nonparametric regression. In Russ B. Altman, A. Keith Dunker, Lawrence\nHunter, and Teri E. Klein, editors, Pacific Symposium on Biocomputing, pages 175–186, 2002.\n[35] S. Acid and L. M. Campos. A hybrid methodology for learning belief networks: Benedict. International Journal\nof Approximate Reasoning, 27(3):235–262, 2001.\n[36] D. M. Chickering, D. Geiger, and D. Heckerman. Learning Bayesian networks: Search methods and experimental\nresults. In Preliminary papers of the 5th International Workshop on Artificial Intelligence and Statistics, pages\n112–128, 1995.\n[37] Markus Kalisch, Martin Mächler, Diego Colombo, Marloes H. Maathuis, and Peter Bühlmann. Causal inference\nusing graphical models with the r package pcalg. Journal of Statistical Software, 47(11):1–26, 5 2012.\n[38] Marloes H. Maathuis, Markus Kalisch, and Peter Bühlmann. Estimating high-dimensional intervention effects\nfrom observational data. Ann. Statist, 37(6A):3133–3164, 2009.\n[39] Ioannis Tsamardinos, Laura E. Brown, and Constantin F. Aliferis. The max-min hill-climbing bayesian network\nstructure learning algorithm. Machine Learning, 65(1):31–78, 2006.\n[40] Peter Spirtes, Christopher Meek, and Thomas Richardson. Causal inference in the presence of latent variables and\nselection bias. In Proceedings of the Eleventh conference on Uncertainty in artificial intelligence, pages 499–506.\nMorgan Kaufmann Publishers Inc., 1995.\n[41] Peter Spirtes. An anytime algorithm for causal inference. In Proc. of the Eighth International Workshop on\nArtificial Intelligence and Statistics, pages 213–221. Citeseer, 2001.\n[42] Jiji Zhang. On the completeness of orientation rules for causal discovery in the presence of latent confounders and\nselection bias. Artificial Intelligence, 172(16):1873–1896, 2008.\n[43] Rosa Aghdam, Vahid Rezaei Tabar, and Hamid Pezeshk. Some node ordering methods for the k2 algorithm.\nComputational Intelligence, 35(1):42–58, 2019.\n[44] Zachary D Kurtz, Christian L Müller, Emily R Miraldi, Dan R Littman, Martin J Blaser, and Richard A Bonneau.\nSparse and compositionally robust inference of microbial ecological networks. PLoS computational biology,\n11(5):e1004226, 2015.\n[45] Grace Yoon, Irina Gaynanova, and Christian L Müller. Microbial networks in spring-semi-parametric rank-based\ncorrelation and partial correlation estimation for quantitative microbiome data. Frontiers in genetics, 10:516,\n2019.\n[46] Jonathan Friedman and Eric J Alm. Inferring correlation networks from genomic survey data.PLoS computational\nbiology, 8(9):e1002687, 2012.\n[47] F. Pedregosa, G. Varoquaux, A. Gramfort, V . Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss,\nV . Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn:\nMachine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.\n[48] Cale O Seymour, Marike Palmer, Eric D Becraft, Ramunas Stepanauskas, Ariel D Friel, Frederik Schulz, Tanja\nWoyke, Emiley Eloe-Fadrosh, Dengxun Lai, Jian-Yu Jiao, et al. Hyperactive nanobacteria with host-dependent\ntraits pervade omnitrophota. Nature Microbiology, 8(4):727–744, 2023.\n16\n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint \n\n[49] Ying Zhang, Fanghan Qian, and Yanyu Bao. Variations of microbiota and metabolites in rhizosphere soil of\ncarmona microphylla at the co-contaminated site with polycyclic aromatic hydrocarbons and heavy metals.\nEcotoxicology and Environmental Safety, 290:117734, 2025.\n[50] Yan Yang, Jing Geng, Shulan Cheng, Huajun Fang, Yifan Guo, Yuna Li, Yi Zhou, Fangying Shi, and Karen\nVancampenhout. Linking soil microbial community to the chemical composition of dissolved organic matter in a\nboreal forest during freeze–thaw cycles. Geoderma, 431:116359, 2023.\n[51] Zhen Guo, Haiou Zhang, Juan Li, Tianqing Chen, Huanyuan Wang, and Yang Zhang. Distribution of soil\nmicroorganisms in different complex soil layers in mu us sandy land. PLoS One, 18(4):e0283341, 2023.\n[52] Hua Wei, Changhui Peng, Bin Yang, Hanxiong Song, Quan Li, Lin Jiang, Gang Wei, Kefeng Wang, Hui Wang,\nShirong Liu, et al. Contrasting soil bacterial community, diversity, and function in two forests in china. Frontiers\nin Microbiology, 9:1693, 2018.\n[53] Roeland L Berendsen, Corné MJ Pieterse, and Peter AHM Bakker. The rhizosphere microbiome and plant health.\nTrends in plant science, 17(8):478–486, 2012.\n[54] Hongli He, Runyu Miao, Lilong Huang, Hongshan Jiang, and Yunqing Cheng. Vegetative cells may perform\nnitrogen fixation function under nitrogen deprivation in anabaena sp. strain pcc 7120 based on genome-wide\ndifferential expression analysis. PLoS One, 16(3):e0248155, 2021.\n[55] BLD Uthpala Pushpakumara, Kshitij Tandon, Anusuya Willis, and Heroen Verbruggen. Unravelling microalgal-\nbacterial interactions in aquatic ecosystems through 16s rrna gene-based co-occurrence networks. Scientific\nReports, 13(1):2743, 2023.\n[56] Rodrigo Mendes, Marco Kruijt, Irene De Bruijn, Ester Dekkers, Menno Van Der V oort, Johannes HM Schneider,\nYvette M Piceno, Todd Z DeSantis, Gary L Andersen, Peter AHM Bakker, et al. Deciphering the rhizosphere\nmicrobiome for disease-suppressive bacteria. Science, 332(6033):1097–1100, 2011.\n[57] Stéphane Genin and Timothy P Denny. Pathogenomics of the ralstonia solanacearum species complex. Annual\nreview of phytopathology, 50(1):67–89, 2012.\n[58] Marzieh Ebrahimi-Zarandi, Roohallah Saberi Riseh, and Mika T Tarkka. Actinobacteria as effective biocontrol\nagents against plant pathogens, an overview on their role in eliciting plant defense. Microorganisms, 10(9):1739,\n2022.\n[59] Gang Wang, Ying Ren, Xuanjiao Bai, Yuying Su, and Jianping Han. Contributions of beneficial microorganisms\nin soil remediation and quality improvement of medicinal plants. Plants, 11(23):3200, 2022.\n[60] Djordje Fira, Ivica Dimki´c, Tanja Beri´c, Jelena Lozo, and Slaviša Stankovi´c. Biological control of plant pathogens\nby bacillus species. Journal of biotechnology, 285:44–55, 2018.\n[61] Shuhei Yabe, Yasuteru Sakai, Keietsu Abe, and Akira Yokota. Diversity of ktedonobacteria with actinomycetes-like\nmorphology in terrestrial environments. Microbes and environments, 32(1):61–70, 2017.\n[62] Paige E Payne, Loren N Knobbe, Patricia Chanton, Julian Zaugg, Behzad Mortazavi, and Olivia U Mason.\nUncovering novel functions of the enigmatic, abundant, and active anaerolineae in a salt marsh ecosystem.\nMsystems, 10(1):e01162–24, 2025.\n[63] Yuqin Liang, Liang Wei, Shuang Wang, Can Hu, Mouliang Xiao, Zhenke Zhu, Yangwu Deng, Xiaohong Wu,\nYakov Kuzyakov, Jianping Chen, et al. Long-term fertilization suppresses rice pathogens by microbial volatile\ncompounds. Journal of environmental management, 336:117722, 2023.\n[64] Zhiqiang Tang, Liying Zhang, Na He, Diankai Gong, Hong Gao, Zuobin Ma, Liang Fu, Mingzhu Zhao, Hui Wang,\nChanghua Wang, et al. Soil bacterial community as impacted by addition of rice straw and biochar. Scientific\nReports, 11(1):22185, 2021.\n[65] Hazel A Barton, Juan G Giarrizzo, Paula Suarez, Charles E Robertson, Mark J Broering, Eric D Banks, Parag A\nVaishampayan, and Kasthisuri Venkateswaran. Microbial diversity in a venezuelan orthoquartzite cave is dominated\nby the chloroflexi (class ktedonobacterales) and thaumarchaeota group i. 1c. Frontiers in microbiology, 5:615,\n2014.\n[66] Ian DEA Lidbury, Chiara Borsetto, Andrew RJ Murphy, Andrew Bottrill, Alexandra ME Jones, Gary D Bending,\nJohn P Hammond, Yin Chen, Elizabeth MH Wellington, and David J Scanlan. Niche-adaptation in plant-associated\nbacteroidetes favours specialisation in organic phosphorus mineralisation. The ISME Journal, 15(4):1040–1055,\n2021.\n[67] Marketa Sagova-Mareckova, Marek Omelka, and Jan Kopecky. The golden goal of soil management: disease-\nsuppressive soils. Phytopathology®, 113(4):741–752, 2023.\n[68] Reed Nelson, Rosa Aghdam, and Claudia Solis-Lemus. Minaa: Microbiome network alignment algorithm.\nJournal of Open Source Software, 9(96):5448, 2024.\n17\n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint \n\nSUPPLEMENTARY MATERIAL : L EVERAGING BAYESIAN\nNETWORKS FOR CONSENSUS NETWORK CONSTRUCTION AND\nMULTI-METHOD FEATURE SELECTION TO DECODE DISEASE\nPREDICTION\nRosa Aghdam\nWisconsin Institute for Discovery\nUniversity of Wisconsin-Madison\nMadison, WI\nShan Shan\nDepartment of Plant Pathology\nWisconsin Institute for Discovery\nUniversity of Wisconsin-Madison\nMadison, WI\nRichard Lankau\nDepartment of Plant Pathology\nWisconsin Institute for Discovery\nUniversity of Wisconsin-Madison\nMadison, WI\nClaudia Solís-Lemus∗\nDepartment of Plant Pathology\nWisconsin Institute for Discovery\nUniversity of Wisconsin-Madison\nMadison, WI\nList of Tables\n1 Taxonomic level (first column), number of OTUs in the original dataset (second column), and number\nof OTUs remaining after filtering out those that appear in fewer than 15 samples (third column). . . . 2\n2 Network metrics for microbiome networks constructed using four different methods based on all\nsamples at the Phylum level. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6\n3 Network metrics for microbiome networks constructed using four different methods based on all\nsamples at the Class level. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6\n4 Network metrics for microbiome networks constructed using four different methods based on all\nsamples at the Order level. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6\n5 Commonly identified important OTUs based on topological features in microbiome networks con-\nstructed from all samples at the Phylum level. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7\n6 Commonly identified important OTUs based on topological features in microbiome networks con-\nstructed from all samples at the Class level. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8\n7 Commonly identified important OTUs based on topological features in microbiome networks con-\nstructed from all samples at the Order level. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8\n8 Phylum-level microbial associations identified in ‘clean tuber’ and ‘scab-infected tubers’ microbiome\nnetworks. Only interactions confirmed by all four inference methods are shown. . . . . . . . . . . . . 9\n9 Class-level microbial associations identified in ‘clean tuber’ and ‘scab-infected tubers’ microbiome\nnetworks. Only interactions confirmed by all four inference methods are shown. . . . . . . . . . . . . 9\n10 Order-level microbial associations identified in ‘clean tuber’ and ‘scab-infected tubers’ microbiome\nnetworks. Only interactions confirmed by all four inference methods are shown. . . . . . . . . . . . . 10\n11 Important OTUs identified using Multi Machine Learning (ML) methods at the Phylum level. . . . . 12\n∗Corresponding author: solislemus@wisc.edu\n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint \n\nSoil BN\n12 Important OTUs identified using Multi Machine Learning (ML) methods at the Class level. . . . . . . 13\n13 Important OTUs identified using Multi Machine Learning (ML) methods at the Order level. . . . . . 14\n14 Important OTUs identified as key features in response to pitted scab at the Phylum level using Strategy\n1: Differential Centrality Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15\n15 Important OTUs identified as key features in response to pitted scab at the Class level using Strategy 1:\nDifferential Centrality Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15\n16 Important OTUs identified as key features in response to pitted scab at theOrder level using Strategy 1:\nDifferential Centrality Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15\n17 Selection of key Operational Taxonomic Units (OTUs) at thePhylum level using network-based feature\nselection (Strategy 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16\n18 Selection of key Operational Taxonomic Units (OTUs) at theClass level using network-based feature\nselection (Strategy 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17\n19 Selection of key Operational Taxonomic Units (OTUs) at the Order level using network-based feature\nselection (Strategy 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19\n20 Selection of Operational Taxonomic Units (OTUs) at the Phylum level in both networks of ‘Clean\nTubers’ and ‘Scab-Infected Tubers’ using Network-Based Method (Strategy 2) and Machine Learning\n(ML) methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24\n21 Selection of Operational Taxonomic Units (OTUs) at the Class level in both networks of ‘Clean Tubers’\nand ‘Scab-Infected Tubers’ using Network-Based Method (Strategy 2) and Machine Learning (ML)\nmethods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25\n22 Selection of Operational Taxonomic Units (OTUs) at theOrder level in both networks of ‘Clean Tubers’\nand ‘Scab-Infected Tubers’ using Network-Based Method (Strategy 2) and Machine Learning (ML)\nmethods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26\n23 Operational Taxonomic Units (OTUs) of significance in the ‘Clean Tubers’ network, selected by all\nfour algorithms: CMIMN, SPARCC, SE_glasso , and SPRING. . . . . . . . . . . . . . . . . . . . . . 27\n24 Operational Taxonomic Units (OTUs) of significance in the ‘scab-infected tubers’ network, selected by\nall four algorithms: CMIMN, SPARCC, SE_glasso , and SPRING. . . . . . . . . . . . . . . . . . . . 27\nList of Figures\n1 Box plots illustrating F-scores obtained from the robustness analysis of different microbiome network\nconstruction methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3\n2 Microbiome network at the Class taxonomic level . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4\n3 microbiome network at the Order taxonomic level. . . . . . . . . . . . . . . . . . . . . . . . . . . . 5\n4\nOverlap Between Machine Learning Methods based on different nomalized data sets and Network-Based\nApproaches for ‘clean tubers’ network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23\n5 Overlap Between Machine Learning Methods based on different nomalized data sets and Network-Based\nApproaches for ‘scab-infected tubers’ network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24\nTable 1: Taxonomic level (first column), number of OTUs in the original dataset (second column), and number of OTUs\nremaining after filtering out those that appear in fewer than 15 samples (third column).\nLevel # OTUs # OTUs after filtering\nPhylum 57 42\nClass 152 108\nOrder 378 224\n2\n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint \n\nSoil BN\n1.0 \n�.,.. I .,. �•-t-• \n• • • • \nI \n• \n0.8 \n• \nLevel \nQ) \nE;3 \n,.._ phylum \n• \n(/) \nclass \n• order\n0.6 \n0.4 \nCMIMN SE_glasso SPRING SPARCC \nAlgorithm \nFigure 1: Box plots illustrating F-scores obtained from the robustness analysis of different microbiome network\nconstruction methods. The evaluation involved constructing networks from 50 distinct datasets, each generated by\nrandomly selecting 70% of the samples from the full dataset. Performance was assessed across different taxonomic\nlevels, including Phylum, Class, and Order. Four network inference algorithms—\nCMIMN, SPRING, SE_glasso, and\nSPARCC—were compared. Our results demonstrate that CMIMN exhibits superior robustness, as indicated by consistently\nhigher and more stable F-scores across all taxonomic levels. Notably,SE_glasso shows the least favorable performance\nat the Class level, with greater variability and lower F-scores.\n3\n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint \n\nSoil BN\n‘clean tubers’ network\n‘scab-infected tubers’ network\ncommon interactions\nFigure 2: This figure illustrates the microbiome network at the Class taxonomic level. Part (a) represents the ‘clean\ntubers’ network, part (b) displays the ‘scab-infected tubers’ network, and part (c) shows the common interactions\nbetween them. Nodes correspond to Operational Taxonomic Units (OTUs) and are color-coded: purple for OTUs shared\nbetween ‘clean tubers’ and ‘scab-infected tubers’ networks, blue for OTUs unique to the ‘clean tubers’ network, and\ngreen for OTUs unique to the ‘scab-infected tubers’ network. Edges are shown by solid lines which confirmed by all\nfour method).\n4\n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint \n\nSoil BN\n‘clean tubers’ network\n‘scab-infected tubers’ network\ncommon interactions\nFigure 3: This figure showcases the microbiome network at the Order taxonomic level. Part (a) represents the ‘clean\ntubers’ network, part (b) displays the ‘scab-infected tubers’ network, and part (c) shows the common interactions\nbetween them. Nodes correspond to Operational Taxonomic Units (OTUs) and are color-coded: purple for OTUs shared\nbetween ‘clean tubers’ and ‘scab-infected tubers’ networks, blue for OTUs unique to the ‘clean tubers’ network, and\ngreen for OTUs unique to the ‘scab-infected tubers’ network. The top 20% of nodes with the highest centrality scores\nresulted by Equation (5) are labeled in dark purple. Node size reflects their degree of connectivity. Edges are shown as\nsolid lines that confirmed by all four methods.\n5\n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint \n\nSoil BN\nTable 2: Network metrics for microbiome networks constructed using four different methods based on all samples at\nthe Phylum level.\nMethod num_edges average_path_length transitivity mean_degree mean_distance modularity\nSPARCC 148 1.917 0.506 3.895 1.917 0.238\nSE_glasso 60 2.322 0.429 1.579 2.322 0.351\nSPRING 172 2.658 0.272 4.526 2.658 0.456\nCMIMN 136 3.06 0.636 3.579 3.06 0.372\nTable 2 summarizes network metrics for different methods applied at the Phylum level of microbiome analysis. Notable\nvariations in network characteristics are observed across these methods. SPRING constructs a network with 172 edges,\nindicating a relatively complex network structure, while SE_glasso results in a sparser network with only 60 edges.\nSPARCC falls in between with 148 edges, and CMIMN generates a network with 136 edges. In terms of average path\nlength, SPARCC exhibits shorter paths on average (1.917), suggesting high connectivity, whereas CMIMN has longer\naverage path lengths indicating less direct connections. Transitivity, a measure of clustering, is highest for CMIMN\n(0.636), implying a higher level of clustering in its network. The mean_degree and mean_distance offer insights into\nnetwork density and structure; SPARCC and SPRING have higher mean degrees (3.895 and 4.526, respectively) and\nshorter mean distances (1.917 and 2.322, respectively), suggesting denser and more compact networks. Finally, SPRING\nexhibits the highest modularity value (0.456), indicating a potential modular structure within its network. These metrics\ncollectively illustrate that different methods yield networks with distinct structural characteristics, highlighting the\nimportance of selecting the method that aligns with specific research objectives and hypotheses.\nTable 3: Network metrics for microbiome networks constructed using four different methods based on all samples at\nthe Class level.\ntopology_name num_edges average_path_length transitivity mean_degree mean_distance modularity\nSPARCC 880 2.249 0.597 8.889 2.249 0.195\nSE_glasso 228 2.388 0.562 2.303 2.388 0.333\nSPRING 732 2.697 0.208 7.394 2.697 0.409\nCMIMN 856 2.402 0.603 8.646 2.402 0.467\nTable 3 provides network metrics for various methods applied at the Class level of microbiome analysis. Notable\ndifferences in network characteristics are evident across these methods. SPARCC yields a network with 880 edges,\nindicating a relatively complex and highly connected structure. SE_glasso, in contrast, results in a sparser network with\n228 edges, suggesting fewer connections between Class-level components. SPRING’s network falls in between with 732\nedges, while CMIMN constructs a network with 856 edges. When examining the average path length, SPARCC exhibits\nshorter paths on average (2.249), implying higher overall connectivity. SE_glasso and SPRING have longer average\npath lengths (2.388 and 2.697, respectively), indicating relatively less direct connections. Transitivity, a measure of\nnetwork clustering, is highest for CMIMN (0.603), signifying a substantial degree of clustering within its network. Mean\ndegree and mean distance provide insights into the density and overall structure of the networks; SPARCC and CMIMN\nhave higher mean degrees (8.889 and 8.646, respectively) and shorter mean distances (2.249 and 2.402, respectively),\nsuggesting denser and more compact networks. Finally, in terms of modularity,CMIMN exhibits the highest value (0.467),\nindicating a potential modular structure within its network.\nTable 4: Network metrics for microbiome networks constructed using four different methods based on all samples at\nthe Order level.\ntopology_name num_edges average_path_length transitivity mean_degree mean_distance modularity\nSPARCC 3400 2.091 0.647 17.989 2.091 0.212\nSE_glasso 2170 2.414 0.503 11.481 2.414 0.324\nSPRING 2060 2.595 0.175 10.899 2.595 0.399\nCMIMN 3574 2.391 0.626 18.91 2.391 0.426\nTable 4 presents network metrics for different methods applied at the Order level of microbiome analysis. Notable\nvariations in network characteristics are observed across these methods. SPARCC produces a network with 3,400 edges,\n6\n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint \n\nSoil BN\nindicating a complex and highly connected structure. SE_glasso results in a network with 2,170 edges, suggesting a\nsomewhat sparser network compared to SPARCC. SPRING falls in between with 2,060 edges, while CMIMN constructs a\nnetwork with 3,574 edges, representing a highly connected network. Examining average path length, SPARCC exhibits\nshorter paths on average (2.091), indicating a high degree of connectivity.SE_glasso and SPRING have longer average\npath lengths (2.414 and 2.595, respectively), implying less direct connections. Transitivity, a measure of network\nclustering, is highest for SPARCC (0.647), signifying a substantial level of clustering within its network. Mean degree\nand mean distance provide insights into network density and structure; SPARCC and CMIMN have higher mean degrees\n(17.989 and 18.91, respectively) and shorter mean distances (2.091 and 2.391, respectively), suggesting dense and\ncompact networks. Finally, in terms of modularity, CMIMN exhibits the highest value (0.426), indicating a potential\nmodular structure within its network. These metrics collectively demonstrate that different methods yield networks with\ndistinct structural characteristics at the Order level, emphasizing the importance of method selection based on specific\nresearch objectives and hypotheses.\nIn Tables S2 to S4, which present network metrics for various taxonomic levels, we observed notable trends. Specifically,\nthe methods yielded distinct results:\n• For average_path_length, mean_distance, and modularity, the SPARCC method consistently produced the\nlowest values, indicating its efficiency in minimizing these metrics across all taxonomic levels.\n• Conversely, in the case of transitivity, the SPRING algorithm consistently yielded the lowest values. This\nsuggests that the SPRING algorithm is particularly effective in minimizing transitivity, irrespective of the\ntaxonomic level under consideration.\nTable 5: Commonly identified important OTUs based on topological features in microbiome networks constructed\nfrom all samples at the Phylum level. Microbiome networks were constructed using four different inference methods:\nSE_glasso, SPRING, SPARCC, and CMIMN. For each network, key topological features—Degree, Betweenness, Close-\nness, Eigenvector Centrality, and PageRank—were computed to assess the importance of individual OTUs. The top\n20% of OTUs based on each centrality measure were selected as important. An OTU was included in this table if it\nconsistently ranked in the top 20% across all four network inference methods based on at least one topological feature.\nFor example, Proteobacteria was ranked in the top 20% of networks generated by all four methods based on Degree,\nCloseness, and PageRank.\nImportant OTUs Features\nProteobacteria Degree, Closeness, Page Rank\nAcidobacteriota Degree, Closeness, Page Rank\nWPS.2 Betweenness, Closeness\nNB1.j Page Rank\n7\n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint \n\nSoil BN\nTable 6: Commonly identified important OTUs based on topological features in microbiome networks constructed\nfrom all samples at the Class level. Microbiome networks were constructed using four different inference methods:\nSE_glasso, SPRING, SPARCC, and CMIMN. For each network, key topological features—Degree, Betweenness, Close-\nness, Eigenvector Centrality, and PageRank—were computed to assess the importance of individual OTUs. The top\n20% of OTUs based on each centrality measure were selected as important. An OTU was included in this table if it\nconsistently ranked in the top 20% across all four network inference methods based on at least one topological feature.\nFor example, Acidobacteriae was consistently ranked in the top 20% of networks generated by all four methods based\non all topological features.\nOTUs Topology Measures\nAcidobacteriae Degree, Betweenness, Closeness, Eigenvector Centrality, Page Rank\nAlphaproteobacteria Degree, Betweenness, Closeness, Page Rank\nAnaerolineae Degree, Betweenness, Closeness\nIgnavibacteria Degree, Eigenvector Centrality, Page Rank\nKtedonobacteria Betweenness\nGammaproteobacteria Betweenness, Page Rank\nMB.A2.108 Closeness\nOM190 Closeness\nSyntrophobacteria Eigenvector Centrality\nKryptonia Eigenvector Centrality\nDesulfobulbia Eigenvector Centrality\nTable 7: Commonly identified important OTUs based on topological features in microbiome networks constructed\nfrom all samples at the Order level. Microbiome networks were constructed using four different inference methods:\nSE_glasso, SPRING, SPARCC, and CMIMN. For each network, key topological features—Degree, Betweenness, Close-\nness, Eigenvector Centrality, and PageRank—were computed to assess the importance of individual OTUs. The top\n20% of OTUs based on each centrality measure were selected as important. An OTU was included in this table if it\nconsistently ranked in the top 20% across all four network inference methods based on at least one topological feature.\nFor example, C0119 was consistently ranked in the top 20% of networks generated by all four methods based on all\ntopological features.\nOTUs Topology Measures\nC0119 Degree, Betweenness, Closeness, Eigenvector Centrality, Page Rank\nSphingomonadales Degree, Closeness, Page Rank\nGammaproteobacteria.Incertae Degree, Betweenness, Closeness, Page Rank\nDefluviicoccales Degree, Page Rank\nMicrotrichales Degree, Page Rank\nGemmatimonadales Betweenness\nPropionibacteriales Betweenness\nB10.SB3A Betweenness\nErysipelotrichales Closeness\nReyranellales Closeness\n8\n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint \n\nSoil BN\nTable 8: Phylum-level microbial associations identified in ‘clean tuber’ and ‘scab-infected tubers’ microbiome networks.\nOnly interactions confirmed by all four inference methods are shown.\n‘clean tubers’ Network Links ‘scab-infected tubers’ Network Links Common Links\nChloroflexi–Acidobacteriota Actinobacteriota–Gemmatimonadota Actinobacteriota–Proteobacteria\nChloroflexi–Planctomycetota Actinobacteriota–Acidobacteriota Proteobacteria–Acidobacteriota\nGemmatimonadota–Acidobacteriota Gemmatimonadota–Acidobacteriota\nGemmatimonadota–Proteobacteria Proteobacteria–Firmicutes\nPlanctomycetota–Patescibacteria Proteobacteria–Myxococcota\nMethylomirabilota–WPS.2 Bacteroidota–Patescibacteria\nArmatimonadota–WPS.2 NB1.j–MBNT15\nProteobacteria–Bacteroidota Spirochaetota–MBNT15\nDesulfobacterota–MBNT15\nVerrucomicrobiota–Bacteroidota\nMyxococcota–Bacteroidota\nTable 9: Class-level microbial associations identified in ‘clean tuber’ and ‘scab-infected tubers’ microbiome networks.\nOnly interactions confirmed by all four inference methods are shown.\n‘clean tuber’ Network Links ‘scab-infected tubers’ Network Links Common Links\nAD3–Kapabacteria AD3–Chthonomonadetes AD3–Acidobacteriae\nAD3–Verrucomicrobiae AD3–Ktedonobacteria Acidobacteriae–Chthonomonadetes\nAcidobacteriae–Armatimonadia AD3–Syntrophobacteria Acidobacteriae–Gemmatimonadetes\nAcidobacteriae–Lineage.IIa Actinobacteria–Gemmatimonadetes Acidobacteriae–Ktedonobacteria\nAcidobacteriae–OLB14 Alphaproteobacteria–Blastocatellia Alphaproteobacteria–Gammaproteobacteria\nActinobacteria–Alphaproteobacteria Aminicenantia–Desulfobulbia Alphaproteobacteria–Thermoleophilia\nActinobacteria–Bacilli Aminicenantia–Thermodesulfovibrionia Bacteroidia–Gammaproteobacteria\nActinobacteria–Bacteroidia Anaerolineae–Dehalococcoidia Chthonomonadetes–Ktedonobacteria\nAlphaproteobacteria–Bacteroidia Anaerolineae–MB.A2.108 Gemmatimonadetes–TK10\nAlphaproteobacteria–Gemmatimonadetes Bacilli–Blastocatellia\nAlphaproteobacteria–Saccharimonadia Bacteroidia–Verrucomicrobiae\nAlphaproteobacteria–TK10 Chloroflexia–TK10\nAlphaproteobacteria–Vampirivibrionia Coriobacteriia–Desulfobulbia\nAnaerolineae–Phycisphaerae Coriobacteriia–Syntrophia\nAnaerolineae–Thermoanaerobaculia Coriobacteriia–Syntrophobacteria\nAnaerolineae–Vicinamibacteria Desulfitobacteriia–Desulfobaccia\nBabeliae–Chlamydiae Desulfitobacteriia–Desulfobulbia\nBacilli–Gemmatimonadetes Desulfitobacteriia–Syntrophobacteria\nBlastocatellia–SHA.26 Desulfobaccia–Desulfobulbia\nBlastocatellia–Thermoanaerobaculia Desulfobaccia–Kryptonia\nBlastocatellia–Vicinamibacteria Desulfobaccia–Syntrophobacteria\nCyanobacteriia–Planctomycetes Desulfobaccia–Thermodesulfovibrionia\nGammaproteobacteria–JG30.KF.CM66 Desulfobacteria–Kryptonia\nGammaproteobacteria–Saccharimonadia Desulfobacteria–Syntrophobacteria\nGammaproteobacteria–Vampirivibrionia Desulfobacteria–Thermodesulfovibrionia\nGitt.GS.136–MB.A2.108 Desulfobulbia–Syntrophia\nGitt.GS.136–Methylomirabilia Desulfobulbia–Syntrophobacteria\nGitt.GS.136–SHA.26 Desulfobulbia–Thermodesulfovibrionia\nGitt.GS.136–Vicinamibacteria Gammaproteobacteria–Polyangia\nHolophagae–Thermoanaerobaculia Gammaproteobacteria–Thermoleophilia\nJG30.KF.CM66–Ktedonobacteria Kryptonia–Syntrophobacteria\nKapabacteria–Lineage.IIa Spirochaetia–Syntrophia\nKtedonobacteria–Planctomycetes Spirochaetia–Thermodesulfovibrionia\nKtedonobacteria–Saccharimonadia Syntrophia–Thermodesulfovibrionia\nLineage.IIa–Vampirivibrionia\nMB.A2.108–Methylomirabilia\nMB.A2.108–P2.11E\nMB.A2.108–Subgroup.25\nMB.A2.108–Vicinamibacteria\nMethylomirabilia–SHA.26\nMethylomirabilia–Subgroup.25\nContinued on next page\n9\n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint \n\nSoil BN\nTable 9 Continued from previous page\nClean Network Links Disease Network Links Common Links\nPlanctomycetes–Verrucomicrobiae\nSubgroup.22–Subgroup.25\nSubgroup.22–Subgroup.5\nSubgroup.22–Vicinamibacteria\nSubgroup.25–bacteriap25\nSubgroup.5–Vicinamibacteria\nTable 10: Order-level microbial associations identified in ‘clean tuber’ and ‘scab-infected tubers’ microbiome networks.\nOnly interactions confirmed by all four inference methods are shown.\nClean Network Links Disease Network Links Common Links\nAcetobacterales–Armatimonadales AKIW659–Aminicenantales Acetobacterales–C0119\nAcetobacterales–B12.WMSP1 AKIW659–Bacteroidales Acidobacteriales–Bryobacterales\nAcetobacterales–Bryobacterales AKIW659–Defluviicoccales Acidobacteriales–Elsterales\nAcetobacterales–Elsterales AKIW659–Syntrophales Acidobacteriales–Ktedonobacterales\nAcetobacterales–Frankiales AKIW659–Syntrophobacterales Actinomarinales–PAUC26f\nAcetobacterales–Isosphaerales Acetobacterales–Acidobacteriales Anaerolineales–SBR1031\nAcetobacterales–Saccharimonadales Acetobacterales–Ktedonobacterales B12.WMSP1–Elev.1554\nAcidobacteriales–Chthonomonadales Acetobacterales–Reyranellales Bacillales–Paenibacillales\nAcidobacteriales–Gaiellales Acidiferrobacterales–Desulfobaccales Blastocatellales–Chloroflexales\nAcidobacteriales–Micropepsales Acidiferrobacterales–Kryptoniales Bryobacterales–Micropepsales\nAcidobacteriales–Subgroup.2 Acidiferrobacterales–Methylomirabilales Bryobacterales–Solibacterales\nAcidobacteriales–Xanthomonadales Acidobacteriales–Ardenticatenales Burkholderiales–Pedosphaerales\nActinomarinales–PLTA13 Acidobacteriales–C0119 Burkholderiales–Rhizobiales\nAzospirillales–Chloroflexales Acidobacteriales–Solibacterales Catenulisporales–Elev.1554\nAzospirillales–Propionibacteriales Actinomarinales–Ardenticatenales Chitinophagales–Sphingomonadales\nAzospirillales–Pyrinomonadales Actinomarinales–Candidatus.Adlerbacteria Elev.1554–Ktedonobacterales\nAzospirillales–Rhodobacterales Actinomarinales–Ignavibacteriales Elsterales–Frankiales\nAzospirillales–Subgroup.2 Actinomarinales–Rhodothermales Erysipelotrichales–Peptostreptococcales.Tissierellales\nB10.SB3A–B12.WMSP1 Aminicenantales–Desulfobaccales Gaiellales–IMCC26256\nB10.SB3A–Candidatus.Liptonbacteria Aminicenantales–Desulfobulbales Ignavibacteriales–SJA.15\nB10.SB3A–Catenulisporales Aminicenantales–SJA.15 Micrococcales–Sphingomonadales\nB10.SB3A–Diplorickettsiales Aminicenantales–Syntrophales Microtrichales–Subgroup.17\nB10.SB3A–Elev.1554 Aminicenantales–Syntrophobacterales PLTA13–Subgroup.17\nB10.SB3A–Gemmatales Anaerolineales–Kryptoniales Rhizobiales–Solirubrobacterales\nB10.SB3A–Isosphaerales Anaerolineales–S085\nB10.SB3A–Ktedonobacterales Anaerolineales–Subgroup.17\nB10.SB3A–Micropepsales B12.WMSP1–Subgroup.13\nB10.SB3A–Nannocystales Bacillales–Chitinophagales\nB10.SB3A–Subgroup.2 Bacillales–Sphingomonadales\nB10.SB3A–X24.Nov Bacteroidales–Candidatus.Moranbacteria\nB12.WMSP1–CCD24 Bacteroidales–Defluviicoccales\nB12.WMSP1–Catenulisporales Bacteroidales–Desulfobaccales\nB12.WMSP1–Ktedonobacterales Bacteroidales–Desulfobulbales\nB12.WMSP1–Microtrichales Bacteroidales–OPB41\nB12.WMSP1–Steroidobacterales Bacteroidales–SJA.15\nB12.WMSP1–Subgroup.17 Bacteroidales–Spirochaetales\nB12.WMSP1–WD260 Bacteroidales–Syntrophobacterales\nBacillales–Gemmatimonadales Bryobacterales–C0119\nBlastocatellales–Subgroup.7 Bryobacterales–Chthonomonadales\nBlastocatellales–Thermoanaerobaculales C0119–Candidatus.Adlerbacteria\nBryobacterales–Candidatus.Liptonbacteria C0119–Chthonomonadales\nBryobacterales–Elsterales C0119–Elsterales\nBryobacterales–Gammaproteobacteria.Incertae.Sedis C0119–Gammaproteobacteria.Incertae.Sedis\nBryobacterales–Gemmatimonadales C0119–Ktedonobacterales\nBurkholderiales–Caulobacterales C0119–Micropepsales\nBurkholderiales–Haliangiales C0119–PAUC26f\nBurkholderiales–Sphingobacteriales C0119–Sphingomonadales\nC0119–Gemmatimonadales Candidatus.Adlerbacteria–Desulfitobacteriales\nC0119–PLTA13 Candidatus.Adlerbacteria–Kryptoniales\nC0119–Solibacterales Candidatus.Adlerbacteria–PAUC26f\nCCD24–Caldilineales Candidatus.Moranbacteria–Desulfobaccales\nCCD24–Catenulisporales Candidatus.Moranbacteria–Desulfobulbales\nContinued on next page\n10\n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint \n\nSoil BN\nTable 10 Continued from previous page\nClean Network Links Disease Network Links Common Links\nCCD24–RBG.13.54.9 Candidatus.Moranbacteria–Ignavibacteriales\nCCD24–Rokubacteriales Candidatus.Moranbacteria–Ktedonobacterales\nCCD24–Steroidobacterales Candidatus.Moranbacteria–Syntrophobacterales\nCCD24–Subgroup.17 Candidatus.Yanofskybacteria–Kryptoniales\nCCD24–Vicinamibacterales Candidatus.Yanofskybacteria–PLTA13\nCaldilineales–Catenulisporales Candidatus.Yanofskybacteria–Subgroup.17\nCaldilineales–Microtrichales Chitinophagales–Chthoniobacterales\nCaldilineales–Propionibacteriales Chitinophagales–Sphingobacteriales\nCandidatus.Jorgensenbacteria–Candidatus.Kaiserbacteria Cytophagales–Polyangiales\nCandidatus.Jorgensenbacteria–Candidatus.Liptonbacteria Defluviicoccales–Desulfitobacteriales\nCandidatus.Jorgensenbacteria–KF.JG30.C25 Defluviicoccales–Kryptoniales\nCatenulisporales–Elsterales Defluviicoccales–PAUC26f\nCatenulisporales–Kapabacteriales Defluviicoccales–PLTA13\nCatenulisporales–Ktedonobacterales Defluviicoccales–S085\nCatenulisporales–WD260 Defluviicoccales–SJA.15\nCaulobacterales–Rhizobiales Desulfitobacteriales–Desulfobaccales\nCaulobacterales–Sphingomonadales Desulfitobacteriales–Kryptoniales\nCaulobacterales–Xanthomonadales Desulfobaccales–Desulfobulbales\nChitinophagales–Micrococcales Desulfobaccales–Ignavibacteriales\nChitinophagales–Rhizobiales Desulfobaccales–Kryptoniales\nChloroflexales–Kallotenuales Desulfobaccales–SJA.15\nChloroflexales–Propionibacteriales Desulfobaccales–Syntrophales\nChloroflexales–Pyrinomonadales Desulfobaccales–Syntrophobacterales\nChloroflexales–RBG.13.54.9 Desulfobulbales–Kryptoniales\nChthoniobacterales–Solibacterales Desulfobulbales–SJA.15\nChthonomonadales–Elsterales Desulfobulbales–Syntrophales\nChthonomonadales–PLTA13 Desulfobulbales–Syntrophobacterales\nCyanobacteriales–Leptolyngbyales Elsterales–Gemmatimonadales\nDS.100–X24.Nov Elsterales–Ktedonobacterales\nDiplorickettsiales–Xanthomonadales Elsterales–Micropepsales\nElev.1554–Gammaproteobacteria.Incertae.Sedis Elsterales–Reyranellales\nElev.1554–Gemmatales Frankiales–Gemmatimonadales\nElev.1554–Subgroup.13 Gammaproteobacteria.Incertae.Sedis–Paenibacillales\nElev.1554–WD260 Gemmatimonadales–Kallotenuales\nElsterales–Gammaproteobacteria.Incertae.Sedis Ignavibacteriales–JG36.TzT.191\nElsterales–Subgroup.13 Ignavibacteriales–Kryptoniales\nFFCH16263–Nitrospirales Ignavibacteriales–Ktedonobacterales\nFFCH16263–Subgroup.2 Ignavibacteriales–Syntrophobacterales\nFFCH16263–Vicinamibacterales Kryptoniales–Syntrophobacterales\nFrankiales–Gaiellales Ktedonobacterales–Rickettsiales\nGammaproteobacteria.Incertae.Sedis–Xanthomonadales Ktedonobacterales–Syntrophobacterales\nGemmatales–Isosphaerales Methylomirabilales–OPB41\nGemmatimonadales–Paenibacillales Methylomirabilales–SJA.15\nHaliangiales–Pedosphaerales Methylomirabilales–Syntrophales\nIMCC26256–Subgroup.2 OPB41–Syntrophales\nIsosphaerales–Ktedonobacterales OPB41–Syntrophobacterales\nKF.JG30.C25–Subgroup.13 PAUC26f–PLTA13\nKtedonobacterales–Micropepsales PAUC26f–SJA.15\nKtedonobacterales–PAUC26f PLTA13–Syntrophobacterales\nKtedonobacterales–Subgroup.2 Paenibacillales–Rickettsiales\nLeptolyngbyales–Oxyphotobacteria.Incertae.Sedis Paenibacillales–Sphingomonadales\nMicromonosporales–Propionibacteriales Paenibacillales–Xanthomonadales\nMicromonosporales–Rhizobiales Propionibacteriales–Thermomicrobiales\nMicropepsales–Subgroup.2 Rhodothermales–SAR202.clade\nMicropepsales–Xanthomonadales SJA.15–Syntrophobacterales\nMicrotrichales–Phycisphaerales Sphingomonadales–Xanthomonadales\nMicrotrichales–Rhodobacterales Spirochaetales–Syntrophales\nMicrotrichales–SBR1031\nMicrotrichales–Thermoanaerobaculales\nMicrotrichales–Vicinamibacterales\nNannocystales–PLTA13\nNannocystales–Steroidobacterales\nNannocystales–WD260\nNitrospirales–Rokubacteriales\nNitrospirales–Vicinamibacterales\nContinued on next page\n11\n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint \n\nSoil BN\nTable 10 Continued from previous page\nClean Network Links Disease Network Links Common Links\nPedosphaerales–Thermomicrobiales\nPhycisphaerales–SBR1031\nPropionibacteriales–Rhodobacterales\nPyrinomonadales–RBG.13.54.9\nPyrinomonadales–Vicinamibacterales\nPyrinomonadales–X24.Nov\nRBG.13.54.9–Rokubacteriales\nRBG.13.54.9–SBR1031\nRBG.13.54.9–Subgroup.7\nReyranellales–Solibacterales\nRhodobacterales–Verrucomicrobiales\nRhodobacterales–WD260\nRokubacteriales–Subgroup.17\nRokubacteriales–X24.Nov\nSBR1031–Steroidobacterales\nSBR1031–Subgroup.17\nSBR1031–Thermoanaerobaculales\nSaccharimonadales–Xanthomonadales\nSphingobacteriales–Xanthomonadales\nSteroidobacterales–Vicinamibacterales\nSubgroup.17–Vicinamibacterales\nSubgroup.17–X24.Nov\nSubgroup.2–WD260\nSubgroup.7–Thermoanaerobaculales\nVicinamibacterales–X24.Nov\nTable 11: Important OTUs identified using Multi Machine Learning (ML) methods at the Phylum level. Microbiome\ndata were normalized using four different techniques: (1) Centered Log-Ratio (CLR), (2) the original dataset (no\ntransformation), (3) Log transformation, and (4) Total Sum Scaling (TSS). Seven ML-based feature selection methods\nwere applied to each normalized dataset to identify important OTUs. An OTU was selected if it was identified as\nimportant by at least five out of the seven ML methods. The table presents the OTUs identified as important for each\nnormalization method, with separate columns for ML_Clr (CLR-normalized data), ML_Original (original dataset),\nML_Log (log-transformed data), and ML_TSS (TSS-normalized data). The final ML_Intersection column lists the\nOTUs that were consistently selected as important across all normalization methods, highlighting robust microbial taxa\nthat are independent of the normalization approach.\nML_Clr ML_Original ML_Log ML_TSS ML_Intersection\nFirmicutes Armatimonadota Firmicutes Firmicutes Firmicutes\nCyanobacteria Firmicutes Armatimonadota Cyanobacteria Cyanobacteria\nMethylomirabilota Deinococcota Cyanobacteria Patescibacteria Armatimonadota\nArmatimonadota Cyanobacteria Deinococcota Armatimonadota NB1.j\nDeinococcota NB1.j Verrucomicrobiota Bacteroidota\nWPS.2 Verrucomicrobiota Spirochaetota Spirochaetota\nAcidobacteriota Patescibacteria NB1.j Methylomirabilota\nMBNT15 Methylomirabilota Bacteroidota Acidobacteriota\nNB1.j WPS.2 Nitrospirota\nBacteroidota Desulfobacterota\nVerrucomicrobiota NB1.j\n12\n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint \n\nSoil BN\nTable 12: Important OTUs identified using Multi Machine Learning (ML) methods at the Class level. Microbiome\ndata were normalized using four different techniques: (1) Centered Log-Ratio (CLR), (2) the original dataset (no\ntransformation), (3) Log transformation, and (4) Total Sum Scaling (TSS). Seven ML-based feature selection methods\nwere applied to each normalized dataset to identify important OTUs. An OTU was selected if it was identified as\nimportant by at least five out of the seven ML methods. The table presents the OTUs identified as important for each\nnormalization method, with separate columns for ML_Clr (CLR-normalized data), ML_Original (original dataset),\nML_Log (log-transformed data), and ML_TSS (TSS-normalized data). The final ML_Intersection column lists the\nOTUs that were consistently selected as important across all normalization methods, highlighting robust microbial taxa\nthat are independent of the normalization approach.\nML_Clr ML_Original ML_Log ML_TSS ML_Intersection\nBacilli Bacilli Bacilli Vicinamibacteria Bacilli\nKtedonobacteria Latescibacteria Ignavibacteria Saccharimonadia Ktedonobacteria\nCyanobacteriia Saccharimonadia Verrucomicrobiae Gemmatimonadetes Cyanobacteriia\nSaccharimonadia Ktedonobacteria Deinococci Gitt.GS.136 Saccharimonadia\nPlanctomycetes Spirochaetia Latescibacteria Oligoflexia Planctomycetes\nLineage.IIa Planctomycetes Saccharimonadia Ktedonobacteria Ignavibacteria\nVampirivibrionia Kryptonia Ktedonobacteria Anaerolineae Dehalococcoidia\nVerrucomicrobiae Verrucomicrobiae OLB14 Bacilli Anaerolineae\nIgnavibacteria Deinococci Kryptonia Acidimicrobiia MB.A2.108\nDehalococcoidia Ignavibacteria MB.A2.108 Planctomycetes Chthonomonadetes\nThermoplasmata OLB14 Cyanobacteriia Nitrospiria Kryptonia\nDeinococci Cyanobacteriia Planctomycetes Methylomirabilia\nAnaerolineae Dehalococcoidia AD3 Ignavibacteria\nPla4.lineage Methylomirabilia Gitt.GS.136 MB.A2.108\nMB.A2.108 Chthonomonadetes Dehalococcoidia Acidobacteriae\nMethylomirabilia MB.A2.108 Anaerolineae TK10\nChthonomonadetes Gitt.GS.136 Armatimonadia Bacteroidia\nOLB14 Vicinamibacteria Chthonomonadetes Polyangia\nBacteroidia Anaerolineae Oligoflexia Thermoleophilia\nOligoflexia Vampirivibrionia Clostridia KD4.96\nKryptonia Holophagae Spirochaetia Cyanobacteriia\nTK10 Chthonomonadetes\nDehalococcoidia\nKryptonia\n13\n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint \n\nSoil BN\nTable 13: Important OTUs identified using Multi Machine Learning (ML) methods at theOrder level. Microbiome\ndata were normalized using four different techniques: (1) Centered Log-Ratio (CLR), (2) the original dataset (no\ntransformation), (3) Log transformation, and (4) Total Sum Scaling (TSS). Seven ML-based feature selection methods\nwere applied to each normalized dataset to identify important OTUs. An OTU was selected if it was identified as\nimportant by at least five out of the seven ML methods. The table presents the OTUs identified as important for each\nnormalization method, with separate columns for ML_Clr (CLR-normalized data), ML_Original (original dataset),\nML_Log (log-transformed data), and ML_TSS (TSS-normalized data). The final ML_Intersection column lists the\nOTUs that were consistently selected as important across all normalization methods, highlighting robust microbial taxa\nthat are independent of the normalization approach.\nML_Clr ML_Original ML_Log ML_TSS ML_Intersection\nSaccharimonadales Anaerolineales Saccharimonadales Sphingomonadales Saccharimonadales\nBacillales Acetobacterales Bacillales Saccharimonadales Bacillales\nC0119 Chloroplast Subgroup.2 SBR1031 C0119\nIsosphaerales P\nAUC26f Solibacterales Kryptoniales Subgroup.2\nSubgroup.2 Subgroup.17 Micropepsales V\nicinamibacterales Xanthomonadales\nBlastocatellales Bacteroidales Anaerolineales Anaerolineales Acidobacteriales\nXanthomonadales Alic\nyclobacillales Latescibacterales C0119 Chloroplast\nAcidobacteriales Micropepsales Frankiales Bryobacterales Alic\nyclobacillales\nChloroplast Bacillales Gaiellales Micropepsales P\naenibacillales\nAlicyclobacillales Sphingomonadales Alic\nyclobacillales Chloroplast Acetobacterales\nP\naenibacillales Defluviicoccales Bacteroidales Subgroup.2 Pseudomonadales\nAcetobacterales Frankiales B10.SB3A Gaiellales Anaerolineales\nPseudomonadales Saccharimonadales Acetobacterales PL\nTA13 Elsterales\nAnaerolineales Caulobacterales Chloroplast Bacillales Bacteroidales\nElsterales Acidobacteriales Defluviicoccales Microtrichales Ktedonobacterales\nOligofle\nxales Pseudomonadales Propionibacteriales Candidatus.Le\nvybacteria Sphingomonadales\nBacteroidales Bryobacterales Acidobacteriales Pseudomonadales Kineosporiales\nDeinococcales C0119 Sphingomonadales Rokubacteriales SBR1031\nKtedonobacterales Gaiellales Caulobacterales Acetobacterales Rokubacteriales\nSphingomonadales B10.SB3A B12.WMSP1 Clostridiales Frankiales\nChlorofle\nxales Microtrichales SBR1031 P\nAUC26f Micropepsales\nRhodobacterales Chthoniobacterales Chthoniobacterales Elsterales Gaiellales\nCandidatus.Y\nanofskybacteria SBR1031 Deinococcales Frankiales PL\nTA13\nKF.JG30.C25 Kineosporiales Pseudomonadales B10.SB3A Defluviicoccales\nKineosporiales Ktedonobacterales Obscuribacterales Kineosporiales Obscuribacterales\nSBR1031 Subgroup.2 Rokubacteriales Ktedonobacterales\nRokubacteriales Latescibacterales Subgroup.17 Subgroup.17\nFrankiales Chitinophag\nales Lineage.IV Lineage.IV\nRick\nettsiales B12.WMSP1 P\nAUC26f Micromonosporales\nMicropepsales Blastocatellales Spirochaetales Gammaproteobacteria.Incertae\nEnterobacterales Xanthomonadales Chitinophag\nales Blastocatellales\nMicrococcales Syntrophobacterales Kineosporiales Cytophag\nales\nGaiellales Sphingobacteriales Isosphaerales Obscuribacterales\nPL\nTA13 SJ\nA.15 P\naenibacillales Deinococcales\nSolibacterales Elsterales Xanthomonadales Chthoniobacterales\nCCD24 Pseudonocardiales C0119 Chthonomonadales\nDefluviicoccales Clostridiales Ktedonobacterales Acidobacteriales\nDesulfitobacteriales Candidatus.Y\nanofskybacteria Micromonosporales Pseudonocardiales\nS085 Kryptoniales Kryptoniales Actinomarinales\nObscuribacterales Chthonomonadales Chthonomonadales Rick\nettsiales\nSJ\nA.28 Thermoanaerobaculales P\naenibacillales\nP\naenibacillales PL\nTA13 Alic\nyclobacillales\nPropionibacteriales Bryobacterales Xanthomonadales\nS085 Rick\nettsiales SJ\nA.28\nPL\nTA13 Elsterales Chitinophag\nales\nIsosphaerales Ardenticatenales\nObscuribacterales Bacteroidales\nRokubacteriales Gemmatimonadales\nSolibacterales Nitrospirales\nRhizobiales\nDefluviicoccales\nCCD24\nSolirubrobacterales\nS08514\n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint \n\nSoil BN\nTable 14: Important OTUs identified as key features in response to pitted scab at thePhylum level using Strategy 1:\nDifferential Centrality Analysis. First, microbiome networks were constructed separately for ‘scab-infected tubers’\nand ‘clean tubers’ using four inference methods:SE_glasso, SPRING, SPARCC, and CMIMN. Then, centrality metrics\n(Degree, Betweenness, Closeness, Eigenvector, and PageRank) were calculated for both networks, and the differences\nin centrality values between diseased and healthy conditions were computed for each method. OTUs ranked in the\ntop 20% based on these centrality differences across all four methods were selected as important. The table presents\nOTUs that were consistently identified as important across all four inference methods. The Features column indicates\nthe centrality measure that determined the significance of each OTU. These OTUs represent microbial taxa whose\nnetwork connectivity consistently exhibited significant differences between healthy and diseased conditions across all\nfour methods.\nImportant OTUs Topological Features\nBacteroidota Betweenness\nWPS.2 Betweenness\nProteobacteria Closeness\nTable 15: Important OTUs identified as key features in response to pitted scab at the Class level using Strategy 1:\nDifferential Centrality Analysis. First, microbiome networks were constructed separately for ‘scab-infected tubers’\nand ‘clean tubers’ using four inference methods:SE_glasso, SPRING, SPARCC, and CMIMN. Then, centrality metrics\n(Degree, Betweenness, Closeness, Eigenvector, and PageRank) were calculated for both networks, and the differences\nin centrality values between diseased and healthy conditions were computed for each method. OTUs ranked in the\ntop 20% based on these centrality differences across all four methods were selected as important. The table presents\nOTUs that were consistently identified as important across all four inference methods. The Features column indicates\nthe centrality measure that determined the significance of each OTU. These OTUs represent microbial taxa whose\nnetwork connectivity consistently exhibited significant differences between healthy and diseased conditions across all\nfour methods.\nImportant OTUs Topological Features\nDesulfitobacteriia degree, Eigenvector Centrality, page_rank\nActinobacteria, AD3 betweenness, closeness\nSyntrophobacteria Eigenvector Centrality\nTable 16: Important OTUs identified as key features in response to pitted scab at the Order level using Strategy 1:\nDifferential Centrality Analysis. First, microbiome networks were constructed separately for ‘scab-infected tubers’\nand ‘clean tubers’ using four inference methods:SE_glasso, SPRING, SPARCC, and CMIMN. Then, centrality metrics\n(Degree, Betweenness, Closeness, Eigenvector, and PageRank) were calculated for both networks, and the differences\nin centrality values between diseased and healthy conditions were computed for each method. OTUs ranked in the\ntop 20% based on these centrality differences across all four methods were selected as important. The table presents\nOTUs that were consistently identified as important across all four inference methods. The Features column indicates\nthe centrality measure that determined the significance of each OTU. These OTUs represent microbial taxa whose\nnetwork connectivity consistently exhibited significant differences between healthy and diseased conditions across all\nfour methods.\nImportant OTUs Topological Features\nC0119 Degree, Betweenness, Closeness, Eigenvector Centrality, Page Rank\nDefluviicoccales Closeness, Eigenvector Centrality\nBacteroidales Closeness, Eigenvector Centrality\nKryptoniales Eigenvector Centrality\nB12.WMSP1 Eigenvector Centrality\nDesulfitobacteriales Page Rank\n15\n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint \n\nSoil BN\nTable 17: Selection of key Operational Taxonomic Units (OTUs) at the Phylum level in microbiome networks\nconstructed separately for ‘clean tubers’ (left panel) and ‘scab-infected tubers’ (right panel) using network-based feature\nselection (Strategy 2: Composite Scoring Approach). Steps for Identifying These OTUs: 1- Network Construction:\nMicrobiome networks were separately built for clean tubers and scab-infected tubers using four inference methods:\nSE_glasso, SPRING, SPARCC, and CMIMN. 2- Weighted Scoring Within Each Method: A weighted score was assigned\nto each OTU within each method based on multiple centrality metrics (Degree, Betweenness, Closeness, Eigenvector,\nand PageRank). Selection of Important OTUs: The top 20% of OTUs with the highest Score 1 were selected within each\nindividual method. Table Column Explanations: First column in each panel: OTUs selected by Strategy 2 for each\nspecific network inference method (i.e., these OTUs are among the top 20% highest-scoring OTUs for that method).\nNext four columns (CLR, Original, Log, TSS): Overlap between Strategy 2-selected OTUs and those identified by\nML-based feature selection under different normalization approaches. A 1 in a column means that the ML method\nalso identified the OTU as important under that normalization method. A 0 in a column means that the OTU was not\nselected by the ML method under that normalization method. Note: This table presents the weighted score for each\nOTU within each inference method. Unlike later steps in Strategy 2, this table does not include the combined score\nacross all methods.\n‘clean tubers’ network ‘scab-infected tubers’ network\nCMIMN clr original log TSS CMIMN clr original log TSS\nMethylomirabilota 1 1 0 1 Nitrospirota 0 0 0 1\nMyxococcota 0 0 0 0 Desulfobacterota 0 0 0 1\nNitrospirota 0 0 0 1 Myxococcota 0 0 0 0\nDesulfobacterota 0 0 0 1 Bacteroidota 1 0 1 1\nMBNT15 1 0 0 0 NB1.j 1 1 1 1\nArmatimonadota 1 1 1 1 Patescibacteria 0 1 0 1\nProteobacteria 0 0 0 0 Proteobacteria 0 0 0 0\nWPS.2 1 0 1 0 Acidobacteriota 1 0 0 1\nSPARCC SPARCC\nWPS.2 1 0 1 0 Firmicutes 1 1 1 1\nMethylomirabilota 1 1 0 1 Desulfobacterota 0 0 0 1\nAcidobacteriota 1 0 0 1 Gemmatimonadota 0 0 0 0\nProteobacteria 0 0 0 0 Acidobacteriota 1 0 0 1\nPatescibacteria 0 1 0 1 Proteobacteria 0 0 0 0\nGemmatimonadota 0 0 0 0 Verrucomicrobiota 1 1 1 0\nActinobacteriota 0 0 0 0 Bacteroidota 1 0 1 1\nPlanctomycetota 0 0 0 0 Cyanobacteria 1 1 1 1\nSE_glasso SE_glasso\nWPS.2 1 0 1 0 Proteobacteria 0 0 0 0\nAcidobacteriota 1 0 0 1 MBNT15 1 0 0 0\nProteobacteria 0 0 0 0 Myxococcota 0 0 0 0\nChloroflexi 0 0 0 0 Bacteroidota 1 0 1 1\nPatescibacteria 0 1 0 1 NB1.j 1 1 1 1\nPlanctomycetota 0 0 0 0 Desulfobacterota 0 0 0 1\nGemmatimonadota 0 0 0 0 Firmicutes 1 1 1 1\nArmatimonadota 1 1 1 1 Actinobacteriota 0 0 0 0\nSPRING SPRING\nWPS.2 1 0 1 0 Proteobacteria 0 0 0 0\nNB1.j 1 1 1 1 Chloroflexi 0 0 0 0\nPatescibacteria 0 1 0 1 Nitrospirota 0 0 0 1\nActinobacteriota 0 0 0 0 Bacteroidota 1 0 1 1\nMethylomirabilota 1 1 0 1 Firmicutes 1 1 1 1\nProteobacteria 0 0 0 0 NB1.j 1 1 1 1\nGAL15 0 0 0 0 RCP2.54 0 0 0 0\nMBNT15 1 0 0 0 Acidobacteriota 1 0 0 1\n16\n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint \n\nSoil BN\nTable 18: Selection of key Operational Taxonomic Units (OTUs) at theClass level in microbiome networks constructed\nseparately for ‘clean tubers’ (left panel) and ‘scab-infected tubers’ (right panel) using network-based feature selection\n(Strategy 2: Composite Scoring Approach). Steps for Identifying These OTUs: 1- Network Construction: Microbiome\nnetworks were separately built for clean tubers and scab-infected tubers using four inference methods: SE_glasso,\nSPRING, SPARCC, and CMIMN. 2- Weighted Scoring Within Each Method: A weighted score was assigned to each OTU\nwithin each method based on multiple centrality metrics (Degree, Betweenness, Closeness, Eigenvector, and PageRank).\nSelection of Important OTUs: The top 20% of OTUs with the highest Score 1 were selected within each individual\nmethod. Table Column Explanations: First column in each panel: OTUs selected by Strategy 2 for each specific\nnetwork inference method (i.e., these OTUs are among the top 20% highest-scoring OTUs for that method). Next four\ncolumns (CLR, Original, Log, TSS): Overlap between Strategy 2-selected OTUs and those identified by ML-based\nfeature selection under different normalization approaches. A 1 in a column means that the ML method also identified\nthe OTU as important under that normalization method. A 0 in a column means that the OTU was not selected by the\nML method under that normalization method. Note: This table presents the weighted score for each OTU within each\ninference method. Unlike later steps in Strategy 2, this table does not include the combined score across all methods.\nclean tuber Net clr original log TSS scab-infected tubers Net clr original log TSS\nCMIMN\nKtedonobacteria 1 1 1 1 Ignavibacteria 1 1 1 1\nIgnavibacteria 1 1 1 1 Parcubacteria 0 0 0 0\nOM190 0 0 0 0 Anaerolineae 1 1 1 1\nAcidimicrobiia 0 0 0 1 Vicinamibacteria 0 1 0 1\nVicinamibacteria 0 1 0 1 Kryptonia 1 1 1 1\nActinobacteria 0 0 0 0 Acidimicrobiia 0 0 0 1\nDesulfobaccia 0 0 0 0 Gammaproteobacteria 0 0 0 0\nGammaproteobacteria 0 0 0 0 Alphaproteobacteria 0 0 0 0\nAlphaproteobacteria 0 0 0 0 AD3 0 0 1 0\nBlastocatellia 0 0 0 0 Acidobacteriae 0 0 0 1\nGitt.GS.136 0 1 1 1 Nitrospiria 0 0 0 1\nParcubacteria 0 0 0 0 Microgenomatia 0 0 0 0\nAcidobacteriae 0 0 0 1 Myxococcia 0 0 0 0\nPolyangia 0 0 0 1 BD2.11.terrestrial.group 0 0 0 0\nChloroflexia 0 0 0 0 Thermoleophilia 0 0 0 1\nKD4.96 0 0 0 1 Gitt.GS.136 0 1 1 1\nThermoleophilia 0 0 0 1 Blastocatellia 0 0 0 0\nMB.A2.108 1 1 1 1 Holophagae 0 1 0 0\nBabeliae 0 0 0 0 KD4.96 0 0 0 1\nLongimicrobia 0 0 0 0 Bacilli 1 1 1 1\nSPARCC\nVicinamibacteria 0 1 0 1 Actinobacteria 0 0 0 0\nAnaerolineae 1 1 1 1 Bacilli 1 1 1 1\nOLB14 1 1 1 0 AD3 0 0 1 0\nActinobacteria 0 0 0 0 Acidobacteriae 0 0 0 1\nKtedonobacteria 1 1 1 1 Anaerolineae 1 1 1 1\nMethylomirabilia 1 1 0 1 Blastocatellia 0 0 0 0\nAcidobacteriae 0 0 0 1 Chloroflexia 0 0 0 0\nBacteroidia 1 0 0 1 Thermoleophilia 0 0 0 1\nSaccharimonadia 1 1 1 1 Ignavibacteria 1 1 1 1\nChlamydiae 0 0 0 0 Ktedonobacteria 1 1 1 1\nAlphaproteobacteria 0 0 0 0 Nitrospiria 0 0 0 1\nContinued on next page\n17\n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint \n\nSoil BN\nTable 18 – continued from previous page\nclean tuber Net clr original log TSS scab-infected tubers Net clr original log TSS\nPlanctomycetes 1 1 1 1 Parcubacteria 0 0 0 0\nGammaproteobacteria 0 0 0 0 Alphaproteobacteria 0 0 0 0\nMB.A2.108 1 1 1 1 Rhodothermia 0 0 0 0\nGemmatimonadetes 0 0 0 1 Kryptonia 1 1 1 1\nGitt.GS.136 0 1 1 1 Gemmatimonadetes 0 0 0 1\nAD3 0 0 1 0 Syntrophobacteria 0 0 0 0\nSubgroup.25 0 0 0 0 Thermodesulfovibrionia 0 0 0 0\nBlastocatellia 0 0 0 0 Polyangia 0 0 0 1\nThermoanaerobaculia 0 0 0 0 Gammaproteobacteria 0 0 0 0\nSE_glasso\nKtedonobacteria 1 1 1 1 AD3 0 0 1 0\nSyntrophobacteria 0 0 0 0 Ktedonobacteria 1 1 1 1\nAnaerolineae 1 1 1 1 Bacilli 1 1 1 1\nAcidobacteriae 0 0 0 1 Desulfobaccia 0 0 0 0\nDesulfobaccia 0 0 0 0 Gammaproteobacteria 0 0 0 0\nAD3 0 0 1 0 Desulfobulbia 0 0 0 0\nVicinamibacteria 0 1 0 1 Syntrophobacteria 0 0 0 0\nActinobacteria 0 0 0 0 Thermodesulfovibrionia 0 0 0 0\nThermoleophilia 0 0 0 1 Alphaproteobacteria 0 0 0 0\nPlanctomycetes 1 1 1 1 Ignavibacteria 1 1 1 1\nMethylomirabilia 1 1 0 1 Kryptonia 1 1 1 1\nMB.A2.108 1 1 1 1 Actinobacteria 0 0 0 0\nGemmatimonadetes 0 0 0 1 Blastocatellia 0 0 0 0\nAlphaproteobacteria 0 0 0 0 Anaerolineae 1 1 1 1\nGammaproteobacteria 0 0 0 0 Dehalococcoidia 1 1 1 1\nLatescibacteria 0 1 1 0 Verrucomicrobiae 1 1 1 0\nSubgroup.25 0 0 0 0 Acidobacteriae 0 0 0 1\nGitt.GS.136 0 1 1 1 TK10 1 0 0 1\nSaccharimonadia 1 1 1 1 Syntrophia 0 0 0 0\nDesulfobulbia 0 0 0 0 Gemmatimonadetes 0 0 0 1\nSPRING\nActinobacteria 0 0 0 0 AD3 0 0 1 0\nBlastocatellia 0 0 0 0 Syntrophobacteria 0 0 0 0\nVicinamibacteria 0 1 0 1 Bacteroidia 1 0 0 1\nAcidimicrobiia 0 0 0 1 Rhodothermia 0 0 0 0\nAlphaproteobacteria 0 0 0 0 Polyangia 0 0 0 1\nAcidobacteriae 0 0 0 1 Nitrospiria 0 0 0 1\nParcubacteria 0 0 0 0 Alphaproteobacteria 0 0 0 0\nPolyangia 0 0 0 1 Bdellovibrionia 0 0 0 0\nGammaproteobacteria 0 0 0 0 Cyanobacteriia 1 1 1 1\nRhodothermia 0 0 0 0 Acidimicrobiia 0 0 0 1\nLongimicrobia 0 0 0 0 Microgenomatia 0 0 0 0\nAKAU4049 0 0 0 0 Kazania 0 0 0 0\nAnaerolineae 1 1 1 1 Parcubacteria 0 0 0 0\nKtedonobacteria 1 1 1 1 Gammaproteobacteria 0 0 0 0\nContinued on next page\n18\n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint \n\nSoil BN\nTable 18 – continued from previous page\nclean tuber Net clr original log TSS scab-infected tubers Net clr original log TSS\nKazania 0 0 0 0 Thermoleophilia 0 0 0 1\nGracilibacteria 0 0 0 0 Blastocatellia 0 0 0 0\nGemmatimonadetes 0 0 0 1 OM190 0 0 0 0\nThermoplasmata 1 0 0 0 Desulfitobacteriia 0 0 0 0\nVerrucomicrobiae 1 1 1 0 Anaerolineae 1 1 1 1\nDesulfobulbia 0 0 0 0 Bacilli 1 1 1 1\nTable 19: Selection of key Operational Taxonomic Units (OTUs) at theOrder level in microbiome networks constructed\nseparately for ‘clean tubers’ (left panel) and ‘scab-infected tubers’ (right panel) using network-based feature selection\n(Strategy 2: Composite Scoring Approach). Steps for Identifying These OTUs: 1- Network Construction: Microbiome\nnetworks were separately built for clean tubers and scab-infected tubers using four inference methods: SE_glasso,\nSPRING, SPARCC, and CMIMN. 2- Weighted Scoring Within Each Method: A weighted score was assigned to each OTU\nwithin each method based on multiple centrality metrics (Degree, Betweenness, Closeness, Eigenvector, and PageRank).\nSelection of Important OTUs: The top 20% of OTUs with the highest Score 1 were selected within each individual\nmethod. Table Column Explanations: First column in each panel: OTUs selected by Strategy 2 for each specific\nnetwork inference method (i.e., these OTUs are among the top 20% highest-scoring OTUs for that method). Next four\ncolumns (CLR, Original, Log, TSS): Overlap between Strategy 2-selected OTUs and those identified by ML-based\nfeature selection under different normalization approaches. A 1 in a column means that the ML method also identified\nthe OTU as important under that normalization method. A 0 in a column means that the OTU was not selected by the\nML method under that normalization method. Note: This table presents the weighted score for each OTU within each\ninference method. Unlike later steps in Strategy 2, this table does not include the combined score across all methods.\nclean tuber Net clr original log TSS scab-infected tubers Net clr original log TSS\nCMIMN\nKtedonobacterales 1 1 1 1 Ktedonobacterales 1 1 1 1\nC0119 1 1 1 1 Elev.1554 0 0 0 0\nAcidobacteriales 1 1 1 1 Microtrichales 0 1 0 1\nRhizobiales 0 0 0 1 Bryobacterales 0 1 1 1\nBacillales 1 1 1 1 C0119 1 1 1 1\nPropionibacteriales 0 1 1 0 Acetobacterales 1 1 1 1\nMicropepsales 1 1 1 1 Haliangiales 0 0 0 0\nPLTA13 1 1 1 1 Pedosphaerales 0 0 0 0\nChitinophagales 0 1 1 1 Rickettsiales 1 0 1 1\nActinomarinales 0 0 0 1 Rhizobiales 0 0 0 1\nSubgroup.17 0 1 1 1 Phycisphaerales 0 0 0 0\nChloroplast 1 1 1 1 PLTA13 1 1 1 1\nBryobacterales 0 1 1 1 Chloroplast 1 1 1 1\nRhodobacterales 1 0 0 0 Subgroup.17 0 1 1 1\nB12.WMSP1 0 1 1 0 Subgroup.2 1 1 1 1\nElev.1554 0 0 0 0 Sphingobacteriales 0 1 0 0\nChloroflexales 1 0 0 0 Reyranellales 0 0 0 0\nAcetobacterales 1 1 1 1 Actinomarinales 0 0 0 1\nDesulfobaccales 0 0 0 0 Thermoactinomycetales 0 0 0 0\nElsterales 1 1 1 1 S085 1 1 0 1\nGaiellales 1 1 1 1 Erysipelotrichales 0 0 0 0\nB10.SB3A 0 1 1 1 SBR1031 1 1 1 1\nContinued on next page\n19\n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint \n\nSoil BN\nTable 19 – continued from previous page\nclean tuber Net clr original log TSS scab-infected tubers Net clr original log TSS\nMicrotrichales 0 1 0 1 Gaiellales 1 1 1 1\nGemmatimonadales 0 0 0 1 Myxococcales 0 0 0 0\nVicinamibacterales 0 0 0 1 Kryptoniales 0 1 1 1\nMicrococcales 1 0 0 0 Opitutales 0 0 0 0\nFrankiales 1 1 1 1 CCD24 1 0 0 1\nSubgroup.2 1 1 1 1 Solirubrobacterales 0 0 0 1\nThermoanaerobaculales 0 0 1 0 Blastocatellales 1 1 0 1\nAnaerolineales 1 1 1 1 Saccharimonadales 1 1 1 1\nBacteroidales 1 1 1 1 Micromonosporales 0 0 1 1\nCCD24 1 0 0 1 IMCC26256 0 0 0 0\nVerrucomicrobiales 0 0 0 0 Ardenticatenales 0 0 0 1\nKryptoniales 0 1 1 1 Desulfitobacteriales 1 0 0 0\nSphingobacteriales 0 1 0 0 Candidatus.Yanofskybacteria 1 1 0 0\nIgnavibacteriales 0 0 0 0 Catenulisporales 0 0 0 0\nBurkholderiales 0 0 0 0 PAUC26f 0 1 1 1\nBabeliales 0 0 0 0 Streptomycetales 0 0 0 0\nSPARCC\nC0119 1 1 1 1 Ktedonobacterales 1 1 1 1\nMicrotrichales 0 1 0 1 Bacteroidales 1 1 1 1\nAcidobacteriales 1 1 1 1 Gemmatimonadales 0 0 0 1\nVicinamibacterales 0 0 0 1 Bryobacterales 0 1 1 1\nElsterales 1 1 1 1 Microtrichales 0 1 0 1\nChloroplast 1 1 1 1 Subgroup.17 0 1 1 1\nKtedonobacterales 1 1 1 1 Thermomicrobiales 0 0 0 0\nBryobacterales 0 1 1 1 Blastocatellales 1 1 0 1\nXanthomonadales 1 1 1 1 Sphingomonadales 1 1 1 1\nLeptolyngbyales 0 0 0 0 Micrococcales 1 0 0 0\nElev.1554 0 0 0 0 Elsterales 1 1 1 1\nMicropepsales 1 1 1 1 Actinomarinales 0 0 0 1\nAcetobacterales 1 1 1 1 Ignavibacteriales 0 0 0 0\nChloroflexales 1 0 0 0 Reyranellales 0 0 0 0\nB12.WMSP1 0 1 1 0 Anaerolineales 1 1 1 1\nChitinophagales 0 1 1 1 C0119 1 1 1 1\nSphingomonadales 1 1 1 1 Solibacterales 1 1 1 0\nB10.SB3A 0 1 1 1 Streptomycetales 0 0 0 0\nFrankiales 1 1 1 1 SJA.15 0 1 0 0\nSolibacterales 1 1 1 0 Frankiales 1 1 1 1\nGaiellales 1 1 1 1 Chloroflexales 1 0 0 0\nX24.Nov 0 0 0 0 Defluviicoccales 1 1 1 1\nSBR1031 1 1 1 1 Candidatus.Yanofskybacteria 1 1 0 0\nSubgroup.2 1 1 1 1 Chitinophagales 0 1 1 1\nPropionibacteriales 0 1 1 0 Bacillales 1 1 1 1\nPaenibacillales 1 1 1 1 Erysipelotrichales 0 0 0 0\nNitrospirales 0 0 0 1 Kryptoniales 0 1 1 1\nSaccharimonadales 1 1 1 1 Acidobacteriales 1 1 1 1\nContinued on next page\n20\n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint \n\nSoil BN\nTable 19 – continued from previous page\nclean tuber Net clr original log TSS scab-infected tubers Net clr original log TSS\nSphingobacteriales 0 1 0 0 Streptosporangiales 0 0 0 0\nBacillales 1 1 1 1 Azospirillales 0 0 0 0\nCCD24 1 0 0 1 Acetobacterales 1 1 1 1\nSubgroup.17 0 1 1 1 SBR1031 1 1 1 1\nRhizobiales 0 0 0 1 Syntrophobacterales 0 1 0 0\nNannocystales 0 0 0 0 Gammaproteobacteria.Incertae 0 0 0 1\nCatenulisporales 0 0 0 0 CCD24 1 0 0 1\nRokubacteriales 1 1 1 1 Desulfobaccales 0 0 0 0\nCaldilineales 0 0 0 0 Xanthomonadales 1 1 1 1\nPyrinomonadales 0 0 0 0 Sphingobacteriales 0 1 0 0\nSE_glasso\nKtedonobacterales 1 1 1 1 Sphingomonadales 1 1 1 1\nAcetobacterales 1 1 1 1 Chitinophagales 0 1 1 1\nElsterales 1 1 1 1 Ktedonobacterales 1 1 1 1\nMicrotrichales 0 1 0 1 Elev.1554 0 0 0 0\nXanthomonadales 1 1 1 1 Ignavibacteriales 0 0 0 0\nVicinamibacterales 0 0 0 1 Bacteroidales 1 1 1 1\nDesulfobaccales 0 0 0 0 Pedosphaerales 0 0 0 0\nIgnavibacteriales 0 0 0 0 Subgroup.17 0 1 1 1\nMicropepsales 1 1 1 1 C0119 1 1 1 1\nAcidobacteriales 1 1 1 1 Defluviicoccales 1 1 1 1\nSBR1031 1 1 1 1 PLTA13 1 1 1 1\nC0119 1 1 1 1 Elsterales 1 1 1 1\nSJA.15 0 1 0 0 Burkholderiales 0 0 0 0\nBryobacterales 0 1 1 1 Syntrophobacterales 0 1 0 0\nSphingomonadales 1 1 1 1 SJA.15 0 1 0 0\nSubgroup.2 1 1 1 1 Desulfobulbales 0 0 0 0\nB10.SB3A 0 1 1 1 Gemmatimonadales 0 0 0 1\nPAUC26f 0 1 1 1 Anaerolineales 1 1 1 1\nElev.1554 0 0 0 0 Actinomarinales 0 0 0 1\nRhizobiales 0 0 0 1 PAUC26f 0 1 1 1\nCaldilineales 0 0 0 0 Rhodothermales 0 0 0 0\nB12.WMSP1 0 1 1 0 Frankiales 1 1 1 1\nPropionibacteriales 0 1 1 0 Candidatus.Yanofskybacteria 1 1 0 0\nSolibacterales 1 1 1 0 Desulfobaccales 0 0 0 0\nCaulobacterales 0 1 1 0 Paenibacillales 1 1 1 1\nActinomarinales 0 0 0 1 Kryptoniales 0 1 1 1\nChloroflexales 1 0 0 0 Bryobacterales 0 1 1 1\nSubgroup.17 0 1 1 1 Acidobacteriales 1 1 1 1\nCandidatus.Jorgensenbacteria 0 0 0 0 Methylomirabilales 0 0 0 0\nRhodobacterales 1 0 0 0 Armatimonadales 0 0 0 0\nChitinophagales 0 1 1 1 Reyranellales 0 0 0 0\nFrankiales 1 1 1 1 Syntrophales 0 0 0 0\nSyntrophobacterales 0 1 0 0 Acetobacterales 1 1 1 1\nMethylomirabilales 0 0 0 0 Bacillales 1 1 1 1\nContinued on next page\n21\n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint \n\nSoil BN\nTable 19 – continued from previous page\nclean tuber Net clr original log TSS scab-infected tubers Net clr original log TSS\nGemmatimonadales 0 0 0 1 Micrococcales 1 0 0 0\nAminicenantales 0 0 0 0 Microtrichales 0 1 0 1\nDesulfobulbales 0 0 0 0 AKIW659 0 0 0 0\nSyntrophales 0 0 0 0 Ardenticatenales 0 0 0 1\nSPRING\nJG36.TzT.191 0 0 0 0 Erysipelotrichales 0 0 0 0\nPaludibaculum 0 0 0 0 Opitutales 0 0 0 0\nCaldilineales 0 0 0 0 Candidatus.Adlerbacteria 0 0 0 0\nKF.JG30.C25 1 0 0 0 Brevibacillales 0 0 0 0\nHaliangiales 0 0 0 0 Elev.16S.1166 0 0 0 0\nGammaproteobacteria.Incertae 0 0 0 1 Sphingobacteriales 0 1 0 0\nJG36.GS.52 0 0 0 0 Cytophagales 0 0 0 1\nBacteroidales 1 1 1 1 Flavobacteriales 0 0 0 0\nC0119 1 1 1 1 Enterobacterales 1 0 0 0\nRhodothermales 0 0 0 0 Haliangiales 0 0 0 0\nErysipelotrichales 0 0 0 0 Chloroplast 1 1 1 1\nChitinophagales 0 1 1 1 Xanthomonadales 1 1 1 1\nPaenibacillales 1 1 1 1 WD260 0 0 0 0\nPB19 0 0 0 0 KF.JG30.C25 1 0 0 0\nSJA.15 0 1 0 0 Paracaedibacterales 0 0 0 0\nSAR202.clade 0 0 0 0 Syntrophales 0 0 0 0\nChloroplast 1 1 1 1 Isosphaerales 1 1 1 0\nThermomicrobiales 0 0 0 0 PAUC26f 0 1 1 1\nSphingomonadales 1 1 1 1 Desulfitobacteriales 1 0 0 0\nRhodospirillales 0 0 0 0 Sphingomonadales 1 1 1 1\nMSB.4B10 0 0 0 0 Steroidobacterales 0 0 0 0\nPAUC26f 0 1 1 1 JG36.GS.52 0 0 0 0\nSBR1031 1 1 1 1 Paludibaculum 0 0 0 0\nEntomoplasmatales 0 0 0 0 PB19 0 0 0 0\nS085 1 1 0 1 SJA.28 0 1 0 1\nIsosphaerales 1 1 1 0 Saccharimonadales 1 1 1 1\nPolyangiales 0 0 0 0 Obscuribacterales 1 1 1 1\nDefluviicoccales 1 1 1 1 Rickettsiales 1 0 1 1\nElev.16S.1166 0 0 0 0 Fibrobacterales 0 0 0 0\nCandidatus.Kaiserbacteria 0 0 0 0 Chitinophagales 0 1 1 1\nCandidatus.Liptonbacteria 0 0 0 0 Pyrinomonadales 0 0 0 0\nXanthomonadales 1 1 1 1 Lactobacillales 0 0 0 0\nFFCH16263 0 0 0 0 FCPU453 0 0 0 0\nSubgroup.7 0 0 0 0 Ktedonobacterales 1 1 1 1\nRhodobacterales 1 0 0 0 Chthoniobacterales 0 1 1 1\nRhizobiales 0 0 0 1 Oxyphotobacteria.Incertae.Sedis 0 0 0 0\nPedosphaerales 0 0 0 0 Kineosporiales 1 1 1 1\nBlastocatellales 1 1 0 1 Microtrichales 0 1 0 1\n22\n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint \n\nSoil BN\n(/) \n\"O0..c\n0.8 \nci3 E -o 0.6 \nQ) \n(/) \ncu \n.0 \n.:: . 0 \n3 ci3 \n� 0.4 cu \n....J \n� \nQ) \nQ) \n3 ci3 \n.0 \nc \nQ) 0.2\nE \nQ) \nQ) \nOl \n<( \n0.0 I \n• \no• \ncir \nPhylum \n• • \n• \n0 • - ---\n0 \n•o•\n• • \nlog original TSS \nClass \n•o\no• \nI \n- - - - !.. \n• eo •\n• \n• \n• \no• • \n• \ncir log original TSS \nNormalization Methods \n• \n• \n• \n0 \ncir \nOrder \n• \no• o \n_._ i.,_ -I \no•• • \n• \n• \nlog original TSS \n• CMiMN\n0 SE_glasso\n• • SPARCC\n• SPRING\nFigure 4: Average Overlap Between Machine Learning (ML) Methods based on different nomalized data sets (clr,\noriginal data, log transformation, and TSS transformation) and Network-Based Approaches for ‘clean tubers’ network\nin Phylum, Class, and Order levels.\n23\n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint \n\nSoil BN\n0.8 \n(/) \n0 \nci3 \nE \n-o 0.6\nQ) \n(/) \ncu \n.0 \n,._ \n0 \n3 \nci3 \n� 0.4 cu \n....J \n� \nQ) \nQ) \n3 \n.0 \nc \nQ) 0.2\nE \nQ) \nQ) ,._ \n0.0 I \n• \n• \n0 • • \ncir \n•o- -\nlog \nPhylum \n0 \n- .•\n• \n• •o •\n• • \noriginal TSS \nClass \n• \n0 \n• • \n0 \n- .,.•o\n� .. I \n\"\"'\"-• \n• • • • \n• \ncir log original TSS \nNormalization Methods \no• \n• \n• \ncir \nOrder \n• \n• • • • \n0 \n• - .- 0 I \n0 \n• \n• CMiMN\n0 SE_glasso\n• • SPARCC\n• \n• SPRING\nlog original TSS \nFigure 5: Average Overlap Between Machine Learning (ML) Methods based on different nomalized data sets (clr,\noriginal data, log transformation, and TSS transformation) and Network-Based Approaches for ‘scab-infected tubers’\nnetwork in Phylum, Class, and Order levels.\nTable 20: Selection of Operational Taxonomic Units (OTUs) at thePhylum level in both networks of ‘Clean Tubers’\nand ‘Scab-Infected Tubers’ using Network-Based Method (Strategy 2) and Machine Learning (ML) methods. The\nleft part presents results from CMIMN and SE_glasso, while the right part displays results from SPARCC and SPRING\nalgorithms. The first column represents OTUs selected by the network-based method (Strategy 2), and columns 2\nto 4 show the overlap between OTUs selected by Strategy 2 and those chosen by ML methods using different data\nnormalization approaches (clr, original, log, and TSS). A \"1\" indicates that the ML method selected the respective OTU,\nwhile \"0\" signifies that the ML method did not select the respective OTU.\nCMIMN clr original log TSS SPARCC clr original log TSS\nMyxococcota 0 0 0 0 Acidobacteriota 1 0 0 1\nNitrospirota 0 0 0 1 Proteobacteria 0 0 0 0\nDesulfobacterota 0 0 0 1 Gemmatimonadota 0 0 0 0\nProteobacteria 0 0 0 0\nSE_glasso clr original log TSS SPRING clr original log TSS\nProteobacteria 0 0 0 0 NB1.j 1 1 1 1\n24\n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint \n\nSoil BN\nTable 21: Selection of Operational Taxonomic Units (OTUs) at the Class level in both networks of ‘Clean Tubers’ and\n‘Scab-Infected Tubers’ using Network-Based Method (Strategy 2) and Machine Learning (ML) methods. The left part\npresents results from CMIMN and SE_glasso, while the right part displays results from SPARCC and SPRING algorithms.\nThe first column represents OTUs selected by the network-based method (Strategy 2), and columns 2 to 4 show the\noverlap between OTUs selected by Strategy 2 and those chosen by ML methods using different data normalization\napproaches (clr, original, log, and TSS). A \"1\" indicates that the ML method selected the respective OTU, while \"0\"\nsignifies that the ML method did not select the respective OTU.\nCMIMN clr original log TSS SPARCC clr original log TSS\nIgnavibacteria 1 1 1 1 Anaerolineae 1 1 1 1\nAcidimicrobiia 0 0 0 1 Actinobacteria 0 0 0 0\nVicinamibacteria 0 1 0 1 Ktedonobacteria 1 1 1 1\nGammaproteobacteria 0 0 0 0 Acidobacteriae 0 0 0 1\nAlphaproteobacteria 0 0 0 0 Alphaproteobacteria 0 0 0 0\nBlastocatellia 0 0 0 0 Gammaproteobacteria 0 0 0 0\nGitt.GS.136 0 1 1 1 Gemmatimonadetes 0 0 0 1\nParcubacteria 0 0 0 0 AD3 0 0 1 0\nAcidobacteriae 0 0 0 1 Blastocatellia 0 0 0 0\nKD4.96 0 0 0 1\nThermoleophilia 0 0 0 1\nSE_glasso SPRING\nKtedonobacteria 1 1 1 1 Blastocatellia 0 0 0 0\nSyntrophobacteria 0 0 0 0 Acidimicrobiia 0 0 0 1\nAnaerolineae 1 1 1 1 Alphaproteobacteria 0 0 0 0\nAcidobacteriae 0 0 0 1 Parcubacteria 0 0 0 0\nDesulfobaccia 0 0 0 0 Polyangia 0 0 0 1\nAD3 0 0 1 0 Gammaproteobacteria 0 0 0 0\nActinobacteria 0 0 0 0 Rhodothermia 0 0 0 0\nGemmatimonadetes 0 0 0 1 Anaerolineae 1 1 1 1\nAlphaproteobacteria 0 0 0 0 Kazania 0 0 0 0\nGammaproteobacteria 0 0 0 0\nDesulfobulbia 0 0 0 0\n25\n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint \n\nSoil BN\nTable 22: Selection of Operational Taxonomic Units (OTUs) at the Order level in both networks of ‘Clean Tubers’ and\n‘Scab-Infected Tubers’ using Network-Based Method (Strategy 2) and Machine Learning (ML) methods. The left part\npresents results from CMIMN and SE_glasso, while the right part displays results from SPARCC and SPRING algorithms.\nThe first column represents OTUs selected by the network-based method (Strategy 2), and columns 2 to 4 show the\noverlap between OTUs selected by Strategy 2 and those chosen by ML methods using different data normalization\napproaches (clr, original, log, and TSS). A \"1\" indicates that the ML method selected the respective OTU, while \"0\"\nsignifies that the ML method did not select the respective OTU.\nCMIMN clr original log TSS SPARCC clr original log TSS\nKtedonobacterales 1 1 1 1 C0119 1 1 1 1\nC0119 1 1 1 1 Microtrichales 0 1 0 1\nRhizobiales 0 0 0 1 Acidobacteriales 1 1 1 1\nPLTA13 1 1 1 1 Elsterales 1 1 1 1\nActinomarinales 0 0 0 1 Ktedonobacterales 1 1 1 1\nSubgroup.17 0 1 1 1 Bryobacterales 0 1 1 1\nChloroplast 1 1 1 1 Xanthomonadales 1 1 1 1\nBryobacterales 0 1 1 1 Acetobacterales 1 1 1 1\nElev.1554 0 0 0 0 Chloroflexales 1 0 0 0\nAcetobacterales 1 1 1 1 Chitinophagales 0 1 1 1\nGaiellales 1 1 1 1 Sphingomonadales 1 1 1 1\nMicrotrichales 0 1 0 1 Frankiales 1 1 1 1\nSubgroup.2 1 1 1 1 Solibacterales 1 1 1 0\nCCD24 1 0 0 1 SBR1031 1 1 1 1\nKryptoniales 0 1 1 1 Sphingobacteriales 0 1 0 0\nSphingobacteriales 0 1 0 0 Bacillales 1 1 1 1\nCCD24 1 0 0 1\nSubgroup.17 0 1 1 1\nSE_glasso clr original log TSS SPRING clr original log TSS\nKtedonobacterales 1 1 1 1 Paludibaculum 0 0 0 0\nAcetobacterales 1 1 1 1 KF.JG30.C25 1 0 0 0\nElsterales 1 1 1 1 Haliangiales 0 0 0 0\nMicrotrichales 0 1 0 1 JG36.GS.52 0 0 0 0\nDesulfobaccales 0 0 0 0 Erysipelotrichales 0 0 0 0\nIgnavibacteriales 0 0 0 0 Chitinophagales 0 1 1 1\nAcidobacteriales 1 1 1 1 PB19 0 0 0 0\nC0119 1 1 1 1 Chloroplast 1 1 1 1\nSJA.15 0 1 0 0 Sphingomonadales 1 1 1 1\nBryobacterales 0 1 1 1 PAUC26f 0 1 1 1\nSphingomonadales 1 1 1 1 Isosphaerales 1 1 1 0\nPAUC26f 0 1 1 1 Elev.16S.1166 0 0 0 0\nElev.1554 0 0 0 0 Xanthomonadales 1 1 1 1\nActinomarinales 0 0 0 1\nSubgroup.17 0 1 1 1\nChitinophagales 0 1 1 1\nFrankiales 1 1 1 1\nSyntrophobacterales 0 1 0 0\nMethylomirabilales 0 0 0 0\nGemmatimonadales 0 0 0 1\nDesulfobulbales 0 0 0 0\nSyntrophales 0 0 0 0\n26\n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint \n\nSoil BN\nTable 23: Operational Taxonomic Units (OTUs) of significance in the ‘Clean Tubers’ network, selected by all four\nalgorithms: CMIMN, SPARCC, SE_glasso , and SPRING.\nLevel important OTUs\nPhylum Proteobacteria WPS.2\nClass Ktedonobacteria Vicinamibacteria Actinobacteria Gammaproteobacteria Alphaproteobacteria Acidobacteriae\nOrder C0119 Rhizobiales Chitinophagales\nTable 24: Operational Taxonomic Units (OTUs) of significance in the ‘scab-infected tubers’ network, selected by all\nfour algorithms: CMIMN, SPARCC, SE_glasso , and SPRING.\nLevel important OTUs\nPhylum Bacteroidota Proteobacteria\nClass Anaerolineae Gammaproteobacteria Alphaproteobacteria AD3 Blastocatellia Bacilli\nOrder Ktedonobacterales Microtrichales\n27\n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted April 13, 2025. ; https://doi.org/10.1101/2025.04.07.647660doi: bioRxiv preprint","source_license":"CC-BY-4.0","license_restricted":false}