KODA: An Agentic Framework for KEGG Orthology-Driven Discovery of Antimicrobial Drug Targets in Gut Microbiome

doi:10.1101/2025.05.27.656480

KODA: An Agentic Framework for KEGG Orthology-Driven Discovery of Antimicrobial Drug Targets in Gut Microbiome

2025 · doi:10.1101/2025.05.27.656480

preprint OA: closed

📄 Open PDF Full text JSON View at publisher

Full text 36,307 characters · extracted from preprint-html · click to expand

KODA: An Agentic Framework for KEGG Orthology-Driven Discovery of Antimicrobial Drug Targets in Gut Microbiome | bioRxiv /* */ /* */ <!-- <!-- /*! * yepnope1.5.4 * (c) WTFPL, GPLv2 */ (function(a,b,c){function d(a){return"[object Function]"==o.call(a)}function e(a){return"string"==typeof a}function f(){}function g(a){return!a||"loaded"==a||"complete"==a||"uninitialized"==a}function h(){var a=p.shift();q=1,a?a.t?m(function(){("c"==a.t?B.injectCss:B.injectJs)(a.s,0,a.a,a.x,a.e,1)},0):(a(),h()):q=0}function i(a,c,d,e,f,i,j){function k(b){if(!o&&g(l.readyState)&&(u.r=o=1,!q&&h(),l.onload=l.onreadystatechange=null,b)){"img"!=a&&m(function(){t.removeChild(l)},50);for(var d in y[c])y[c].hasOwnProperty(d)&&y[c][d].onload()}}var j=j||B.errorTimeout,l=b.createElement(a),o=0,r=0,u={t:d,s:c,e:f,a:i,x:j};1===y[c]&&(r=1,y[c]=[]),"object"==a?l.data=c:(l.src=c,l.type=a),l.width=l.height="0",l.onerror=l.onload=l.onreadystatechange=function(){k.call(this,r)},p.splice(e,0,u),"img"!=a&&(r||2===y[c]?(t.insertBefore(l,s?null:n),m(k,j)):y[c].push(l))}function j(a,b,c,d,f){return q=0,b=b||"j",e(a)?i("c"==b?v:u,a,b,this.i++,c,d,f):(p.splice(this.i++,0,a),1==p.length&&h()),this}function k(){var a=B;return a.loader={load:j,i:0},a}var l=b.documentElement,m=a.setTimeout,n=b.getElementsByTagName("script")[0],o={}.toString,p=[],q=0,r="MozAppearance"in l.style,s=r&&!!b.createRange().compareNode,t=s?l:n.parentNode,l=a.opera&&"[object Opera]"==o.call(a.opera),l=!!b.attachEvent&&!l,u=r?"object":l?"script":"img",v=l?"script":u,w=Array.isArray||function(a){return"[object Array]"==o.call(a)},x=[],y={},z={timeout:function(a,b){return b.length&&(a.timeout=b[0]),a}},A,B;B=function(a){function b(a){var a=a.split("!"),b=x.length,c=a.pop(),d=a.length,c={url:c,origUrl:c,prefixes:a},e,f,g;for(f=0;f<d;f++)g=a[f].split("="),(e=z[g.shift()])&&(c=e(c,g));for(f=0;f<b;f++)c=x[f](c);return c}function g(a,e,f,g,h){var i=b(a),j=i.autoCallback;i.url.split(".").pop().split("?").shift(),i.bypass||(e&&(e=d(e)?e:e[a]||e[g]||e[a.split("/").pop().split("?")[0]]),i.instead?i.instead(a,e,f,g,h):(y[i.url]?i.noexec=!0:y[i.url]=1,f.load(i.url,i.forceCSS||!i.forceJS&&"css"==i.url.split(".").pop().split("?").shift()?"c":c,i.noexec,i.attrs,i.timeout),(d(e)||d(j))&&f.load(function(){k(),e&&e(i.origUrl,h,g),j&&j(i.origUrl,h,g),y[i.url]=2})))}function h(a,b){function c(a,c){if(a){if(e(a))c||(j=function(){var a=[].slice.call(arguments);k.apply(this,a),l()}),g(a,j,b,0,h);else if(Object(a)===a)for(n in m=function(){var b=0,c;for(c in a)a.hasOwnProperty(c)&&b++;return b}(),a)a.hasOwnProperty(n)&&(!c&&!--m&&(d(j)?j=function(){var a=[].slice.call(arguments);k.apply(this,a),l()}:j[n]=function(a){return function(){var b=[].slice.call(arguments);a&&a.apply(this,b),l()}}(k[n])),g(a[n],j,b,n,h))}else!c&&l()}var h=!!a.test,i=a.load||a.both,j=a.callback||f,k=j,l=a.complete||f,m,n;c(h?a.yep:a.nope,!!i),i&&c(i)}var i,j,l=this.yepnope.loader;if(e(a))g(a,0,l,0);else if(w(a))for(i=0;i (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0];var j=d.createElement(s);var dl=l!='dataLayer'?'&l='+l:'';j.src='//www.googletagmanager.com/gtm.js?id='+i+dl;j.type='text/javascript';j.async=true;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-M677548'); Skip to main content Home About Submit ALERTS / RSS Search for this keyword Advanced Search New Results KODA: An Agentic Framework for KEGG Orthology-Driven Discovery of Antimicrobial Drug Targets in Gut Microbiome View ORCID Profile Javad Aminian-Dehkordi , View ORCID Profile Mohammad Parsa , View ORCID Profile Mohsen Naghipourfar , View ORCID Profile Mohammad R.K. Mofrad doi: https://doi.org/10.1101/2025.05.27.656480 Javad Aminian-Dehkordi 1 Department of Bioengineering, University of California , Berkeley, California, US Find this author on Google Scholar Find this author on PubMed Search for this author on this site ORCID record for Javad Aminian-Dehkordi Mohammad Parsa 1 Department of Bioengineering, University of California , Berkeley, California, US Find this author on Google Scholar Find this author on PubMed Search for this author on this site ORCID record for Mohammad Parsa Mohsen Naghipourfar 1 Department of Bioengineering, University of California , Berkeley, California, US Find this author on Google Scholar Find this author on PubMed Search for this author on this site ORCID record for Mohsen Naghipourfar Mohammad R.K. Mofrad 1 Department of Bioengineering, University of California , Berkeley, California, US Find this author on Google Scholar Find this author on PubMed Search for this author on this site ORCID record for Mohammad R.K. Mofrad For correspondence: mofrad{at}berkeley.edu Abstract Full Text Info/History Metrics Data/Code Preview PDF Abstract The gut microbiome plays a crucial role in human health and disease, influencing diverse biological processes such as immune regulation and nutrient metabolism. However, the complexity of micro-bial interactions and their metabolic cross-feeding dynamics remains poorly understood. This study proposes KODA, an agentic framework that integrates large language models (LLMs) and knowledge graphs (KGs) to facilitate the discovery of targets in antimicrobial drugs in the gut microbiome. Our approach employs a multi-agent system to interpret natural language queries and translate them into precise graph database queries, enabling intuitive interactions with complex microbiome data. Focusing on KEGG orthologies related to essential microbial genes, KODA identifies potential antimicrobial drug targets by analyzing microbial metabolic pathways. The system employs a Neo4j-based microbiome KG, which integrates microbial interaction data, metabolic models, and KEGG annotations. A dedicated evaluation framework, which incorporates LLM-based reviewers, assesses the quality of generated queries and analytical reports. Our results demonstrate the efficacy of KODA in providing actionable insights for antimicrobial research, particularly in identifying conserved essential genes as potential drug targets. This framework holds the potential to democratize microbiome research by lowering technical barriers and accelerating hypothesis generation in drug discovery. 1. Introduction Collective metabolic activities and interactions within the gut microbiome can be aptly described as microbial social networks, shaped by complex ecological dynamics. A central feature of these networks is metabolic cross-feeding, in which the byproducts of one group of microbes serve as essential nutrients or substrates for others. This interdependence plays a key role in shaping both the structure and functional output of the microbial community ( Ponomarova & Patil, 2015 ; Sung et al., 2017 ). Despite the acknowledged importance of these interdependencies, the precise underlying mechanisms, the full spectrum of exchanged metabolites, and the functional consequences of these complex metabolic handoffs often remain poorly characterized, rendering significant portions of gut microbial ecosystem dynamics a “black box”. This opacity limits our ability to predict how microbial communities respond to perturbations and to rationally design interventions, such as targeted probiotics or dietary modulations. To address this challenge, systematic integration and transparent representation of available data are crucial. In this context, knowledge graphs (KGs) are a powerful paradigm for representing such complex, interconnected biological data ( Goetz et al., 2024 ; Ma et al., 2024 ). KGs model information as networks of entities and their relationships, making them inherently well-suited to capture the networked nature of biological systems and enabling complex queries. However, effective use of these KGs often requires proficiency in graph query languages ( e . g ., Cypher for Neo4j), which limits accessibility for many domain scientists. Large language models (LLMs) and the agentic systems built upon them offer a transformative approach to bridge this gap ( Brown et al., 2020 ; Wang et al., 2023 ). AI agents, powered by LLMs, can interpret natural language, interact with tools, and perform complex reasoning, thereby enabling more intuitive and accessible interfaces to structured data repositories. We propose KODA, an LLM-powered, multi-agent framework designed to bridge the gap between natural language inquiry and the structured yet technically demanding world of human gut microbiome KGs. We hypothesize that such a system will make querying complex microbial interaction data more accessible to researchers in the field by leveraging intuitive, domain-specific natural language. More than a data retrieval tool, KODA delivers contextually relevant analyses that speed up hypothesis generation and testing in microbiome research. Our system coordinates a team of specialized AI agents that translate natural language queries (NLQs) into precise Cypher commands, execute these queries against a Neo4j database, analyze the retrieved data with a focus on biological significance, particularly the role of KOs linked to essential genes as potential drug targets, and synthesize comprehensive reports. A key component of the system is a detailed LLM-consumable graph schema description, which guides the agents in accurately interpreting user intent and interacting with the KG. Our approach offers a novel, conversational interface for microbiome research, democratizing access to complex graph data and speeding up discovery, especially in identifying potential antimicrobial drug targets based on essential gene functions. Our contributions are: A KG of the human gut microbiome, constructed based on mechanistic pairwise simulations by metabolic networks, gene essentiality analyses associated wtih descriptive KEGG orthologies (KOs), and multi-source biochemical data, providing a use-case for targeted therapeutic discovery. The design and implementation of a pipeline for natural language querying of a gut microbiome KG, leveraging schema-guided LLM reasoning. A specialized analytical agent within the pipeline focused on interpreting KOs associated with essential genes as candidate antimicrobial drug targets. An evaluation framework using LLM-based reviewers to systematically assess the quality of generated Cypher queries and analytical reports. 2. Methods 2.1. Pairwise GEM-based modeling of microbial interactions We analyzed gut microbial interactions using microbiome data from individuals under high-fiber diet ( Diener et al., 2020 ). All identifiable microbial taxa were extracted, regardless of their relative abundances. For each taxon, we retrieved available genome-scale metabolic models (GEMs) from the AGORA ( Heinken et al., 2020 ) database, resulting in a total of 75 SBML models. GEMs are computational reconstructions of an organism’s metabolic network, integrating genomic and biochemical data used to simulate metabolic fluxes under various conditions ( Cook & Nielsen, 2017 ). These models are particularly useful for predicting microbial growth and interactions in different environments using constraint-based modeling approaches. To simulate microbial interactions, we constrained each GEM based on an averaged high-fiber diet profile and anaerobic conditions. We then performed 2,775 pairwise simulations, representing all possible combinations among the 75 GEMs. In each simulation, the paired microbes shared a common compartment that allowed for metabolic exchanges: both microbes could secrete metabolites into, or uptake metabolites from, this shared environment. Dietary compounds were introduced into a shared compartment, and metabolic byproducts were allowed to exit the system to simulate realistic environmental turnover. We performed optimized general parallel sampler, a Monte Carlo sampling method ( Megchelenbrink et al., 2014 ), generating 10,000 flux distributions with a thinning factor of 100. From these simulations, we identified cross-feeding metabolites—those exchanged between microbes in the shared environment—and characterized their directionality. We also quantified the contribution of individual metabolic pathways in each microbe based on flux distributions from pairwise simulations. Reaction-to-pathway mappings were obtained from the Virtual Metabolic Human (VMH) and KEGG databases. For each microbe, pathway activity scores were computed by aggregating the fluxes of reactions associated with each pathway. All resulting data, including metabolic cross-feeding relationships and individual pathway activities, were used to construct a graph-based representation of microbial interactions in Neo4j. This enables integrative visualization and analysis of the gut microbial metabolic network. 2.2 Antimicrobial drug targets and KEGG orthologies To identify potential drug targets, we performed a gene essentiality analysis using GEMs corresponding to the microbial strains of the community. The essentiality of metabolic genes was assessed through single-gene deletion simulations using flux balance analysis (FBA) ( Sahu et al., 2021 ), incorporating gene-protein-reaction (GPR) associations. For each gene, we simulated a knockout by constraining the flux through all associated reactions to zero. FBA was then conducted to evaluate the impact of gene deletion on the organism’s growth, using the wild-type biomass reaction as the objective function. The simulations were performed under an averaged high-fiber diet and anaerobic conditions. A gene was classified as essential if its deletion reduced the predicted growth rate to less than 10% of the maximum wild-type growth rate (see Figure S1 ). Rather than examining the full genome, we focused on genes involved in a curated set of biologically critical pathways known to be conserved in pathogens and commonly exploited as drug targets ( Naclerio & Sintim, 2020 ). These pathways include those responsible for cofactor and vitamin biosynthesis, cell envelope biogenesis, and central carbon metabolism. Table S1 (Appendix A.1) outlines the selection criteria for these pathways. Then, we refined our list of candidate drug targets by removing any genes with human homologs, based on VMH data, to minimize potential host toxicity. Finally, to facilitate cross-species functional annotation and drug development, we retrieved KO identifiers for the essential genes using KEGG (see Figure S2 for shared and unique KOs per microbes and Figure S3 for hierarchical clustering of microbes based on essential KO similarity). These KO assignments provide standardized functional categories, aiding in comparative analysis and identification of conserved essential targets across strains. 2.3 Human gut microbiome knowledge graph At the core of our system is a structured Neo4j KG that serves as the primary data repository, capturing key relationships within the human gut microbiome. The KG was initially constructed using the NetworkX library ( Hagberg et al., 2008 ) and subsequently loaded into a Neo4j graph database for persistent storage and querying. The KG was populated by integrating data explained above. A comprehensive programmatically accessible graph schema description was provided to the AI agents (for more details, see Table S2 in the Appendix). Grounded in the schema, the LLM interprets user intent, formulates precise Cypher queries, and interacts effectively with KG, an approach aligned with recent research in automated KG querying and enrichment ( Chen et al., 2025 ; Tiwari et al., 2025 ). 2.4 Multi-agent LLM framework architecture The framework was implemented using a modular agent architecture powered by the GPT-4o LLM and orchestrated through a task coordination environment designed for multiagent workflows. The general objective was to support biological interpretation and drug target discovery in microbiome research. The system comprises four sequentiallyoperating AI agents (Table S3 in the Appendix), each tailored for a specific subtask: Researcher Agent is equipped with tools to interact with Neo4j KG and to gather more information from external sources, e . g ., a web search tool. Researcher Agent serves in preliminary research stages to generate hypotheses by combining the KG insights with the external literature and also to contextualize user queries before entering the main processing pipeline. Data Engineering Agent is conceptualized as an expert in Neo4j operations with deep knowledge of the microbiome graph schema and Cypher query language. Its primary role is to transform a user’s NLQ, guided by the graph schema, into precise and efficient Cypher queries required to retrieve relevant data from our microbiome KG. It dynamically consults with the schema and executes queries against the database using retry logic, returning either the raw query results or any error messages encountered during execution. The expected output is a structured representation of the generated Cypher queries along with their execution results. Content Analyst Agent , an expert in microbial ecology and systems biology, is tasked with interpreting data retrieved by the Data Engineering Agent with a focus on identifying antimicrobial drug targets and prioritizing KOs linked to essential microbial genes. It analyzes the retrieved data, or error messages, in light of the original user query. When data is available, the agent identifies key biological entities (microbes, metabolites, pathways, KOs), describes relationships, quantifies findings where possible, and highlights their biological significance. Special emphasis is given to KOs, detailing their functions, essentiality, and potential as drug targets for the specified microbe. If no data was found or an error occurred, the agent clearly reports this. The output is a detailed textual analysis or a notification of data absence or errors. Report Writer Agent: This agent functions as an expert scientific report writer specializing in microbial ecology. It synthesizes the detailed output from the Content Analysis Agent into a clear, concise, and wellstructured report that directly addresses the original user query and undescores any identified antimicrobial drug target implications. The final output is a formatted document suitable for researchers or other informed users. The workflow follows a sequential structure, with the output of one agent forming the primary input for the next. To promote consistency and determinism, particularly for query generation and scientific interpretation, the LLM temperature was set to 0.2. 2.5 System implementation The framework was implemented in Python, using several open-source libraries, including CrewAI for the multiagent architecture, Langchain, a dependency of CrewAI, for handling the LLM integration, Pydantic for data validation and schema definition, and the official Neo4j Python driver for database interaction. The OpenAI API provided access to the GPT-4o model. A detailed system log was configured to capture operational activity throughout the pipeline. All codes are publicly available on GitHub ( https://github.com/mofradlab/koda ), 2.6 Evaluation framework To systematically evaluate the performance and reliability of the pipeline, we developed a dedicated evaluation framework using LLM-based reviewer agents. This framework assesses the quality of generated Cypher queries and the final analytical reports using a benchmark dataset. Benchmark dataset: A curated set of NLQs was designed to reflect typical microbiome research inquiries. These covered a broad range of analytical tasks, such as identifying KOs linked to specific microbes, evaluating metabolite production under gene presence/absence scenarios, and conducting comparative KO analyses across taxa. Each NLQ was paired with a manually curated gold-standard Cypher query for reference. LLM-based reviewers: Two reviewer agents, each modeled as a subject-matter expert, were implemented: (1) Query Reviewer Agent evaluates the generated Cypher queries based on syntactic correctness, schema compliance, semantic alignment with the original NLQ, parameter usage, and naming clarity. Each criterion is scored on a 1–5 scale, with qualitative feedback provided to guide further optimization; (2) Report Reviewer Agent assesses the final scientific report for factual accuracy, completeness, relevance to the original query, interpretative depth, and the strength of drug target discussion. It similarly provides both scores and narrative feedback. Both agents were configured with a low-temperature setting (0.1) to ensure consistent and critical evaluations. For each NLQ, the system was run end-to-end to generate Cypher queries and a corresponding report, which were then independently reviewed. Outputs included quantitative scores and qualitative comments, all systematically recorded. A query or report was considered successful if it achieved an average score above 4.0 on key metrics, specifically, syntactic and semantic quality for queries and factual accuracy and relevance for reports. Aggregate results across all NLQs were used to evaluate overall system robustness and generalizability. 3. Results We evaluated our framework rigorously using a benchmark suite shown in Table 1 . The evaluation centered on the precision of the generation of Cypher queries against KG (depicted in Figure 2 ) and the scientific quality of the analytical reports resulting, particularly regarding KOs as potential targets for antimicrobial drugs. View this table: View inline View popup Download powerpoint Table 1. Natural language queries (NLQs) and their specific gold-standard Cypher queries (GSCQ). Download figure Open in new tab Figure 1. KODA pipeline diagram. The pipeline initiates with a user’s NLQ. Agents are guided by the comprehensive Graph Schema description. Data Engineer generates and executes a Cypher query against the Neo4j knowledge graph (containing microbe, metabolite, pathway, and KO nodes), fetching relevant data. Content Analyst receives the retrieved data, the updated NLQ, and the graph schema description. It performs a biological interpretation, focusing on the significance of identified KOs as essential gene functions and potential antimicrobial drug targets. Finally, the report is generated. Download figure Open in new tab Figure 2. An overview of the Neo4j knowledge graph representing the relationships of the gut microbiome of individuals under an averaged high-fiber diet. The graph includes 745 nodes of four types: Microbe, Metabolite, KOs, and Pathway as well as 16,370 relationships of five types: INVOLVED IN, CONSUMES, PRO- DUCES, HAS KEGG ORTHOLOGY, and CROSS FEEDS WITH. The evaluation process involved two steps. The first stage was designed to assess the technical quality and semantic accuracy of the Cypher queries generated by the pipeline. This step ensures that queries are not only syntactically correct and executable on a Neo4j database but also semantically aligned with the user’s natural language intent and compliant with the defined graph schema, covering correct usage of node labels, relationships, properties, and functions. By systematically evaluating criteria such as schema adherence, parameterization, and semantic alignment (compared to both the input NLQ and gold-standard queries), this agent helps quantify the reliability and precision of the NLto-Cypher translation, a critical component for trustworthy data retrieval. The Data Engineer Agent showed strong performance in translating NLQs into executable Cypher queries (refer to Table 2 for more details of the evaluation). Each query was assessed for syntactic validity, schema adherence, and semantic accuracy, achieving high scores (average syntactic validity: 5.00, schema adherence: 5.00, semantic accuracy NLQ: 4.83). View this table: View inline View popup Download powerpoint Table 2. Average LLM-reviewer scores for the pipeline performance. Scores were averaged across the benchmark NLQs for Cypher query generation and analytical report quality, based on a 1-5 scale (5 being optimal). In the second stage of the evaluation, we assessed the quality of the final analytical outputs delivered to the user. This involved a detailed review of the scientific reports generated by the pipeline, focusing on their biological relevance, factual grounding in retrieved data, depth of interpretation, and overall clarity. The reviewer also evaluated how effectively each report addressed the original NLQ and highlighted key insights, particularly the discussion of KOs associated with essential genes as potential antimicrobial drug targets, a central aim of our framework. Regarding Table 2 , analytical reports produced by the Content Analyst and Report Writer agents were informative and contextually relevant across benchmarked NLQs. Reports received pretty good scores for factual accuracy, completeness, and relevance to NLQs where data was available. For instance, in the first NLQ focused on Klebsiella pneumoniae , the report earned top scores (5/5) across all criteria. The reviewer noted: “ The report provides a comprehensive and accurate analysis and effectively identifies and describes the functional roles of each KO. The discussion on the essentiality of these KOs and their potential as drug targets is insightful and scientifically valuable ”. The report correctly identified and listed KOs, such as K03151 (Thiazole phosphate synthesis) and K01646 (Citrate Lyase), and discussed their importance as potential drug targets. Overall, the system demonstrated strong capabilities in both query generation and scientific interpretation, effectively synthesizing complex microbial interaction data into meaningful, actionable insights. This underscores the analytical depth and biological significance of our framework. 4. Discussion Our study introduces a novel tool that integrates advanced foundation models with an architecture to enable sophisticated interaction with and analysis of complex gut microbial data structured within a KG. This directly addresses the pressing need for advanced computational tools capable of managing, interpreting, and extracting insights from the exponentially growing volume of microbiome data. Our system adoptes a schema-guided approach to translate NLQs into precise Cypher commands, and employs specialized agents for biological reasoning with a focus on KOs corresponding to essential gene functions and their implications as drug targets. This structured, interpretable workflow improves the transparency and reliability of LLM-driven analyses, offering outputs that are not only accurate but also verifiable. This study reinforces the value of KGs as structured, machine-readable repositories of microbiome knowledge. When combined with the natural language processing and reasoning capabilities of LLMs, KGs can be powerfully and intuitively interrogated. This synergistic approach aligns with and contributes to recent trends in combining KGs with LLMs for enhanced reasoning, information retrieval, and hypothesis generation in the biomedical domain. A key practical outcome of our framework is its potential to significantly accelerate the cycle of hypothesis generation and subsequent experimental validation, especially in highpriority areas such as antimicrobial drug discovery. By targeting essential microbial genes annotated with KOs, the system helps identify novel therapeutic targets with potential cross-species relevance. The application of this approach to the human gut microbiome, a data-rich and biologically complex system, further shows its practical utility and scalability. 5. Conclusion The presented framework aligns with the growing trend of using LLMs and microbiome-specific KGs for automated scientific discovery. The specific focus on interpreting KOs linked to essential genes as potential drug targets offers a direct application for accelerating hypothesis generation in antimicrobial research, a critical area given the rise of antibiotic resistance. The use of specialized AI agents allows for different sub-tasks, from technical query generation to nuanced biological interpretation, which is a novel approach in the context of microbiome KG interrogation. Additionally, our LLM-based evaluation pipeline provides a systematic method for assessing both the technical and analytical performance of such systems. In summary, our work offers a promising pathway towards democratizing access to complex microbiome datasets and enhancing scientific inquiry. By translating NLQs into structured analyses, it enables researchers to derive actionable insights from large-scale biological data more efficiently. Impact Statement We aim to advance biological discovery by targeting the gut microbiome to identify potential antimicrobial drug targets by combining large language models and knowledge graphs. Our framework offers a scalable tool to streamline hypothesis generation and enhance data-driven microbiology research. A. Appendix View this table: View inline View popup Download powerpoint A.3 Table S1: Metabolic pathways included in the analysis of gene essentiality and their relevance as antimicrobial drug targets View this table: View inline View popup Download powerpoint A.2 Table S2: Complete Graph Schema View this table: View inline View popup A.1 Table S3: Agents descriptions A.4 Supplementary Figures Download figure Open in new tab Figure S1. Essential gene count per pathway per microbe. Download figure Open in new tab Figure S2. Shared vs unique essential KOs per microbe. Download figure Open in new tab Figure S3. Hierarchical clustering of microbes based on essential KO similarity. Footnotes https://github.com/mofradlab/koda References ↵ Brown , T. , Mann , B. , Ryder , N. , Subbiah , M. , Kaplan , J. D. , Dhariwal , P. , Neelakantan , A. , Shyam , P. , Sastry , G. , Askell , A. , et al. Language models are few-shot learners . Advances in neural information processing systems , 33 : 1877 – 1901 , 2020 . OpenUrl ↵ Chen , Y. , Sawhney , H. , Gydé , N. , Jian , Y. , Saunders , J. , Vela , P. , and Lundell , B. A schema-guided reason-while-retrieve framework for reasoning on scene graphs with large-language-models (llms) . arXiv preprint arXiv:2502.03450, 2025 . ↵ Cook , D. J. and Nielsen , J. Genome-scale metabolic models applied to human health and disease . Wi- ley Interdisciplinary Reviews: Systems Biology and Medicine , 9 ( 6 ): e1393 , 11 2017 . ISSN 1939-005X . doi: 10.1002/WSBM.1393 . URL https://onlinelibrary.wiley.com/doi/full/10.1002/wsbm.1393 https://onlinelibrary.wiley.com/doi/abs/10.1002/wsbm.1393 https://wires.onlinelibrary.wiley.com/doi/10.1002/wsbm.1393 . OpenUrl CrossRef ↵ Diener , C. , Gibbons , S. M. , and Resendis-Antonio , O. MICOM: Metagenome-Scale Modeling To Infer Metabolic Interactions in the Gut Micro-biota . mSystems , 5 ( 1 ), 2 2020 . ISSN 2379-5077 . doi: 10.1128/msystems.00606-19 . URL https://journals.asm.org/journal/msystems https://journals.asm.org/doi/10.1128/msystems.00606-19 . OpenUrl CrossRef ↵ Goetz , S. L. , Glen , A. K. , and Glusman , G. Microbiomekg: Bridging microbiome research and host health through knowledge graphs . bioRxiv , 2024 . ↵ Hagberg , A. , Swart , P. J. , and Schult , D. A. Exploring network structure, dynamics, and function using networkx. Technical report, Los Alamos National Laboratory (LANL) , Los Alamos, NM (United States ), 2008 . ↵ Heinken , A. , Acharya , G. , Ravcheev , D. A. , Hertel , J. , Nyga , M. , Okpala , O. E. , Hogan , M. , Magnúsdóttir , S. , Martinelli , F. , Preciat , G. , Edirisinghe , J. N. , Henry , C. S. , Fleming , R. M. T. , and Thiele , I. AGORA2: Large scale reconstruction of the microbiome highlights wide-spread drug-metabolising capacities . bioRxiv , pp. 2020.11.09.375451, 11 2020 . doi: 10.1101/2020.11.09.375451 . URL https://www.biorxiv.org/content/10.1101/2020.11.09.375451v1 https://www.biorxiv.org/content/10.1101/2020.11.09.375451v1.abstract https://doi.org/10.1101/2020.11.09.375451 . OpenUrl Abstract / FREE Full Text ↵ Ma , C. , Liu , S. , and Koslicki , D. Metagenomickg: a knowledge graph for metagenomic applications . bioRxiv , 2024 . ↵ Megchelenbrink , W. , Huynen , M. , and Marchiori , E. optGp-Sampler: An Improved Tool for Uniformly Sampling the Solution-Space of Genome-Scale Metabolic Networks . PLoS ONE , 9 ( 2 ): e86587 , 2 2014 . ISSN 1932-6203 . doi: 10.1371/journal.pone.0086587 . URL https://dx.plos.org/10.1371/journal.pone.0086587 . OpenUrl CrossRef PubMed ↵ Naclerio , G. A. and Sintim , H. O. Multiple ways to kill bacteria via inhibiting novel cell wall or membrane targets . Future medicinal chemistry , 12 ( 13 ): 1253 – 1279 , 2020 . OpenUrl CrossRef PubMed ↵ Ponomarova , O. and Patil , K. R. Metabolic interactions in microbial communities: untangling the gordian knot . Current opinion in microbiology , 27 : 37 – 44 , 2015 . OpenUrl CrossRef PubMed ↵ Sahu , A. , Blätke , M. A. , Szymanńki , J. J. , and Töpfer , N. Advances in flux balance analysis by integrating machine learning and mechanism-based models , 1 2021 . ISSN 20010370 . ↵ Sung , J. , Kim , S. , Cabatbat , J. J. T. , Jang , S. , Jin , Y.-S. , Jung , G. Y. , Chia , N. , and Kim , P.-J. Global metabolic interaction network of the human gut microbiota for context-specific community-scale analysis . Nature communica- tions , 8 ( 1 ): 15393 , 2017 . OpenUrl CrossRef ↵ Tiwari , A. , Malay , S. K. R. , Yadav , V. , Hashemi , M. , and Madhusudhan , S. T. Auto-cypher: Improving llms on cypher generation via llm-supervised generation-verification framework . In Proceedings of the 2025 Con- ference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Lan- guage Technologies (Volume 2: Short Papers) , pp. 623 – 640 , 2025 . ↵ Wang , G. , Xie , Y. , Jiang , Y. , Mandlekar , A. , Xiao , C. , Zhu , Y. , Fan , L. , and Anandkumar , A. Voyager: An open-ended embodied agent with large language models . arXiv preprint arXiv:2305.16291, 2023 . View the discussion thread. Back to top Previous Next Posted June 01, 2025. Download PDF Data/Code Email Thank you for your interest in spreading the word about bioRxiv. NOTE: Your email address is requested solely to identify you as the sender of this article. Your Email * Your Name * Send To * Enter multiple addresses on separate lines or separate them with commas. You are going to email the following KODA: An Agentic Framework for KEGG Orthology-Driven Discovery of Antimicrobial Drug Targets in Gut Microbiome Message Subject (Your Name) has forwarded a page to you from bioRxiv Message Body (Your Name) thought you would like to see this page from the bioRxiv website. Your Personal Message CAPTCHA This question is for testing whether or not you are a human visitor and to prevent automated spam submissions. Share KODA: An Agentic Framework for KEGG Orthology-Driven Discovery of Antimicrobial Drug Targets in Gut Microbiome Javad Aminian-Dehkordi , Mohammad Parsa , Mohsen Naghipourfar , Mohammad R.K. Mofrad bioRxiv 2025.05.27.656480; doi: https://doi.org/10.1101/2025.05.27.656480 Share This Article: Copy Citation Tools KODA: An Agentic Framework for KEGG Orthology-Driven Discovery of Antimicrobial Drug Targets in Gut Microbiome Javad Aminian-Dehkordi , Mohammad Parsa , Mohsen Naghipourfar , Mohammad R.K. Mofrad bioRxiv 2025.05.27.656480; doi: https://doi.org/10.1101/2025.05.27.656480 Citation Manager Formats BibTeX Bookends EasyBib EndNote (tagged) EndNote 8 (xml) Medlars Mendeley Papers RefWorks Tagged Ref Manager RIS Zotero Tweet Widget Facebook Like Google Plus One Subject Area Bioinformatics Subject Areas All Articles Animal Behavior and Cognition (7629) Biochemistry (17660) Bioengineering (13881) Bioinformatics (41910) Biophysics (21436) Cancer Biology (18576) Cell Biology (25480) Clinical Trials (138) Developmental Biology (13368) Ecology (19887) Epidemiology (2067) Evolutionary Biology (24302) Genetics (15598) Genomics (22482) Immunology (17726) Microbiology (40360) Molecular Biology (17163) Neuroscience (88534) Paleontology (666) Pathology (2830) Pharmacology and Toxicology (4821) Physiology (7637) Plant Biology (15129) Scientific Communication and Education (2045) Synthetic Biology (4290) Systems Biology (9817) Zoology (2269)

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00