Deciphering the links between metabolism and health by building tailored knowledge graphs: application to endometriosis and persistent pollutants

article preprint OA: green CC0
AI-generated summary by claude@2026-06, 2026-06-07

This study presents Kg4j, a framework for creating tailored biomedical knowledge graphs from large databases, which, when applied to endometriosis and persistent organic pollutants, identified known associations and suggested new hypothetical links.

One-sentence paraphrase of the abstract; not a substitute for reading it. No clinical advice. How this works

Abstract

Abstract Motivation Knowledge graphs (KGs) are a robust formalism for structuring biomedical knowledge, but large-scale KGs often require complex queries, are difficult for non-experts to explore, and lack real-world context (such as experimental data, clinical conditions, patients symptoms). This limits their usability for addressing specific research questions. Results We present Kg4j, a computational framework built on FORVM (a large-scale KG containing 82 million compound-biological concept associations), that constructs local, keyword-based sub-graphs tailored to address biomedical research questions. Resulting graphs support hypothetical relationships and can integrate experimental datasets, enabling the discovery of plausible but yet unknown connections. Starting from a conceptual definition of a research field of interest (e.g., disease, symptoms, exposure), the framework extracts relevant associations from FORVM and identifies potential biological mechanisms and chemical compounds. We applied this approach to endometriosis, exploring links between exposure to Persistent Organic Pollutants (POPs) and disease risk. We propose a novel validation strategy comparing the resulting sub-graph (2,706 nodes and 23,243 edges, 0.002% of FORVM) with recent scientific literature, showing consistency with known findings while also revealing new hypothetical associations requiring further investigation.We also showed that removing duplicated nodes and edges from the KG improves the proportion of validated nodes (from 8.4% to 16%), doubles the precision (from 0.085 to 0.197) while maintaining the recall (0.954 to 0.952), illustrating a trade-off between the loss of potentially relevant but redundant information and the reliability of remaining associations. By combining automated knowledge mining with experimental data integration, this framework supports reproducible, context-based exploration of biomedical knowledge and systematic hypothesis generation. Applied to endometriosis, it highlights potential mechanisms linking exposure to POPs to the aetiology of the disease, offering a scalable strategy for constructing disease-specific KGs. Availability The code and data underlying this article are available in the MetExplore repository at https://forge.inrae.fr/metexplore/kg4j . Contact [email protected] Key Messages Kg4 builds targeted knowledge maps from large biomedical databases using simple keywords. Keyword-driven exploration reveals the most relevant disease–exposure relationships without navigating millions of connections. Applied to endometriosis, the method recovered known links with persistent organic pollutant exposure. Removing redundant information and formatting Knowledge graph as Labeled Property Graph improves the reliability of extracted knowledge.

My notes (saved in your browser only)

Condition tags

endometriosis

Citation neighborhood

Papers in the corpus that this work cites (lower rings, blue) and that cite this one (upper rings, green). Dot size scales with the paper's in-corpus citation count — bigger dot = more influential within the endo/adeno field. Click a dot to open that paper. [ expand to 2 hops ] — adds papers reached through this work's immediate citers/citees. Heavier; up to 60 extra dots.

References (34)

Source provenance

europepmc
last seen: 2026-06-17T06:30:59.472361+00:00
openalex
last seen: 2026-06-10T17:14:06.276822+00:00
License: CC0 · commercial use OK