RNA foundation models enable generalizable endometriosis disease classification and stable gene-level interpretation

article OA: green CC0
📄 Open PDF View on OpenAlex View at publisher
AI-generated summary by claude@2026-06, 2026-06-07

RNA foundation models significantly improved endometriosis classification across independent cohorts, and a new interpretability method revealed conserved, biologically plausible predictive genes.

One-sentence paraphrase of the abstract; not a substitute for reading it. No clinical advice. How this works

Abstract

Abstract Endometriosis is a chronic inflammatory condition with significant diagnostic delays impacting one in ten reproductive age women worldwide. While machine learning (ML) models trained on transcriptomic data show promise for disease prediction, limited generalizability across independent patient cohorts has hindered clinical translation. Foundations models (FMs) pretrained on large-scale transcriptomic data offer promise to learn transferrable, biologically meaningful representations that could support cross-cohort predictions. We assembled a 12-cohort bulk RNA-seq benchmark (334 samples) and developed a computationally efficient pipeline to test whether FMs improve endometriosis classification, an approach not previously applied to this disease. Using AutoXAI4Omics with cohort-aware validation, we compared embeddings derived from five state-of-the-art RNA FMs against TPM baselines. In cross-cohort prediction, FM embeddings significantly improved performance, achieving a weighted F1-score of 0.83 vs. 0.68 for the baseline. To allow gene-level interpretation of FM embedding models, we introduce classified-aligned integrated gradients (CA-IG), an interpretability approach aligning gene-level attributions to the downstream classifier without end-to-end finetuning. CA-IG revealed a conserved set of predictive genes from FM embeddings across cohort-validation regimes, contrasting with unstable baseline explainability, suggesting that FM embeddings prioritized transferable disease-related signal over cohort-specific effects. These genes include novel candidates that converge on biologically plausible pathways for endometriosis.

My notes (saved in your browser only)

Condition tags

endometriosis

Citation neighborhood

Papers in the corpus that this work cites (lower rings, blue) and that cite this one (upper rings, green). Dot size scales with the paper's in-corpus citation count — bigger dot = more influential within the endo/adeno field. Click a dot to open that paper. [ expand to 2 hops ] — adds papers reached through this work's immediate citers/citees. Heavier; up to 60 extra dots.

References (52)

Source provenance

europepmc
last seen: 2026-06-04T01:45:00.660873+00:00
openalex
last seen: 2026-06-04T00:00:01.174412+00:00
License: CC0 · commercial use OK