Revisiting the Risk Factors for Endometriosis: A Machine Learning Approach

article OA: gold CC0 ⤵ 22 in-corpus citations
AI-generated summary by claude@2026-06, 2026-06-07

This study developed a machine learning model using UK Biobank data to predict endometriosis, identifying irritable bowel syndrome and menstrual cycle length as significant risk factors.

One-sentence paraphrase of the abstract; not a substitute for reading it. No clinical advice. How this works

Abstract

Endometriosis is a condition characterized by implants of endometrial tissues into extrauterine sites, mostly within the pelvic peritoneum. The prevalence of endometriosis is under-diagnosed and is estimated to account for 5-10% of all women of reproductive age. The goal of this study was to develop a model for endometriosis based on the UK-biobank (UKB) and re-assess the contribution of known risk factors to endometriosis. We partitioned the data into those diagnosed with endometriosis (5924; ICD-10: N80) and a control group (142,723). We included over 1000 variables from the UKB covering personal information about female health, lifestyle, self-reported data, genetic variants, and medical history prior to endometriosis diagnosis. We applied machine learning algorithms to train an endometriosis prediction model. The optimal prediction was achieved with the gradient boosting algorithms of CatBoost for the data-combined model with an area under the ROC curve (ROC-AUC) of 0.81. The same results were obtained for women from a mixed ethnicity population of the UKB (7112; ICD-10: N80). We discovered that, prior to being diagnosed with endometriosis, affected women had significantly more ICD-10 diagnoses than the average unaffected woman. We used SHAP, an explainable AI tool, to estimate the marginal impact of a feature, given all other features. The informative features ranked by SHAP values included irritable bowel syndrome (IBS) and the length of the menstrual cycle. We conclude that the rich population-based retrospective data from the UKB are valuable for developing unified machine learning endometriosis models despite the limitations of missing data, noisy medical input, and participant age. The informative features of the model may improve clinical utility for endometriosis diagnosis.

My notes (saved in your browser only)

Condition tags

endometriosisirritable_bowel_syndrome

Citation neighborhood

Papers in the corpus that this work cites (lower rings, blue) and that cite this one (upper rings, green). Dot size scales with the paper's in-corpus citation count — bigger dot = more influential within the endo/adeno field. Click a dot to open that paper. [ expand to 2 hops ] — adds papers reached through this work's immediate citers/citees. Heavier; up to 60 extra dots.

References (66)

Cited by (22)

Source provenance

europepmc
last seen: 2026-06-04T01:30:01.192114+00:00
openalex
last seen: 2026-06-04T00:00:01.174412+00:00
pubmed
last seen: 2026-05-27T00:34:36.928807+00:00
License: CC0 · commercial use OK