An accessible, non-invasive tool for endometriosis diagnosis reveals an association between age at symptom onset and endometriosis symptom prevalence

article OA: green CC0
AI-generated summary by claude@2026-06, 2026-06-08

This study developed an 81.76% accurate non-invasive symptom-based predictive model for endometriosis, finding that age at symptom onset correlates with the prevalence of specific symptoms like dyspareunia and pelvic pain.

One-sentence paraphrase of the abstract; not a substitute for reading it. No clinical advice. How this works

AI-generated deep summary by claude@2026-06, 2026-06-12 · read from full text

Using data from a Puerto Rico endometriosis patient registry (1560 participants who underwent diagnostic laparoscopy, including 1189 surgically confirmed endometriosis cases and 371 controls with benign diagnoses), the study analyzed 230 self-reported demographic and symptom variables to build logistic-regression models and to examine how symptom prevalence relates to age at symptom onset. After removing missing/unknown and highly correlated variables, the authors identified predictors distinguishing cases from controls and obtained a symptom-only non-invasive diagnostic model with AUC 77.30% (95% CI 66.94%–87.66%), sensitivity 80.46%, and specificity 78.28% in an unbalanced model, applying cross-validation and Bonferroni correction for multiple testing. The paper also modeled how age at onset categories (teens, 20s, 30s) relate to predictor variables, and reports that cases were younger than controls at the time of the study while showing higher prevalence of incapacitating pain, dyspareunia, and difficulty getting pregnant. A key limitation explicitly stated is that the dataset reflects surgically evaluated chronic pelvic pain/infertility populations from one registry, which may constrain generalizability. This paper is centrally about endometriosis — it develops a non-invasive symptom-based diagnostic tool and analyzes age at symptom onset as it relates to endometriosis symptom prevalence in surgically confirmed patients.

Read from the paper's body, not the abstract. Not a substitute for reading the paper. No clinical advice. How this works

Abstract

OBJECTIVE: To determine what symptom differences are prevalent in patients with differing ages of endometriosis symptom onset. MATERIAL AND METHODS: We obtained clinical and demographic data from 1560 individuals with suspected pelvic conditions undergoing laparoscopy from the Endometriosis Patient Registry at Ponce Health Science University-Ponce Research Institute. We then generated predictive models by fitting logistic regressions to the patient data. We determined association between symptoms and age at symptom onset in patients with endometriosis by generating predictive linear and multinomial logistic regression models. RESULTS: Our best model had an accuracy of 81.76%, with a sensitivity of 89.32% and a specificity of 64.57% at an optimal threshold of 0.75. Classic endometriosis symptoms such as dyspareunia and pelvic pain showed different prevalence rates based on patient age at onset of symptoms. CONCLUSION: Symptom-based predictive models are able to predict patients' likelihood of having endometriosis in a non-invasive and accessible manner. Gynecologic and pelvic symptoms including dyspareunia and presence of uterine fibroids are significantly associated with age at symptom onset.
Full text 19,976 characters · extracted from pmc-nxml · 4 sections · click to expand

Results

In order to determine which variables would be best at distinguishing between patients with (“cases”) and without endometriosis (“controls”), and to add to the literature of demographics and symptoms associated with having an endometriosis diagnosis, we examined 122 variables including demographics, clinical history, and symptoms for statistically significant differences between 1189 surgically-confirmed endometriosis cases and 371 controls confirmed not to have endometriosis ( Table S3 ). We identified several demographic variables that were significantly different between cases and controls in our dataset. We found that cases were significantly younger than controls at the time of the study (Wilcoxon Test W = 255,042.5, p < 1.8 × 10 −18 ; Table S3 ). We looked for differences in reported symptoms and clinical history between individuals with and without endometriosis. We did not find any significant differences between cases and controls in reported age at menarche, cycle regularity, menstrual cycle length, or period length ( Table S3 ). Cases showed a higher prevalence of diabetes ( X 2 = 15.00, p < 0.0005), problems getting pregnant ( X 2 = 122.26, p < 0.0005), incapacitating pain ( X 2 = 83.86, p < 0.0005) and dyspareunia ( X 2 = 106.98, p < 0.0005; Table S3 ). Like previous studies, 34 , 39 – 48 80% of cases reported dysmenorrhea, 59% reported incapacitating pelvic pain, 50% reported dyspareunia, and 43% reported difficulty becoming pregnant. To develop a non-invasive diagnostic test for endometriosis, we used logistic regression to distinguish cases from controls in our cohort based on self-reported symptoms alone. We included the 18 variables found to be significantly different ( p < 0.001; Table 1 ) between cases and controls in our models and performed k-fold cross validation. For the statistical approach, we used the full, unbalanced dataset ( n = 374), which uses all cases and controls in one model iteration and obtained an AUC (area under the curve) of 77.30% (95% CI: 66.94%–87.66%), sensitivity of 80.46%, specificity of 78.28% at an optimal threshold of 0.75, PPV of 50.9%, and NPV of 90.5% ( Tables 2 and S5 ). Patients with endometriosis were twice as likely to have a higher level of education, with having a master’s degree being over 10 times more likely, as well as having chronic pelvic pain, dyspareunia, and a family history of endometriosis ( Table S5 ). To assess whether the large proportion of cases ( n = 1189) versus controls ( n = 371) in our dataset might bias our model, we created a subset of models, henceforth referred to as balanced models, that had an equal number of cases and controls. We ran a logistic regression on 10 iterations of a balanced dataset with a random subsample of cases ( n = 90) and controls ( n = 90). These models had a mean AUC of 71.81% (95% CI: 66.97%–76.65%), mean sensitivity of 77.56%, a mean specificity of 6911% at an optimal mean threshold of 0.551, a mean PPV of 71.92%, and a mean NPV of 28.15% ( Tables 2 and S6 ). Patients with endometriosis were more than twice as likely to experience dysmenorrhea, incapacitating pain, and dyspareunia. Cases were also twice as likely to have a family history of endometriosis, problems getting pregnant, a hysterectomy, and be currently taking OCP ( Table S6 ). Thus, our statistical model fitted on an unbalanced dataset is slightly more accurate, with a higher AUC than the full, balanced dataset. To help elucidate potential age-based differences, we used linear regression models to determine which variables were vary significantly with age at symptom onset in patients. Of the predictors from the balanced statistical model predicting age in years, we found the following significant variables: number of pregnancies, whether the patient is currently taking an oral contraceptive pill (OCP), presence of uterine fibroids, and whether the patient has had a hysterectomy ( Table 3 ). Within the multivariate logistic regressions, whether a patient is currently taking an OCP decreases with age, while presence of uterine fibroids, a greater number of pregnancies, and having a hysterectomy increases with age ( Table 3 ). We also found that incapacitating pain was more likely to occur in patients with symptom onset in their 20s compared to teens, as well as in the 30s group compared to the teens group ( Table 4 ). The supplemental material documents the results of the biological approach ( Tables S7 – S9 ).

Materials

We analyzed data collected from 2001 to 2010 as part of a Patient Registry at the Endometriosis Research Program of the Ponce Research Institute, in Ponce, Puerto Rico. The study design and scope were approved by an institutional review board committee. All participants signed a consent form prior to donation of tissues and data. This NIH-funded (grant #HD050559) patient registry consists of data from 1560 subjects in Puerto Rico who underwent diagnostic laparoscopy to identify the cause of chronic pelvic pain or infertility with or without pelvic pain. Of these subjects, 1189 were surgically and visually confirmed to have endometriosis at any stage (I–IV; “cases”) while 371 were diagnosed with benign gynecologic conditions (e.g. uterine fibroids, dysfunctional bleeding) and no visible endometriotic lesions at surgery (“controls”). Patient diagnoses were documented by a surgeon on a Surgery Report Form that recorded the size, depth, and extent of lesions and adhesions, presence of endometriomas, date of last menstrual period, and type of surgery, among other clinical findings. All study participants completed a self-administered survey documenting 230 variables that captured information on demographics, clinical history, symptoms, and treatments, including the age at which symptoms began, the extent to which severe pain prevented or impaired daily activities (incapacitating pain), and questions regarding problems getting pregnant ( Supplemental File 2 ). First, we removed variables with high rates (>50%) of “unknown” or missing responses. Second, we removed highly correlated variables (Spearman’s ρ > 0.7 or <−0.7), beginning with variables that were highly correlated with two or more other variables. For variables highly correlated with only one other variable, we removed the variable that had a greater amount of missing data. If two variables had an equal number of missing observations, we removed the variable encompassed in the other variable (e.g. hypothyroidism is encompassed in thyroid problems). Finally, due to the wide variety of conditions reported in this dataset, we grouped co-morbidities into categories based on the broader systems they affected, including cardiovascular, gynecological, musculoskeletal, gastrointestinal, and autoimmune ( Table S1 ). These steps reduced the number of variables to consider from the initial total of 230 to 122 ( Table S2 ). We used two approaches to generate a predictive model. One approach used a series of statistical tests to determine significant predictors unique to this dataset—hereafter referred to as the “statistical model.” For the second approach, we referred to prior studies that showed high correlations between specific symptoms and endometriosis to determine predictors—hereafter referred to as the “biological model.” The methods and results for the biological approach are detailed in the Supplemental Material . To select predictors for the statistical model, we tested 122 variables for statistical significance between cases and controls using a Wilcoxon Rank Sum Test for numeric variables and a Pearson’s Chi-Squared test with Yates’ continuity correction for categorical variables ( Table S3 ). We implemented the Bonferroni correction to account for multiple testing, reducing our threshold for statistical significance from 0.05 to 0.0005. We developed predictive models using logistic regression. We included the 18 variables that were significantly different between cases and controls during our predictive model iterations ( Table 1 ). Following the TRIPOD guidelines, we created a model using complete observations from the entire dataset ( n = 374). 29 We refer to these models as using an unbalanced dataset, since they were generated with a greater number of cases ( n = 275) compared to controls ( n = 99). For 10 iterations of the models, using K-fold cross validation ( K = 2), we split the data into training and testing sets which contained 70% and 30% of the original data respectively. All models and their formulas are listed in Table S4 . To reduce bias, we also created balanced training datasets, sampling 90 cases and 90 controls for each of the 10 iterations. Only complete observations from the dataset were used in the iterative models, and categorical variables with substantial sparse information (i.e. education, health insurance) were removed, reducing the number of variables to 17 ( Table 1 ). We analyzed the performance of all models with receiver-operator curves (ROC) and the area under the curve (AUC) statistic. We calculated sensitivity, specificity, positive predictive values (PPV), and negative predictive values (NPV) using confusion matrices. 30 We calculated plausible risk ratios from odds ratios generated in all logit models. 31 We determined the thresholds for each model by calculating Youden’s J statistic. 32 , 33 We used binomial exact tests to calculate the 95% confidence interval (CI) for the AUC values. 32 We analyzed the relationship between occurrence of patient symptoms and age. We performed linear regressions with patient age as the response variable and the 16 predictor variables included in the balanced statistical model ( Table 1 ). To further analyze the differences between symptoms and age at symptom onset, we grouped patient ages into three categories: “teens” (patients whose symptoms started before the age of 20), “20s” (patients whose symptoms began between the ages of 20–29), and “30s” (patients whose symptoms began at 30 years old or later). We then ran a multinomial logistic regression on the statistical and biological predictors using the teens group as a reference group and calculated the odds ratios for each variable ( Tables 4 and S7 ). All statistical tests were performed in R programming language 34 using the tidyverse, 35 dplyr, 36 ggpubr, 37 kable-Extra, 38 pROC, 39 and caret 40 packages.

Discussion

We developed symptom-based models to assess differences in symptom prevalence in patients with endometriosis with differing demographics. To guide our variable selection, we created predictive models using patient-reported symptoms to determine the likelihood of a patient having endometriosis in a cohort of Hispanic patients. Our model based on 18 statistically-determined, dataset-specific, patient-reported variables can detect all stages of endometriosis without lesion localization limitations. Importantly, these variables can be obtained from a patient questionnaire independently of a clinician’s assessment, clinical exam, or imaging. We found significant differences in demographic variables and clinical variables between cases and controls. Our data showed significant differences in demographic variables between cases and controls like “cities,” “regions,” and “education,” which may indicate socio-economic factors associated with endometriosis ( Table S3 ). The age at onset of endometriosis symptoms can range from pre-menarche in adolescents to post-menopause. 48 To our knowledge, few endometriosis studies to date have analyzed patient data based on age at onset of symptoms. We found that the number of pregnancies a patient has had is significantly correlated with age for both variable selection approaches ( Tables 3 and S9 ). Additionally, both approaches showed increased occurrence of dysmenorrhea, gynecological and pelvic symptoms and presence of uterine fibroids with age ( Tables 3 and S9 ), which is consistent with previous studies. 21 – 23 Additional correlates with age at symptom onset within the statistical approach for variable selection include currently taking OCP, having uterine fibroids, and having had a hysterectomy. The correlation between current OCP use and age at symptom onset is unfortunately difficult to interpret because we lack information on both the timing of OCP use relative to the onset of symptoms and the reason patients are using OCPs. We therefore cannot disentangle whether OCP use prevented symptoms or if OCP use was prompted by the onset of symptoms. Patient-reported gynecological and pelvic symptoms associated with younger age at symptom onset might indicate a need to survey patients to earlier diagnose and treat endometriosis in younger populations. We observed differences between symptom prevalence in different age groups. Cases in the teens group were less likely to have dyspareunia than patients in the 20s group ( Table 4 ). These results open the possibility for development of targeted clinical evaluations and predictive models for patients based on age at symptom presentation. Future studies should perform analyses that stratify data based on age at onset of symptoms to confirm whether significant differences exist between these groups and to determine how those differences can inform clinical practice. We found discrepancies between our data and previous findings. Previous studies have reported cardiovascular and gastrointestinal diseases resulting from increased inflammation are comorbid with endometriosis as well as a higher incidence of uterine fibroids. 30 , 48 However, our study did not uncover a significant difference between cases and controls for these conditions. Additionally, previous studies have used stringent requirements for symptom inclusion in generating predictive models, while our predictive models did not use the same exclusion criteria, making our model more accessible in clinical settings. 43 The discrepancies between our findings and those previously reported are likely due to our controls having a suspected pelvic condition and experiencing symptoms like bloating, abdominal pain, and inflammation due to uterine fibroids or cardiovascular conditions, unlike asymptomatic, healthy controls used in other studies. Current screening tools for early diagnosis of endometriosis generated using machine learning algorithms included patient-reported yes/no questionnaires, age, BMI, pain types and absenteeism and report AUCs ranging from 0.5 to 0.82 with sensitivities and specificities ranging from 0.81 to 0.95 and 0 to 0.8 respectively. 13 , 18 , 19 , 22 While previous models are similar to the predictive models we used to guide variable selection in the number of predictors and performance, our model has key advantages. Our model can predict endometriosis at any stage of disease and do not require imaging or laboratory-based tests. This latter feature reduces the cost of our model and makes the screening process more convenient for patients. One limitation to our study is that all study subjects were from a single country and ethnicity. Since endometriosis affects people of all races worldwide, follow-up studies should focus on validating this model on a sample that better represents all racial, ethnic, and geographic backgrounds. 9 , 48 , 49 Additionally, our study had a much larger case sample ( n = 1182) than control sample ( n = 371) and only contained symptomatic patients actively pursuing treatment, so our results may not apply to asymptomatic patients. The predictive model should therefore be validated on a larger, balanced dataset. Here, we present one of the first investigations of age at symptom onset to determine symptom prevalence in patients with endometriosis. Additionally, we created an inclusive and inexpensive symptoms-based predictive model available for all stages of endometriosis, that validates significant clinical symptoms through unbiased statistical variable selection. Our data suggest that symptoms-based predictive models that use demographics as well as patient-reported clinical history and symptoms can be developed to diagnose endometriosis with 80% accuracy. The use of controls who were also being evaluated for pelvic conditions through laparoscopy may be more helpful in understanding variables that are important in diagnosing endometriosis, even in individuals with classical symptoms of the disease. Additionally, we provide evidence that differences exist in the prevalence of certain symptoms or patient characteristics based on the age at onset of symptoms, indicating the need for new, potentially informative age-specific diagnostic models.

Introduction

Endometriosis is a widespread chronic disease characterized by the growth of endometrial-like tissue outside of the uterine cavity. 1 Endometriosis presents as dysmenorrhea, non-cyclic pelvic pain, or dyspareunia, abnormal uterine bleeding, infertility, low back pain, fatigue, and gastrointestinal/urinary manifestations. 1 Endometriosis affects 6%–10% of women of reproductive age worldwide and an unknown number of transmasculine and non-binary individuals. 2 – 4 Despite its prevalence, endometriosis is challenging to diagnose. On average there is a 7-year delay between the onset of symptoms and diagnosis. 1 The current gold standard for endometriosis diagnosis is exploratory inspection of the abdominopelvic cavity through laparoscopic surgery with histopathological confirmation of ectopic endometrial-like tissue. 1 This approach is limited in that laparoscopic findings poorly correlate to pain severity, especially for minimal-mild (Stage I–II) disease. 5 , 6 Although laparoscopy is considered relatively safe, it is an invasive procedure that can add psychological strains and economic burden due to interruptions of daily life. 7 Moreover, current professional standards recommend reducing the reliance on laparoscopic diagnosis and opting for empirical treatment based on symptoms regardless of a positive ultrasound examination. 8 The lack of non-invasive, cost-effective diagnostic tools for endometriosis has prompted multiple analyses of endometriosis symptoms and patient demographics to develop symptom-based models to predict likelihood of endometriosis from self-reported symptoms and clinical variables, with areas under the curve (AUCs) ranging from 0.74 to 0.83 and sensitivities ranging from 10% to 90.2%. 2 , 9 – 25 These studies support the possibility of using a symptom-based screening tool to accurately diagnose endometriosis and calculate individualized risk. Currently, no predictive model for endometriosis has been deemed sufficiently accurate for clinical use. Recently, Konrad et al., generated a predictive model with 90% sensitivity, and 75% specificity, though according to Nisenblat et al., a clinically useful diagnostic test should have higher sensitivity and specificity than laparoscopic surgery, which has a sensitivity of 94% and a specificity of 79%. 21 , 26 – 28 Common limitations of current diagnostic models include long questionnaires and identification of predictors only at specific sites, such as the ovary or pelvic peritoneum, or for specific stages of endometriosis. 16 Advances in non-invasive methods to diagnose endometriosis are necessary to reduce diagnosis delays, biopsychosocial impact for the patients, and healthcare costs. Moreover, few studies have focused on the relationship between the age at symptom onset and symptom outcomes in patients with and without endometriosis. Here, we analyze multiple common risk factors, endometriosis symptoms, and demographic variables that are associated with an increased likelihood of endometriosis for patients with surgically confirmed endometriosis or lack thereof. We describe the relationship between these factors with age at symptom onset, which could help decrease the diagnostic delay that patients face, lead to prompt treatment, and improve quality of life for these individuals.

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: pmc-nxml

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Condition tags

endometriosisdyspareunia

Citation neighborhood

Papers in the corpus that this work cites (lower rings, blue) and that cite this one (upper rings, green). Dot size scales with the paper's in-corpus citation count — bigger dot = more influential within the endo/adeno field. Click a dot to open that paper. [ expand to 2 hops ] — adds papers reached through this work's immediate citers/citees. Heavier; up to 60 extra dots.

References (44)

Source provenance

europepmc
last seen: 2026-06-14T06:08:20.186862+00:00
openalex
last seen: 2026-06-10T17:14:06.276822+00:00
pubmed
last seen: 2026-06-14T06:06:10.573907+00:00
License: CC0 · commercial use OK