Methods
This international multicenter observational cohort study was conducted between August 2018 and November 2019 across six countries (Australia, Austria, Germany, Spain, Italy and Israel). Data were collected prospectively but analyzed retrospectively. The study followed the Standards for Reporting of Diagnostic Accuracy Studies guidelines
8
. A subset of the data ( n = 273) included in the current study were analyzed and reported in a separate prospective study evaluating the performance of ultrasound to predict DE, known colloquially as the ‘IDEA pilot study’
9
. Additional participants were recruited to the present study at Nepean Hospital, Kingswood, NSW, Australia, during the same time period and following the same inclusion and exclusion criteria as in the pilot study. After recruitment, patients underwent a transvaginal ultrasound (TVS) scan in accordance with the IDEA protocol, and laparoscopic surgery was scheduled if indicated by routine clinical management in order to excise all suspected endometriosis, unless the surgical risk outweighed the potential clinical benefit.
Eligible participants were aged 18–50 years, with clinical suspicion of endometriosis based on symptoms and/or signs, and were scheduled to undergo laparoscopy to excise endometriosis. Individuals were excluded if they had suspected or diagnosed malignancy, were premenarchal or postmenopausal, were currently pregnant, were unable to undergo TVS or had undergone surgery > 12 months after their most recent TVS scan. Informed verbal consent was obtained and documented for each participant. Ethical approval was granted by the Nepean Blue Mountains Local Health District, Sydney, NSW, Australia (reference: 16‐90LNR/16/NEPEAN/152).
TVS (the index test) was performed according to the IDEA consensus methodology, and findings were recorded contemporaneously in a local database. Ovarian endometrioma was reported in accordance with International Ovarian Tumor Analysis definitions and terminology
10
. All participants underwent laparoscopic excision of endometriosis by an experienced gynecological surgeon, with involvement of other appropriate specialists, such as urological, colorectal or upper gastrointestinal surgeons, when required. Following the four‐step IDEA approach, Step 1 was considered positive if endometrioma, hydrosalpinx or an ‘ear sign’ (referring to an anteverted but retroflexed uterus)
11
was noted; Step 2 was considered positive if either unilateral or bilateral ovarian fixation was recorded (site‐specific tenderness was not recorded consistently and thus not included as a positive finding); Step 3 was considered positive if there was a negative POD sliding sign, indicating an obliterated POD; and Step 4 was considered positive if any DE lesions were noted in any compartment of the pelvis. In our cumulative scoring system, a TVS scan would be considered positive if any of the included steps was scored as positive. For example, when evaluating the cumulative performance of TVS for Steps 1–3, the scan would be recorded as positive if the ovaries were immobile and the POD sliding sign was positive (i.e. no POD obliteration).
Endometriosis was excised rather than ablated for histological confirmation of the diagnosis, and no biopsies were taken without visual suspicion of endometriosis. The definition of DE described by Tomassetti et al .
3
is in line with surgeons' understanding of DE in this study, albeit this succinct definition was published after the data herein were collected. Tomassetti et al .
3
highlight the inaccuracy associated with the historical distinction between superficial endometriosis (SE) and DE on the basis of the histological depth of endometrial‐like tissue. As there is no international consensus on the histological distinction between SE and DE, and no such delineation was recorded in our retrospective database, we adopted visual diagnosis at laparoscopy as the reference test instead of histology. For the purpose of analysis, results were dichotomized as DE present at laparoscopy or DE not present at laparoscopy. Participants with endometrioma but no other DE lesions were recorded as DE not present.
Sensitivity, specificity, accuracy, positive (PPV) and negative (NPV) predictive values and positive (LR+) and negative (LR−) likelihood ratios for individual and cumulative steps of the IDEA protocol in diagnosing DE were calculated with corresponding 95% CIs. Statistical analysis was performed using Excel version 16.78.3 (Microsoft Corp., Redmond, WA, USA). Participants with missing or incomplete data were excluded from the analysis. If we were unable to clarify indeterminate results upon review, this was noted as an incomplete dataset and data were excluded accordingly.
Results
Of 640 participants recruited initially, 173 were excluded owing to incomplete ultrasound data, leaving 467 participants for analysis (Figure 1 ). The mean ± SD age of the participants was 33.6 ± 6.3 years. DE was diagnosed and excised at laparoscopy in 255 (54.6%) participants. Histology reports were available for 273 (58.5%) participants. Endometriosis was confirmed on histology in 255 (93.4%) of these patients. However, given that the histology reports did not differentiate between DE and SE, which is problematic for diagnostic accuracy assessment for the prediction of DE, the reference standard used in our analysis was visual diagnosis at laparoscopy.
Flowchart summarizing inclusion of patients in study population. DE, deep endometriosis; TVS, transvaginal ultrasound.
The diagnostic performance of the four steps of the IDEA consensus protocol in isolation and in combination is summarized in Table 1 . With the addition of each step, the sensitivity of TVS in identifying DE increased progressively (Step 1 only, 0.65 (95% CI, 0.59–0.71); Steps 1–4, 0.94 (95% CI, 0.90–0.97)), as did the accuracy (Step 1 only, 0.71 (95% CI, 0.67–0.75); Steps 1–4, 0.78 (95% CI, 0.74–0.81)), whereas the specificity decreased progressively (Step 1 only, 0.78 (95% CI, 0.72–0.84); Steps 1–4, 0.58 (95% CI, 0.52–0.65)). PPV decreased slightly with the addition of each step (Step 1 only, 0.78 (95% CI, 0.73–0.82); Steps 1–4, 0.73 (95% CI, 0.70–0.76)). Conversely, NPV increased with the addition of each step (Step 1 only, 0.65 (95% CI, 0.61–0.69); Steps 1–4, 0.89 (95% CI, 0.83–0.92)). LR− for the four‐step protocol was 0.10 (95% CI, 0.06–0.17), indicating a 10‐fold decrease in the likelihood of DE being present if there is no positive finding at TVS. Using the cumulative approach, the greatest incremental increase in sensitivity and decrease in specificity was seen with the addition of Step 2, with respective changes of +0.17 and –0.12 compared with Step 1 alone.
Diagnostic performance of individual and cumulative steps of International Deep Endometriosis Analysis four‐step consensus protocol for diagnosing deep endometriosis on transvaginal ultrasound
Data in parentheses are 95% CI. Step 1 includes evaluation of the uterus and adnexa, including the presence or absence of endometrioma; Step 2 assesses ‘soft markers’, including ovarian mobility and site‐specific tenderness; Step 3 assesses the pouch of Douglas sliding sign; and Step 4 searches for deep endometriosis nodules in the anterior and posterior compartments. LR+, positive likelihood ratio; LR−, negative likelihood ratio; NPV, negative predictive value; PPV, positive predictive value.
When assessing each step individually, Step 4 had the highest sensitivity (0.82 (95% CI, 0.77–0.86)) and accuracy (0.83 (95% CI, 0.79–0.86)), while Step 3 had the highest specificity (0.89 (95% CI, 0.83–0.93)), albeit with low sensitivity that matched that of Step 1 (0.65 (95% CI, 0.58–0.71)). PPV was highest for Step 3, at 0.87 (95% CI, 0.82–0.91), while NPV was highest for Step 4, at 0.79 (95% CI, 0.75–0.84). Similarly, LR+ was highest for Step 3 and LR− was lowest for Step 4, with respective values of 5.75 (95% CI, 3.90–8.47) and 0.21 (95% CI, 0.16–0.28).
When combining Step 1 with each of Steps 2, 3 and 4 individually, the sensitivity was highest for the combination of Steps 1 and 4 (0.93 (95% CI, 0.89–0.95)), the specificity was highest for the combination of Steps 1 and 3 (0.74 (95% CI, 0.67–0.79)), PPV was highest for the combination of Steps 1 and 3 (0.79 (95% CI, 0.75–0.82)) and NPV was highest for the combination of Steps 1 and 4 (0.88 (95% CI, 0.83–0.92)). Both the combination of Steps 1 and 3 and that of Steps 1 and 4 demonstrated superior diagnostic performance compared with the combination of Steps 1 and 2.
Using Steps 1, 3 and 4 as a cumulative three‐step test within our study cohort yielded a sensitivity of 0.94 (95% CI, 0.90–0.96), specificity of 0.67 (95% CI, 0.60–0.73), accuracy of 0.81 (95% CI, 0.78–0.85), PPV of 0.77 (95% CI, 0.74–0.80), NPV of 0.90 (95% CI, 0.84–0.93), LR+ of 2.80 (95% CI, 2.32–3.39) and LR− of 0.09 (95% CI, 0.06–0.15). Thus, this strategy demonstrated better diagnostic performance compared with the four‐step protocol.
Discussion
This is the first analysis of the diagnostic accuracy of each step of the four‐step IDEA consensus protocol in predicting the presence of DE at the time of laparoscopy. With the addition of each step, we noted a progressive increase in sensitivity for predicting DE. However, this came at the cost of decreasing specificity. With a sensitivity of 0.94 (95% CI, 0.90–0.97) within our study cohort, the four‐step protocol performed well in identifying patients with DE. However, the specificity of 0.58 (95% CI, 0.52–0.65) indicates that, in a significant number of patients with a positive TVS result, no DE is identified at the time of operative laparoscopy. Importantly, the NPV of 0.89 (95% CI, 0.83–0.92) and LR− of 0.10 (95% CI, 0.06–0.17) indicate that a negative four‐step TVS assessment is a good predictor of the absence of DE at the time of laparoscopy. Indeed, the European Society of Human Reproduction and Embryology clinical guidelines for endometriosis recommend imaging to replace laparoscopy as the gold standard for diagnosis
5
. The high NPV observed in our study suggests that a four‐step TVS assessment may be helpful for patient counseling around surgical procedures. However, the limitations of imaging should be acknowledged, including possible false negatives in predicting DE and the high false‐positive rate in predicting SE, particularly when an individual's symptoms are not managed adequately by empirical treatment despite a negative four‐step TVS assessment
12
.
There are mixed reports of interobserver reproducibility for evaluating ovarian/adnexal mobility or adhesions using TVS, with Guerriero et al .
13
reporting a Cohen's kappa (κ) of 0.5 and Holland et al .
14
reporting a κ of 0.93. In addition, the diagnostic accuracy of ovarian mobility on TVS seems to depend on the presence of ovarian endometrioma, as the PPV of unilateral ovarian immobility was reported to be < 10% for normal ovaries
15
. Therefore, there was reason to consider excluding Step 2 of the IDEA protocol, in which soft markers are assessed, during our diagnostic accuracy analysis. Compared with the four‐step protocol, a three‐step protocol excluding Step 2 had the same sensitivity, but demonstrated a marginal improvement in specificity, accuracy, PPV, NPV, LR+ and LR− for DE at laparoscopy within this study cohort. However, the exclusion of Step 2 may have significant negative clinical impact, despite improving the diagnostic accuracy for DE. The utility of assessing adnexal mobility is demonstrated by the association between a negative ovarian sliding sign and subsequent requirement for ureterolysis at surgery, which is a procedure of higher surgical complexity
16
,
17
. Therefore, in our opinion, this step should not be omitted from the protocol.
As a retrospective analysis, our study is inherently less robust compared with a prospective cohort study. We were limited in our ability to obtain histological reports, and those that were accessible did not distinguish between SE and DE. As the IDEA consensus statement describes DE and not SE, we adopted visual assessment at laparoscopy as the reference standard. It is important to note that all patients in this cohort were recruited from tertiary endometriosis ultrasound and surgery units, where the prevalence of the disease is much higher compared with that in the general population, which probably limits the generalizability of our findings to a low‐incidence population. Laparoscopy as the reference standard also has limitations. Endometriotic lesions exhibit significant variation in appearance, including in color, size, shape and depth. The diagnosis of endometriosis during laparoscopy depends solely on the surgeon's visual judgement. To minimize bias, surgeons should follow consensus definitions and terminology when diagnosing and classifying DE vs SE at laparoscopy. This approach helps to ensure that preoperative imaging and surgical observations are evaluated more accurately
3
. Although the exclusion of 27% of participants owing to incomplete data renders our study at risk of attrition bias, these cases were assessed by clinicians who did not record both uterine version and flexion as part of their scanning protocol. Therefore, their exclusion was not related to patient or disease factors.
Despite these limitations, our study utilized prospectively recorded data, which increased the accuracy of data collection, and recruited a large number of participants across multiple sites, which improved the generalizability of our findings across populations. Following the reporting language outlined in the IDEA consensus statement improved the homogeneity of data across recruitment sites.
In considering variations of the comprehensive TVS scan, we observed that the addition of Step 4 alone to Step 1 increased the sensitivity from 0.65 (95% CI, 0.59–0.71) to 0.93 (95% CI, 0.89–0.95), decreased the specificity from 0.78 (95% CI, 0.72–0.84) to 0.68 (95% CI, 0.63–0.75), and improved the performance of negative findings, with a NPV of 0.88 (95% CI, 0.83–0.92) and LR− of 0.11 (95% CI, 0.07–0.17) (Table 1 ). Regarding the diagnostic utility of each standalone step, Step 4 had the highest sensitivity and is therefore a critical component of the scan.
In conclusion, our findings support our hypothesis that, with the addition of each step of the four‐step IDEA consensus protocol for pelvic examination using TVS, there is an incremental improvement in diagnostic accuracy for the prediction of DE. With the addition of each step, there is a progressive increase in cumulative sensitivity, albeit at the expense of specificity. Meanwhile, each additional ‘negative’ step improves the predictive performance for the absence of DE. Thus, this study demonstrates that the complete four‐step TVS scan offers a significant improvement in diagnostic performance over a basic pelvic TVS scan or a limited protocol in which steps are omitted. While future prospective validation of the four‐step protocol would provide a more thorough and robust analysis, this study supports the consensus methodology as a robust investigative instrument for DE. In the future, analysis of the IDEA addendum to the consensus publication, which describes evaluation of the parametrium, should be conducted
18
.
S. Guerriero , Department of Obstetrics and Gynecology, University of Cagliari, Policlinico Universitario Duilio Casula, Cagliari, Italy
M. Zajicek , Obstetrics and Gynecology, Sheba Medical Center, Tel Hashomer, Tel Aviv, Israel
A. Dueckelmann , Department of Gynecology, Charité University Hospital, Berlin, Germany
F. Filippi , Centro Procreazione Medicalmente Assistita, Fondazione IRCCS Ca' Granda Ospedale Maggiore Policlinico di Milano, Milan, Italy
F. Buonomo , Institute for Maternal and Child Health, IRCCS ‘Burlo Garofolo’, Trieste, Italy
M. A. Pascual , Department of Obstetrics, Gynecology, and Reproduction, Hospital Universitari Dexeus, Barcelona, Spain
A. Stepniewska , Department of Obstetrics and Gynecology, Gynecology Oncology and Minimally Invasive Pelvic Surgery, International School of Surgical Anatomy, IRCCS ‘Sacro Cuore’ Don Calabria Hospital, Verona, Italy
M. Ceccaroni , Department of Obstetrics and Gynecology, Gynecology Oncology and Minimally Invasive Pelvic Surgery, International School of Surgical Anatomy, IRCCS ‘Sacro Cuore’ Don Calabria Hospital, Verona, Italy
T. Van den Bosch , Department of Obstetrics and Gynaecology, University Hospital Leuven and Department of Development and Regeneration, KU Leuven, Leuven, Belgium
D. Timmerman , Department of Obstetrics and Gynaecology, University Hospital Leuven and Department ofDevelopment and Regeneration, KU Leuven, Leuven, Belgium
G. Hudelist , Centre for Endometriosis, Hospital St John of God, Vienna, Austria; Rudolfinerhaus Private Clinic & Campus, Vienna, Austria
Introduction
Endometriosis occurs in up to 11% of the Australian female population, and may be difficult to identify early and without surgery
1
. Of those affected, up to 20% may have deep endometriosis (DE), which involves an element of anatomical distortion
2
,
3
. There has been significant progress towards the non‐invasive diagnosis of endometriosis, epitomized by the consensus statement of the International Deep Endometriosis Analysis (IDEA) group, which established a robust language for the sonographic diagnosis of DE, as well as clinical guidelines supporting the use of imaging for first‐line diagnosis
4
,
5
. According to the IDEA consensus statement, there are four basic sonographic steps for examining patients with clinical suspicion of endometriosis: Step 1 includes evaluation of the uterus and adnexa, including the presence or absence of endometrioma; Step 2 assesses ‘soft markers’, including ovarian mobility and site‐specific tenderness; Step 3 assesses the pouch of Douglas (POD) sliding sign; and Step 4 searches for DE nodules in the anterior and posterior compartments. The consensus statement protocol is utilized not only for diagnosing endometriosis, but also for surgical planning and triaging of procedures to an appropriate surgical team
6
. For example, a negative POD sliding sign in the third step accurately predicts the presence of POD obliteration, which may be secondary to intestinal endometriosis and thus require more complicated surgical planning
7
.
We hypothesized that, with each additional step of the IDEA protocol, there would be a corresponding improvement in cumulative diagnostic performance for DE. Delineating the incremental improvement in diagnostic accuracy offered by each additional step would have relevance for trainee sonologists on the learning curve, and would clarify the difference in performance between simple and comprehensive ultrasound scans.
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.