Results
The literature search of the Embase, Google Scholar, Medline, PubMed and Scopus databases, from inception to 1 May 2020 identified 1977 references. The PRISMA flow diagram in Fig. 1 represents the selection of studies. Of the 45 ( Fedele et al. , 1998 ; Dessole et al. , 2003 ; Delpy et al , 2005 ; Takeuchi et al. , 2005 ; Bahr et al. , 2006 ; Abrao et al. , 2007 ; Biscaldi et al. , 2007 ; Guerriero et al. , 2007 ; Griffiths et al. , 2008 ; Guerriero et al. , 2008 ; Ribeiro et al. , 2008 ; Valenzano Menada et al. , 2008 ; Bazot et al. , 2009 ; Hottat et al. , 2009 ; Hudelist et al. , 2009 ; Piketty et al. , 2009 ; Bergamini et al. , 2010 ; Chassang et al. , 2010 ; Faccioli et al. , 2010 ; Goncalves et al. , 2010 ; Grasso et al. , 2010 ; Pascual et al. , 2010 ; Ferrero et al. , 2011 ; Hudelist et al. , 2011 ; Fiaschetti et al. , 2012 ; Savelli et al. , 2012 ; Bazot et al. , 2013 ; Holland et al. , 2013 ; Hudelist et al. , 2013 ; Manganaro et al. , 2013 ; Stabile Ianora et al. , 2013 ; Leon et al. , 2014 ; Tammaa et al. , 2015 ; Baggio et al. , 2016 ; Menakaya et al. , 2016 ; Ferrero et al. , 2017 ; Guerriero et al. , 2017 ; Jiang et al. , 2017 ; Ros et al. , 2017 ; Alborzi et al. , 2018 ; Carfagna et al. , 2018 ; Di Giovanni et al. , 2018 ; Reid et al. , 2018 ; Zhang et al. , 2019 ; Barra et al. , 2020 ), there were 10 ( Pascual et al. , 2010 ; Hudelist et al. , 2011 ; Fiaschetti et al. , 2012 ; Bazot et al. , 2013 ; Holland et al. , 2013 ; Manganaro et al. , 2013 ; Tammaa et al. , 2015 ; Menakaya et al. , 2016 ; Alborzi et al. , 2018 ; Zhang et al. , 2019 ) which specifically assessed the USL, RVS and vagina that were included in the analysis after 2010.
Flow of studies identified in literature for systematic review on imaging modalities for the preoperative diagnosis of uterosacral ligament/torus uterinus, rectovaginal septum and vaginal deep endometriosis.
The 10 studies included a total of 1188 women with a median of 91.5 per study (range 23 to 317) ( Pascual et al. , 2010 ; Hudelist et al. , 2011 ; Fiaschetti et al. , 2012 ; Bazot et al. , 2013 ; Holland et al. , 2013 ; Manganaro et al. , 2013 ; Tammaa et al. , 2015 ; Menakaya et al. , 2016 ; Alborzi et al. , 2018 ; Zhang et al. , 2019 ). Of the 10 studies, seven were conducted in Europe, one in Asia, one in Australia and one in the Middle East.
A total of nine studies assessed USL DE (1150 participants) ( Hudelist et al. , 2011 ; Fiaschetti et al. , 2012 ; Bazot et al. , 2013 ; Holland et al. , 2013 ; Manganaro et al. , 2013 ; Tammaa et al. , 2015 ; Menakaya et al. , 2016 ; Alborzi et al. , 2018 ; Zhang et al. , 2019 ), of which seven studies assessed TVS (1085 women), of which five studies used two-dimensional (2D) TVS (568 participants) ( Hudelist et al. , 2011 ; Fiaschetti et al. , 2012 ; Holland et al. , 2013 ; Tammaa et al. , 2015 ; Zhang et al. , 2019 ), one used SVG ( Menakaya et al. , 2016 ) and one used TVS with BP ( Alborzi et al. , 2018 ). A total of four studies assessed MRI (440 women with 521 examinations included in the analysis due to two studies comparing more than one MRI technique with each woman), of which all four studies used 2D MRI ( Fiaschetti et al. , 2012 ; Bazot et al. , 2013 ; Manganaro et al. , 2013 ; Alborzi et al. , 2018 ), one used three-dimensional (3D) MRI ( Bazot et al. , 2013 ) and two studies used MRI with gel ( Fiaschetti et al. , 2012 ; Manganaro et al. , 2013 ). There was one study that assessed RES ( Alborzi et al. , 2018 ). The pre-test probabilities of disease for TVS, 2D TVS, MRI and 2D MRI were 33%, 34%, 47% and 47%, respectively.
A total of seven studies assessed RVS DE (1005 participants) ( Pascual et al. , 2010 ; Hudelist et al. , 2011 ; Fiaschetti et al. , 2012 ; Holland et al. , 2013 ; Tammaa et al. , 2015 ; Menakaya et al. , 2016 ; Alborzi et al. , 2018 ), all of which assessed TVS (1005 participants). Of these, four studies assessed 2D TVS (450 participants) ( Hudelist et al. , 2011 ; Fiaschetti et al. , 2012 ; Holland et al. , 2013 ; Tammaa et al. , 2015 ), one used SVG ( Menakaya et al. , 2016 ), one used 3D TVS ( Pascual et al. , 2010 ) and one used TVS-BP ( Alborzi et al. , 2018 ). Two studies assessed MRI (432 participants) ( Fiaschetti et al. , 2012 ; Alborzi et al. , 2018 ) of which Alborzi et al. (2018) assessed 2D MRI and Fiaschetti et al. (2012) compared MRI with and without vaginal gel. One study assessed RES ( Alborzi et al. , 2018 ). The pre-test probabilities of DE for both TVS and 2D TVS were 14%.
A total of five studies assessed vaginal DE (474 participants) ( Hudelist et al. , 2011 ; Fiaschetti et al. , 2012 ; Bazot et al. , 2013 ; Tammaa et al. , 2015 ; Menakaya et al. , 2016 ), of which four studies assessed TVS (451 participants), from which five data sets were obtained (516 participants) as Tammaa et al . (2015 ) assessed the interobserver agreement of two experts. One study used SVG ( Menakaya et al. , 2016 ) and the remaining three studies used 2D TVS (251 participants) ( Hudelist et al. , 2011 ; Fiaschetti et al. , 2012 ; Tammaa et al. , 2015 ) from which four data sets were used (316 participants), which included the interobserver agreement of the two experts from Tammaa et al . (2015 ). Three studies assessed MRI (137 participants), from which four data sets were obtained (160 participants) Fiaschetti et al. (2012) compared MRI with and without vaginal gel. The pre-test probabilities of disease for TVS, 2D TVS and MRI were 10%, 14% and 20%, respectively. The study characteristics are summarized in Table I and the summary results are shown in Table II .
Characteristics of included studies.
Only the first author of each study is given. All studies were prospective and included women with clinical suspicion of uterosacral/torus uterinus, rectovaginal septum or vaginal deep endometriosis (DE). Observers refer to the number of observers involved with each imaging modality.
BP, bowel preparation; RES, transrectal endoscopic sonography; RWC, rectal water contrast; SVG, sonovaginography; TVS, transvaginal ultrasound.
Summary of findings of the pooled results of the preoperative diagnostic accuracy of imaging modalities.
Preoperative diagnostic accuracy of imaging modalities for the detection of uterosacral/torus uterinus (USL), rectovaginal septum (RVS) and vaginal DE.
LR+, positive likelihood ratio; LR− negative likelihood ratio.
Corresponding to 521 examinations owing to some studies performing more than one TVS technique in the same patients ( Fiaschetti et al ., 2012 ; Bazot et al ., 2013 ).
Corresponding to 516 examinations (TVS—Overall) and 316 examinations (2D TVS) owing to Taamaa et al. (2015) performing more than one TVS technique in the same patients.
Corresponding to 160 examinations owing to Fiaschetti et al. (2012) performing more than one MRI technique in the same patients.
The methodological quality, as per QUADAS-2 ( Gerges et al. , 2021 ), of most of the studies was poor and is represented in Figs 2 and 3 .
QUADAS-2 (Quality Assessment of Diagnostic Accuracy Studies-2) quality evaluation of all 10 included studies.
Traffic-light plot summarizing the authors' review of the QUADAS-2 risk of bias and applicability concerns.
Six studies were considered to be low risk for patient selection bias ( Pascual et al. , 2010 ; Hudelist et al. , 2011 ; Holland et al. , 2013 ; Tammaa et al. , 2015 ; Menakaya et al. , 2016 ; Alborzi et al. , 2018 ), three were high risk ( Fiaschetti et al. , 2012 ; Bazot et al. , 2013 ; Manganaro et al. , 2013 ) and one was unclear ( Zhang et al. , 2019 ). With reference to the index test domain, seven studies were assessed to be low risk ( Pascual et al. , 2010 ; Hudelist et al. , 2011 ; Bazot et al. , 2013 ; Holland et al. , 2013 ; Tammaa et al. , 2015 ; Menakaya et al. , 2016 ; Zhang et al. , 2019 ) and three were high risk ( Holland et al. , 2013 ; Menakaya et al. , 2016 ; Alborzi et al. , 2018 ). Aside from Zhang et al. (2019) which was unclear, the remaining nine studies were considered high risk of bias for the reference standard domain as surgeons were not blinded to the preoperative imaging results. With respect to the flow and timing domain, five were considered unclear ( Hudelist et al. , 2011 ; Fiaschetti et al. , 2012 ; Bazot et al. , 2013 ; Tammaa et al. , 2015 ; Alborzi et al. , 2018 ) and the remaining five were low risk ( Pascual et al. , 2010 ; Holland et al. , 2013 ; Manganaro et al. , 2013 ; Menakaya et al. , 2016 ; Zhang et al. , 2019 ). With regards to the risk of bias concerning applicability, all the studies were deemed low risk as they were only included if they: had a population that was clinically relevant which would have undertaken index test in real practice; used any imaging modality, as all were included, of which the index test had sufficient information; and had surgery as a reference test.
The overall pooled sensitivity and specificity, from which LR+, LR− and DOR were calculated, for the detection of USL DE with TVS and sub-analysis with 2D TVS ( Table II ). There was significant heterogeneity for sensitivity ( Fig. 4 ). The sROC are displayed in Fig. 5 . There was no evidence of publication bias for any of these analyses ( P = 0.93 and P = 0.77, respectively) ( Supplementary Fig. S1 ).
Forest plots of studies included for the evaluation of uterosacral ligaments/torus uterinus deep endometriosis (TVS). Imaging modalities analysed are ( a ) ALL transvaginal ultrasound (TVS) and ( b ) sub-analysis of 2D TVS, displaying the pooled sensitivity, specificity and heterogeneity statistics (Cochran’s Q and I2).
Summary ROC curves of studies included for the evaluation of uterosacral ligaments/torus uterinus deep endometriosis. Imaging modalities analysed are ( a ) ALL transvaginal ultrasound ( b ) sub-analysis of 2D transvaginal ultrasound ( c ) ALL MRI and ( d ) sub-analysis of 2D MRI. SENS, sensitivity; SPEC, specificity; SROC, summary receiver-operating characteristic.
Given the low number of studies, it was not possible to perform sub-analyses for TV-BP or SVG. However, whilst the results were poorer with SVG, with a sensitivity and specificity of 24% and 98%, respectively ( Menakaya et al. , 2016 ), they were much improved with BP with a sensitivity and specificity of 71% and 93%, respectively ( Alborzi et al. , 2018 ).
The overall pooled sensitivity and specificity, from which LR+, LR− and DOR were calculated, for the detection of USL DE with MRI and sub-analysis with 2D MRI ( Table II ). There was significant heterogeneity for sensitivity and specificity ( Fig. 6 ). The sROC are displayed in Fig. 5 . There was no evidence of publication bias for any of these analyses ( P = 0.53 and P = 0.79, respectively) ( Supplementary Fig. S1 ).
Forest plots of studies included for the evaluation of uterosacral ligaments/torus uterinus deep endometriosis (MRI). Imaging modalities analysed are ( a ) ALL MRI and ( b ) sub-analysis of 2D MRI, displaying the pooled sensitivity, specificity and heterogeneity statistics (Cochran’s Q and I 2 ).
Given the low number of studies, it was not possible to perform sub-analyses for 3D MRI or MRI with ultrasound gel. 3D MRI had a slightly higher sensitivity of 88% but significantly lower specificity of 33% ( Bazot et al. , 2013 ) whilst the results of MRI with ultrasound gel were improved, with sensitivities and specificities ranging from 81% to 91% and 89% to 92%, respectively ( Fiaschetti et al. , 2012 ; Manganaro et al. , 2013 ).
There was one study assessing RES, with a sensitivity and specificity of 83% and 90%, respectively ( Alborzi et al. , 2018 ).
The overall pooled sensitivity and specificity, from which LR+, LR− and DOR were calculated, for the detection of RVS DE with TVS and sub-analysis with 2D TVS ( Table II ). There was significant heterogeneity for sensitivity and specificity ( Fig. 7 ). The sROC are displayed in Supplementary Fig. S2 . There was no evidence of publication bias for any of these analyses ( P = 0.44 and P = 0.57, respectively) ( Supplementary Fig. S3 ).
Forest plots of studies included for the evaluation of rectovaginal septum endometriosis. Imaging modalities analysed are ( a ) ALL TVS and ( b ) sub-analysis of 2D TVS, displaying the pooled sensitivity, specificity and heterogeneity statistics (Cochran’s Q and I 2 ).
Given the low number of studies, it was not possible to perform sub-analyses for 3D TVS and TVS-BP or SVG. There was one study for each of the modalities 3D TVS, TVS-BP and SVG, with the sensitivity and specificity of each being 89%/95% ( Pascual et al. , 2010 ), 86%/95% ( Alborzi et al. , 2018 ) and 22%/100% ( Menakaya et al. , 2016 ), respectively.
There were only two studies that assessed MRI, with sensitivity and specificity of 73% and 95%, respectively, using 2D MRI ( Alborzi et al. , 2018 ), whilst Fiaschetti et al. (2012) found an improvement when comparing MRI without and with gel with a sensitivity of 69% and 94%, respectively, and the specificities were very similar, being 93% and 91%, respectively.
Only one study assessed RES, with a sensitivity and specificity of 84%/94%, respectively ( Alborzi et al. , 2018 ).
The overall pooled sensitivity and specificity, from which LR+, LR− and DOR were calculated, for the detection of vaginal DE with TVS and sub-analysis with 2D TVS ( Table II ). There was significant heterogeneity for sensitivity and specificity ( Fig. 8 ). The sROC are displayed in Fig. 9 , There was no evidence of publication bias for any of these analyses ( P = 0.05 and P = 0.09, respectively) ( Supplementary Fig. S4 ).
Forest plots of studies included for the evaluation of vaginal deep endometriosis. Imaging modalities analysed are ( a ) ALL TVS and ( b ) sub-analysis of 2D TVS, displaying the pooled sensitivity, specificity and heterogeneity statistics (Cochran’s Q and I 2 ).
SROC curves of studies included for the evaluation of vaginal deep endometriosis. Imaging modalities analysed are ( a ) ALL TVS, ( b ) sub-analysis of 2D TVS and ( c ) ALL MRI.
As there was only one study assessing SVG, it was not possible to perform sub-analyses, although sensitivity and specificity were 20% and 99%, respectively ( Menakaya et al. , 2016 ).
The overall pooled sensitivity and specificity, from which LR+, LR− and DOR were calculated, for the detection of vaginal DE with MRI ( Table II ). There was significant heterogeneity for sensitivity and specificity ( Supplementary Fig. S5 ). The sROC is displayed in Fig. 9 . There was no evidence of publication bias for this analysis ( P = 0.81) ( Supplementary Fig. S4 ).
Given the low number of studies, it was not possible to perform sub-analyses for 2D MRI, 3D MRI and MRI with ultrasound gel. For 2D MRI, the sensitivities ranged widely from 36% ( Fiaschetti et al. , 2012 ) to 60% ( Bazot et al. , 2013 ), although the specificities were similar, ranging from 94% to 98% ( Fiaschetti et al. , 2012 ; Bazot et al. , 2013 ). MRI with ultrasound gel outperformed 2D MRI with a sensitivity and specificity of 82% and 98%, respectively ( Fiaschetti et al. , 2012 ).
Materials
This review was designed as per the Synthesizing Evidence from Diagnostic Accuracy Tests (SEDATE) guidelines ( Sotiriadis et al. , 2016 ) and the PRISMA statement ( Moher et al. , 2009 ). Prior to commencement, prospective registration of the protocol was obtained with PROSPERO (CRD42017059872) including the detailing of inclusion/exclusion criteria, data extraction and quality assessment. This study is one of a series of subgroups of the larger systematic review protocol. The protocol and following methodology, whilst standard for systematic reviews and meta-analyses, were used in a previously published study ( Gerges et al. , 2021 ).
Peer-reviewed, published studies which evaluated preoperative imaging modalities to assess the presence of DE and compared with the reference standard of surgical/histological diagnosis were included, as per the criteria defined by Bazot et al. (2007) . The studies were included if they were prospective cohort studies including women of reproductive age presenting with a clinical suspicion of DE, based on symptoms and/or physical examination from any healthcare centre setting.
Any imaging modalities used for the detection of DE of the USL, RVS and vagina were included, namely, MRI, RES, sonovaginography (SVG) and TVS. We also included the variations of standard techniques, such as the addition of gel contrast, rectal water or bowel preparation (BP), with the outcome being the presence and location of DE. The imaging techniques were assessed as a group and separately. Only those studies with sufficient data to construct 2 × 2 contingency tables were included. The risk of selection bias was reduced by only including studies with at least 10 affected and 10 unaffected women by the reference standard. There were no restrictions on language.
Searches were conducted using Embase, Google Scholar, Medline, PubMed and Scopus to identify published studies from inception (1946) until 1 May 2020, of which only those from 2010 were screened for eligible studies owing to the increased proficiency of the sonographers and advancements in technology. Filters were not utilized to reduce any exclusions of potentially relevant studies ( Leeflang et al. , 2006 ). Furthermore, the references from included studies and relevant reviews were hand-searched by the authors. Where necessary, the authors of primary studies were contacted.
The search criteria used with the aforementioned databases is outlined in Supplementary Data. The studies were then screened for those that assessed USL, RVS or vaginal DE to ensure that studies that used inconsistent or outdated descriptions of DE were not excluded.
Initial screening of the records was based on titles and abstracts after which the full texts of the potentially eligible records were reviewed. Compliance with the inclusion criteria and selection of eligible studies was performed following the independent and blind examination by two authors (B.G. and G.C.) of these full texts. Where studies included either all or part of the same previously published study population, the most complete and recent study was selected to avoid duplication of studies or participants. Similarly, the most accurate and senior reviewer’s (G.C.) results were included in inter-observer diagnostic studies. The author M.L. was consulted to solve any disagreements. A ‘PRISMA’ flow chart ( Moher et al. , 2009 ) was used to document the selection process.
B.G. extracted the data and the risk of bias and applicability of individual studies were independently assessed by B.G. and M.L. as per QUADAS-2 ( Whiting et al. , 2011 ; Gerges et al. , 2021 ). The four domains evaluated were: patient selection; index text; reference standard; and flow and timing (only risk of bias). An overall quality summary score for each study was not performed ( Whiting et al. , 2005 ).
Mixed-effects diagnostic meta-analysis was performed to determine overall pooled sensitivity and specificity, from which the likelihood ratio of positive and negative tests (LR+, LR–) ( Zwinderman and Bossuyt, 2008 ), diagnostic odds ratios (DORs) and AUC of summary receiver-operating characteristic curves (sROC) with their respective 95% CIs for all diagnostic modules. At least four studies are required to perform a meta-analysis with this method ( Sotiriadis et al. , 2016 ). Forest plots of sensitivity and specificity for diagnostic modules that have adequate studies to be assessed were produced. sROC were plotted to illustrate AUC and the relation between sensitivity and specificity. Sub-group analyses, where possible, were performed using the same methods.
The magnitude and presence of heterogeneity for sensitivity and specificity were assessed using the Cochran’s Q test and the I 2 index. A P -value of Cochran’s Q test <0.1 suggests the presence of heterogeneity. The I 2 index describes the percentage of total variation across studies that can be explained by heterogeneity but not chance. I 2 values of 25%, 50% and 75% would be considered to indicate low, moderate and high heterogeneity, respectively ( Higgins et al. , 2003 ).
The Deeks Funnel Plot asymmetry test was used to assess publication bias by computing a regression of diagnostic log odds ratio against 1/root (effective sample size), weighted by effective sample size. A P -value <0.10 for the slope coefficient suggests significant asymmetry and possible publication bias ( Deeks et al. , 2005 ). All analyses were performed using STATA version 16.1 for Windows (Stata Corporation, College Station, TX, USA).
Discussion
While USL DE is one the most common sites of DE, found in up to 61% of women during laparoscopy ( Fratelli et al. , 2013 ), assessment of disease in this region via TVS seems to be one of the most difficult, with sensitivities of less than 70% reported in the literature ( Deslandes et al. , 2020 ). This is consistent with the findings of our meta-analysis, where the detection of USL DE using TVS was poorer than MRI, with pooled sensitivities, specificities, DOR and AUC of 61%, 95%, 24% and 93%, respectively for the former and 81%, 86%, 27% and 89%, respectively for the latter. MRI consistently outperformed TVS for both RVS and vaginal DE. The overall pooled sensitivity, specificity, DOR and AUC of MRI for the detection of RVS DE was 75%, 95%, 68% and 96% and for the detection of vaginal DE of 70%, 96%, 55% and 90%, respectively. Meanwhile, the overall pooled sensitivity, specificity, DOR and AUC of TVS for the detection of RVS DE was 72%, 98%, 154% and 97%, and for the detection of vaginal DE of 58%, 97%, 46% and 95%, respectively. While MRI seems to outperform TVS, it is important to note that there is an overlap of CI and the absence of differences is associated with the significant heterogeneity for sensitivity and specificity of both techniques.
Our results were comparable to previously published meta-analyses, with regards to TVS being outperformed by MRI for the detection of USL. Nisenblat et al. (2016) compared all imaging modalities and obtained a sensitivity and specificity of 64% and 97%, respectively for TVS (seven studies), and 86% and 84%, respectively for MRI (four studies). Similarly, Guerriero et al. (2016) published two reviews, their first in 2016 which assessed TVS while the most recent in 2018 ( Guerriero et al. , 2018 ) compared TVS and MRI in women who had both tests. In 2016, a total of 11 studies were included, from which the sensitivity and specificity of TVS for the detection of USL DE was 53% and 93%, for RVS DE was 49% and 98%, and for vaginal DE was 58% and 96%, respectively. Aside from RVS DE, these results were very similar, with the differences likely linked to the smaller number of studies included: the assessment of these regions is likely to have improved given the increased experience in the time between reviews. In the head-to-head review in 2018 ( Guerriero et al. , 2018 ), a total of six studies were included, from which the sensitivity and specificity, respectively, for TVS for the detection of USL DE was 67% and 86% compared with 70% and 93% for MRI. For RVS DE, the sensitivity and specificity for TVS was 59% and 97%, respectively, compared with 66% and 97%, respectively, for MRI ( Guerriero et al. , 2018 ). As only head-to-head studies were included, it is not surprising that there were some differences from our results, particularly given the limitations of the small number of studies included. Noventa et al. , (2019) performed a similar head-to-head meta-analysis, although they included retrospective studies, and interestingly found TVS to be marginally superior to MRI for the detection of USL DE, with sensitivities of 71% and 67%, respectively. This, however, was reversed for RVS DE (as with other studies), with sensitivities of 47%, and 61%, respectively ( Noventa et al. , 2019 ). When comparing the performance of MRI, Medeiros et al. (2015) confirmed very similar results in their meta-analysis reviewing the accuracy of MRI for DE and found sensitivities and specificities for the detection of USL DE of 85% and 80%, respectively, 77% and 95%, respectively, for the detection of RVS DE, and 82% and 82%, respectively, for the detection of vaginal DE.
In contrast to some of the studies discussed above ( Medeiros et al. , 2015 ; Noventa et al. , 2019 ), the present analysis only included studies which were prospective with at least 10 non-affected and affected women to reduce the risk of selection bias. Aside from an attempt to reduce selection bias, the reasoning for specifying the minimum number of women affected and not affected by the disease was to increase the applicability of the results to the general population, as inevitably many of these studies are performed in tertiary level referral centres. In addition to these strengths, the primary searches were purposely broad to capture all potentially applicable studies, particularly given the discrepancies in the definitions of USL, RVS and vaginal DE. Although the risk of studies not being identified in a search is a limitation of any systematic review, an attempt was made to reduce this by including all studies with any reference to ‘endometriosis’ and ‘deep’.
As with many similar systematic reviews and meta-analyses assessing similar diagnostic studies, one of the limitations is the low quality of evidence given the high risk of bias and heterogeneity in the included studies. Similarly, there are potential biases secondary to the risk of misdiagnosis at surgery owing to the lack of either histopathological findings or expertise, coupled with the surgeons not being blinded. Furthermore, and importantly, many of the studies do not report the experience or the number of surgeons involved. This potential of varying surgical experience and the lack of clarity regarding complete surgical clearance, thereby also contributing to the lack of histopathology, could also explain the wide range of pre-test probability of disease. This would be particularly problematic with RVS and vaginal DE, both of which are less common than USL DE. Of note, while the Bazot et al. (2007) criteria were used, as with other studies, two of the included studies ( Fiaschetti et al. , 2012 ; Holland et al. , 2013 ) met the criteria based on pouch of Douglas obliteration but did not include histopathology: thus there is the implication of a lack of accurate surgical mapping of the exact locations of DE posterior to the cervix since dissection of the retroperitoneum was not performed. In these cases, the Bazot criteria are insufficient when the accuracy of these specific DE sites should be evaluated. Indeed, there is the impression that the diagnostic accuracy of TVS and MRI are very high and similar for both techniques in studies performed at dedicated endometriosis centres where there are both expert imaging operators and surgeons. This is further confirmation of the effectiveness of endometriosis units where more accurate diagnoses are a result of the collaboration of experts in all fields.
Finally, as the number of studies which met the criteria was limited, it was not possible to perform pooled analyses of other imaging modalities, and sub-analyses within the modalities regarding the addition of BPs or contrasts. More published prospective studies are necessary to obtain unbiased data in this regard. Finally, given the lack of standardized nomenclature prior to 2016, there is the risk that the defined regions assessed may be inaccurate, such as the difficulty in differentiating between vaginal and retrocervical lesions.