Intro
Endometriosis, a chronic oestrogen-dependent disorder, affects 5–10% of women of reproductive age worldwide. 1 Common symptoms include chronic pelvic pain, dysmenorrhoea, dyspareunia and infertility. These manifestations significantly reduce patients’ quality of life and impose a substantial socioeconomic burden. 2 4 Emerging evidence further characterises endometriosis as a systemic condition associated with extra-pelvic symptoms such as anxiety, depression and chronic fatigue. 5 6
Both pain and infertility are primary concerns for patients. Although the association between endometriosis and these symptoms is well documented, their underlying mechanisms remain incompletely understood. Angiogenesis and neurogenesis are considered key to lesion establishment and peripheral pain pathway activation. 7 8 Additionally, immune and inflammatory cells release mediators that directly stimulate pain nociception. 7 8 Fertility impacts are equally multifactorial: while healthy couples exhibit a 15–20% monthly fecundity rate, this drops to 2–10% in women with untreated endometriosis. 9 Successful conception requires feasible sexual activity, yet over 50% of patients report dyspareunia that disrupts intercourse. 9 Other contributors include diminished oocyte quality, inflammation-mediated pelvic microenvironment alterations and structural damage to reproductive organs in severe disease. 10
The diagnosis of endometriosis is suspected based on the history, symptoms and signs, is corroborated by physical examination and imaging techniques and is often confirmed by histological examination of specimens collected during surgery. 11 Currently, diagnostic laparoscopy has not demonstrated superior long-term symptom management compared with empirical medical therapy. 11
Treatment strategies aim to alleviate symptoms, improve fertility and prevent recurrence. 11 12 Laparoscopic treatment can effectively reduce endometriosis-associated pain and improve fertility; 11 13 however, 40–50% of patients experience pain recurrence post-surgery. 14 Medical therapies—including non-steroidal anti-inflammatory drugs (NSAIDs), combined oral contraceptives, progestogens, gonadotropin-releasing hormone (GnRH) agonists/antagonists and aromatase inhibitors—are recommended for symptom control and recurrence prevention. 11 15 Despite these options, therapeutic decision-making remains challenging due to insufficient head-to-head comparisons in randomised controlled trials (RCTs) and variable risk-benefit profiles across treatments. 16 19
Given the above background, the objective of this systematic review and network meta-analysis (NMA) is to evaluate the efficacy and safety of all available treatments for endometriosis in women of reproductive age. In contrast to previous analyses limited to hormonal therapies or surgeries only, 1318 20 22 this study will encompass the full range of interventions used for endometriosis including surgical, hormonal and non-hormonal medical treatments and will compare their effects on clinically important outcomes such as pain reduction, fertility outcomes, disease recurrence, quality of life and adverse events.
Methods
This protocol has been reported in accordance with the Preferred Reporting Items for Systematic Review and Meta-Analysis (PRISMA) protocols guideline and the planned report will follow the PRISMA extension for Network Meta-Analyses. 23 The protocol has been registered in the International Prospective Register of Systematic Reviews under the number CRD420251051917. We will conduct the study from 1 December 2025 to 30 June 2026.
This review will include RCTs evaluating treatments for endometriosis. Non-randomised studies, observational designs and quasi-experimental studies will be excluded.
Women of reproductive age (generally ranging from adolescence up to menopausal age) diagnosed with endometriosis will be the target population. We will include participants with endometriosis at any anatomical location (eg, ovarian, peritoneal, deep infiltrating) and any stage of disease severity. The diagnosis of endometriosis in the trials need not be strictly defined by surgical or histological confirmation; studies using clinical diagnosis or imaging-based diagnosis will also be included, reflecting real-world diagnostic practice (we will consider performing sensitivity analyses based on diagnostic method if data permit). Women with coexisting conditions (such as infertility) will be included as long as endometriosis is the primary condition under treatment. There will be no restrictions based on prior treatments; studies may include treatment-naïve patients or those who have had previous surgical or medical therapy for endometriosis. If feasible, we will later stratify or perform subgroup analyses to account for prior treatment history.
Any therapeutic intervention for endometriosis will be considered. This includes, but is not limited to, surgical interventions (eg, excision or ablation of endometriotic lesions, with or without adhesiolysis); hormonal therapies such as GnRH agonists, GnRH antagonists, combined hormonal contraceptives, progestogens (eg, dydrogesterone, medroxyprogesterone, dienogest) and aromatase inhibitors; non-hormonal medical treatments such as analgesics (NSAIDs, other pain modulators), immunomodulators or anti-inflammatory agents; and any other pharmacological or non-pharmacological interventions aimed at managing endometriosis (eg, neuromodulatory agents for pain, complementary therapies, etc.), provided they were evaluated in RCTs. Combination treatments (eg, surgery followed by medical therapy, or multidrug regimens) will also be included. Eligible comparators include placebo, no treatment or any active treatment. Essentially, we will include any RCT that compares one eligible endometriosis treatment against another treatment or against placebo/sham.
For the NMA, our primary analytic approach is to define network nodes with high clinical specificity to avoid the inappropriate aggregation of distinct treatments. Our a priori criteria for defining intervention nodes are as follows:
Surgical interventions: excision and ablation of endometriotic lesions will be considered two distinct nodes. We will also differentiate between primary surgery and surgery for recurrence, addressing this through subgroup or sensitivity analysis if sufficient data are available.
Pharmacological interventions: we will define pharmacological nodes by the specific drug (eg, progestogens: dienogest, dydrogesterone, medroxyprogesterone; GnRH analogues: GnRH agonists, GnRH antagonists; aromatase inhibitors: letrozole, anastrozole; and other types of medications), route of administration (eg, oral, injection, intrauterine system) and distinct doses.
Complementary therapies: given their inherent heterogeneity, we will categorise each therapy, such as acupuncture, specific herbal medicines or defined dietary interventions, as a distinct node.
Our primary analysis will focus on short-term endpoints at clinically relevant time points, such as 3 and 6 months post-intervention. To evaluate the durability of treatment effects—a key concern in endometriosis management—we will conduct a pre-planned secondary analysis of outcomes at longer-term follow-up points (eg, 12 and 24 months), contingent on the availability of sufficient data.
Trials will be included irrespective of whether they report all of our outcomes of interest, as long as they report at least one relevant clinical outcome.
Overall pain—typically defined as endometriosis-related pain severity or pain improvement, measured on a validated scale (such as a Visual Analogue Scale or numeric rating scale for pelvic pain, or composite pain scores). If a trial reports multiple pain measures (eg, dysmenorrhea, dyspareunia separately), we will extract those and, if possible, combine or prioritise a global pain score or the most clinically significant pain endpoint. Pain outcome will be assessed at the end of the treatment period or at the trial’s designated primary endpoint.
Live birth rate—defined as the occurrence of at least one live-born baby per woman randomised, among trials that enrol women seeking pregnancy. This outcome is relevant for interventions aimed at improving fertility in women with endometriosis (eg, comparisons of surgery vs expectant management for infertility). In trials not focused on fertility, this outcome may not be applicable; we will include it in the analysis of the subset of trials that report fertility outcomes.
Clinical pregnancy rate, as determined by ultrasound confirmation of intrauterine pregnancy or as defined by trial authors.
Miscarriage rate, defined as spontaneous pregnancy loss before viability, among those who became pregnant.
Endometriosis recurrence—typically measured as recurrence of symptoms or lesions after treatment completion. This could include pain recurrence or the need for repeat surgery. If defined by imaging or repeat surgery detection of lesions, we will record that definition.
Improvement in endometriosis-related symptoms other than pain, such as fatigue, heavy menstrual bleeding, dyschezia (painful bowel movements) or dysuria, as reported in trials (eg, proportion of women with improvement in these symptoms or change in symptom scores).
Adverse events—any treatment-related adverse outcomes. We will extract data on overall adverse event rates and specific serious adverse events (such as thromboembolic events, bone density loss, menopausal symptoms, etc., as applicable to each intervention).
Quality of life—typically measured by a validated instrument (eg, Endometriosis Health Profile (EHP-30 or EHP-5), Short Form (SF-36), European Quality of Life–5 Dimensions questionnaire (EQ-5D), etc.), reported as change from baseline or final scores.
Obstetrical outcomes for those who conceive: including gestational age at birth, birth weight of the newborn, neonatal mortality and major congenital abnormalities in offspring. These outcomes will be considered in the context of safety for treatments used around the time of conception or during pregnancy (although most interventions are not continued in pregnancy, it is possible that some treatments prior to conception could have lingering effects).
We will collect data on all these outcomes when available, but not all outcomes will be applicable to every study. For example, trials focusing on pain might not report fertility or obstetric outcomes, and vice versa for infertility-focused trials. We will perform separate network meta-analyses by outcome as appropriate (eg, one network for pain outcomes and another for pregnancy outcomes), or analyse subsets of studies for particular outcomes. No language or publication status restrictions will be applied in selecting studies.
We will identify studies through systematic searches of electronic databases and other resources. The following databases will be searched for relevant studies from their inception to the present (no date restrictions): PubMed/MEDLINE, EMBASE and The Cochrane Central Register of Controlled Trials (CENTRAL). In addition, we will search the WHO International Clinical Trials Registry Platform for ongoing or completed trial registry entries to identify any trials that have been conducted but not yet published in the academic literature, or unpublished results of completed trials. We will also check the reference lists of relevant systematic reviews or meta-analyses (if any are found during our search) and the bibliographies of all included studies for any additional trials we might have missed. To reduce publication and language bias, we will include non-English studies and plan to translate key sections of any potentially relevant articles that are not in English. Where needed, we will contact study authors for clarification or additional data (eg, if outcome data are incomplete or only presented in abstract form).
We will use a combination of controlled vocabulary (eg, MeSH terms in PubMed, Emtree terms in EMBASE) and free-text keywords to represent the condition (endometriosis and related terms such as “endometrioma”) and will apply a filter for RCTs to focus the results on RCTs. We will search these databases from the inception of each database through 1 January 2026. We will not restrict the search by specific interventions or outcomes to ensure we include trials of any treatment modality. Instead, broad terms for endometriosis will be paired with terms indicative of randomised trials (such as “randomized controlled trial”, “randomised”, “placebo”, etc., or validated RCT filters for each database). A sample search strategy for PubMed might be the following:
(“Endometriosis”[MeSH] OR endometriosis[tiab] OR endometrioma[tiab]).
AND (randomized controlled trial OR controlled clinical trial OR randomi*ed[tiab] OR placebo[tiab] OR trial[ti]).
Similar strategies will be adapted for EMBASE and CENTRAL, using their specific thesauri and indexing terms.
Data from each included study will be extracted independently by two reviewers using a pre-designed data collection form or database. Before full data extraction, we will pilot the extraction form on a few studies to ensure that all relevant information is captured and that there is consistency between reviewers.
The two data extractors will compare their extracted data for consistency. Any discrepancies will be rechecked against the original articles and resolved by discussion, involving a third reviewer if necessary. If outcome data are missing or reported only incompletely (for instance, an outcome is stated to have been measured but results are not reported, or only graphically presented), we will attempt to contact the study authors to request the necessary data. In cases where only an aggregate measure is reported (eg, only an OR with CI, without raw counts), we will note that and consider methods to derive approximate data if possible. We will also extract data on follow-up duration and timing of outcome measurements, as these may be relevant for analysing heterogeneity.
Two reviewers will independently assess the risk of bias in each included RCT using the Risk of Bias in Network Meta-Analysis tool. 24 Disagreements between reviewers on risk of bias judgments will be resolved through consensus discussion, and if needed, a third reviewer will mediate.
We will first map out the network of comparisons for each outcome of interest. In a NMA, nodes will represent each specific intervention (or grouping of similar interventions) and edges (connections between nodes) will represent direct head-to-head comparisons that have been evaluated in the included trials. We anticipate a multinode network given the numerous treatments (eg, nodes might include surgery, placebo/no treatment, NSAIDs, combined oral contraceptives, progestins, GnRH agonists, GnRH antagonists, aromatase inhibitors, etc., as well as possibly specific subsets like different surgical techniques or different progestins if they cannot be grouped). We will produce a network diagram for the primary outcome(s) to illustrate which interventions have been compared directly in trials and the number of trials informing each comparison.
We will perform a Bayesian NMA to simultaneously compare all included treatments for a given outcome. We will use Markov chain Monte Carlo methods to fit the model (OpenBUGS/JAGS via R). For each outcome, we will likely use a random-effects NMA model, which accounts for heterogeneity in treatment effects across trials (assuming that different studies may have underlying differences leading to variation in effect sizes). The random-effects model will assume a distribution for the true effects across studies with a between-study variance (τ²). In cases where data are very sparse or heterogeneity appears minimal, we may also examine a fixed-effect model for comparison.
The effect measure will be chosen according to the nature of the outcome: for dichotomous outcomes (such as live birth, pregnancy, adverse events, recurrence), we will use the risk ratio (RR) as the measure of effect. For continuous outcomes (such as pain scores or quality of life measures), we will use either the mean difference (MD) if all studies use the same scale, or the standardised mean difference (SMD) if different scales are used across studies. We will ensure that higher scores consistently reflect worse symptoms or better outcomes as needed before pooling (eg, we may multiply some scales by −1 so that for all pain scales, higher values mean worse pain, ensuring consistency in interpretation of effect direction).
All treatments will be compared against each other through the network. We will designate a reference treatment (possibly placebo or no treatment, if such a node exists in the network) for computational purposes, but inference will be made on all pairwise contrasts. The Bayesian analysis will generate a posterior distribution for each parameter of interest, namely the relative treatment effects (eg, log ORs or MDs) between every pair of treatments. From these, we will obtain the posterior median (or mean) as the point estimate and the 95% credible interval (CrI) as the interval estimate for each comparison. CrIs represent the Bayesian equivalent of CIs, providing the range within which the true effect lies with 95% probability given the model and data.
We will derive the probability that each treatment is the best, second best, third best, etc. for the outcome in question, based on the posterior distributions. From the rank probabilities, we will also compute the surface under the cumulative ranking curve or median rank for each treatment as a summary of its overall ranking performance. 25 These ranking metrics will be reported with caution, noting that small differences in rank may not be clinically meaningful and that rank results depend on the set of treatments analysed. For outcomes like adverse events, ‘best’ might mean lowest risk. We will clearly define whether a higher rank means better efficacy (eg, more pain reduction, higher pregnancy rate) or better safety (fewer side effects) as appropriate for each outcome.
We will assess statistical heterogeneity in the network by examining the magnitude of the between-study variance (τ²) from the random-effects model for each outcome. A large τ² or a 95% CrI for τ² that does not include zero would suggest substantial heterogeneity in effect sizes between studies. We will also look at forest plots of the pairwise results and compare consistency of effects across studies informally. If sufficient studies are available, we may calculate a generalised I² statistic for the NMA to describe the percentage of variability due to heterogeneity rather than chance (though interpreting I² in a multi-arm context has limitations). In the event of notable heterogeneity, we will explore possible sources through subgroup analyses or meta-regression.
Inconsistency refers to disagreement between direct and indirect evidence in the network (ie, the NMA assumption that all evidence is statistically coherent). We will employ both global and local methods to detect inconsistency. For a global test of inconsistency, we may use the design-by-treatment interaction model to see if allowing for inconsistency improves model fit significantly. 26 In our Bayesian framework, one approach is to compare the deviance information criterion (DIC) or posterior residual deviance of a consistency model (which assumes all evidence is coherent) with an inconsistency model (which relaxes the consistency assumption for all loops in the network). A substantially better fit (lower DIC) for the inconsistency model could indicate global inconsistency. Locally, we will use the node-splitting method to assess inconsistency on specific comparisons. 27 We will perform node-splitting for key nodes (especially where there is a closed loop of evidence with both direct and indirect data) to identify any local inconsistencies. If inconsistency is detected, we will attempt to investigate potential reasons for it (differences in populations or interventions across the loop, risk of bias issues, etc.). If we find serious inconsistency that cannot be explained or resolved, we will report it and interpret the network results with caution, or in extreme cases, consider not combining certain inconsistent parts of the network.
All analyses will primarily be conducted separately for the main outcomes (pain and live birth).
We plan several subgroup analyses to explore whether treatment effects differ across certain subsets of trials or patients, which could explain heterogeneity or inform specific clinical scenarios:
Prior treatment status: one key subgroup analysis will stratify trials based on whether participants had received prior treatment for endometriosis. This could be operationalised as first-line therapy trials (those enrolling patients with no prior medical or surgical therapy for endometriosis) versus secondary therapy trials (those enrolling patients after failure of a previous treatment or after surgery). We hypothesise that the relative effectiveness of interventions might differ in treatment-naïve patients compared with those with recurrent or persistent disease after prior treatments.
Diagnostic confirmation: although we include trials with or without surgical diagnosis, we may compare outcomes between trials that required laparoscopic confirmation of endometriosis and those that included patients based on clinical diagnosis. The certainty of diagnosis might affect the patient population (clinical-only diagnosis could include some false positives or milder cases) and thus outcomes.
Endometriosis severity: if data are available, we will attempt subgroup analysis by disease severity (for instance, studies predominantly involving stage I–II vs stage III–IV endometriosis, or presence of deep infiltrating disease vs superficial). Treatment effects might vary by burden of disease (eg, surgery might be more beneficial in those with moderate-severe disease than mild).
Follow-up duration: some outcomes like recurrence might depend on length of follow-up. We may explore if trials with longer follow-up report different relative effects. If needed, we can restrict an analysis to studies with similar follow-up times for consistency.
For sensitivity analyses, we will test the robustness of our findings to various assumptions and decisions made in the analysis:
Risk of bias exclusion: we will conduct a sensitivity analysis excluding studies judged to be at high risk of bias.
Outcome definitions: we will examine if using alternative outcome measures affects results. For example, for pain, if some studies reported only dysmenorrhoea and others overall pain, we might in one analysis use dysmenorrhoea scores to represent pain in all studies (or conversely drop studies that did not report a composite pain score) to see if that changes the network conclusions.
All subgroup and sensitivity analyses will be clearly documented and deviations from the primary analysis will be reported. We will use these analyses to gauge the robustness of our findings; our primary conclusions will be based on the main analysis but qualified by any notable differences observed in these additional analyses.
We will assess the potential for publication bias and small-study effects in our meta-analyses, to the extent possible given the data. For outcomes where a large number of studies (eg, ≥10 trials) contribute to a particular pairwise comparison or to the network as a whole, we will employ funnel plot techniques. Specifically, for direct pairwise meta-analyses (eg, if many trials compare hormone therapy vs placebo on pain), we will create conventional funnel plots of effect size versus SE and examine them for asymmetry, which could indicate publication bias (as small studies with negative or null results may be missing). In the context of the NMA, we will use a comparison-adjusted funnel plot to detect small-study effects across the network. If we identify asymmetry or suspect publication bias, we will discuss the potential impact on our findings.
Two review authors will independently assess the certainty of each evidence, following the Grading of Recommendations Assessment, Development, and Evaluation working group approach. 28
This systematic review of published literature does not require ethical approval. Results will be disseminated through peer-reviewed publication. The findings aim to establish a robust evidence base for clinical decision-making and to inform future research priorities in endometriosis management.
Patients and/or the public were not involved in the design, conduct, reporting or dissemination plans of this research.