Introduction
Endometriosis is a common gynaecological
condition in which there is often a long time
between first primary care consultation and
diagnosis.
1–4 A longer time to diagnosis
is associated with prolonged symptoms,
particularly pain and
5,6 subfertility, along
with patient frustration and demoralisation. 7
Endometriosis can be difficult to
diagnose clinically; its symptoms are both
common
8 and non-specific, so are often
considered by GPs as part of the normal
menstrual experience, 9 or attributed to
other conditions. 5 The use of very detailed
questions about symptoms can increase
diagnostic accuracy.
10 However, current
biomarkers 11 and imaging 12 have limited
benefit, and there is substantial variation in
guideline recommendations for diagnosis
and management of this condition. 13
Most research on the clinical features of
endometriosis in primary care has focused
on features present at a single point in time,
typically the time of diagnosis.
5,14 However,
with endometriosis, the symptoms at
any single point in time have only limited
predictive value
2 and the problem of delays
in diagnosis requires an understanding
of when symptoms first appear . Although
data in electronic records contain many
single items, experienced practitioners
typically recognise composite patterns that
involve combinations of items. For example,
repeated episodes of dysmenorrhoea, except
when taking hormonal contraception,
15 are
recognised by experienced clinicians as
having diagnostic value in endometriosis.
Although such knowledge-derived features
16
are not immediately present in electronic
records, they can be constructed.
17 However,
the authors are not aware of studies that
have attempted to do this using primary care
data or for endometriosis.
This study aimed to: (a) construct
enriched datasets from electronic health
records, which contained conventional and
composite features potentially predictive of
endometriosis; (b) examine the association
of these features with a subsequent
diagnosis of endometriosis in a nested
case-control study; and (c) examine the
relationship of these features to diagnosis
at different time periods before the date of
diagnosis.
Method
Data source
Data from the Practice Team Information
(PTI) database, a subset of the Primary Care
Clinical Informatics Unit Research database
held by the University of Aberdeen, were
obtained. It includes anonymised data from
primary care electronic health records of
approximately 224 000 patients registered
with a primary care physician, and is broadly
representative of the Scottish population
Research
C Burton, MD, professor of primary medical
care, Academic Unit of Primary Medical Care,
University of Sheffield, Sheffield; Institute of
Applied Health Sciences, University of Aberdeen,
Aberdeen. D Ayansina, MBBS, research fellow;
S Bhattacharya, PhD, senior lecturer, Institute of
Applied Health Sciences; L Iverson, PhD, research
fellow, Institute of Applied Health Sciences;
D Sleeman, PhD, emeritus professor, Computing
Sciences, Natural and Computing Sciences,
University of Aberdeen, Aberdeen. L Saraswat,
PhD, consultant gynaecologist, Aberdeen Royal
Infirmary, Aberdeen.
Address for correspondence
Christopher Burton, Academic Unit of Primary
Medical Care, University of Sheffield, Samuel Fox
House, Sheffield, S5 7AU, UK.
E-mail:
[email protected]
Submitted: 16 May 2017; Editor’s response:
9 June 2017; final acceptance: 11 August 2017.
©British Journal of General Practice
This is the full-length article (published online
7 Nov 2017) of an abridged version published in
print. Cite this version as: Br J Gen Pract 2017;
DOI: https://doi.org/10.3399/bjgp17X693497
Christopher Burton, Lisa Iversen, Sohinee Bhattacharya, Dolapo Ayansina, Lucky Saraswat
and Derek Sleeman
Pointers to earlier diagnosis of endometriosis:
a nested case-control study using primary care electronic health records
Abstract
Background
Endometriosis is a condition with relatively non-
specific symptoms, and in some cases a long
time elapses from first-symptom presentation
to diagnosis.
Aim
To develop and test new composite pointers to
a diagnosis of endometriosis in primary care
electronic records.
Design and setting
This is a nested case-control study of 366 cases
using the Practice Team Information database
of anonymised primary care electronic health
records from Scotland. Data were analysed
from 366 cases of endometriosis between 1994
and 2010, and two sets of age and GP practice
matched controls: (a) 1453 randomly selected
females and (b) 610 females whose records
contained codes indicating consultation for
gynaecological symptoms.
Method
Composite pointers comprised patterns of
symptoms, prescribing, or investigations, in
combination or over time. Conditional logistic
regression was used to examine the presence
of both new and established pointers during the
3 years before diagnosis of endometriosis and
to identify time of appearance.
Results
A number of composite pointers that were
strongly predictive of endometriosis were
observed. These included pain and menstrual
symptoms occurring within the same year
(odds ratio [OR] 6.5, 95% confidence interval
[CI] = 3.9 to 10.6), and lower gastrointestinal
symptoms occurring within 90 days of
gynaecological pain (OR 6.1, 95% CI = 3.6 to
10.6). Although the association of infertility with
endometriosis was only detectable in the year
before diagnosis, several pain-related features
were associated with endometriosis several
years earlier .
Conclusion
Useful composite pointers to a diagnosis of
endometriosis in GP records were identified.
Some of these were present several years
before the diagnosis and may be valuable
targets for diagnostic support systems.
Keywords
diagnosis; electronic health records;
endometriosis; primary care.e816 British Journal of General Practice, December 2017
with regards to age, sex, deprivation, and
urban/rural ratio mix. It includes data
collected annually between 2004 and 2010.
Practices in the PTI project were expected
to record every clinical encounter using
Read Codes for clinical diagnoses and/
or main reasons for consultation. All GP
prescriptions were automatically recorded.
Investigations and therapeutic procedures
were coded differently over time —
increasing towards the end of the database
period.
Populations
This study was a nested case-control study.
Cases were females with a diagnosis of
endometriosis, who were born after
1 January 1974 and were, therefore,
≤36 years on 1 January 2010. This enabled
us to capture teenage menstrual symptoms
for the majority of females and avoid the
possibility that an apparent new diagnosis
in an older female was actually a historical
diagnosis being recorded for the first time
due to the creation of computerised record
summaries.
Population controls were randomly
selected for each case and individually
matched by age and GP practice, with up to
four controls per case (subject to availability).
A second control group comprised
females with codes for gynaecological
symptoms (pain, menstrual symptoms, or
infertility) but with no recorded diagnosis
of endometriosis. These controls were
also randomly selected for each case and
individually matched by age and GP practice,
with up to four symptomatic controls per
case. The index date for cases was defined
as the date of diagnosis of endometriosis
and for controls as the date of diagnosis
of endometriosis in the matched case. All
cases and controls were required to have
been registered with their GP practice for at
least 1 year before the index date.
Data extraction and preparation
Box 1 lists the key data extracted and the
categories into which related items were
grouped. Most items were allocated to a
single time point. However, for contraception
prescriptions, which commonly lasted for
6 months or longer, details were used
about each prescription to estimate the
onset and offset of contraception using
Methods
previously employed to ascertain
the continuity of prescribing.
18
The data was enriched by introducing
composite features that were based on
the clinical experience of the investigators
and on interviews with 10 experts
(six gynaecologists, two specialists in
reproductive health, and two representatives
of a lay support organisation). Interviews
sought to identify tacit patterns in symptoms,
which clinicians thought may be predictive
of a diagnosis, were audio recorded,
transcribed, and analysed thematically.
Composite features were specified
according to one of five relationships:
proximity, following, separated, during, and
exclusive. These are summarised in Box 2.
The presence of each feature (single and
composite) was ascertained in the record
of each individual at any time, and during a
How this fits in
Endometriosis is a relatively common
condition but the time from first
presentation to diagnosis is often
longer than ideal as symptoms are non-
specific. This study used anonymised
GP record data to construct new
pointers to diagnosis, which identified
patterns of symptoms in time. Distinct
episodes of gynaecological pain and
combinations of gynaecological pain on
one occasion with menstrual symptoms
or lower gastrointestinal symptoms on
another appear to be useful pointers to
endometriosis. Patterns such as these
make sense to clinicians and could be
integrated into electronic diagnostic
support systems.
Box 1. Categories of data grouped by data type
Data type Data description Included data categories
Specific features Classical features of Pain (pelvic pain, dyspareunia, dysmenorrhoea)
endometriosis (pelvic pain, Menstrual (flow)
dysmenorrhoea, dyspareunia Infertility
and infertility) 2,5,9,14 Ovarian (for example, cysts)
Non-specific Abdominal pain and Menstrual (timing)
symptoms gastrointestinal symptoms, Genital/other gynaecological
fatigue, urinary symptoms; Urinary
additional diagnoses, including Lower GI
irritable bowel syndrome 5 Upper GI
Fatigue
Diagnostic tests Primary care tests, referred Full blood count
and procedures investigations such as diagnostic Genital swabs
ultrasound, and specialist Laparoscopy
procedures such as laparoscopy Abdominal or pelvic ultrasound
Thyroid function
Treatments Hormonal treatment for Hormonal treatment
endometriosis (for example, Contraception
gonadotropin-releasing NSAID
hormone agonists) Codeine or other opioids
Prescriptions for contraception Tricyclic
Analgesic drugs SSRI and related antidepressants
Antidepressant drugs
Lower GI = pain, bloating, irritable bowel syndrome. NSAID = non-steroidal anti-inflammatory drugs.
SSRI = selective serotonin reuptake inhibitor . Upper GI = dyspepsia, reflux, nausea.
British Journal of General Practice, December 2017 e817
series of overlapping 3-year time windows
set at different intervals from the index date
(for diagnosis or matching). The windows
were defined using intervals between the
end of the window and the index date of 0, 3,
6, 12, 18, 24, and 36 months. The appearance
of statistical associations between available
information in the record and diagnosis
over time were examined by comparing
the same measure in different windows.
The purpose of this was to differentiate
between features that were present long
before diagnosis (and may thus indicate
missed diagnostic opportunities) and those
that appeared only shortly before diagnosis
(and may thus have triggered referral).
Analysis of association of features and
patterns with diagnosis
Conditional logistic regression was carried
out to examine the association between each
feature (conventional or composite) and the
diagnosis of endometriosis. Each feature
was reported as either present or absent
within the time period. Rather than use
counts of how often a feature occurred, the
‘separated’ composite variables were used
to indicate multiple episodes. Conditional
logistic regression was conducted for all
features for which at least 10 individuals
(cases or controls) had the feature present
and reported as the odds ratio (OR), with
95% confidence intervals (CIs). All analyses
were conducted in R 3.3.2 (version 2016).
The analysis was conducted separately
with population and symptomatic control
groups. For the population comparison all
cases and their matched controls were
included. For the symptomatic comparison,
only cases that had recorded symptoms
and their matched controls were included.
For the time window analysis, the data were
limited to females who had been registered
with their practice for at least 1 year before
the beginning of the gap. The odds ratios for
each feature at each of the six different time
gaps were plotted in order to visualise the
appearance of predictive features over time.
Results
Patient characteristics
Data from 366 cases and 1453 matched
population controls were obtained. Of these,
243 cases had gynaecological symptoms
(pain, menstrual symptoms, and infertility)
and were matched to a further 610 controls
with comparable symptoms. The median
age at diagnosis was 25 years, interquartile
range 22–28 years, and age at diagnosis
was <20 years in 47 (12.8%) cases.
Data quality
In total, 191 cases (52.2%) were registered
with the same GP practice for at least
5 years before diagnosis and, therefore, had
continuous records in the PTI database; 114
(31.2%) were registered for at least 8 years
before diagnosis. Similar proportions were
seen for population controls (746 [51.3%]
and 469 [32.3%] respectively), but more
of the symptomatic controls had been
registered for these time periods (414/610,
67.9% and 273/610, 44.8%). A recorded code
for laparoscopy was found in only 47 (12.8%)
cases despite this being the commonest
diagnostic procedure for endometriosis.
This is likely to represent a preference for
recording the diagnosis rather than the
procedure by which it was made, although
instances of a clinical diagnosis being
entered without any confirmatory tests
cannot be excluded. Likewise, there were
few coded surgical procedures, for example,
13 cases (3.5%) had a recorded operation
for tubal or ovarian problems excluding
diagnostic laparoscopy. These procedures
were excluded from the analysis, focusing
instead on clinical features, investigations,
and medical treatments.
Occurrence of diagnostic features
There were 145 cases (39.6%) that had
a code recorded for gynaecological pain
(dysmenorrhoea, pelvic pain) during the
3 years prior to diagnosis and 39 (10.7%) had
a code for infertility. And 198 cases (54.1%)
had neither of these during the 3 years prior
to diagnosis.
The numbers and proportions of females
with at least one instance of each feature,
Box 2. Types of composite features used in constructing predictors
Relationship Specification Example
Proximity An occurrence of one feature within a given Pain and fatigue within 90 days of
number of days of the other but with no each other
specification of which should come first
Following An occurrence of one feature within Pain occurring within 90 days of
a given number of days of the other with estimated cessation of contraception
specification of which should come first
Separated Two consecutive recordings of a Two consecutive episodes of pain
single feature occurring at least a separated by at least 180 days
given number of days apart (this permits
differentiation of separate episodes
from repeated consultation during
the same episode)
During An occurrence of a symptom or other feature Pain during estimated duration
after the onset, and before the expected of prescription for contraception
offset, of a contraception prescription
Exclusive A feature only occurring in the absence Pain but only outside of estimated
of another periods of prescribed contraception
e818 British Journal of General Practice, December 2017
either in the 3 years prior to the index date
or at any time, are shown in Table 1 (all
cases [ N = 366] and population controls)
and Table 2 (symptomatic cases [ N = 261]
and controls). Table 1 and Table 2 also
show the odds ratios (OR), with 95% CIs
for the two comparisons: all cases versus
population controls and symptomatic cases
(gynaecological pain, menstrual symptoms,
or infertility) versus matched symptomatic
controls.
As expected, pain was more common
in cases in both comparisons: OR 14.9,
95% CI = 10.1 to 21.9 versus population
controls and OR 5.6, 95% CI = 3.9 to 8.1
versus symptomatic controls over 3 years’
data. Menstrual bleeding and timing
symptoms were coded more commonly
than in population controls, OR 3.8, 95%
CI = 2.8 to 5.0 and 2.1, 95% CI = 1.4 to 3.2,
but not in comparison with symptomatic
controls, OR 1.0, 95% CI = 0.7 to 1.4 and 1.2,
95% CI = 0.7 to 1.9. Non-specific clinical
features such as fatigue, vulvo-vaginal
problems, and lower gastrointestinal
symptoms were all more common in cases
than population controls.
Although simple tests such as full blood
count were more common in cases than
population controls, there was no significant
difference in the symptomatic comparison.
Genitourinary swab tests (presumably
ordered because of the possibility that
symptoms were due to pelvic inflammation)
were more common in cases than controls
in both comparisons.
Occurrence of prescribed treatments
In both the population and the symptomatic
group comparisons, both analgesics (OR
3.0, 95% CI = 2.3 to 4.0 and OR 2.7, 95%
CI = 1.9 to 3.9, in 3 years before index date
Table 1. Numbers, proportions, and odds ratios (95% CI) for features in cases of endometriosis compared with
population controls
Occurrence of features in 3 years before index date a Occurrence of features at any time before index date a
Cases ( N = 366) Controls ( N = 1453) Cases ( N = 366) Controls ( N = 1453)
Specific features n % n % OR 95% CI n % n % OR 95% CI
Subfertility 39 10.7 24 1.7 7.7 (4.4 to 13.3) 41 11.2 31 2.1 5.9 (3.6 to 9.7)
Menstrual — bleeding 121 33.1 179 12.3 3.8 (2.8 to 5.0) 151 41.3 267 18.4 3.3 (2.6 to 4.3)
Menstrual — timing 39 10.7 80 5.5 2.1 (1.4 to 3.2) 45 12.3 117 8.1 1.6 (1.1 to 2.3)
Ovarian 24 6.6 7 0.5 13.7 (5.9 to 31.8) 25 6.8 11 0.8 9.8 (4.7 to 20.4)
Pain 145 39.6 79 5.4 14.9 (10.1 to 21.9) 169 46.2 146 10.1 9.9 (7.1 to 13.6)
Non-specific symptoms
Fatigue 56 15.3 121 8.3 2.0 (1.4 to 2.8) 79 21.6 178 12.3 2.0 (1.5 to 2.7)
Gynaecological 51 13.9 47 3.2 5.0 (3.3 to 7.7) 77 21.0 97 6.7 4.0 (2.8 to 5.6)
Lower GI 104 28.4 144 9.9 3.7 (2.8 to 5.0) 126 34.4 213 14.7 3.3 (2.5 to 4.3)
Upper GI 27 7.4 62 4.3 1.8 (1.1 to 3.0) 50 13.7 107 7.4 2.1 (1.4 to 3.0)
Urinary 25 6.8 49 3.4 2.1 (1.3 to 3.5) 42 11.5 80 5.5 2.3 (1.5 to 3.5)
Tests and procedures
Full blood count 40 10.9 102 7.0 2.0 (1.2 to 3.2) 50 13.7 112 7.7 2.6 (1.6 to 4.2)
Genital swabs 64 17.5 77 5.3 4.5 (3.0 to 6.7) 73 20.0 111 7.6 3.5 (2.5 to 5.0)
Laparoscopy 42 11.5 13 0.9 14.6 (7.5 to 28.4) 47 12.8 15 1.0 13.9 (7.5 to 25.7)
Thyroid function 53 14.5 112 7.7 2.4 (1.6 to 3.5) 67 18.3 132 9.1 2.8 (1.9 to 4.1)
Ultrasound 14 3.8 5 0.3 12.3 (4.0 to 37.8) 14 3.8 11 0.8 5.0 (2.2 to 11.4)
Treatments
Contraception 201 54.9 716 49.3 1.3 (1.0 to 1.6) 234 63.9 800 55.1 1.5 (1.2 to 2.0)
NSAID 171 46.7 276 19.0 4.8 (3.6 to 6.4) 191 52.2 393 27.1 3.8 (2.9 to 5.1)
Analgesic 136 37.2 254 17.5 3.0 (2.3 to 4.0) 156 42.6 343 23.6 2.7 (2.1 to 3.5)
SSRI 65 17.8 188 12.9 1.5 (1.1 to 2.0) 85 23.2 229 15.8 1.7 (1.2 to 2.2)
Tricyclic 29 7.9 60 4.1 2.2 (1.3 to 3.6) 42 11.5 82 5.6 2.4 (1.6 to 3.6)
a Index date: date of diagnosis for cases, date of diagnosis of matched case for controls. CI = confidence interval. Gynaecological = vulvo-vaginal symptoms, pelvic inflammation.
Lower GI = pain, bloating, irritable bowel syndrome. NSAID = non-steroidal anti-inflammatory drug. OR = odds ratio. Ovarian = coded diagnosis of ovarian cysts and related
conditions. SSRI = selective serotonin reuptake inhibitor and related antidepressants. Upper GI = dyspepsia, reflux, nausea.
British Journal of General Practice, December 2017 e819
comparison) and NSAIDs (OR 4.8, 95%
CI = 3.6 to 6.4 and OR 3.0, 95% CI = 2.1 to
4.2, in 3 years before index date comparison)
were more commonly prescribed to cases
than controls. When comparing cases
and symptomatic controls, there was no
association with antidepressant drugs
(either tricyclic or SSRI and related).
Composite features
Table 3 shows the number and proportion of
patients with at least one instance of each
of the composite features over the 3 years
before date of diagnosis/matching. Several
composite features had high ORs when
cases were compared with symptomatic
controls: pain and menstrual symptoms
within the same year (pain proximity
menstrual [360]), OR 6.5, 95% CI = 3.9 to
10.6 and lower gastrointestinal symptoms
occurring within 90 days of gynaecological
pain (OR 6.1, 95% CI = 3.6 to 10.6). Episodes
of gynaecological pain separated by at least
180 days were approximately eight times as
likely in cases than symptomatic controls
(OR 8.5, 95% CI = 4.3 to 16.9). Although pain
or analgesic use on stopping contraception
was suggested by some of the experts,
these composite features occurred in less
than 10% of cases, and with only moderate
ORs of approximately 3.
Occurrence of diagnostic features over
the time prior to diagnosis
Figure 1 shows plots of eight diagnostic
features, describing the ORs for 3-year time
windows with different intervals between the
end of the 3-year window and the diagnosis/
matching date. Each plot compares cases
with matched population controls and
symptomatic cases with their matched
symptomatic controls. In all plots, 95% CIs
Table 2. Numbers, proportions, and odds ratios (95% CI) for features in cases of endometriosis compared with
symptomatic controls
Occurrence of features in 3 years before index date a Occurrence of features at any time before index date a
Cases ( N = 261) Controls ( N = 610) Cases ( N = 261) Controls ( N = 610)
Specific features N % N % OR 95% CI N % N % OR 95% CI
Subfertility 39 16.1 52 8.5 2.4 (1.4 to 3.9) 41 16.9 64 10.5 1.9 (1.2 to 3.1)
Menstrual — bleeding 121 49.8 304 49.8 1.0 (0.7 to 1.4) 151 62.1 443 72.6 0.7 (0.5 to 0.9)
Menstrual — timing 30 12.4 64 10.5 1.2 (0.7 to 1.9) 34 14.0 111 18.2 0.7 (0.5 to 1.1)
Ovarian 14 5.8 3 0.5 12.2 (3.5 to 42.7) 15 6.2 6 1.0 7.0 (2.7 to 18.1)
Pain 145 59.7 148 24.3 5.6 (3.9 to 8.1) 169 69.6 241 39.5 4.0 (2.8 to 5.6)
Non-specific symptoms
Fatigue 45 18.5 84 13.8 1.4 (0.9 to 2.1) 66 27.2 138 22.6 1.3 (0.9 to 1.9)
Gynaecological 41 16.9 34 5.6 4.2 (2.4 to 7.4) 64 26.3 68 11.2 3.6 (2.3 to 5.6)
Lower GI 79 32.5 109 17.9 2.3 (1.6 to 3.2) 95 39.1 180 29.5 1.7 (1.2 to 2.3)
Upper GI 24 9.9 51 8.4 1.3 (0.8 to 2.3) 44 18.1 87 14.3 1.5 (1.0 to 2.3)
Urinary 20 8.2 29 4.8 1.8 (1.0 to 3.4) 36 14.8 64 10.5 1.5 (1.0 to 2.4)
Tests and procedures
Full blood count 34 14.0 82 13.4 1.2 (0.7 to 2.2) 42 17.3 97 15.9 1.4 (0.8 to 2.4)
Genital swabs 43 17.7 71 11.6 2.2 (1.3 to 3.5) 50 20.6 90 14.8 1.9 (1.2 to 3.0)
Laparoscopy 31 12.8 4 0.7 20.0 (7.0 to 57.1) 35 14.4 13 2.1 7.2 (3.7 to 14.1)
Thyroid function 43 17.7 86 14.1 1.5 (0.9 to 2.4) 53 21.8 103 16.9 1.7 (1.1 to 2.7)
Ultrasound 11 4.5 6 1.0 5.2 (1.6 to 17.0) 11 4.5 7 1.2 4.3 (1.4 to 13.0)
Treatments
Contraception 151 62.1 373 61.2 1.1 (0.8 to 1.5) 178 73.3 421 69.0 1.3 (0.9 to 1.9)
NSAID 133 54.7 185 30.3 3.0 (2.1 to 4.2) 150 61.7 264 43.3 2.6 (1.8 to 3.7)
Analgesic 100 41.2 142 23.3 2.7 (1.9 to 3.9) 116 47.7 203 33.3 2.3 (1.6 to 3.4)
SSRI 43 17.7 115 18.9 1.0 (0.7 to 1.5) 57 23.5 148 24.3 1.1 (0.8 to 1.6)
Tricyclic 20 8.2 37 6.1 1.5 (0.8 to 2.7) 29 11.9 58 9.5 1.3 (0.8 to 2.1)
a Index date: date of diagnosis for cases, date of diagnosis of matched case for controls. CI = confidence interval. Gynaecological = vulvo-vaginal symptoms, pelvic inflammation.
Lower GI = pain, bloating, irritable bowel syndrome. NSAID = non-steroidal anti-inflammatory drug. OR = odds ratio. Ovarian = coded diagnosis of ovarian cysts and related
conditions. SSRI = selective serotonin reuptake inhibitor and related antidepressants. Upper GI = dyspepsia, reflux, nausea.
e820 British Journal of General Practice, December 2017
are indicated. These show differing patterns.
The plot for fertility problems (infertility)
shows that until 1.5 years before diagnosis
there was no association with a diagnosis
of endometriosis, but from there the
OR increased until about 0.5 years before
diagnosis, at which point it stayed elevated.
This is interpreted as indicating that the
time delay from the occurrence of infertility
to diagnosis is relatively short, presumably
as infertility leads to referral including
diagnostic laparoscopy.
The plot for gynaecological pain shows
that the OR was significantly elevated
several years prior to diagnosis and that
this increased in the year prior to diagnosis
(at least in the population comparison). The
two plots for non-specific symptoms (fatigue
and lower gastrointestinal symptoms) show
patterns of longstanding modest elevation.
The bottom row of plots in Figure 1 shows
two composite features: lower GI symptoms
within 90 days of gynaecological pain and
episodes of gynaecological pain >180 days
apart. Although CIs for these composites
were wider there was a suggestion of a
trend over time in the lower GI plus pain
combination.
Discussion
Summary
This study has two important new findings.
First, the predictive value of several
composite features for a subsequent
diagnosis of endometriosis in routine
records was evaluated. Second, for the
first time, different time trends in the
appearance of recorded clinical features of
endometriosis were demonstrated.
Strengths and limitations
The choice of features as pointers used
principles of feature selection based on
expert input,
19 and methods of data
consolidation and aggregation that have been
developed for use with clinical data sources
Table 3. Numbers, proportions, and odds ratios (95% CI) for composite features in the 3 years before
diagnosis/matching a
Comparison with population controls Comparison with symptomatic controls
Composite feature Cases ( N = 366) Controls ( N = 1453) Cases ( N = 261) Controls ( N = 610)
n % n % OR 95% CI n % n % OR 95% CI
Pain during contraception 40 10.9 24 1.7 7.4 (4.3 to 12.7) 40 16.5 38 6.2 3.0 (1.9 to 5.0)
Pain follow contraception (180) 17 4.6 8 0.6 8.5 (3.7 to 19.7) 17 7.0 17 2.8 3.1 (1.5 to 6.4)
Pain exclusive contraception 105 28.7 55 3.8 14.2 (9.1 to 22.0) 105 43.2 110 18.0 4.3 (2.9 to 6.2)
Menstrual during contraception 38 10.4 65 4.5 2.6 (1.7 to 4.1) 38 15.6 87 14.3 1.1 (0.7 to 1.8)
Menstrual follow contraception (180) 14 3.8 8 0.6 7.0 (2.9 to 16.7) 14 5.8 17 2.8 2.0 (1.0 to 4.2)
Analgesic during contraception 51 13.9 90 6.2 2.5 (1.7 to 3.7) 39 16.1 59 9.7 2.0 (1.3 to 3.1)
Analgesic follow contraception (180) 27 7.4 26 1.8 4.5 (2.5 to 7.8) 21 8.6 21 3.4 2.8 (1.5 to 5.3)
Analgesic exclusive contraception 116 31.7 68 4.7 12.0 (8.1 to 17.8) 116 47.7 132 21.6 3.9 (2.7 to 5.6)
NSAID during contraception 56 15.3 92 6.3 2.9 (2.0 to 4.2) 48 19.8 68 11.2 2.0 (1.3 to 3.0)
NSAID follow contraception (90) 27 7.4 28 1.9 4.0 (2.3 to 6.8) 21 8.6 19 3.1 3.0 (1.6 to 5.8)
Pain proximity menstrual (360) 61 16.7 23 1.6 15.1 (8.5 to 26.6) 61 25.1 34 5.6 6.5 (3.9 to 10.6)
Analgesic proximity menstrual (90) 29 7.9 19 1.3 6.3 (3.5 to 11.4) 29 11.9 30 4.9 2.6 (1.5 to 4.6)
Analgesic proximity pain (90) 45 12.3 15 1.0 15.5 (8.0 to 30.1) 45 18.5 20 3.3 7.1 (4.0 to 12.5)
NSAID proximity pain (90) 63 17.2 28 1.9 10.9 (6.7 to 17.7) 63 25.9 40 6.6 6.0 (3.7 to 9.7)
Lower GI proximity pain (90) 48 13.1 12 0.8 15.9 (8.4 to 29.9) 48 19.8 24 3.9 6.1 (3.6 to 10.6)
Lower GI proximity menstrual (90) 35 9.6 23 1.6 6.3 (3.7 to 10.7) 35 14.4 39 6.4 2.6 (1.6 to 4.1)
Pain separated by > 180 days 36 9.8 14 1.0 12.5 (6.3 to 24.6) 36 14.8 14 2.3 8.5 (4.3 to 16.9)
a Composite feature names follow the format X relationship Y [N] where relationship is defined as follows:
X during Y; only used where Y = contraception. X = feature and occurs at least once after the onset date and before the expected offset date of at least one contraceptive
prescription.
X follow Y (N); N = number of days. Y = discrete time point event. X = feature and occurs between 1 and N days after Y. Where Y = contraception, N days relate to the expected
offset date. X proximity Y (N); used where X and Y = discrete time point events and N is a number of days. X occurs between N days before and N days after Y. X exclusive Y;
currently only used where Y = contraception: X = feature. X and Y are present but criteria for X during Y are never met. A single prescription of contraception occurring on the
same day as a code for dysmenorrhoea would meet X exclusive Y criteria as X during Y requires X after the onset of contraception. X separated by >(N) days; two consecutive
occurrences of X separated by more than N days.
CI = confidence interval. GI = gastrointestinal. NSAID = non-steroidal anti-inflammatory drug. OR = odds ratio.
Funding
This study was funded by the Chief
Scientist Office of NHS Scotland through
its first health informatics call (reference
HICG/1/25). The funder played no role in
conducting the research or in writing the
article.
Ethical approval
The study involved analysis of anonymised
data. Access to the data was approved
by the Research Applications and Data
Management Team at the University of
Aberdeen.
Provenance
Freely submitted; externally peer reviewed.
Competing interests
The authors have declared no competing
interests.
British Journal of General Practice, December 2017 e821
other than GP records. 17,20 This sequence
of steps is broadly comparable with other
recent approaches to the summarisation of
clinical data.
20,21 An established anonymised
GP record set was used that contained both
diagnostic and symptom codes using the
Read Code format, which means that the
Method
is transferable to other research
datasets and potentially to clinical use.
There were limitations relating to the data,
as the data were from standalone primary
care records with no linkage to secondary
care records, meaning that the reliability
of GPs’ diagnoses of endometriosis could
not be assessed. However, in the authors’
experience, GP practices tend not to code
such diagnoses without specialist opinion.
The data were more sparse than anticipated,
with only around half of cases having
cardinal clinical features of endometriosis
recorded prior to diagnosis. This probably
reflects the limited use of symptom codes
by GPs, even in this database where a
reason for consultation was meant to be
given for each attendance. The rates of
coding of procedures such as laparoscopy
was surprisingly low; the authors suspect
this is because GP practices had coded the
findings of the laparoscopy rather than the
procedure itself. Finally, as the duration of
the database was shorter than a female’s
reproductive period, a decision was made
to exclude some females aged >35 years
and diagnosed with endometriosis in order
to maintain a focus on females for whom
electronic health records were more likely
to have data about earlier menstrual and
related symptoms.
Comparison with existing literature
The authors are not aware of other studies
that have looked for combinations of
features in time as predictors of diagnoses
in GP records. Although combinations of
symptoms are commonly used in cancer
prediction tools, these are usually simply
recorded as present or absent,
22 whereas
in this study temporal relationships were
specified in order to increase the specificity
of pointers. Other studies of endometriosis
have only reported single items.
5
Implications for research and practice
The composite predictors of a diagnosis
of endometriosis variables reflect the
patterns that clinicians observe, and, for
the first time, they have been tested using
data in routine GP records over time.
These combinations — including pain and
menstrual symptoms in the same year;
pain and lower GI symptoms in the same
90 days; and episodes of pain separated by
at least 6 months — are likely to be clinically
useful, as pointers to a diagnosis in their
own right. However, the fact that they
can be derived from existing data means
that they have potential to be included
in diagnostic support software within GP
records.
23 This study did not have sufficient
cases to split the data into derivation and
test sets, but future studies can use these
composite features to test their predictive
value in larger and better linked datasets.
Additionally, machine learning techniques
have a potential value in feature reduction
and model selection.
24,25 Ultimately, the aim
must be to apply these observations within
predictive models for earlier referral and
diagnosis of endometriosis.
Infertility Gynaecological pain
Years gap before index date
3.0
0.1 0.5 2.0 10.0
Odds ratio
2.5 2.0 1.5 1.0 0.5 0.0
Years gap before index date
3.0
0.1 0.5 2.0 10.0
Odds ratio
2.5 2.0 1.5 1.0 0.5 0.0
Fatigue Lower GI symptoms
Years gap before index date
3.0
0.1 0.5 2.0 10.0
Odds ratio
2.5 2.0 1.5 1.0 0.5 0.0
Years gap before index date
3.0
0.1 0.5 2.0 10.0
Odds ratio
2.5 2.0 1.5 1.0 0.5 0.0
Analgesic prescription NSAID prescription
Years gap before index date
3.0
0.1 0.5 2.0 10.0
Odds ratio
2.5 2.0 1.5 1.0 0.5 0.0
Years gap before index date
3.0
0.1 0.5 2.0 10.0
Odds ratio
2.5 2.0 1.5 1.0 0.5 0.0
Gynaecological pain and lower GI witin 90 days Episodes gynaecological pain >180 days apart
Years gap before index date
3.0
0.1 0.5 2.0 10.0
Odds ratio
2.5 2.0 1.5 1.0 0.5 0.0
Years gap before index date
3.0
0.1 0.5 2.0 10.0
Odds ratio
2.5 2.0 1.5 1.0 0.5 0.0
OR versus population controls
OR versus symptomatic controls
OR versus population controls
OR versus symptomatic controls
OR versus population controls
OR versus symptomatic controls
OR versus population controls
OR versus symptomatic controls
OR versus population controls
OR versus symptomatic controls
OR versus population controls
OR versus symptomatic controls
OR versus population controls
OR versus symptomatic controls
OR versus population controls
OR versus symptomatic controls
Figure 1. Plots of OR for individual features over
3 years, by gap between the end of the 3-year window
and the date of diagnosis/matching. Dotted lines
indicate 95% CI for ORs.
CI = confidence interval. OR = odds ratio.
Acknowledgements
The authors thank the expert clinicians and
representatives of Endometriosis UK for
their interviews.
Discuss this article
Contribute and read comments about this
article: bjgp.org/letters
e822 British Journal of General Practice, December 2017
References
1. Ballard K, Lowton K, Wright J. What’s the delay? A qualitative study of
women’s experiences of reaching a diagnosis of endometriosis. Fertil Steril
2006; 86(5): 1296–1301.
2. Dunselman GA, Vermeulen N, Becker C, et al . ESHRE guideline: management
of women with endometriosis. Hum Reprod 2014; 29(3): 400–412.
3. Pugsley Z, Ballard K. Management of endometriosis in general practice: the
pathway to diagnosis. Br J Gen Pract 2007; 57(539): 470–476.
4. Staal AH, van der Zanden M, Nap AW. Diagnostic delay of endometriosis in
the Netherlands. Gynecol Obstet Invest 2016; 81(4): 321–324.
5. Ballard KD, Seaman HE, de Vries CS, Wright JT. Can symptomatology help in
the diagnosis of endometriosis? Findings from a national case-control study
— Part 1.
BJOG 2008; 115(11): 1382–1391.
6. Simoens S, Dunselman G, Dirksen C et al . The burden of endometriosis:
costs and quality of life of women with endometriosis and treated in referral
centres.
Hum Reprod 2012; 27(5): 1292–1299.
7. Culley L, Law C, Hudson N, et al . The social and psychological impact of
endometriosis on women’s lives: a critical narrative review. Hum Reprod
Update 2013; 19(6): 625–639.
8. Abbas S, Ihle P, Köster I, Schubert I. Prevalence and incidence of diagnosed
endometriosis and risk of endometriosis in patients with endometriosis-
related symptoms: findings from a statutory health insurance-based cohort in
Germany.
Eur J Obstet Gynecol Reprod Biol 2012; 160(1): 79–83.
9. Lemaire GS. More than just menstrual cramps: symptoms and uncertainty
among women with endometriosis. J Obstet Gynecol Neonatal Nurs 2004;
33(1): 71–79.
10. Nnoaham KE, Hummelshoj L, Kennedy SH, et al. Developing symptom-based
predictive models of endometriosis as a clinical screening tool: results from a
multicenter study. Fertil Steril 2012; 98(3): 692–701.
11. Gupta D, Hull ML, Fraser I, et al . Endometrial biomarkers for the non-
invasive diagnosis of endometriosis. Cochrane Database Syst Rev 2016; (4):
CD012165.
12. Nisenblat V, Bossuyt PM, Farquhar C, et al . Imaging modalities for the non-
invasive diagnosis of endometriosis. Cochrane Database Syst Rev 2016; (2):
CD009591.
13. Hirsch M, Begum MR, Paniz E, et al . Diagnosis and management of
endometriosis: a systematic review of international and national guidelines.
BJOG 2017; Jul 29. DOI: 10.1111/1471-0528.14838
14. Ballard K, Lane H, Hudelist G, et al . Can specific pain symptoms help in the
diagnosis of endometriosis? A cohort study of women with chronic pelvic pain.
Fertil Steril 2010; 94(1): 20–27.
15. Chapron C, Souza C, Borghese B, et al. Oral contraceptives and
endometriosis: the past use of oral contraceptives for treating severe primary
dysmenorrhea is associated with endometriosis, especially deep infiltrating
endometriosis. Hum Reprod 2011; 26(8): 2028–2035.
16. Sleeman D, Moss L, Aiken A, et al . Detecting and resolving inconsistencies
between domain experts’ different perspectives on (classification) tasks. Artif
Intell Med 2012; 55(2): 71–86.
17. Reis BY, Kohane IS, Mandl KD. Longitudinal histories as predictors of future
diagnoses of domestic abuse: modelling study.
BMJ 2009; 339: b3677.
18. Burton C, Cochran AJ, Cameron IM. Restarting antidepressant treatment
following early discontinuation — a primary care database study. Fam Pract
2015; 32(5): 520–524.
19. Sleeman D, Moss L, Sim M, Kinsella J. Predicting adverse events: detecting
myocardial damage in intensive care unit (ICU) patients. KCAP 2011, the Sixth
International Conference on Knowledge Capture. Banff, Alberta, Canada:
2011. New York: ACM Press: 73–79. DOI: 10.1145/1999676.1999690.
20. Feblowitz JC, Wright A, Singh H,
et al . Summarization of clinical information:
a conceptual model. J Biomed Inform 2011; 44(4): 688–699.
21. Hirsch JS, Tanenbaum JS, Lipsky Gorman S, et al . HARVEST, a longitudinal
patient record summarizer . J Am Med Inform Assoc 2015; 22(2): 263–274.
22. Hamilton W. The CAPER studies: five case-control studies aimed at identifying
and quantifying the risk of cancer in symptomatic primary care patients.
Br J
Cancer 2009; 101(Suppl 2): 80–86.
23. Nurek M, Kostopoulou O, Delaney BC, Esmail A. Reducing diagnostic errors
in primary care. A systematic meta-review of computerized diagnostic
decision support systems by the LINNEAUS collaboration on patient safety in
primary care.
Eur J Gen Pract 2015; 21(Suppl): 8–13.
24. Mitchell TM. Machine learning . Boston: WBC/McGraw-Hill, 1997.
25. Darcy AM, Louie AK, Roberts LW. Machine learning and the profession of
medicine. JAMA 2016; 315(6): 551–552.
British Journal of General Practice, December 2017 e823
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.