{"paper_id":"4dc3c6a9-957d-450d-98a9-0bfc337a7b34","body_text":"INTRODUCTION\nEndometriosis is a common gynaecological \ncondition in which there is often a long time \nbetween first primary care consultation and \ndiagnosis.\n1–4  A longer time to diagnosis \nis associated with prolonged symptoms, \nparticularly pain and\n5,6  subfertility, along \nwith patient frustration and demoralisation. 7  \nEndometriosis can be difficult to \ndiagnose clinically; its symptoms are both \ncommon\n8  and non-specific, so are often \nconsidered by GPs as part of the normal \nmenstrual experience, 9  or attributed to \nother conditions. 5  The use of very detailed \nquestions about symptoms can increase \ndiagnostic accuracy.\n10  However, current \nbiomarkers 11  and imaging 12  have limited \nbenefit, and there is substantial variation in \nguideline recommendations for diagnosis \nand management of this condition. 13\nMost research on the clinical features of \nendometriosis in primary care has focused \non features present at a single point in time, \ntypically the time of diagnosis.\n5,14  However, \nwith endometriosis, the symptoms at \nany single point in time have only limited \npredictive value\n2  and the problem of delays \nin diagnosis requires an understanding \nof when symptoms first appear . Although \ndata in electronic records contain many \nsingle items, experienced practitioners \ntypically recognise composite patterns that \ninvolve combinations of items. For example, \nrepeated episodes of dysmenorrhoea, except \nwhen taking hormonal contraception,\n15  are \nrecognised by experienced clinicians as \nhaving diagnostic value in endometriosis. \nAlthough such knowledge-derived features\n16  \nare not immediately present in electronic \nrecords, they can be constructed.\n17  However, \nthe authors are not aware of studies that \nhave attempted to do this using primary care \ndata or for endometriosis. \nThis study aimed to: (a) construct \nenriched datasets from electronic health \nrecords, which contained conventional and \ncomposite features potentially predictive of \nendometriosis; (b) examine the association \nof these features with a subsequent \ndiagnosis of endometriosis in a nested \ncase-control study; and (c) examine the \nrelationship of these features to diagnosis \nat different time periods before the date of \ndiagnosis. \nMETHOD\nData source\nData from the Practice Team Information \n(PTI) database, a subset of the Primary Care \nClinical Informatics Unit Research database \nheld by the University of Aberdeen, were \nobtained. It includes anonymised data from \nprimary care electronic health records of \napproximately 224 000 patients registered \nwith a primary care physician, and is broadly \nrepresentative of the Scottish population \nResearch\nC Burton,  MD, professor of primary medical \ncare, Academic Unit of Primary Medical Care, \nUniversity of Sheffield, Sheffield; Institute of \nApplied Health Sciences, University of Aberdeen, \nAberdeen. D Ayansina, MBBS, research fellow; \nS Bhattacharya, PhD, senior lecturer, Institute of \nApplied Health Sciences; L Iverson, PhD, research \nfellow, Institute of Applied Health Sciences; \nD Sleeman, PhD, emeritus professor, Computing \nSciences, Natural and Computing Sciences, \nUniversity of Aberdeen, Aberdeen. L Saraswat, \nPhD, consultant gynaecologist, Aberdeen Royal \nInfirmary, Aberdeen.\nAddress for correspondence\nChristopher Burton, Academic Unit of Primary \nMedical Care, University of Sheffield, Samuel Fox \nHouse, Sheffield, S5 7AU, UK.\nE-mail: \nchris.burton@sheffield.ac.uk\nSubmitted: 16 May 2017; Editor’s response:  \n9 June 2017; final acceptance: 11 August 2017.\n©British Journal of General Practice\nThis is the full-length article (published online  \n7 Nov 2017) of an abridged version published in \nprint. Cite this version as: Br J Gen Pract 2017;  \nDOI: https://doi.org/10.3399/bjgp17X693497\nChristopher Burton, Lisa Iversen, Sohinee Bhattacharya, Dolapo Ayansina, Lucky Saraswat  \nand Derek Sleeman\nPointers to earlier diagnosis of endometriosis:\na nested case-control study using primary care electronic health records\nAbstract\nBackground\nEndometriosis is a condition with relatively non-\nspecific symptoms, and in some cases a long \ntime elapses from first-symptom presentation \nto diagnosis. \nAim\nTo develop and test new composite pointers to \na diagnosis of endometriosis in primary care \nelectronic records.\nDesign and setting\nThis is a nested case-control study of 366 cases \nusing the Practice Team Information database \nof anonymised primary care electronic health \nrecords from Scotland. Data were analysed \nfrom 366 cases of endometriosis between 1994 \nand 2010, and two sets of age and GP practice \nmatched controls: (a) 1453 randomly selected \nfemales and (b) 610 females whose records \ncontained codes indicating consultation for \ngynaecological symptoms.\nMethod\nComposite pointers comprised patterns of \nsymptoms, prescribing, or investigations, in \ncombination or over time. Conditional logistic \nregression was used to examine the presence \nof both new and established pointers during the \n3 years before diagnosis of endometriosis and \nto identify time of appearance. \nResults\nA number of composite pointers that were \nstrongly predictive of endometriosis were \nobserved. These included pain and menstrual \nsymptoms occurring within the same year \n(odds ratio [OR] 6.5, 95% confidence interval \n[CI] = 3.9 to 10.6), and lower gastrointestinal \nsymptoms occurring within 90 days of \ngynaecological pain (OR 6.1, 95% CI = 3.6 to \n10.6). Although the association of infertility with \nendometriosis was only detectable in the year \nbefore diagnosis, several pain-related features \nwere associated with endometriosis several \nyears earlier .\nConclusion\nUseful composite pointers to a diagnosis of \nendometriosis in GP records were identified. \nSome of these were present several years \nbefore the diagnosis and may be valuable \ntargets for diagnostic support systems. \nKeywords\ndiagnosis; electronic health records; \nendometriosis; primary care.e816  British Journal of General Practice, December 2017\n\nwith regards to age, sex, deprivation, and \nurban/rural ratio mix. It includes data \ncollected annually between 2004 and 2010. \nPractices in the PTI project were expected \nto record every clinical encounter using \nRead Codes for clinical diagnoses and/\nor main reasons for consultation. All GP \nprescriptions were automatically recorded. \nInvestigations and therapeutic procedures \nwere coded differently over time — \nincreasing towards the end of the database \nperiod. \nPopulations\nThis study was a nested case-control study. \nCases were females with a diagnosis of \nendometriosis, who were born after \n1 January 1974 and were, therefore, \n≤36 years on 1 January 2010. This enabled \nus to capture teenage menstrual symptoms \nfor the majority of females and avoid the \npossibility that an apparent new diagnosis \nin an older female was actually a historical \ndiagnosis being recorded for the first time \ndue to the creation of computerised record \nsummaries. \nPopulation controls were randomly \nselected for each case and individually \nmatched by age and GP practice, with up to \nfour controls per case (subject to availability). \nA second control group comprised \nfemales with codes for gynaecological \nsymptoms (pain, menstrual symptoms, or \ninfertility) but with no recorded diagnosis \nof endometriosis. These controls were \nalso randomly selected for each case and \nindividually matched by age and GP practice, \nwith up to four symptomatic controls per \ncase. The index date for cases was defined \nas the date of diagnosis of endometriosis \nand for controls as the date of diagnosis \nof endometriosis in the matched case. All \ncases and controls were required to have \nbeen registered with their GP practice for at \nleast 1 year before the index date.\nData extraction and preparation\nBox 1 lists the key data extracted and the \ncategories into which related items were \ngrouped. Most items were allocated to a \nsingle time point. However, for contraception \nprescriptions, which commonly lasted for \n6 months or longer, details were used \nabout each prescription to estimate the \nonset and offset of contraception using \nmethods previously employed to ascertain \nthe continuity of prescribing.\n18  \nThe data was enriched by introducing \ncomposite features that were based on \nthe clinical experience of the investigators \nand on interviews with 10 experts \n(six gynaecologists, two specialists in \nreproductive health, and two representatives \nof a lay support organisation). Interviews \nsought to identify tacit patterns in symptoms, \nwhich clinicians thought may be predictive \nof a diagnosis, were audio recorded, \ntranscribed, and analysed thematically. \nComposite features were specified \naccording to one of five relationships: \nproximity, following, separated, during, and \nexclusive. These are summarised in Box 2.\nThe presence of each feature (single and \ncomposite) was ascertained in the record \nof each individual at any time, and during a \nHow this fits in\nEndometriosis is a relatively common \ncondition but the time from first \npresentation to diagnosis is often \nlonger than ideal as symptoms are non-\nspecific. This study used anonymised \nGP record data to construct new \npointers to diagnosis, which identified \npatterns of symptoms in time. Distinct \nepisodes of gynaecological pain and \ncombinations of gynaecological pain on \none occasion with menstrual symptoms \nor lower gastrointestinal symptoms on \nanother appear to be useful pointers to \nendometriosis. Patterns such as these \nmake sense to clinicians and could be \nintegrated into electronic diagnostic \nsupport systems. \nBox 1. Categories of data grouped by data type\nData type Data description Included data categories\nSpecific features Classical features of  Pain (pelvic pain, dyspareunia, dysmenorrhoea)  \n endometriosis (pelvic pain,  Menstrual (flow)  \n dysmenorrhoea, dyspareunia  Infertility  \n and infertility) 2,5,9,14 Ovarian (for example, cysts)\nNon-specific Abdominal pain and  Menstrual (timing)  \nsymptoms gastrointestinal symptoms,  Genital/other gynaecological  \n fatigue, urinary symptoms;  Urinary  \n additional diagnoses, including Lower GI  \n irritable bowel syndrome 5 Upper GI  \n  Fatigue\nDiagnostic tests Primary care tests, referred  Full blood count  \nand procedures investigations such as diagnostic  Genital swabs  \n ultrasound, and specialist  Laparoscopy  \n procedures such as laparoscopy Abdominal or pelvic ultrasound  \n  Thyroid function\nTreatments Hormonal treatment for  Hormonal treatment  \n endometriosis (for example,  Contraception  \n gonadotropin-releasing  NSAID  \n hormone agonists) Codeine or other opioids  \n Prescriptions for contraception Tricyclic  \n Analgesic drugs SSRI and related antidepressants  \n Antidepressant drugs\nLower GI =  pain, bloating, irritable bowel syndrome. NSAID =  non-steroidal anti-inflammatory drugs. \nSSRI =  selective serotonin reuptake inhibitor . Upper GI =  dyspepsia, reflux, nausea.  \nBritish Journal of General Practice, December 2017  e817\n\nseries of overlapping 3-year time windows \nset at different intervals from the index date \n(for diagnosis or matching). The windows \nwere defined using intervals between the \nend of the window and the index date of 0, 3, \n6, 12, 18, 24, and 36 months. The appearance \nof statistical associations between available \ninformation in the record and diagnosis \nover time were examined by comparing \nthe same measure in different windows. \nThe purpose of this was to differentiate \nbetween features that were present long \nbefore diagnosis (and may thus indicate \nmissed diagnostic opportunities) and those \nthat appeared only shortly before diagnosis \n(and may thus have triggered referral). \nAnalysis of association of features and \npatterns with diagnosis \nConditional logistic regression was carried \nout to examine the association between each \nfeature (conventional or composite) and the \ndiagnosis of endometriosis. Each feature \nwas reported as either present or absent \nwithin the time period. Rather than use \ncounts of how often a feature occurred, the \n‘separated’ composite variables were used \nto indicate multiple episodes. Conditional \nlogistic regression was conducted for all \nfeatures for which at least 10 individuals \n(cases or controls) had the feature present \nand reported as the odds ratio (OR), with \n95% confidence intervals (CIs). All analyses \nwere conducted in R 3.3.2 (version 2016).\nThe analysis was conducted separately \nwith population and symptomatic control \ngroups. For the population comparison all \ncases and their matched controls were \nincluded. For the symptomatic comparison, \nonly cases that had recorded symptoms \nand their matched controls were included. \nFor the time window analysis, the data were \nlimited to females who had been registered \nwith their practice for at least 1 year before \nthe beginning of the gap. The odds ratios for \neach feature at each of the six different time \ngaps were plotted in order to visualise the \nappearance of predictive features over time. \nRESULTS\nPatient characteristics\nData from 366 cases and 1453 matched \npopulation controls were obtained. Of these, \n243 cases had gynaecological symptoms \n(pain, menstrual symptoms, and infertility) \nand were matched to a further 610 controls \nwith comparable symptoms. The median \nage at diagnosis was 25 years, interquartile \nrange 22–28 years, and age at diagnosis \nwas <20 years in 47 (12.8%) cases. \nData quality\nIn total, 191 cases (52.2%) were registered \nwith the same GP practice for at least \n5 years before diagnosis and, therefore, had \ncontinuous records in the PTI database; 114 \n(31.2%) were registered for at least 8 years \nbefore diagnosis. Similar proportions were \nseen for population controls (746 [51.3%] \nand 469 [32.3%] respectively), but more \nof the symptomatic controls had been \nregistered for these time periods (414/610, \n67.9% and 273/610, 44.8%). A recorded code \nfor laparoscopy was found in only 47 (12.8%) \ncases despite this being the commonest \ndiagnostic procedure for endometriosis. \nThis is likely to represent a preference for \nrecording the diagnosis rather than the \nprocedure by which it was made, although \ninstances of a clinical diagnosis being \nentered without any confirmatory tests \ncannot be excluded. Likewise, there were \nfew coded surgical procedures, for example, \n13 cases (3.5%) had a recorded operation \nfor tubal or ovarian problems excluding \ndiagnostic laparoscopy. These procedures \nwere excluded from the analysis, focusing \ninstead on clinical features, investigations, \nand medical treatments.\nOccurrence of diagnostic features\nThere were 145 cases (39.6%) that had \na code recorded for gynaecological pain \n(dysmenorrhoea, pelvic pain) during the \n3 years prior to diagnosis and 39 (10.7%) had \na code for infertility. And 198 cases (54.1%) \nhad neither of these during the 3 years prior \nto diagnosis. \nThe numbers and proportions of females \nwith at least one instance of each feature, \nBox 2. Types of composite features used in constructing predictors\nRelationship Specification Example\nProximity An occurrence of one feature within a given Pain and fatigue within 90 days of  \n number of days of the other but with no each other  \n specification of which should come first\nFollowing An occurrence of one feature within  Pain occurring within 90 days of  \n a given number of days of the other with  estimated cessation of contraception  \n specification of which should come first\nSeparated Two consecutive recordings of a  Two consecutive episodes of pain  \n single feature occurring at least a  separated by at least 180 days  \n given number of days apart (this permits  \n differentiation of separate episodes  \n from repeated consultation during  \n the same episode)\nDuring An occurrence of a symptom or other feature Pain during estimated duration  \n after the onset, and before the expected of prescription for contraception  \n offset, of a contraception prescription \nExclusive A feature only occurring in the absence  Pain but only outside of estimated  \n of another periods of prescribed contraception\ne818  British Journal of General Practice, December 2017\n\neither in the 3 years prior to the index date \nor at any time, are shown in Table 1 (all \ncases [ N  = 366] and population controls) \nand Table 2 (symptomatic cases [ N  = 261] \nand controls). Table 1 and Table 2 also \nshow the odds ratios (OR), with 95% CIs \nfor the two comparisons: all cases versus \npopulation controls and symptomatic cases \n(gynaecological pain, menstrual symptoms, \nor infertility) versus matched symptomatic \ncontrols.\nAs expected, pain was more common \nin cases in both comparisons: OR 14.9, \n95% CI = 10.1 to 21.9 versus population \ncontrols and OR 5.6, 95% CI = 3.9 to 8.1 \nversus symptomatic controls over 3 years’ \ndata. Menstrual bleeding and timing \nsymptoms were coded more commonly \nthan in population controls, OR 3.8, 95% \nCI = 2.8 to 5.0 and 2.1, 95% CI = 1.4 to 3.2, \nbut not in comparison with symptomatic \ncontrols, OR 1.0, 95% CI = 0.7 to 1.4 and 1.2, \n95% CI = 0.7 to 1.9. Non-specific clinical \nfeatures such as fatigue, vulvo-vaginal \nproblems, and lower gastrointestinal \nsymptoms were all more common in cases \nthan population controls. \nAlthough simple tests such as full blood \ncount were more common in cases than \npopulation controls, there was no significant \ndifference in the symptomatic comparison. \nGenitourinary swab tests (presumably \nordered because of the possibility that \nsymptoms were due to pelvic inflammation) \nwere more common in cases than controls \nin both comparisons.\nOccurrence of prescribed treatments\nIn both the population and the symptomatic \ngroup comparisons, both analgesics (OR \n3.0, 95% CI = 2.3 to 4.0 and OR 2.7, 95% \nCI = 1.9 to 3.9, in 3 years before index date \nTable 1. Numbers, proportions, and odds ratios (95% CI) for features in cases of endometriosis compared with \npopulation controls\n Occurrence of features in 3 years before index date a Occurrence of features at any time before index date a\n Cases ( N  = 366) Controls ( N  = 1453) Cases ( N  = 366) Controls ( N  = 1453)\nSpecific features n  % n  % OR 95% CI n  % n  % OR 95% CI\nSubfertility 39 10.7 24 1.7 7.7 (4.4 to 13.3) 41 11.2 31 2.1 5.9 (3.6 to 9.7)\nMenstrual — bleeding 121 33.1 179 12.3 3.8 (2.8 to 5.0) 151 41.3 267 18.4 3.3 (2.6 to 4.3)\nMenstrual — timing 39 10.7 80 5.5 2.1 (1.4 to 3.2) 45 12.3 117 8.1 1.6 (1.1 to 2.3)\nOvarian 24 6.6 7 0.5 13.7 (5.9 to 31.8) 25 6.8 11 0.8 9.8 (4.7 to 20.4)\nPain 145 39.6 79 5.4 14.9 (10.1 to 21.9) 169 46.2 146 10.1 9.9 (7.1 to 13.6)\nNon-specific symptoms \nFatigue 56 15.3 121 8.3 2.0 (1.4 to 2.8) 79 21.6 178 12.3 2.0 (1.5 to 2.7)\nGynaecological 51 13.9 47 3.2 5.0 (3.3 to 7.7) 77 21.0 97 6.7 4.0 (2.8 to 5.6)\nLower GI 104 28.4 144 9.9 3.7 (2.8 to 5.0) 126 34.4 213 14.7 3.3 (2.5 to 4.3)\nUpper GI 27 7.4 62 4.3 1.8 (1.1 to 3.0) 50 13.7 107 7.4 2.1 (1.4 to 3.0)\nUrinary 25 6.8 49 3.4 2.1 (1.3 to 3.5) 42 11.5 80 5.5 2.3 (1.5 to 3.5)\nTests and procedures\nFull blood count 40 10.9 102 7.0 2.0 (1.2 to 3.2) 50 13.7 112 7.7 2.6 (1.6 to 4.2)\nGenital swabs 64 17.5 77 5.3 4.5 (3.0 to 6.7) 73 20.0 111 7.6 3.5 (2.5 to 5.0)\nLaparoscopy 42 11.5 13 0.9 14.6 (7.5 to 28.4) 47 12.8 15 1.0 13.9 (7.5 to 25.7)\nThyroid function 53 14.5 112 7.7 2.4 (1.6 to 3.5) 67 18.3 132 9.1 2.8 (1.9 to 4.1)\nUltrasound 14 3.8 5 0.3 12.3 (4.0 to 37.8) 14 3.8 11 0.8 5.0 (2.2 to 11.4)\nTreatments\nContraception 201 54.9 716 49.3 1.3 (1.0 to 1.6) 234 63.9 800 55.1 1.5 (1.2 to 2.0)\nNSAID 171 46.7 276 19.0 4.8 (3.6 to 6.4) 191 52.2 393 27.1 3.8 (2.9 to 5.1)\nAnalgesic 136 37.2 254 17.5 3.0 (2.3 to 4.0) 156 42.6 343 23.6 2.7 (2.1 to 3.5)\nSSRI 65 17.8 188 12.9 1.5 (1.1 to 2.0) 85 23.2 229 15.8 1.7 (1.2 to 2.2)\nTricyclic 29 7.9 60 4.1 2.2 (1.3 to 3.6) 42 11.5 82 5.6 2.4 (1.6 to 3.6) \na Index date: date of diagnosis for cases, date of diagnosis of matched case for controls. CI =  confidence interval. Gynaecological =  vulvo-vaginal symptoms, pelvic inflammation. \nLower GI =  pain, bloating, irritable bowel syndrome. NSAID =  non-steroidal anti-inflammatory drug. OR =  odds ratio. Ovarian =  coded diagnosis of ovarian cysts and related \nconditions. SSRI =  selective serotonin reuptake inhibitor and related antidepressants. Upper GI =  dyspepsia, reflux, nausea.\nBritish Journal of General Practice, December 2017  e819\n\ncomparison) and NSAIDs (OR 4.8, 95% \nCI = 3.6 to 6.4 and OR 3.0, 95% CI = 2.1 to \n4.2, in 3 years before index date comparison) \nwere more commonly prescribed to cases \nthan controls. When comparing cases \nand symptomatic controls, there was no \nassociation with antidepressant drugs \n(either tricyclic or SSRI and related).\nComposite features\nTable 3 shows the number and proportion of \npatients with at least one instance of each \nof the composite features over the 3 years \nbefore date of diagnosis/matching. Several \ncomposite features had high ORs when \ncases were compared with symptomatic \ncontrols: pain and menstrual symptoms \nwithin the same year (pain proximity \nmenstrual [360]), OR 6.5, 95% CI = 3.9 to \n10.6 and lower gastrointestinal symptoms \noccurring within 90 days of gynaecological \npain (OR 6.1, 95% CI = 3.6 to 10.6). Episodes \nof gynaecological pain separated by at least \n180 days were approximately eight times as \nlikely in cases than symptomatic controls \n(OR 8.5, 95% CI = 4.3 to 16.9). Although pain \nor analgesic use on stopping contraception \nwas suggested by some of the experts, \nthese composite features occurred in less \nthan 10% of cases, and with only moderate \nORs of approximately 3. \nOccurrence of diagnostic features over \nthe time prior to diagnosis\nFigure 1 shows plots of eight diagnostic \nfeatures, describing the ORs for 3-year time \nwindows with different intervals between the \nend of the 3-year window and the diagnosis/\nmatching date. Each plot compares cases \nwith matched population controls and \nsymptomatic cases with their matched \nsymptomatic controls. In all plots, 95% CIs \nTable 2. Numbers, proportions, and odds ratios (95% CI) for features in cases of endometriosis compared with \nsymptomatic controls\n Occurrence of features in 3 years before index date a Occurrence of features at any time before index date a\n Cases ( N  = 261) Controls ( N  = 610) Cases ( N  = 261) Controls ( N  = 610)\nSpecific features N  % N  % OR 95% CI N  % N  % OR 95% CI\nSubfertility 39 16.1 52 8.5 2.4 (1.4 to 3.9) 41 16.9 64 10.5 1.9 (1.2 to 3.1)\nMenstrual — bleeding 121 49.8 304 49.8 1.0 (0.7 to 1.4) 151 62.1 443 72.6 0.7 (0.5 to 0.9)\nMenstrual — timing 30 12.4 64 10.5 1.2 (0.7 to 1.9) 34 14.0 111 18.2 0.7 (0.5 to 1.1)\nOvarian 14 5.8 3 0.5 12.2 (3.5 to 42.7) 15 6.2 6 1.0 7.0 (2.7 to 18.1)\nPain 145 59.7 148 24.3 5.6 (3.9 to 8.1) 169 69.6 241 39.5 4.0 (2.8 to 5.6)\nNon-specific symptoms\nFatigue 45 18.5 84 13.8 1.4 (0.9 to 2.1) 66 27.2 138 22.6 1.3 (0.9 to 1.9)\nGynaecological 41 16.9 34 5.6 4.2 (2.4 to 7.4) 64 26.3 68 11.2 3.6 (2.3 to 5.6)\nLower GI 79 32.5 109 17.9 2.3 (1.6 to 3.2) 95 39.1 180 29.5 1.7 (1.2 to 2.3)\nUpper GI 24 9.9 51 8.4 1.3 (0.8 to 2.3) 44 18.1 87 14.3 1.5 (1.0 to 2.3)\nUrinary 20 8.2 29 4.8 1.8 (1.0 to 3.4) 36 14.8 64 10.5 1.5 (1.0 to 2.4)\nTests and procedures\nFull blood count 34 14.0 82 13.4 1.2 (0.7 to 2.2) 42 17.3 97 15.9 1.4 (0.8 to 2.4)\nGenital swabs 43 17.7 71 11.6 2.2 (1.3 to 3.5) 50 20.6 90 14.8 1.9 (1.2 to 3.0)\nLaparoscopy 31 12.8 4 0.7 20.0 (7.0 to 57.1) 35 14.4 13 2.1 7.2 (3.7 to 14.1)\nThyroid function 43 17.7 86 14.1 1.5 (0.9 to 2.4) 53 21.8 103 16.9 1.7 (1.1 to 2.7)\nUltrasound 11 4.5 6 1.0 5.2 (1.6 to 17.0) 11 4.5 7 1.2 4.3 (1.4 to 13.0)\nTreatments\nContraception 151 62.1 373 61.2 1.1 (0.8 to 1.5) 178 73.3 421 69.0 1.3 (0.9 to 1.9)\nNSAID 133 54.7 185 30.3 3.0 (2.1 to 4.2) 150 61.7 264 43.3 2.6 (1.8 to 3.7)\nAnalgesic 100 41.2 142 23.3 2.7 (1.9 to 3.9) 116 47.7 203 33.3 2.3 (1.6 to 3.4)\nSSRI 43 17.7 115 18.9 1.0 (0.7 to 1.5) 57 23.5 148 24.3 1.1 (0.8 to 1.6)\nTricyclic 20 8.2 37 6.1 1.5 (0.8 to 2.7) 29 11.9 58 9.5 1.3 (0.8 to 2.1) \na Index date: date of diagnosis for cases, date of diagnosis of matched case for controls. CI =  confidence interval. Gynaecological = vulvo-vaginal symptoms, pelvic inflammation. \nLower GI = pain, bloating, irritable bowel syndrome. NSAID = non-steroidal anti-inflammatory drug. OR =  odds ratio. Ovarian =  coded diagnosis of ovarian cysts and related \nconditions. SSRI = selective serotonin reuptake inhibitor and related antidepressants. Upper GI = dyspepsia, reflux, nausea.\ne820  British Journal of General Practice, December 2017\n\nare indicated. These show differing patterns.\nThe plot for fertility problems (infertility) \nshows that until 1.5 years before diagnosis \nthere was no association with a diagnosis \nof endometriosis, but from there the \nOR increased until about 0.5 years before \ndiagnosis, at which point it stayed elevated. \nThis is interpreted as indicating that the \ntime delay from the occurrence of infertility \nto diagnosis is relatively short, presumably \nas infertility leads to referral including \ndiagnostic laparoscopy.\nThe plot for gynaecological pain shows \nthat the OR was significantly elevated \nseveral years prior to diagnosis and that \nthis increased in the year prior to diagnosis \n(at least in the population comparison). The \ntwo plots for non-specific symptoms (fatigue \nand lower gastrointestinal symptoms) show \npatterns of longstanding modest elevation.\nThe bottom row of plots in Figure 1 shows \ntwo composite features: lower GI symptoms \nwithin 90 days of gynaecological pain and \nepisodes of gynaecological pain >180 days \napart. Although CIs for these composites \nwere wider there was a suggestion of a \ntrend over time in the lower GI plus pain \ncombination.\nDISCUSSION\nSummary\nThis study has two important new findings. \nFirst, the predictive value of several \ncomposite features for a subsequent \ndiagnosis of endometriosis in routine \nrecords was evaluated. Second, for the \nfirst time, different time trends in the \nappearance of recorded clinical features of \nendometriosis were demonstrated.\n \nStrengths and limitations\nThe choice of features as pointers used \nprinciples of feature selection based on \nexpert input,\n19  and methods of data \nconsolidation and aggregation that have been \ndeveloped for use with clinical data sources \nTable 3. Numbers, proportions, and odds ratios (95% CI) for composite features in the 3 years before \ndiagnosis/matching a\n Comparison with population controls Comparison with symptomatic controls\nComposite feature  Cases ( N  = 366) Controls ( N  = 1453) Cases ( N  = 261) Controls ( N  = 610)\n n  % n  % OR 95% CI n  % n  % OR 95% CI\nPain during contraception 40 10.9 24 1.7 7.4 (4.3 to 12.7) 40 16.5 38 6.2 3.0 (1.9 to 5.0)\nPain follow contraception (180) 17 4.6 8 0.6 8.5 (3.7 to 19.7) 17 7.0 17 2.8 3.1 (1.5 to 6.4)\nPain exclusive contraception 105 28.7 55 3.8 14.2 (9.1 to 22.0) 105 43.2 110 18.0 4.3 (2.9 to 6.2)\nMenstrual during contraception 38 10.4 65 4.5 2.6 (1.7 to 4.1) 38 15.6 87 14.3 1.1 (0.7 to 1.8)\nMenstrual follow contraception (180) 14 3.8 8 0.6 7.0 (2.9 to 16.7) 14 5.8 17 2.8 2.0 (1.0 to 4.2)\nAnalgesic during contraception 51 13.9 90 6.2 2.5 (1.7 to 3.7) 39 16.1 59 9.7 2.0 (1.3 to 3.1)\nAnalgesic follow contraception (180) 27 7.4 26 1.8 4.5 (2.5 to 7.8) 21 8.6 21 3.4 2.8 (1.5 to 5.3)\nAnalgesic exclusive contraception 116 31.7 68 4.7 12.0 (8.1 to 17.8) 116 47.7 132 21.6 3.9 (2.7 to 5.6)\nNSAID during contraception 56 15.3 92 6.3 2.9 (2.0 to 4.2) 48 19.8 68 11.2 2.0 (1.3 to 3.0)\nNSAID follow contraception (90) 27 7.4 28 1.9 4.0 (2.3 to 6.8) 21 8.6 19 3.1 3.0 (1.6 to 5.8)\nPain proximity menstrual (360) 61 16.7 23 1.6 15.1 (8.5 to 26.6) 61 25.1 34 5.6 6.5 (3.9 to 10.6)\nAnalgesic proximity menstrual (90) 29 7.9 19 1.3 6.3 (3.5 to 11.4) 29 11.9 30 4.9 2.6 (1.5 to 4.6)\nAnalgesic proximity pain (90) 45 12.3 15 1.0 15.5 (8.0 to 30.1) 45 18.5 20 3.3 7.1 (4.0 to 12.5)\nNSAID proximity pain (90) 63 17.2 28 1.9 10.9 (6.7 to 17.7) 63 25.9 40 6.6 6.0 (3.7 to 9.7)\nLower GI proximity pain (90) 48 13.1 12 0.8 15.9 (8.4 to 29.9) 48 19.8 24 3.9 6.1 (3.6 to 10.6)\nLower GI proximity menstrual (90) 35 9.6 23 1.6 6.3 (3.7 to 10.7) 35 14.4 39 6.4 2.6 (1.6 to 4.1)\nPain separated by > 180 days 36 9.8 14 1.0 12.5 (6.3 to 24.6) 36 14.8 14 2.3 8.5 (4.3 to 16.9) \na Composite feature names follow the format X relationship Y [N] where relationship is defined as follows:  \nX during Y; only used where Y =  contraception. X =  feature and occurs at least once after the onset date and before the expected offset date of at least one contraceptive \nprescription. \nX follow Y (N); N =  number of days. Y =  discrete time point event. X =  feature and occurs between 1 and N days after Y. Where Y =  contraception, N days relate to the expected \noffset date. X proximity Y (N); used where X and Y =  discrete time point events and N is a number of days. X occurs between N days before and N days after Y. X exclusive Y; \ncurrently only used where Y =  contraception: X =  feature. X and Y are present but criteria for X during Y are never met. A single prescription of contraception occurring on the \nsame day as a code for dysmenorrhoea would meet X exclusive Y criteria as X during Y requires X after the onset of contraception. X separated by >(N) days; two consecutive \noccurrences of X separated by more than N days.\nCI =  confidence interval. GI =  gastrointestinal. NSAID = non-steroidal anti-inflammatory drug. OR =  odds ratio. \nFunding\nThis study was funded by the Chief \nScientist Office of NHS Scotland through \nits first health informatics call (reference \nHICG/1/25). The funder played no role in \nconducting the research or in writing the \narticle.\nEthical approval\nThe study involved analysis of anonymised \ndata. Access to the data was approved \nby the Research Applications and Data \nManagement Team at the University of \nAberdeen. \nProvenance\nFreely submitted; externally peer reviewed.\nCompeting interests \nThe authors have declared no competing \ninterests.\nBritish Journal of General Practice, December 2017  e821\n\nother than GP records. 17,20  This sequence \nof steps is broadly comparable with other \nrecent approaches to the summarisation of \nclinical data.\n20,21  An established anonymised \nGP record set was used that contained both \ndiagnostic and symptom codes using the \nRead Code format, which means that the \nmethod is transferable to other research \ndatasets and potentially to clinical use. \nThere were limitations relating to the data, \nas the data were from standalone primary \ncare records with no linkage to secondary \ncare records, meaning that the reliability \nof GPs’ diagnoses of endometriosis could \nnot be assessed. However, in the authors’ \nexperience, GP practices tend not to code \nsuch diagnoses without specialist opinion. \nThe data were more sparse than anticipated, \nwith only around half of cases having \ncardinal clinical features of endometriosis \nrecorded prior to diagnosis. This probably \nreflects the limited use of symptom codes \nby GPs, even in this database where a \nreason for consultation was meant to be \ngiven for each attendance. The rates of \ncoding of procedures such as laparoscopy \nwas surprisingly low; the authors suspect \nthis is because GP practices had coded the \nfindings of the laparoscopy rather than the \nprocedure itself. Finally, as the duration of \nthe database was shorter than a female’s \nreproductive period, a decision was made \nto exclude some females aged >35 years \nand diagnosed with endometriosis in order \nto maintain a focus on females for whom \nelectronic health records were more likely \nto have data about earlier menstrual and \nrelated symptoms.\nComparison with existing literature\nThe authors are not aware of other studies \nthat have looked for combinations of \nfeatures in time as predictors of diagnoses \nin GP records. Although combinations of \nsymptoms are commonly used in cancer \nprediction tools, these are usually simply \nrecorded as present or absent,\n22  whereas \nin this study temporal relationships were \nspecified in order to increase the specificity \nof pointers. Other studies of endometriosis \nhave only reported single items.\n5\nImplications for research and practice\nThe composite predictors of a diagnosis \nof endometriosis variables reflect the \npatterns that clinicians observe, and, for \nthe first time, they have been tested using \ndata in routine GP records over time. \nThese combinations — including pain and \nmenstrual symptoms in the same year; \npain and lower GI symptoms in the same \n90 days; and episodes of pain separated by \nat least 6 months — are likely to be clinically \nuseful, as pointers to a diagnosis in their \nown right. However, the fact that they \ncan be derived from existing data means \nthat they have potential to be included \nin diagnostic support software within GP \nrecords.\n23  This study did not have sufficient \ncases to split the data into derivation and \ntest sets, but future studies can use these \ncomposite features to test their predictive \nvalue in larger and better linked datasets. \nAdditionally, machine learning techniques \nhave a potential value in feature reduction \nand model selection.\n24,25  Ultimately, the aim \nmust be to apply these observations within \npredictive models for earlier referral and \ndiagnosis of endometriosis. \nInfertility Gynaecological pain\nYears gap before index date\n3.0\n0.1 0.5 2.0 10.0\nOdds ratio\n2.5 2.0 1.5 1.0 0.5 0.0\nYears gap before index date\n3.0\n0.1 0.5 2.0 10.0\nOdds ratio\n2.5 2.0 1.5 1.0 0.5 0.0\nFatigue Lower GI symptoms\nYears gap before index date\n3.0\n0.1 0.5 2.0 10.0\nOdds ratio\n2.5 2.0 1.5 1.0 0.5 0.0\nYears gap before index date\n3.0\n0.1 0.5 2.0 10.0\nOdds ratio\n2.5 2.0 1.5 1.0 0.5 0.0\nAnalgesic prescription NSAID prescription\nYears gap before index date\n3.0\n0.1 0.5 2.0 10.0\nOdds ratio\n2.5 2.0 1.5 1.0 0.5 0.0\nYears gap before index date\n3.0\n0.1 0.5 2.0 10.0\nOdds ratio\n2.5 2.0 1.5 1.0 0.5 0.0\nGynaecological pain and lower GI witin 90 days Episodes gynaecological pain >180 days apart\nYears gap before index date\n3.0\n0.1 0.5 2.0 10.0\nOdds ratio\n2.5 2.0 1.5 1.0 0.5 0.0\nYears gap before index date\n3.0\n0.1 0.5 2.0 10.0\nOdds ratio\n2.5 2.0 1.5 1.0 0.5 0.0\nOR versus population controls\nOR versus symptomatic controls\nOR versus population controls\nOR versus symptomatic controls\nOR versus population controls\nOR versus symptomatic controls\nOR versus population controls\nOR versus symptomatic controls\nOR versus population controls\nOR versus symptomatic controls\nOR versus population controls\nOR versus symptomatic controls\nOR versus population controls\nOR versus symptomatic controls\nOR versus population controls\nOR versus symptomatic controls\nFigure 1. Plots of OR for individual features over \n3 years, by gap between the end of the 3-year window \nand the date of diagnosis/matching. Dotted lines \nindicate 95% CI for ORs.\nCI = confidence interval. OR = odds ratio.\nAcknowledgements\nThe authors thank the expert clinicians and \nrepresentatives of Endometriosis UK for \ntheir interviews. \nDiscuss this article\nContribute and read comments about this \narticle: bjgp.org/letters\ne822  British Journal of General Practice, December 2017\n\nREFERENCES\n1. Ballard K, Lowton K, Wright J. What’s the delay? A qualitative study of \nwomen’s experiences of reaching a diagnosis of endometriosis. Fertil Steril  \n2006; 86(5): 1296–1301.\n2. Dunselman GA, Vermeulen N, Becker C, et al . ESHRE guideline: management \nof women with endometriosis. Hum Reprod 2014; 29(3): 400–412.\n3. Pugsley Z, Ballard K. Management of endometriosis in general practice: the \npathway to diagnosis. Br J Gen Pract  2007; 57(539): 470–476.\n4. Staal AH, van der Zanden M, Nap AW. Diagnostic delay of endometriosis in \nthe Netherlands. Gynecol Obstet Invest  2016; 81(4): 321–324.\n5. Ballard KD, Seaman HE, de Vries CS, Wright JT. Can symptomatology help in \nthe diagnosis of endometriosis? Findings from a national case-control study \n— Part 1. \nBJOG  2008; 115(11): 1382–1391.\n6. Simoens S, Dunselman G, Dirksen C et al . The burden of endometriosis: \ncosts and quality of life of women with endometriosis and treated in referral \ncentres. \nHum Reprod 2012; 27(5): 1292–1299.\n7. Culley L, Law C, Hudson N, et al . The social and psychological impact of \nendometriosis on women’s lives: a critical narrative review. Hum Reprod \nUpdate  2013; 19(6): 625–639.\n8. Abbas S, Ihle P, Köster I, Schubert I. Prevalence and incidence of diagnosed \nendometriosis and risk of endometriosis in patients with endometriosis-\nrelated symptoms: findings from a statutory health insurance-based cohort in \nGermany. \nEur J Obstet Gynecol Reprod Biol  2012; 160(1): 79–83.\n9. Lemaire GS. More than just menstrual cramps: symptoms and uncertainty \namong women with endometriosis. J Obstet Gynecol Neonatal Nurs  2004; \n33(1): 71–79.\n10. Nnoaham KE, Hummelshoj L, Kennedy SH,  et al.  Developing symptom-based \npredictive models of endometriosis as a clinical screening tool: results from a \nmulticenter study. Fertil Steril  2012; 98(3): 692–701.\n11. Gupta D, Hull ML, Fraser I, et al . Endometrial biomarkers for the non-\ninvasive diagnosis of endometriosis. Cochrane Database Syst Rev 2016; (4):  \nCD012165. \n12. Nisenblat V, Bossuyt PM, Farquhar C,  et al . Imaging modalities for the non-\ninvasive diagnosis of endometriosis. Cochrane Database Syst Rev  2016; (2): \nCD009591. \n13. Hirsch M, Begum MR, Paniz E, et al . Diagnosis and management of \nendometriosis: a systematic review of international and national guidelines. \nBJOG  2017; Jul 29. DOI: 10.1111/1471-0528.14838\n14. Ballard K, Lane H, Hudelist G, et al . Can specific pain symptoms help in the \ndiagnosis of endometriosis? A cohort study of women with chronic pelvic pain. \nFertil Steril  2010; 94(1): 20–27.\n15. Chapron C, Souza C, Borghese B, et al.  Oral contraceptives and \nendometriosis: the past use of oral contraceptives for treating severe primary \ndysmenorrhea is associated with endometriosis, especially deep infiltrating \nendometriosis. Hum Reprod  2011; 26(8): 2028–2035.\n16. Sleeman D, Moss L, Aiken A, et al . Detecting and resolving inconsistencies \nbetween domain experts’ different perspectives on (classification) tasks. Artif \nIntell Med  2012; 55(2): 71–86.\n17. Reis BY, Kohane IS, Mandl KD. Longitudinal histories as predictors of future \ndiagnoses of domestic abuse: modelling study. \nBMJ  2009; 339: b3677.\n18. Burton C, Cochran AJ, Cameron IM. Restarting antidepressant treatment \nfollowing early discontinuation — a primary care database study. Fam Pract  \n2015; 32(5): 520–524.\n19. Sleeman D, Moss L, Sim M, Kinsella J. Predicting adverse events: detecting \nmyocardial damage in intensive care unit (ICU) patients.  KCAP 2011, the Sixth \nInternational Conference on Knowledge Capture. Banff, Alberta, Canada: \n2011. New York: ACM Press: 73–79. DOI: 10.1145/1999676.1999690.\n20. Feblowitz JC, Wright A, Singh H, \net al . Summarization of clinical information: \na conceptual model. J Biomed Inform  2011; 44(4): 688–699.\n21. Hirsch JS, Tanenbaum JS, Lipsky Gorman S, et al . HARVEST, a longitudinal \npatient record summarizer . J Am Med Inform Assoc 2015; 22(2): 263–274.\n22. Hamilton W. The CAPER studies: five case-control studies aimed at identifying \nand quantifying the risk of cancer in symptomatic primary care patients. \nBr J \nCancer  2009; 101(Suppl 2): 80–86.\n23. Nurek M, Kostopoulou O, Delaney BC, Esmail A. Reducing diagnostic errors \nin primary care. A systematic meta-review of computerized diagnostic \ndecision support systems by the LINNEAUS collaboration on patient safety in \nprimary care. \nEur J Gen  Pract  2015; 21(Suppl): 8–13.\n24. Mitchell TM.  Machine learning . Boston: WBC/McGraw-Hill, 1997. \n25. Darcy AM, Louie AK, Roberts LW. Machine learning and the profession of \nmedicine. JAMA  2016; 315(6): 551–552.\nBritish Journal of General Practice, December 2017  e823","source_license":"CC0","license_restricted":false}