Algorithm Training and Testing for a Nonendoscopic Barrett's Esophagus Detection Test in Prospective Multicenter Cohorts.

doi:10.1016/j.cgh.2024.03.003

Algorithm Training and Testing for a Nonendoscopic Barrett's Esophagus Detection Test in Prospective Multicenter Cohorts.

2024 · doi:10.1016/j.cgh.2024.03.003 · PMID:38513982 · PMC11272429

OA: closed

📄 Open PDF Full text JSON View on PubMed View at publisher

Full text 21,496 characters · extracted from pmc-nxml · 4 sections · click to expand

Methods

Patients for the training set were recruited at 6 US medical centers ( NCT02560623 ): Mayo Clinic, Rochester; Mayo Clinic, Arizona; Mayo Clinic, Florida; Mayo Clinic Health System Hospital, Austin Minnesota, Northwell Health, New York; and Baylor University, Dallas Texas, from January 2018 to April 2021. Patients for the test set were recruited from 3 US medical centers ( NCT03060642 ): Mayo Clinic, Rochester; University of Pennsylvania, Philadelphia; and Columbia University, New York City, from February 2018 to April 2021. Both studies were approved by Institutional Review Boards at all recruitment sites. All authors had access to the study data and reviewed and approved the final manuscript. BE/EAC cases were patients with a BE segment ≥1 cm and histology demonstrating intestinal metaplasia (IM). Patients with history of ablation were excluded as ablation can alter the tissue DNA methylation profile 12 . Participants with a history of esophageal/gastric neoplasia were excluded, to avoid methylation changes influenced by residual field cancerization effects 13 . To avoid mechanical injury during CCD withdrawal, patients with a history of eosinophilic esophagitis, untreated achalasia or uninvestigated dysphagia, cirrhosis and those currently taking anticoagulants, were excluded. Participants were classified as cases, controls, and indeterminate based on predefined criteria. Test and training set cases included patients undergoing clinical endoscopy for BE-related dysplasia/neoplasia, including BE-EAC, BE - high grade dysplasia (HGD), BE - low grade dysplasia (LGD), indefinite for dysplasia (IND), long segment (≥3 cm) non-dysplastic BE (LS-NDBE), and short segment (≥1 cm to 2 cm) non-dysplastic BE (SS-NDBE). Controls were patients undergoing clinical endoscopy, without endoscopic evidence of esophageal columnar metaplasia, and those with gastro-esophageal reflux disease (GERD) and one or more BE risk factors (age ≥50 years, male sex, Caucasian race, prior or current history of smoking, obesity, family history of first degree relative with BE or EAC). Participants classified as indeterminate were those likely to be encountered in a population at risk for BE but not meeting a priori specified diagnostic criteria for cases or controls. Patients were placed in the indeterminate category under the following criteria: i) Los Angeles classification C and D esophagitis (as these patients have a 10–15% BE prevalence on subsequent endoscopy); ii) BE cases with <1 cm noncircumferential segments with IM without dysplasia; iii) CCD device failure, defined as dwell time <5 minutes or tether detachment; iv) gastroesophageal junction (GEJ) cancers without visible BE; vi) eosinophilic/infectious esophagitis or eosinophilic gastritis; and/or, vii) meeting endoscopic criteria for BE but lacking histologic IM. Indeterminate cases were excluded from algorithm development but were tested and reported separately. Participants swallowed the CCD (EsophaCap; Lucid Diagnostics, New York, NY) with sips of water following topical pharyngeal anesthesia. The CCD was pulled 6–8 minutes later, following shell dissolution and expansion of the polyurethane foam sphere (25 mm diameter, 10 pores/inch). Patients in the training set completed a tolerability assessment survey (rating pain, choking, gagging and anxiety separately, and overall tolerability on a 0–10 visual Likert scale) immediately after CCD retrieval ( Table 4 ). All patients underwent endoscopy within 24 hours. At endoscopy, a determination of the presence or absence of endoscopically suspected BE was made, with biopsy samples taken for histologic evaluation. A mucosal injury score (ranging from 1 – no trauma to 5 – bleeding requiring endoscopic therapy) was also assessed ( Table 4 ) in the training set. All pathology was reviewed by an expert GI pathologist. Seattle protocol biopsies were obtained in BE cases undergoing clinical endoscopy for surveillance. A research coordinator called all participants 7 days post-endoscopy to assess for adverse events. Participants were asked during if they preferred the CCD procedure or endoscopy and if they would undergo the CCD procedure again if indicated. This was as previously described 10 . DNA extraction, and MDM analysis procedures are described in Supplementary Methods . Laboratory personnel were blinded to case/control status. Formalin fixed paraffin embedded tissue blocks of Mayo Clinic patients identified as false negatives by algorithm calls (BE cases by endoscopy and histology), were retrieved. 5-micron sections were cut and BE tissue was macro-dissected. DNA was extracted using a column-based method (QIAamp DNA FFPE Tissue Kit, Hilden Germany). In blinded fashion, DNA was bisulfite converted and methylation analysis was performed by the long-probe quantitative amplified signal (LQAS) assay method using an automated method as outlined in Supplementary Data . Sample size calculations for the training set were based on the recommendation of Riley, et al 14 . The overall sensitivity target was ≥80% at a specificity of ≥85%. From previous work by our group, we estimate the maximum apparent Cox-Snell R 2 that can be obtained is approximately 0.75 assuming a 1:1 case:control mix and that the degree of overfitting (optimism) to be approximately 0.025 giving an estimated shrinkage value of 0.97 (i.e., estimated coefficients in the model would be shrunk by 3%). With five candidate markers, a shrinkage factor of 0.97, and anticipated Cox-Snell R 2 of 0.60 for the training set, the minimum target for total sample size was estimated to be N=173. However, the minimum target of 150 controls produces a one-sided 95% confidence interval (CI) with a lower bound of 80% assuming a target specificity of 85%. With 150 cases the lower bound of the one-sided 95% CI would be no lower than 73% based on the target sensitivity of 80%. Based on the number of test set samples available, we calculated that with 81 cases and 44 controls, the width of the 95% CI would be no larger than ±9%, assuming a sensitivity of 80% and ±12% at a specificity of 85%. Clinical comparisons between the BE cases and controls were evaluated using Fisher’s exact test for categoric variables and Wilcoxon rank-sum tests for continuous variables. Sensitivity and specificity were calculated with 2-sided 95% CIs using the Wilson score (with continuity correction) method. Differences between the areas under the receiver operating characteristic curves were tested using the nonparametric approach of DeLong, et al 15 . Analyses were performed using JMP PRO 16.0 (SAS Institute, Cary, NC) and R (version 4.12) software. Initially, five MDMs ( NDRG4, VAV3, ZNF682, ZNF568, BMP3 ) and two model types were considered (random forest, and logistic regression). A three-MDM panel, based on previous experiments, was investigated and compared to the five-MDM panel ( Supplementary Figure 1 ). Both random forest 16 and logistic regression models were investigated and different cut-off values were evaluated. For the logistic regression modeling, in silico cross-validation was utilized to minimize overfitting. The data was randomly split into a model derivation set (80%) and a model validation set (20%) where accuracy metrics were calculated. The random partitions of the entire training set was performed 5000 times. The random forest approach uses 500 bootstrap samples (in-bag-samples) of the entire training set to build 500 decision trees for model derivation development. For model validation, accuracy metrics are summarized by aggregating out-of-bag tree predictions (trees in the forest that does not include data from a patient) across all patients in the training set. Model performances were compared using area under the receiver operating characteristic curve (AUC) values and sensitivity (at a target specificity of 90%) ( Supplementary Table 1 ). The final model consisted of 3 MDM logistic model ( NDRG4 , VAV3 , and ZNF682 ), for additional details see Supplementary Methods . The final algorithm panel was assessed for detection of BE/EAC performance in a prospectively collected, independent cohort. Sub-group analyses were performed in both the training set and independent cohort for each of the categories: SS-NDBE, LS-NDBE, BE-LGD, BE-IND, BE-HGD, and BE-EAC. Indeterminate samples in the training set were separately tested to evaluate positivity rates. Three MDMs in the final algorithm panel were assayed on tissue DNA extracted from 14 cases, that were classified as (false) negatives by the algorithm. Log strand counts were quantified from serially diluted universally methylated DNA standards. The National Institutes of Health and Exact Sciences Corporation funded this study. Exact Sciences contributed to study design, data collection and analysis, and was involved in data interpretation.

Results

A total of 555 participants were initially consented for the training set. Of these, 510 attempted to swallow, and 461 (90%) successfully swallowed the CCD. Using the criteria described above, 154 BE cases and 198 controls were included in the training set ( Figure 1 ). The test set consisted of 81 BE cases and 44 controls. Training and test sets did not significantly differ with respect to age, sex, body mass index (BMI), smoking history or BE length ( Table 1 ). The majority of cases in the training and test sets (55–64%) had non-dysplastic BE. The remaining patients had varying degrees of dysplasia: IND (12–17%), LGD (4–12%), HGD (12–14%) and EAC (2–8%). Median (IQR) length of hiatal hernia (present in 66%)) was 2 (2–3) cm in those with SS-NDBE. Baseline characteristics stratified by site of recruitment are outlined in Supplementary Table 1 . During the algorithm development process, it was observed that two MDMs ( ZNF568 and BMP3) did not contribute significantly to performance metrics ( Supplementary Table 2 ). While both models performed similarly when assessing AUC, sensitivity, and specificity, logistic regression was chosen for its conceptually simpler implementation and was used to establish the final algorithm. Cross-validation of the three-MDM panel demonstrated an overall sensitivity of 82% (CI, 68% to 94%) for BE detection at 90% (CI, 79% to 98%) specificity in the training set and 88% (CI, 78% to 94%) sensitivity at 84% (CI, 70% to 93%) specificity in the test set ( Table 2 ). The areas under the receiver operating characteristic curves (AUROCs) for detection of all BE cases with and without dysplasia/EAC were 0.92 (CI, 89% to 95%) and 0.94 (CI, 90% to 98%) in the training and test sets, respectively ( Figure 2 ). Notably, sensitivity for BE-HGD and BE-EAC was 100% in both the training and test sets. The overall sensitivity for NDBE was 82% (CI, 67% to 91%) in the test set. Sensitivity in NDBE test set cases was influenced by BE length, with a lower sensitivity of 63% in SS-NDBE compared to 96% sensitivity for LS-NDBE. A similar pattern was seen in the training set NDBE cases as well. Of note sensitivity for SS-dysplastic BE was substantially higher at 84% in the training set. The algorithm was not influenced by age, sex, or smoking history ( Supplementary Table 3 ). Algorithm specificity remained high (89%) for GERD participants with two or more additional risk factors in the training set ( Supplementary Table 4 ). Table 3 shows the percent positivity of the algorithm for 57 indeterminate participant samples from the training set. The model was positive in 33% of those with <1 cm noncircumferential BE and 23% of those negative for IM on histology despite endoscopically suspected BE. CCD administration and withdrawal were well tolerated ( Table 4 ). The injury score was 1 or 2 in all except two cases and controls. 310 (88.1%) participants stated that they would choose the CCD procedure again for BE/EAC detection and preferred the CCD procedure to endoscopy. Two participants had incomplete expansion of the CCD at withdrawal after 8 minutes. One CCD detachment was seen in the training (retrieved endoscopically at clinical endoscopy) and test set (passed spontaneously in 24 hours). Levels of MDMs were elevated in the DNA extracted from the BE tissue for 13 of the 14 participants with a false negative classification based on the algorithm ( Supplementary Figure 2 ). This suggests that lack of contact of the CCD with the BE epithelium may be a potential reason for the false negative test results. A post-hoc analysis was conducted to assess the model performance in the 53 indeterminate category patients who completed all study procedures but in whom BE could not be definitively confirmed ( Table 3 ). This also included those with gastric intestinal metaplasia and advanced erosive esophagitis. The proportion of indeterminate samples, if included in the controls, was ~21% (53/251) and the specificity within the indeterminates was 80% (11/53). Since the prevalence of indeterminates within the target population is unknown, if we look at high and low values (0% indeterminates and 100% indeterminates in the control group), this would imply that the specificity ranges between 80–90%. A reasonable assumption is that the total false positive rate in the training set would be only 2% higher than in those used in the cross-validated model performance report (31/251 vs 20/198).

Discussion

Utilizing two multicenter, prospective cohorts, a three-MDM panel algorithm was trained and tested for the non-endoscopic detection of BE (with and without dysplasia/ EAC) using CCD samples. This panel showed high sensitivity and specificity, particularly for high-risk categories (HGD/EAC and LS-NDBE) in training and test sets. The CCD was well tolerated, safe, and administered by trained nurses, demonstrating the feasibility of implementation as an office-based, non-physician administered test. All BE subtypes were included in the training and test sets to explore assay performance across different subtypes. Since endoscopic therapy is recommended for BE with dysplasia and intramucosal EAC, detection of these subtypes is critical for a non-endoscopic test to have an impact on cancer prevention and treatment. Additionally, including all histological groups (particularly HGD and EAC with BE) in studies assessing the accuracy of non-endoscopic detection tools and demonstrating adequate performance is likely an important criterion in determining coverage, from potential payors. On the other hand, given the high proportion of NDBE in screening populations, we also ensured that the majority of cases had no dysplasia and a priori included a substantial number of SS-NDBE to assess test performance in this category. Sensitivity was 100% in BE cases with HGD and EAC in both training and test sets. Test set sensitivity was 96% in LS-NDBE but lower at 63% in SS-NDBE cases. Notably, sensitivity in SS-dysplastic BE was higher at 84%, raising the possibility that this non-endoscopic test could detect SSBE cases if dysplasia develops. In another trial, the sensitivity for SS-NDBE was also lower: 43% (BE <2 cm) to 61% (for BE 1–3 cm), compared to the overall sensitivity of 76% 17 . Lower sensitivity in SS-NDBE may be due to inadequate sampling by the CCD. Alternatively, the absence of hypermethylation in 10% of BE samples was reported by Yu, et al 18 . Elevation of tissue methylation levels in the false negative BE cases, suggests that inadequate contact of the CCD with the BE mucosa in SS-NDBE is the likely reason for false negative test results. Importantly, annual rates of progression are substantially lower in SS-NDBE (0.06%) compared to LS-NDBE (0.31%) 19 . Consequently, recent guidelines recommend a 5-year surveillance interval in SS-NDBE compared to 3 years for LS-NDBE 20 . Potential etiologies for false positive results include erosive esophagitis, intestinal metaplasia of the cardia/gastroesophageal junction (IMGEJ), concomitant gastric metaplasia and smoking. In the 20 training set participants with a false positive test, 2 had erosive esophagitis, while none had IMGEJ or gastric metaplasia. We compared the specificity and sensitivity of participants with and without esophagitis and did not see a statistically significant difference in sensitivity or specificity. Though another study reported increased methylation of VIM and CCNA1 in patients with IMGEJ, 12 we did not observe a similar phenomenon. It is possible that those with markers positive for BE without endoscopic evidence of BE may be at risk for developing BE in the future. Clinical and endoscopic follow-up of these participants could be considered. Investigators have tested other biomarkers such as trefoil factor 3 (TFF3), a protein biomarker, and other candidate MDMs 8 , 17 , 21 , 22 . The TFF3 assay requires a pathologist to perform immunohistochemistry (IHC), making this approach difficult to scale-up and susceptible to interobserver variability. In contrast, quantitative PCR or next generation sequencing based MDM assays can be conducted in central laboratories with largely automated methodology 23 . In this study, a five-MDM panel was initially studied but subsequent exploratory analyses revealed that comparable performance could be obtained with a three-MDM panel, which increases assay efficiency and decreases assay cost. Other investigators have also used two- to four- MDM panels for BE detection using different CCD sampling devices, with somewhat similar performance characteristics 7 , 12 , 24 . Current and ex-smokers have high reported rates of proximal squamous epithelium methylation for the two MDMs ( VIM and CCNA1) used in the balloon-based assay 12 , therefore, the balloon must be deflated and inverted in the mid-esophagus to prevent squamous contamination and reduced assay specificity. This study has several strengths including training and test sets from geographically diverse sites with the inclusion of the entire spectrum of BE cases. We used rigorous statistical methodology to set cut-offs and applied the locked algorithm to an independent sample set, an important contribution to the evidence base in the detection of BE using MDMs. The multicenter nature of the training and test sets further enhances the generalizability of these results. Assay specificity remained robust in controls with GERD and 1–3 additional risk factors. However, the case-control study design precluded assessment of the positive and negative predictive value of the test. Further study of assay performance in a prospective screening population is currently underway in a primary care population in Minnesota and Wisconsin ( NCT03961945 ) utilizing the locked algorithm. Finally, sensitivity for SS-NDBE could be further improved, either by increasing the size of the CCD to improve sampling or further optimizing biomarker assay platforms. While sensitivity for BE-IND and BE-LGD was numerically lower in the training set (with relatively small sample sizes in these categories), the sensitivity in the same categories was reassuringly excellent in the test set. In conclusion, we have trained and tested an algorithm for the accurate non-endoscopic detection of BE/EAC using endoscopic confirmation as the criterion standard. Sensitivity for dysplastic and treatment-eligible subsets was high. This approach was safe, well-tolerated and suitable for office-based implementation.

Introduction

Esophageal adenocarcinoma (EAC) incidence continues to increase and 5-year survival remains poor 1 . Barrett’s esophagus (BE) is the precursor of most EACs and progresses to EAC via the development of dysplasia 2 . Despite the recommendation that endoscopic BE screening be considered in those with risk factors, only a minority of prevalent BE cases are diagnosed and in surveillance programs. Esophagogastroduodenoscopy (EGD) screening is invasive, expensive 3 , and difficult to access 4 . Consequently, almost 90% of EACs are diagnosed outside a surveillance program, at advanced stages 5 . Non-endoscopic BE/EAC detection is anticipated to make screening more accessible: it can be safely completed in outpatient settings by trained nurses, is likely to be less expensive, and is more cost-effective than endoscopic screening 6 . Several case control studies have shown this approach to be accurate, well-tolerated, and safe 7 – 11 . A recent pragmatic trial increased not only BE detection 10-fold, but also potentially increased detection of dysplastic BE and endoscopically treatable early stage EAC, compared to conventional care 8 . In prior case control studies, we reported the discovery of promising methylated DNA markers (MDMs) assayed on samples obtained with a swallowed encapsulated sponge collection device (CCD) 9 . Subsequent marker elimination and selection studies developed a five MDM panel ( NDRG4, VAV3, ZNF682, ZNF568, BMP3) with excellent performance characteristics (AUC 0.96) for BE/EAC detection 10 . In the marker selection study, a three MDM panel which could simplify and streamline the MDM assay, was assessed in an exploratory post-hoc analysis, but not by training and testing of optimal cut-offs enabling the determination of test positivity or negativity 10 . In the current study, we aimed to 1) train an algorithm using a large training sample set from a multi-site case control study to establish cutoff values for a final MDM panel to adjudicate CCD samples as positive or negative for BE, and 2) test the performance of the locked algorithm in an independent test cohort.

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: pmc-nxml ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2024) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-06-28T06:08:18.748782+00:00