Artificial intelligence for breast cancer screening in mammography (AI-STREAM): Preliminary analysis of a prospective multicenter cohort study

doi:10.21203/rs.3.rs-4640159/v1

Artificial intelligence for breast cancer screening in mammography (AI-STREAM): Preliminary analysis of a prospective multicenter cohort study

2024 · doi:10.21203/rs.3.rs-4640159/v1

preprint OA: closed

Full text JSON View at publisher

Full text 100,920 characters · extracted from preprint-html · click to expand

Artificial intelligence for breast cancer screening in mammography (AI-STREAM): Preliminary analysis of a prospective multicenter cohort study | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Artificial intelligence for breast cancer screening in mammography (AI-STREAM): Preliminary analysis of a prospective multicenter cohort study Yun Woo Chang, Jung Kyu Ryu, Jin Kyung An, Nami Choi, Kyung Hee Ko, and 2 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-4640159/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 06 Mar, 2025 Read the published version in Nature Communications → Version 1 posted You are reading this latest preprint version Abstract Several studies have shown that artificial intelligence (AI) improves mammography screening accuracy. Meanwhile, prospective evidence, particularly in a single-read setting, is lacking. This study aimed to compare the diagnostic accuracy of breast radiologists, with and without an AI-based computer-aided detection (AI-CAD) for interpreting screening mammograms in a real-world, single-read setting. A prospective multicenter cohort study in six academic hospitals participant in South Korea’s national breast cancer screening program was done, where women aged ³40 years were eligible for enrollment between February 2021 and December 2022. The primary outcome was screen-detected breast cancer diagnosed at a one-year follow-up. The primary analysis compared cancer detection rate (CDRs) and recall rates (RRs) of breast imaging specialized radiologists, with and without AI assistance. The exploratory, secondary analysis compared CDRs and RRs of general radiologists, with and without AI, as well as radiologists versus standalone AI. Of 25,008 women who were eligible for enrollment, 24,543 women were included in the final cohort (median age 61 years [IQR 51-68]), with 140 (0.57%) screen-detected breast cancers. The CDR was significantly higher by 13.8% for breast radiologists with AI-CAD (n=140 [5.70 ‰]) versus those without AI (n=123 [5.01 ‰]; p <0.001), with no significant difference in RRs (p =0.564). Similar trends were observed for general radiologists, with a significant 26.4% higher CDR in those with AI-CAD (n=120 [4.89 ‰]) versus those without AI (n=95 [3.87 ‰]; p <0.001). The CDR of standalone AI (n=128 [5.21 ‰]) was also significantly higher than that of general radiologists without AI (p=0.027), with no significant differences in RRs (p =0.809). This preliminary result from a prospective, multicenter cohort study provided evidence of significant improvement in CDRs without affected RRs of breast radiologists when using AI-CAD, as compared to not using AI-CAD, when interpreting screening mammograms in a radiologist’s standard single reading setting. Furthermore, AI-CAD assistance could potentially improve radiologist’s reading performance, regardless of experience (ClinicalTrials.gov: NCT0524591). Health sciences/Medical research/Clinical trial design/Clinical trials/Adaptive clinical trial Health sciences/Diseases/Cancer/Breast cancer Figures Figure 1 Figure 2 Figure 3 Introduction Worldwide, breast cancer is the most commonly diagnosed cancer in women and the leading cause of mortality among them. Randomized controlled trials, systemic reviews, and observational studies have demonstrated that including mammography in breast cancer screening can reduce breast cancer-related mortality rates by approximately 20–50% 1–3 . Accordingly, several countries are currently implementing screening programs that incorporate mammography for early detection and treatment of breast cancer, aiming to reduce mortality and morbidity. While mammography screening has proven effective in detecting early cancer, it can lead to undesirable false positive recalls that may require additional imaging evaluation or biopsies, that cause overdiagnosis. While early detection is the key to screening, breast cancer diagnosis can be alternatively delayed for inaccurate false negative interpretation of mammography; failure to recognize due to low sensitivity in dense breasts, misinterpretation due to lack of recognition of abnormal features, or a wrong interpretation due to no change compared to the previous study, previous biopsy site, or benign appearing lesion characteristics 4 – 6 . Roughly 20–33% of breast cancer cases are diagnosed based on symptoms modalities between consecutive screening rounds (ie, interval cancers) or discovered through other imaging modalities 5 . To overcome these shortcomings, significant efforts and improvements have been made by performing double readings with two radiologists, increasing the frequency of screening examinations, or implementing additional supplementary imaging modalities 4 , 7 . As population-based breast cancer screening programs become more widespread, the demand for an increased number of daily breast cancer screening examinations is also progressively rising. However, medical resources remain limited, much short of this demand. Therefore, there is not only an inevitable need for assistance in screening interpretations to reduce interobserver variability but it is also important to optimize and streamline current screening workflows 4 , 8 , 9 . With substantial promise in this space, artificial intelligence (AI) has been developed and validated with promising applications in mammography screening, which could support computer-assisted detection (CAD) to marks suspicious findings depending on the risk of malignancy. In support, multiple retrospective studies found that AI’s diagnostic accuracy was similar to or better than that of breast radiologists 8 , 10 – 12 . However retrospective studies have inherent limitations, surfacing the need to prospective trials to assess the real-world effectiveness of AI-supported screening. A few prospective studies from Europe, which use a double reading system, have assessed AI as either an independent reader or as triage tool. There is still knowledge gaps how AI-CAD can affect radiologist’s performance in a single reading strategy. This prospective, multicenter, cohort study, named AI-STREAM (Artificial Intelligence for Breast Cancer Screening in Mammography), aimed to assess the diagnostic accuracy of radiologists, with and without AI-CAD, for interpretation of screening mammograms in single reading strategy. METHODS Participants The AI-STREAM study, described in more detail elsewhere 13 , is a prospective, population-based study aimed to compare the diagnostic accuracy of breast radiologists when interpreting screening mammograms with or without AI-CAD for breast cancer ( Fig. 1 ) . The trial enrolled women aged ≥40 years from six academic hospitals that participated in national breast cancer screening program in South Korea. All women participating in the study provided their consent by completing an informed consent form about taking part of the study and reading a participant information sheet. Of all eligible women, those with a history of breast cancer or mammoplasty were excluded as the used AI software was not validated for these subgroups. During breast cancer screening, mammography was performed by a single radiologist (ie, single-reading strategy) using two standard craniocaudal and mediolateral oblique views of each breast using digital mammography, which is the standard procedure for interpreting mammograms in South Korea. Study Design The study received approval from the Institutional Review Board (IRB) of all participating centers, and written consent for data publication was obtained from all participants. In this study, a radiologist specializing in breast imaging (i.e. BR) was responsible for interpreting the mammograms, who were defined as a radiologist with > 10 years of experience at an university hospital, possessing expertise in breast imaging. In the standard single reading strategy, the subsequent clinical decision about whether the participant requires further diagnostic workup is determined by a single radiologist. Generally, the radiologist performs a comparative reading with a previous mammogram to determine recall, if previous mammography is available. In this study, mammograms were read by radiologists without, and then with AI-CAD. As part of the standard single screening procedure, even when AI was assisted following the radiologist’s reading without AI-CAD, the final recall for further diagnostic workup was made based on the radiologist’s comprehensive decision (set 1 from Fig. 1 ). For exploratory purposes, results from stand-alone AI-CAD were also collected and compared to the diagnostic performance of radiologists with and without the use of AI-CAD. Moreover, a separate simulation study was conducted (set 2 from Fig. 1 ) to compare the cancer detection rare (CDR) and recall rate (RR) of general radiologists (i.e. GRs), who were not specialize in breast imaging, with versus without AI-CAD. Note these results from GRs did not impact any real-world clinical decisions making of study participants for further work-up, given that mammograms are those examined from BRs. The main rationale behind this additional study was that GRs comprise the majority in Korea’s breast cancer screening program due to shortage of BRs. Thus, gaining and insight into the potential effect of AI-CAD on GR’s performance would be highly reflective of and valuable to real-world screening practice. All radiologists who participated in the study, including both BRs and GRs, had no prior experience with the AI-CAD program, minimizing bias. Procedures For image acquisition and data management, a cloud-based imaging data management platform (IRM’s BEST Image) was used (eFigure 1, Appendix 1) . Mammograms were processed by a Snupi program, which performed various operations (eg, search, inquiry, de-identification, separation) and transmission functions on Digital Imaging and Communication in Medicine (DICOM) files. After de-identifying the participant information and assigning identifications, mammograms from participants were exported to the platform to record reading results. If participants had mammograms within the past four years, these were also exported to the study platform for comparison to replicate the same procedure as real-world readings procedure of each site. As part of the routine screening process, BRs interpreted the mammography without AI-CAD and recorded the findings (Test 1), followed by the automatic presentation of AI-CAD results (abnormality score and marks) for review and recording. The final records results were based on comprehensive assessment that considered interpretations from both with and without AI-CAD (Test 2). Results of Test 1 (or without AI-CAD) could not be modified or adjusted once the AI-CAD results were reviewed. Likewise, results of Test 2 could also not be corrected after reading (eFigure 1) . For variables recorded per ‘Test’, the radiologist first recorded the breast density according to Breast Imaging Reporting and Data System (BI-RADS) 5th edition (A, B, C, D). Second, the radiologist assessed malignancy using a 7-point scale (1, definitely normal; 2, benign; 3, probably benign [0–2%]; 4, low suspicion for malignancy [2–10%]; 5, moderate suspicion of malignancy [10–50%]; 6, high suspicion for malignancy [50–95%]; 7, highly suggestive of malignancy [≥95%]). For cases were not recalled, the radiologist could choose from a malignant assessment score of 1 or 2, whereas for cases that were recalled, the radiologist could choose a score between 3 and 7. Score 1 or 2 were considered negative as BI-RADS 1 or 2, while scores of 3 and higher were considered positive as BI-RADS 3 to 5, indicating the need for a recall. In the case a recall decision was made, the location (left, right, both) of the recall was also recorded. The recall for further diagnostic work-up of participants was a BR’s comprehensive decision informed by the results from considering the paired reading resulting with and without AI-CAD. If a participant was recalled and visited the same hospital where the screening mammography was performed, additional diagnostic work-up (e.g. special mammography views, tomosynthesis, ultrasonography) was conducted. If needed, a biopsy was performed, and if a pathologist subsequently diagnosed breast cancer (screen-detected), the participant's information was recorded separately. If surgery was performed, the final pathology was confirmed. Participants diagnosed with breast cancer were reviewed for other breast imaging and pathologic features from electronic medical records and pathology reports (if available). As a secondary and exploratory objective, a separate reading set (Set 2) was designed and conducted as a simulation study ( Fig. 1 ) . In set 2, five general radiologists (GRs) who did not specialize in breast imaging interpreted the same participant’s mammography; GRs had variable experience as radiologists and interpreting mammography. The participant’s mammography was interpreted the same research platform with and without AI-CAD and the corresponding results were recorded on the same study platform. All participating radiologists, including both BRs and GRs, had no prior experience with the AI-CAD program. The study used a commercial AI-CAD (Lunit INSIGHT MMG, available at https://insight.lunit.io , version 1.1.7.1), which has been validated through various studies 8 , 14 (Appendix 3) . In brief, Lunit INSIGHT MMG AI-based CAD improves radiologist’s performance and provides results equivalent or superior to those form radiologists alone 14 . It also has shown superior performance compared with two other commercial AI-based software products 8 . The AI provides abnormal scores ranging from 0 to 100, rounded to two decimal places, per breast based on mammograms. These scores can also be presented as a heatmap or grayscale map. AI results were considered positive if the abnormality score was above a predefined cutoff of 10. Outcomes The primary outcomes were CDRs and RRs of BRs with and without AI-CAD in mammography reading for screen-detected breast cancer, including invasive or ductal in situ (or both). The secondary outcome was to compare CDRs and RRs of mammography reading in the following comparisons: 1) BRs without AI-CAD vs AI standalone, 2) BRs with AI-CAD vs AI standalone, 3) GRs without AI-CAD vs GRs with AI-CAD, 4) GRs without AI-CAD vs AI standalone, and 5) GRs with AI-CAD and AI standalone. Statistical analysis The sample size was estimated using McNemar’s test to detect differences in the CDR between groups of radiologists with and without AI-CAD, with a two-sided test at a significance level of 0.05 and 80% power. The assumed cancer prevalence was 3.21 per 1000 examinations, determined from data in a previous retrospective study, and the target sample size was chosen based on this expected cancer prevalence 15 . The target sample size was 32,714 participants, corresponding to approximately 16,000 participants per year. The total number of expected participants was however adjusted from the initial study design due to the COVID-19 pandemic, but no effect on the primary study endpoint was observed. Assuming the same cancer prevalence as 3.21 per 1000 examinations, it was calculated that the sample size could be maintained if approximately 24,000 people were recruited while still maintaining 80% power and detecting more than 90 cases of cancers ( Appendix 4 ). Descriptive statistics were used for continuous and categorical variables, as appropriate. Logistic regression analysis using a generalized estimating formula was used to estimate 95% CI and for comparative analysis. Pairwise comparisons were performed to compare BRs with AI-CAD, BRs without AI-CAD, AI standalone, GRs with AI-CAD, and GRs without AI-CAD. In addition, corrections based on multiple comparisons are necessary for confirmation, but no corrections were made considering the preliminary nature of the study. Prespecified subgroup analyses were performed to examine results in different age groups (40–49, 50–59, 60–69, 70 + years), mammographic density (four categories of the BI-RADS by the American College of Radiology], and malignant scale assessment using a 7-point scale (defined above). In breast cancer, the following subgroups were analyzed for cancer characteristics including invasiveness, categories of tumor size (< 20 mm, ≥ 20 mm), presence of axillary lymph node metastasis, and molecular subtypes (luminal A, nonluminal A [luminal B, human epidermal growth factor receptor 2 [HER2] enriched, or triple negative]) (Appendix 5). RESULTS Participant Selection Criteria and Characteristics Between February 1, 2021, and December 31, 2022, 25,008 women aged ³40 years underwent regular mammography screening as part of the national breast cancer screening program in South Korea and were eligible for enrollment into the study. After applying the exclusion criteria of parenchymal change due to previous procedure, mammoplasty or insertion of foreign substance (n=144), individuals who withdraw consent (n=267), and data errors on the cloud server (n=54), 24,543 participants were included in the final cohort. Among the patients who underwent additional assessment at the hospital where the recall was conducted, the pathologically diagnosed breast cancer was analyzed at the time of 1-year follow-up after completion of the last participant enrollment. There were 148 cases of pathologically confirmed cancers, of which there was a total of 140 (0.57%) screening detected cancers, including 2 cases of bilateral breast cancer. We analyzed 24,545 mammograms including 2 cases of bilateral breast cancer for 24,543 participants (Figure 2) . The median age of the study cohort was 61 years (IQR, 51-68). Of all participants, 67.5% had dense breasts, and 80.7% of diagnosed breast cancer had dense breasts (Table 1) . Performance of BRs with and without AI-CAD Overall, BRs screen-detected 123 breast cancers without AI-CAD, while those with AI-CAD detected 140 cancers, resulting in 17 more cancers detected with AI-CAD in BRs (Figure 3) . The CDRs of BRs with AI-CAD (n=140, 5.70 ‰ [95% CI 4.76, 6.65]) was significantly higher by 13.8% compared to BRs without AI (n=123, 5.01 ‰ [95% CI 4.13, 5.89]) (p <0.001), while no significant change in RRs was observed between BR with AI-CAD (n=1113, 4.53% [4.27, 4.79]) and BR without AI-CAD (n=1100, 4.48% [4.22, 4.74]) (p =0.564) ( Table 2 ). In cancer characteristic-specific subgroup analyses, the median tumor size was 16 mm (IQR 11–25). Of the total 140 cases, 46 (32.9%) had ductal cancer in situ (DCIS), and 94 (67.1%) had invasive cancers and among 84 cases examined for lymph node metastases, 70 (83.3%) showed negative lymph node metastasis. For molecular subtypes of IDC, out of 68 cases, 50 (73.5%) were classified as luminal A, 18 (26.5%) as non- luminal A [8 (11.8%) as luminal B, 4 (5.9%) as HER2 overexpressing, and 6 (8.8 %) as basal]. Interpreting mammograms with AI-CAD detected 6 additional cases of DCIS and 11 additional cases of invasive cancer, leading to notable increase in detection for both DCIS (p=0.009) and invasive cancers (p<0.001). Furthermore, assistance with AI-CAD resulted in a significant increase in the detection of small-sized cancer less than 20 mm (p=0.002), node-negative metastasis (p<0.001), luminal A subtype (p=0.002), and lower grade IDC NOS (p =0.009) when compared without AI-CAD (Table 3) . Performance of GRs with and without AI-CAD Results of a simulation study (exploratory analysis) found similar trends of diagnostic performance for GRs to that of BRs, but with a greater improvement, as CDRs for GRs with AI-CAD (n=120, 4.89 ‰ [95%CI 4.02, 5.76]) was significantly higher by 26.4% than that of GRs without AI-CAD (n=95, 3.87 ‰ [3.09, 4.65]; p <0·001), resulting in 25 more detected cancers with AI-CAD. However, RRs between GRs with (n=1690, 6.89% [6.57, 7.2]) vs without AI-CAD (n=1548, 6.31%; [6, 6.61]) was also significantly increased (p <0.001) (Figure 3, Table 2) . Performance of standalone AI and Radiologists with and without AI-CAD The CDR of AI standalone was 128 (5.21 ‰ [95% CI 4.31, 6.12]), which showed no significant difference vs BRs without AI-CAD (p=0.752) or BRs with AI-CAD (p=0.462). However, AI standalone showed a significantly higher RR (n=1535, 6.25% [5.95, 6.56])) vs BRs with AI-CAD and without AI-CAD (both, p <0.001).When compared to GR without AI-CAD, the CDR of standalone AI is significantly higher than GR without AI-CAD (p=0.027), without affecting RR (p=0.809). The CDR of GR with AI-CAD showed improvement, showing no significant difference compared to AI alone (p=0.611), but the RR significantly increased compared to AI alone (p=0.005) (Figure 3, Table 2) . DISCUSSION The preliminary results of the AI-STREAM study, which was a population-based prospective study provide real-world evidence that using AI-CAD for BRs’ interpretation of screening mammograms significantly increased the CDR (5.70 per 1000 participants) compared to radiologists not using AI-CAD (5.01 per 1000 participants). The assistance of AI-CAD that led to improved CDRs did not affect RRs, providing reassurance to radiologists when using AI-CAD in their routine practice. The interpretation process for mammography in breast cancer screening has diverse strategies in each country and it thus, is tailored to local needs, practice, and cancer prevalence and/or incidence. Recent advancements in AI, have shown substantial promise in reading the results of screening mammography, including in double reading systems, AI-aided, and AI standalone methods, as indicated by multiple retrospective studies 10,16-18 . While limited to double-read European settings, few prospective studies have been and are being conducted to assess the clinical effectiveness of using AI-CAD 19-21 . There are however limitations in directly comparing our study to these studies due to an inherent difference in the screening practice (single- versus double-reading strategy). The population-based, prospective ScreenTrust CAD study found that dual reading by one radiologist plus AI resulted in a 4% increase in screen-detected cancers compared to standard double-reading by radiologist, which was non-inferior in cancer detection. Result favoring the radiologist plus AI arm was also observed for RRs, with RRs reduced by 4% when compared with standard human double reading during follow-up consensus discussions reviewing mammography. Consensus discussion, using medical history and AI information, has proven effective in preventing an increase in abnormal interpretation during double reading by AI and radiologists from leading to the recall rate 19 . Despite this, many regions outside of Europe and some private practice in Europe adopt single reading as their many readings strategy, raising the need for prospective evidence on the effect of AI in real-world, single read setting. The preliminary analysis of AI-STREAM specifically addressed this knowledge gap, by not only utilizing real-world screening data collected prospectively from multiple centers, but also increased CDRs and unaffected RRs in BRs with AI-CAD. In the single reading setting of this AI-STREAM study, the radiologist's decision to recall or not recall a participant was made the comprehensive assessment obtained with the paired results with and without AI-CAD. Despite utilizing a single reading strategy in this study, comprehensive decisions including the comparison of prior mammograms appear to have contributed to reducing false positive recalls by BRs, while maintaining the strengths of consensus reading from a dual reading strategy. Another prospective study worth noting, despite differences, is the Mammography Screening with Artificial Intelligence trial (MASAI) study, where the AI score was used to triage mammograms to either single or double reading strategies; for instance, for low AI scores, independent reading was done. The implementation of AI-supported screen reading yielded a significant 28% increase in cancer detection without a rise in false positive rates, as observed in the AI interventional triage group compared to the control standard double-reading group. Furthermore, there was a 44% reduction in the workload associated with screening-reading. 20 . In a real-world clinical environment, the setting of AI thresholds is an important factor in mammography readings using AI, where this threshold is set and calibrated differently for each individual study 10 . In the AI-STREAM study, specifically, the AI score was considered positive if it was above the predefined cutoff of 10 based on the abnormality score per breast; If the AI score was 10 or higher, the abnormality score and CAD mark were displayed on the screen readout for mammography, radiologists could fully detect these markings. Thresholds could be varied depending on the purpose where for instance, an AI threshold with high sensitivity should be used if the final decision is made by radiologists 10 , whereas higher thresholds for specificity may be essential when AI is used as an independent reader. Regardless, repeated calibration of the AI threshold will likely be necessary in actual clinical use to maintain the desired operating point. Moreover, determining AI thresholds based on retrospective data alone may not always be sufficient, raising the need of repeated calibrations in a prospective manner. In support, the ScreenTrustCAD study aimed for a 2% increase in the true positive fraction, which however resulted in an actual 6% increase 19 . Currently, the lack of quality assurance protocols to detect and correct data drift, which impacts the performance of AI systems, stands as a major barrier to the practical implementation of AI, Therefore, further evaluation is required regarding AI thresholds to better understand how best to address these unresolved issues. One prior study of differences in screening mammography interpretation performance by radiologist experience found that RRs for specialist radiologists was significantly lower than that of general radiologists whereas the biopsy performed CDRs was significantly higher for the specialist radiologists 22 . These differences in performance, in which specialist radiologists make more true positive and fewer false positive interpretations of screening mammography may be related to increased amounts of initial and continuing education in mammography, as well as accumulated experience. Additionally, consideration may be given to reducing the differences by using multiple rather than single reading systems 22 . The AI-STREAM study was additionally designed to evaluate the impact of using AI-CAD when differences exist in radiologists' experience in mammography interpretation (eg, BRs and GRs). Similar to the positive impact of using AI-CAD on screening mammography interpretation by BRs were also observed with GRs, but to a much larger extent, as CDRs significantly increased by 26.4%. GRs have a chance to see only a few cases of breast cancer in an entire year which causes limited self-confidence in the interpretation of mammography. This showed a modest increased in RRs, unlike BRs which had no significant differences in RRs. Although comparison with prior mammograms was possible in this simulation analysis, this could be explained by GRs relying more on AI-CAD results, given their relatively lower self-confidence in interpreting mammography compared to BRs, which seems to have induced increased false positive RRs. However, when comparing the results of standalone AI vs GR without AI-CAD, the standalone AI had a significantly higher CDR and no difference in RR, suggesting that AI could surpass the reading capabilities of inexperienced radiologists, or GRs. Based on these results, it is clearly expected that AI-CAD as an aid in mammography interpretation can particularly benefit those with less experience in interpreting mammograms, and produce positive results demonstrating the effect of multiple readings. Results of standalone AI showed no significant differences in CDRs compared to both BRs with and without AI-CAD, indicating comparable CDRs of standalone AI versus experienced, expert radiologists. Thus, these findings are also in line with previous meta-analysis, which reported that standalone AI in digital mammography is either equivalent to or superior than the interpretation by radiologists 10 . However, RRs from standalone AI were significantly higher compared to those from BRs with and without AI-CAD. This could be owed to the uniformly applied AI-CAD abnormality score thresholds to determine AI's recall or no recall, as well as the fact that AI-CAD could not consider and compare with prior mammograms. Furthermore, the use of standalone AI as a mammography reader, without any human involvement, presents many challenges of current ethical and medicolegal uncertainties. Hence, the importance of leaving the ultimate clinical decision to the radiologists must be emphasized, not only to meet established medico-legal requirements but to also minimize false positive results, and future discussions specifically addressing this topic is needed. With AI assistance, BRs demonstrated an approximate 11-13% increase in cancer detection in both DCIS and IDC. Cancer detected by AI assistance increased small-sized (< 20mm), node-negative metastasis, low grade of IDC, and better prognostic luminal A subtype. Although it is a preliminary analysis and the number of cancers is small, this suggests that AI assistance can improve the early detection of breast cancer with relevant prognostic features, with minimal unnecessary recalls.However, a two-year follow-up is needed to evaluate the true impact of AI-CAD use on interval cancers detected after two years (biennial screening interval in South Korea) and whether there is an increase in interval cancers with poor prognosis. The final results of the AI-STREAM study reflecting these results will be announced after 2026 on the analysis of data to linked to the National Cancer Register ( eFigure 2 ). Strengths and limitations Strength of AI-STREAM study are, first, as part of a multicenter prospective study conducted on patients participating in national cancer screening, it used various mammography devices (GE and Hologic digital mammography devices). Second, the radiologists who participated in the analysis were proven experts in the interpretation of mammography with many years of experience, and the analysis results were evaluated separately, according to experience. Third, according to the standard procedure of interpreting screening mammography by a single radiologist, to the best of our knowledge, this study is the first clinical trial conducted as a prospective multicenter study evaluating diagnostic accuracy between radiologists with and without AI-CAD. Although only one AI system was used for mammography interpretation, it was verified as the best performing algorithm compared with others used in previous research 8 . There were several limitations in our study. First, the study was an observational trial although a randomized controlled trial would be ideal for direct comparison between with AI and without AI in screening. The study was performed to evaluate the effect of AI assistance in a single reading strategy by a single radiologist, because it wasn't easy to design a protocol involving multi-centers as an randomized trial in applying AI in real clinical practice. However, in AI assistance with a single reading strategy, it is hard to assess direct AI-CAD effect due to which information affected the change in mind for recall of the radiologists. Second, the interim analysis was indeed planned for 2026 upon linkage to and reviewing the National Cancer Registry data to screen-detected cancers. However, this preliminary analysis focused on pathologically proven cases at least one year after the last participant's enrollment. Although this is a preliminary analysis, all participant data had been collected and cleaned and database was locked, and despite the short follow-up to evaluate cumulative effectiveness, it does not affect the false positive results and maintains the pre-planned Statistical Analysis Plan (SAP) (Appendix); according to SAP, assuming a cancer prevalence rate of 3.21 per 1000 tests, maintaining 80% power with approximately 24,000 participants recruited and detecting over 90 cancers. This preliminary analysis maintained the 80% power of the study and evaluated screening-detected cancer including the recall rate of radiologists with and without AI-CAD, which analysis will be able to be substituted as the result of an interim analysis of AI-STREAM. The final result of including interval cancer will be reviewed and analyzed with data linked to the National Cancer Registry expected to be available after 2026. Third, the area with the highest AI score was included in the analysis, but only in 2 cases diagnosed as bilateral cancer with abnormal scores on both sides were included in the analysis. Therefore, not all instances where the AI score was bilateral-sided in non-cancer cases were evaluated. Lastly, the conclusive results for breast cancer were obtained through additional diagnostic work-up following recall after mammography screening, as well as electronic medical and pathology reports from the same hospital where the surgery was conducted. The sample size was relatively small because we could only analyze data for cases with available results for lesion size, nodal metastasis, and molecular subtypes. In conclusion, given the diverse mammography interpretation procedures across countries worldwide, there is a need demonstrate to the true positive impact of increased cancer detection rates when AI is applied in various ways in real-world clinical environments. The preliminary results from this prospective AI-STREAM study demonstrated positive potential that AI assistance in radiologists’ interpretation is indeed beneficial for BRs and GRs in a single reading strategy. With the assistance of AI-CAD, a BR improved CDR and increased early cancer detection without affecting RR in a single reading strategy. Declarations Contributors: Y-WC and KH conceptualized the design of the trial with input from Y-WC. KH did the statistical analysis. Y-WC, JKA, NC, KHK, YMP, and JKR directly assessed and verified the underlying data reported in the manuscript. Y-WC and KH interpreted the results of the validation study. Y-WC wrote the first draft of the report with input from KH. All authors subsequently edited the report. JKR and Y-WC supervised the project. All authors approved the final version of the manuscript and had final responsibility for the decision to submit for publication. Funding: Korea Health Industry Development Institute with its third Korea Medical Device Development Fund in 2020. Declaration of interests: All authors declare no competing interests. Data sharing: Individual patient data will be shared to the extent that anonymity can be maintained, that the recipient has ethical approval to conduct the research, and with a data transfer agreement. A request to obtain study data can be discussed with the committee comprising researchers associated with the study hospital, to ensure compliance with General Data Protection Regulations and other legal agreements. Acknowledgments: This study received a grant from the Korea Health Industry Development Institute with its third Korea Medical Device Development fund in 2020. We thank the trial participants, trial support nurses at each hospital, radiologists at the simulation mammography reading (KWR, THN, JYL, DYY), and Lunit for their support. We would like special thanks to Dr. Ki Hwan Kim for management, information, and organizational contributions and Dr. Han Eol Jeong for research support. References Myers ER, Moorman P, Gierisch JM, et al. Benefits and Harms of Breast Cancer Screening: A Systematic Review. Jama . 2015;314(15):1615-1634. doi:10.1001/jama.2015.13183 The benefits and harms of breast cancer screening: an independent review. Lancet . 2012;380(9855):1778-1786. doi:10.1016/s0140-6736(12)61611-0 Gøtzsche PC, Jørgensen KJ. Screening for breast cancer with mammography. Cochrane Database Syst Rev . 2013;2013(6):Cd001877. doi:10.1002/14651858.CD001877.pub5 Yoon JH, Kim EK. Deep Learning-Based Artificial Intelligence for Mammography. Korean J Radiol . 2021;22(8):1225-1239. doi:10.3348/kjr.2020.1210 Hovda T, Tsuruda K, Hoff SR, Sahlberg KK, Hofvind S. Radiological review of prior screening mammograms of screen-detected breast cancer. Eur Radiol . 2021;31(4):2568-2579. doi:10.1007/s00330-020-07130-y Lamb LR, Mohallem Fonseca M, Verma R, Seely JM. Missed Breast Cancer: Effects of Subconscious Bias and Lesion Characteristics. Radiographics . 2020;40(4):941-960. doi:10.1148/rg.2020190090 Taylor-Phillips S, Stinton C. Double reading in breast cancer screening: considerations for policy-making. Br J Radiol . 2020;93(1106):20190610. doi:10.1259/bjr.20190610 Salim M, Wåhlin E, Dembrower K, et al. External Evaluation of 3 Commercial Artificial Intelligence Algorithms for Independent Assessment of Screening Mammograms. JAMA Oncol . 2020;6(10):1581-1588. doi:10.1001/jamaoncol.2020.3321 Lee CS, Moy L, Hughes D, et al. Radiologist Characteristics Associated with Interpretive Performance of Screening Mammography: A National Mammography Database (NMD) Study. Radiology . 2021;300(3):518-528. doi:10.1148/radiol.2021204379 Yoon JH, Strand F, Baltzer PAT, et al. Standalone AI for Breast Cancer Detection at Screening Digital Mammography and Digital Breast Tomosynthesis: A Systematic Review and Meta-Analysis. Radiology . 2023;307(5):e222639. doi:10.1148/radiol.222639 Rodriguez-Ruiz A, Lång K, Gubern-Merida A, et al. Stand-Alone Artificial Intelligence for Breast Cancer Detection in Mammography: Comparison With 101 Radiologists. J Natl Cancer Inst . 2019;111(9):916-922. doi:10.1093/jnci/djy222 Freeman K, Geppert J, Stinton C, et al. Use of artificial intelligence for image analysis in breast cancer screening programmes: systematic review of test accuracy. Bmj . 2021;374:n1872. doi:10.1136/bmj.n1872 Chang YW, An JK, Choi N, et al. Artificial Intelligence for Breast Cancer Screening in Mammography (AI-STREAM): A Prospective Multicenter Study Design in Korea Using AI-Based CADe/x. J Breast Cancer . 2022;25(1):57-68. doi:10.4048/jbc.2022.25.e4 Kim HE, Kim HH, Han BK, et al. Changes in cancer detection and false-positive recall in mammography using artificial intelligence: a retrospective, multireader study. Lancet Digit Health . 2020;2(3):e138-e148. doi:10.1016/s2589-7500(20)30003-0 Hong S, Song SY, Park B, et al. Effect of Digital Mammography for Breast Cancer Screening: A Comparative Study of More than 8 Million Korean Women. Radiology . 2020;294(2):247-255. doi:10.1148/radiol.2019190951 Hickman SE, Woitek R, Le EPV, et al. Machine Learning for Workflow Applications in Screening Mammography: Systematic Review and Meta-Analysis. Radiology . 2022;302(1):88-104. doi:10.1148/radiol.2021210391 Larsen M, Aglen CF, Lee CI, et al. Artificial Intelligence Evaluation of 122 969 Mammography Examinations from a Population-based Screening Program. Radiology . 2022;303(3):502-511. doi:10.1148/radiol.212381 Romero-Martín S, Elías-Cabot E, Raya-Povedano JL, Gubern-Mérida A, Rodríguez-Ruiz A, Álvarez-Benito M. Stand-Alone Use of Artificial Intelligence for Digital Mammography and Digital Breast Tomosynthesis Screening: A Retrospective Evaluation. Radiology . 2022;302(3):535-542. doi:10.1148/radiol.211590 Dembrower K, Crippa A, Colón E, Eklund M, Strand F. Artificial intelligence for breast cancer detection in screening mammography in Sweden: a prospective, population-based, paired-reader, non-inferiority study. Lancet Digit Health . 2023;5(10):e703-e711. doi:10.1016/s2589-7500(23)00153-x Lång K, Josefsson V, Larsson AM, et al. Artificial intelligence-supported screen reading versus standard double reading in the Mammography Screening with Artificial Intelligence trial (MASAI): a clinical safety analysis of a randomised, controlled, non-inferiority, single-blinded, screening accuracy study. Lancet Oncol . 2023;24(8):936-944. doi:10.1016/s1470-2045(23)00298-x Ng AY, Oberije CJG, Ambrózay É, et al. Prospective implementation of AI-assisted screen reading to improve early detection of breast cancer. Nat Med . 2023 doi:10.1038/s41591-023-02625-9 Sickles EA, Wolverton DE, Dee KE. Performance parameters for screening and diagnostic mammography: specialist and general radiologists. Radiology . 2002;224(3):861-869. doi:10.1148/radiol.2243011482 Tables Tables 1 to 3 are available in the Supplementary Files section Additional Declarations There is NO Competing Interest. Supplementary Files supplementaryappendix20240626Naturecommunications.docx supplementary appendix Tables.docx Cite Share Download PDF Status: Published Journal Publication published 06 Mar, 2025 Read the published version in Nature Communications → Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-4640159","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":322802624,"identity":"2afe1c4f-36d0-47ac-bac3-81ec15615a68","order_by":0,"name":"Yun Woo Chang","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA10lEQVRIiWNgGAWjYHACNhAhx8DAA+Xz4FGLrMWYdC2JDURrMbiR/OzBzx216fPbzx7d8IPBTp6B5+wDAlrSzA17zxzP3XAmL+1mD0OyYQNvuwF+Lbdz2CR4247lbpDgMbvNwMCcwMDPRsBhQC2Sf9uOpcvPAGupJ06LNG9bTQLDDbCWwwkMvG34tUjef2YmLdt2wHDDmRyzmz0Gxw3beI7h18J35vAzybdtdfLy7WfMbvyoqJbn50nDr0XhAJg6DHMnNJrwAfkGMFVHSN0oGAWjYBSMZAAA3CJBpFEoo5oAAAAASUVORK5CYII=","orcid":"https://orcid.org/0000-0001-9704-8112","institution":"Soonchunhyang University Hospital Seoul","correspondingAuthor":true,"prefix":"","firstName":"Yun","middleName":"Woo","lastName":"Chang","suffix":""},{"id":322802625,"identity":"b84ef8fd-f88a-4c67-ae75-695b04ea223f","order_by":1,"name":"Jung Kyu Ryu","email":"","orcid":"","institution":"Department of radiology, Kyung hee University Hospital at Gangdong","correspondingAuthor":false,"prefix":"","firstName":"Jung","middleName":"Kyu","lastName":"Ryu","suffix":""},{"id":322802626,"identity":"6d0dce83-8dbc-4ee4-86f8-f8312cc0483c","order_by":2,"name":"Jin Kyung An","email":"","orcid":"","institution":"department of Radiology, Nowon Eulgi University Hostpial","correspondingAuthor":false,"prefix":"","firstName":"Jin","middleName":"Kyung","lastName":"An","suffix":""},{"id":322802627,"identity":"b033f520-d7b4-4bbb-9d5b-1a1614c172ba","order_by":3,"name":"Nami Choi","email":"","orcid":"","institution":"Department of Radiology, Konkuk University Medical center","correspondingAuthor":false,"prefix":"","firstName":"Nami","middleName":"","lastName":"Choi","suffix":""},{"id":322802628,"identity":"4552e4d5-0bf4-486a-b2de-697744ca7e1b","order_by":4,"name":"Kyung Hee Ko","email":"","orcid":"","institution":"Department of Radiology, CHA Bundang Medical center, Yongin Severance Hospital, Yonsei University College of medicine","correspondingAuthor":false,"prefix":"","firstName":"Kyung","middleName":"Hee","lastName":"Ko","suffix":""},{"id":322802629,"identity":"65f91552-e002-47ab-8311-57d5d0ed7b49","order_by":5,"name":"Kyunghwa Han","email":"","orcid":"https://orcid.org/0000-0002-5687-7237","institution":"","correspondingAuthor":false,"prefix":"","firstName":"Kyunghwa","middleName":"","lastName":"Han","suffix":""},{"id":322802630,"identity":"6e57c012-98f7-4c2b-826f-b6bd0a816961","order_by":6,"name":"Young Mi Park","email":"","orcid":"","institution":"Department of radiology, Inje University Busan Paik Hospital","correspondingAuthor":false,"prefix":"","firstName":"Young","middleName":"Mi","lastName":"Park","suffix":""}],"badges":[],"createdAt":"2024-06-26 05:50:06","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-4640159/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-4640159/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1038/s41467-025-57469-3","type":"published","date":"2025-03-06T05:00:00+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":59963516,"identity":"ff1cf11b-fca0-49cc-bef2-5cc50c5b8c4f","added_by":"auto","created_at":"2024-07-10 01:30:59","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":170651,"visible":true,"origin":"","legend":"\u003cp\u003eOverview of the study design\u003c/p\u003e\n\u003cp\u003e\u003csup\u003e*\u003c/sup\u003eExperts in breast imaging with more than \u0026gt;10 years of experience\u003c/p\u003e\n\u003cp\u003e\u003csup\u003e†\u003c/sup\u003eRadiologists not specializing in breast imaging\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eNote\u003c/strong\u003e: AI, artificial intelligence; BI-RADS, Breast Imaging-Reporting and Data System; BR, breast radiologist; CAD, computer-aided detection; GR, general radiologist.\u003c/p\u003e","description":"","filename":"1.png","url":"https://assets-eu.researchsquare.com/files/rs-4640159/v1/14320076085d9d3045bc8a47.png"},{"id":59963909,"identity":"3712b88d-827e-49af-a50d-625a82e09454","added_by":"auto","created_at":"2024-07-10 01:38:59","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":105152,"visible":true,"origin":"","legend":"\u003cp\u003eStudy participant flow chart.\u003c/p\u003e","description":"","filename":"2.png","url":"https://assets-eu.researchsquare.com/files/rs-4640159/v1/ff5647d77ca71c9963ffbe72.png"},{"id":59963210,"identity":"a918ec3f-1712-43b7-9797-a25037828065","added_by":"auto","created_at":"2024-07-10 01:22:59","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":61280,"visible":true,"origin":"","legend":"\u003cp\u003eNumber of screen-detected breast cancers in breast or general radiologist with and without AI-CAD, AI standalone, and biopsy-proven true positives.\u003c/p\u003e\n\u003cp\u003e\u003csup\u003e*\u003c/sup\u003eExperts in breast imaging with more than 10 years of experience\u003c/p\u003e\n\u003cp\u003e\u003csup\u003e†\u003c/sup\u003eRadiologists not specializing in breast imaging\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eNote\u003c/strong\u003e: AI, artificial intelligence; CAD, computer-aided detection\u003c/p\u003e","description":"","filename":"3.png","url":"https://assets-eu.researchsquare.com/files/rs-4640159/v1/eb79f47be8b65738fbabb6ef.png"},{"id":77955100,"identity":"8e449d80-364a-44b6-a87d-2e99cdaa269b","added_by":"auto","created_at":"2025-03-07 08:06:37","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":980063,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-4640159/v1/c4027e29-ef71-4a81-a1db-bab3c6884247.pdf"},{"id":59963213,"identity":"05fff7bd-5b4b-4186-bb53-468e6777a026","added_by":"auto","created_at":"2024-07-10 01:22:59","extension":"docx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":309253,"visible":true,"origin":"","legend":"supplementary appendix","description":"","filename":"supplementaryappendix20240626Naturecommunications.docx","url":"https://assets-eu.researchsquare.com/files/rs-4640159/v1/a544b660ddf97b32f94a2d94.docx"},{"id":59963209,"identity":"a5f7c691-9363-4be5-a82e-1bb3a3b45314","added_by":"auto","created_at":"2024-07-10 01:22:59","extension":"docx","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":44206,"visible":true,"origin":"","legend":"","description":"","filename":"Tables.docx","url":"https://assets-eu.researchsquare.com/files/rs-4640159/v1/410d16a990374d2ebc28aac2.docx"}],"financialInterests":"There is \u003cb\u003eNO\u003c/b\u003e Competing Interest.","formattedTitle":"Artificial intelligence for breast cancer screening in mammography (AI-STREAM): Preliminary analysis of a prospective multicenter cohort study","fulltext":[{"header":"Introduction","content":"\u003cp\u003eWorldwide, breast cancer is the most commonly diagnosed cancer in women and the leading cause of mortality among them. Randomized controlled trials, systemic reviews, and observational studies have demonstrated that including mammography in breast cancer screening can reduce breast cancer-related mortality rates by approximately 20\u0026ndash;50% \u003csup\u003e1\u0026ndash;3\u003c/sup\u003e. Accordingly, several countries are currently implementing screening programs that incorporate mammography for early detection and treatment of breast cancer, aiming to reduce mortality and morbidity. While mammography screening has proven effective in detecting early cancer, it can lead to undesirable false positive recalls that may require additional imaging evaluation or biopsies, that cause overdiagnosis. While early detection is the key to screening, breast cancer diagnosis can be alternatively delayed for inaccurate false negative interpretation of mammography; failure to recognize due to low sensitivity in dense breasts, misinterpretation due to lack of recognition of abnormal features, or a wrong interpretation due to no change compared to the previous study, previous biopsy site, or benign appearing lesion characteristics\u003csup\u003e\u003cspan additionalcitationids=\"CR5\" citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e\u003c/sup\u003e. Roughly 20\u0026ndash;33% of breast cancer cases are diagnosed based on symptoms modalities between consecutive screening rounds (ie, interval cancers) or discovered through other imaging modalities \u003csup\u003e\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e\u003c/sup\u003e. To overcome these shortcomings, significant efforts and improvements have been made by performing double readings with two radiologists, increasing the frequency of screening examinations, or implementing additional supplementary imaging modalities \u003csup\u003e\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e,\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eAs population-based breast cancer screening programs become more widespread, the demand for an increased number of daily breast cancer screening examinations is also progressively rising. However, medical resources remain limited, much short of this demand. Therefore, there is not only \u003cem\u003ean inevitable need for assistance\u003c/em\u003e in screening interpretations to reduce interobserver variability but it is also important to optimize and streamline current screening workflows \u003csup\u003e\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e,\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e,\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e\u003c/sup\u003e. With substantial promise in this space, artificial intelligence (AI) has been developed and validated with promising applications in mammography screening, which could support computer-assisted detection (CAD) to marks suspicious findings depending on the risk of malignancy. In support, multiple retrospective studies found that AI\u0026rsquo;s diagnostic accuracy was similar to or better than that of breast radiologists \u003csup\u003e\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e,\u003cspan additionalcitationids=\"CR11\" citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e\u003c/sup\u003e. However retrospective studies have inherent limitations, surfacing the need to prospective trials to assess the real-world effectiveness of AI-supported screening. A few prospective studies from Europe, which use a double reading system, have assessed AI as either an independent reader or as triage tool. There is still knowledge gaps how AI-CAD can affect radiologist\u0026rsquo;s performance in a single reading strategy. This prospective, multicenter, cohort study, named AI-STREAM (Artificial Intelligence for Breast Cancer Screening in Mammography), aimed to assess the diagnostic accuracy of radiologists, with and without AI-CAD, for interpretation of screening mammograms in single reading strategy.\u003c/p\u003e"},{"header":"METHODS","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003eParticipants\u003c/h2\u003e \u003cp\u003eThe AI-STREAM study, described in more detail elsewhere\u003csup\u003e\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e\u003c/sup\u003e, is a prospective, population-based study aimed to compare the diagnostic accuracy of breast radiologists when interpreting screening mammograms with or without AI-CAD for breast cancer \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e\u003cb\u003e)\u003c/b\u003e. The trial enrolled women aged \u0026ge;40 years from six academic hospitals that participated in national breast cancer screening program in South Korea. All women participating in the study provided their consent by completing an informed consent form about taking part of the study and reading a participant information sheet. Of all eligible women, those with a history of breast cancer or mammoplasty were excluded as the used AI software was not validated for these subgroups. During breast cancer screening, mammography was performed by a single radiologist (ie, single-reading strategy) using two standard craniocaudal and mediolateral oblique views of each breast using digital mammography, which is the standard procedure for interpreting mammograms in South Korea.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec4\" class=\"Section2\"\u003e \u003ch2\u003eStudy Design\u003c/h2\u003e \u003cp\u003e The study received approval from the Institutional Review Board (IRB) of all participating centers, and written consent for data publication was obtained from all participants. In this study, a radiologist specializing in breast imaging (i.e. BR) was responsible for interpreting the mammograms, who were defined as a radiologist with \u0026gt;\u0026thinsp;10 years of experience at an university hospital, possessing expertise in breast imaging. In the standard single reading strategy, the subsequent clinical decision about whether the participant requires further diagnostic workup is determined by a single radiologist. Generally, the radiologist performs a comparative reading with a previous mammogram to determine recall, if previous mammography is available. In this study, mammograms were read by radiologists without, and then with AI-CAD. As part of the standard single screening procedure, even when AI was assisted following the radiologist\u0026rsquo;s reading without AI-CAD, the final recall for further diagnostic workup was made based on the radiologist\u0026rsquo;s comprehensive decision \u003cb\u003e(set 1 from\u003c/b\u003e Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e\u003cb\u003e).\u003c/b\u003e\u003c/p\u003e \u003cp\u003eFor exploratory purposes, results from stand-alone AI-CAD were also collected and compared to the diagnostic performance of radiologists with and without the use of AI-CAD. Moreover, a separate simulation study was conducted \u003cb\u003e(set 2 from\u003c/b\u003e Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e\u003cb\u003e)\u003c/b\u003e to compare the cancer detection rare (CDR) and recall rate (RR) of general radiologists (i.e. GRs), who were not specialize in breast imaging, with versus without AI-CAD. Note these results from GRs did not impact any real-world clinical decisions making of study participants for further work-up, given that mammograms are those examined from BRs. The main rationale behind this additional study was that GRs comprise the majority in Korea\u0026rsquo;s breast cancer screening program due to shortage of BRs. Thus, gaining and insight into the potential effect of AI-CAD on GR\u0026rsquo;s performance would be highly reflective of and valuable to real-world screening practice. All radiologists who participated in the study, including both BRs and GRs, had no prior experience with the AI-CAD program, minimizing bias.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec5\" class=\"Section2\"\u003e \u003ch2\u003eProcedures\u003c/h2\u003e \u003cp\u003eFor image acquisition and data management, a cloud-based imaging data management platform (IRM\u0026rsquo;s BEST Image) was used \u003cb\u003e(eFigure 1, Appendix 1)\u003c/b\u003e. Mammograms were processed by a Snupi program, which performed various operations (eg, search, inquiry, de-identification, separation) and transmission functions on Digital Imaging and Communication in Medicine (DICOM) files. After de-identifying the participant information and assigning identifications, mammograms from participants were exported to the platform to record reading results. If participants had mammograms within the past four years, these were also exported to the study platform for comparison to replicate the same procedure as real-world readings procedure of each site.\u003c/p\u003e \u003cp\u003eAs part of the routine screening process, BRs interpreted the mammography without AI-CAD and recorded the findings (Test 1), followed by the automatic presentation of AI-CAD results (abnormality score and marks) for review and recording. The final records results were based on comprehensive assessment that considered interpretations from both with and without AI-CAD (Test 2). Results of Test 1 (or without AI-CAD) could not be modified or adjusted once the AI-CAD results were reviewed. Likewise, results of Test 2 could also not be corrected after reading \u003cb\u003e(eFigure 1)\u003c/b\u003e.\u003c/p\u003e \u003cp\u003eFor variables recorded per \u0026lsquo;Test\u0026rsquo;, the radiologist first recorded the breast density according to Breast Imaging Reporting and Data System (BI-RADS) 5th edition (A, B, C, D). Second, the radiologist assessed malignancy using a 7-point scale (1, definitely normal; 2, benign; 3, probably benign [0\u0026ndash;2%]; 4, low suspicion for malignancy [2\u0026ndash;10%]; 5, moderate suspicion of malignancy [10\u0026ndash;50%]; 6, high suspicion for malignancy [50\u0026ndash;95%]; 7, highly suggestive of malignancy [\u0026ge;95%]). For cases were not recalled, the radiologist could choose from a malignant assessment score of 1 or 2, whereas for cases that were recalled, the radiologist could choose a score between 3 and 7. Score 1 or 2 were considered negative as BI-RADS 1 or 2, while scores of 3 and higher were considered positive as BI-RADS 3 to 5, indicating the need for a recall. In the case a recall decision was made, the location (left, right, both) of the recall was also recorded. The recall for further diagnostic work-up of participants was a BR\u0026rsquo;s comprehensive decision informed by the results from considering the paired reading resulting with and without AI-CAD.\u003c/p\u003e \u003cp\u003eIf a participant was recalled and visited the same hospital where the screening mammography was performed, additional diagnostic work-up (e.g. special mammography views, tomosynthesis, ultrasonography) was conducted. If needed, a biopsy was performed, and if a pathologist subsequently diagnosed breast cancer (screen-detected), the participant's information was recorded separately. If surgery was performed, the final pathology was confirmed. Participants diagnosed with breast cancer were reviewed for other breast imaging and pathologic features from electronic medical records and pathology reports (if available).\u003c/p\u003e \u003cp\u003eAs a secondary and exploratory objective, a separate reading set (Set 2) was designed and conducted as a simulation study \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e\u003cb\u003e)\u003c/b\u003e. In set 2, five general radiologists (GRs) who did not specialize in breast imaging interpreted the same participant\u0026rsquo;s mammography; GRs had variable experience as radiologists and interpreting mammography. The participant\u0026rsquo;s mammography was interpreted the same research platform with and without AI-CAD and the corresponding results were recorded on the same study platform. All participating radiologists, including both BRs and GRs, had no prior experience with the AI-CAD program.\u003c/p\u003e \u003cp\u003eThe study used a commercial AI-CAD (Lunit INSIGHT MMG, available at \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://insight.lunit.io\u003c/span\u003e\u003cspan address=\"https://insight.lunit.io\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e, version 1.1.7.1), which has been validated through various studies \u003csup\u003e\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e,\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e\u003c/sup\u003e \u003cb\u003e(Appendix 3)\u003c/b\u003e. In brief, Lunit INSIGHT MMG AI-based CAD improves radiologist\u0026rsquo;s performance and provides results equivalent or superior to those form radiologists alone \u003csup\u003e\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e\u003c/sup\u003e. It also has shown superior performance compared with two other commercial AI-based software products \u003csup\u003e\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e\u003c/sup\u003e. The AI provides abnormal scores ranging from 0 to 100, rounded to two decimal places, per breast based on mammograms. These scores can also be presented as a heatmap or grayscale map. AI results were considered positive if the abnormality score was above a predefined cutoff of 10.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec6\" class=\"Section2\"\u003e \u003ch2\u003eOutcomes\u003c/h2\u003e \u003cp\u003eThe primary outcomes were CDRs and RRs of BRs with and without AI-CAD in mammography reading for screen-detected breast cancer, including invasive or ductal in situ (or both). The secondary outcome was to compare CDRs and RRs of mammography reading in the following comparisons: 1) BRs without AI-CAD vs AI standalone, 2) BRs with AI-CAD vs AI standalone, 3) GRs without AI-CAD vs GRs with AI-CAD, 4) GRs without AI-CAD vs AI standalone, and 5) GRs with AI-CAD and AI standalone.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec7\" class=\"Section2\"\u003e \u003ch2\u003eStatistical analysis\u003c/h2\u003e \u003cp\u003eThe sample size was estimated using McNemar\u0026rsquo;s test to detect differences in the CDR between groups of radiologists with and without AI-CAD, with a two-sided test at a significance level of 0.05 and 80% power. The assumed cancer prevalence was 3.21 per 1000 examinations, determined from data in a previous retrospective study, and the target sample size was chosen based on this expected cancer prevalence \u003csup\u003e\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e\u003c/sup\u003e. The target sample size was 32,714 participants, corresponding to approximately 16,000 participants per year. The total number of expected participants was however adjusted from the initial study design due to the COVID-19 pandemic, but no effect on the primary study endpoint was observed. Assuming the same cancer prevalence as 3.21 per 1000 examinations, it was calculated that the sample size could be maintained if approximately 24,000 people were recruited while still maintaining 80% power and detecting more than 90 cases of cancers (\u003cb\u003eAppendix 4\u003c/b\u003e).\u003c/p\u003e \u003cp\u003eDescriptive statistics were used for continuous and categorical variables, as appropriate. Logistic regression analysis using a generalized estimating formula was used to estimate 95% CI and for comparative analysis. Pairwise comparisons were performed to compare BRs with AI-CAD, BRs without AI-CAD, AI standalone, GRs with AI-CAD, and GRs without AI-CAD. In addition, corrections based on multiple comparisons are necessary for confirmation, but no corrections were made considering the preliminary nature of the study.\u003c/p\u003e \u003cp\u003ePrespecified subgroup analyses were performed to examine results in different age groups (40\u0026ndash;49, 50\u0026ndash;59, 60\u0026ndash;69, 70\u0026thinsp;+\u0026thinsp;years), mammographic density (four categories of the BI-RADS by the American College of Radiology], and malignant scale assessment using a 7-point scale (defined above). In breast cancer, the following subgroups were analyzed for cancer characteristics including invasiveness, categories of tumor size (\u0026lt;\u0026thinsp;20 mm, \u0026ge;\u0026thinsp;20 mm), presence of axillary lymph node metastasis, and molecular subtypes (luminal A, nonluminal A [luminal B, human epidermal growth factor receptor 2 [HER2] enriched, or triple negative]) \u003cb\u003e(Appendix 5).\u003c/b\u003e\u003c/p\u003e \u003c/div\u003e"},{"header":"RESULTS","content":"\u003cp\u003e\u003cstrong\u003e\u003cem\u003eParticipant Selection Criteria and Characteristics\u003c/em\u003e\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eBetween February 1, 2021, and December 31, 2022, 25,008 women aged\u0026nbsp;³40 years underwent regular mammography screening as part of the national breast cancer screening program in South Korea and were eligible for enrollment into the study. After applying the exclusion criteria of parenchymal change due to previous procedure, mammoplasty or insertion of foreign substance (n=144), individuals who withdraw consent (n=267), and data errors on the cloud server (n=54), 24,543 participants were included in the final cohort. Among the patients who underwent additional assessment at the hospital where the recall was conducted, the pathologically diagnosed breast cancer was analyzed at the time of 1-year follow-up after completion of the last participant enrollment. There were 148 cases of pathologically confirmed cancers, of which there was a total of 140 (0.57%) screening detected cancers, including 2 cases of bilateral breast cancer. We analyzed 24,545 mammograms including 2 cases of bilateral breast cancer for 24,543 participants \u003cstrong\u003e(Figure 2)\u003c/strong\u003e.\u0026nbsp;The median age of the study cohort was 61 years (IQR, 51-68). Of all participants, 67.5% had dense breasts, and 80.7% of diagnosed breast cancer had dense breasts \u003cstrong\u003e(Table 1)\u003c/strong\u003e.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e\u003cem\u003ePerformance of BRs with and without AI-CAD\u003c/em\u003e\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eOverall, BRs screen-detected 123 breast cancers without AI-CAD, while those with AI-CAD detected 140 cancers, resulting in 17 more cancers detected with AI-CAD in BRs \u003cstrong\u003e(Figure 3)\u003c/strong\u003e. The CDRs of BRs with AI-CAD (n=140, 5.70 ‰ [95% CI 4.76, 6.65]) was significantly higher by 13.8% compared to BRs without AI (n=123, 5.01 ‰ [95% CI 4.13, 5.89]) (p \u0026lt;0.001), while no significant change in RRs was observed between BR with AI-CAD (n=1113, 4.53% [4.27, 4.79]) and BR without AI-CAD (n=1100, 4.48% [4.22, 4.74]) (p =0.564) (\u003cstrong\u003eTable 2\u003c/strong\u003e).\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eIn cancer characteristic-specific subgroup analyses, the median tumor size was 16 mm (IQR 11–25). Of the total 140 cases, 46 (32.9%) had ductal cancer in situ (DCIS), and 94 (67.1%) had invasive cancers and among 84 cases examined for lymph node metastases, 70 (83.3%) showed negative lymph node metastasis. For molecular subtypes of IDC, out of 68 cases, 50 (73.5%) were classified as luminal A, 18 (26.5%) as non- luminal A [8 (11.8%) as luminal B, 4 (5.9%) as HER2 overexpressing, and 6 (8.8 %) as basal]. Interpreting mammograms with AI-CAD detected 6 additional cases of DCIS and 11 additional cases of invasive cancer, leading to notable increase in detection for both DCIS (p=0.009) and invasive cancers (p\u0026lt;0.001). Furthermore, assistance with AI-CAD resulted in a significant increase in the detection of small-sized cancer less than 20 mm (p=0.002), node-negative metastasis (p\u0026lt;0.001), luminal A subtype (p=0.002), and lower grade IDC NOS (p =0.009) when compared without AI-CAD \u003cstrong\u003e(Table 3)\u003c/strong\u003e.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e\u003cem\u003ePerformance of GRs with and without AI-CAD\u0026nbsp;\u003c/em\u003e\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eResults of a simulation study (exploratory analysis) found similar trends of diagnostic performance for GRs to that of BRs, but with a greater improvement, as CDRs for GRs with AI-CAD (n=120, 4.89 ‰ [95%CI 4.02, 5.76]) was significantly higher by 26.4% than that of GRs without AI-CAD (n=95, 3.87 ‰ [3.09, 4.65]; p \u0026lt;0·001), resulting in 25 more detected cancers with AI-CAD. However, RRs between GRs with (n=1690, 6.89% [6.57, 7.2]) vs without AI-CAD (n=1548, 6.31%; [6, 6.61]) was also significantly increased (p \u0026lt;0.001) \u003cstrong\u003e(Figure 3, Table 2)\u003c/strong\u003e.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e\u003cem\u003ePerformance of standalone AI and Radiologists with and without AI-CAD\u0026nbsp;\u003c/em\u003e\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe CDR of AI standalone was 128 (5.21 ‰ [95% CI 4.31, 6.12]), which showed no significant difference vs BRs without AI-CAD (p=0.752) or BRs with AI-CAD (p=0.462). However, AI standalone showed a significantly higher RR (n=1535, 6.25% [5.95, 6.56])) vs BRs with AI-CAD and without AI-CAD (both, p \u0026lt;0.001).When compared to GR without AI-CAD, the CDR of standalone AI is significantly higher than GR without AI-CAD (p=0.027), without affecting RR (p=0.809). The CDR of GR with AI-CAD showed improvement, showing no significant difference compared to AI alone (p=0.611), but the RR significantly increased compared to AI alone (p=0.005) \u003cstrong\u003e(Figure 3, Table 2)\u003c/strong\u003e.\u003c/p\u003e"},{"header":"DISCUSSION","content":"\u003cp\u003eThe preliminary results of the AI-STREAM study, which was a population-based prospective study provide real-world evidence that using AI-CAD for BRs’ interpretation of screening mammograms significantly increased the CDR (5.70 per 1000 participants) compared to radiologists not using AI-CAD (5.01 per 1000 participants). The assistance of AI-CAD that led to improved CDRs did not affect RRs, providing reassurance to radiologists when using AI-CAD in their routine practice.\u003c/p\u003e\n\u003cp\u003eThe interpretation process for mammography in breast cancer screening has diverse strategies in each country and it thus, is tailored to local needs, practice, and cancer prevalence and/or incidence. Recent advancements in AI, have shown substantial promise in reading the results of screening mammography, including in double reading systems, AI-aided, and AI standalone methods, as indicated by multiple retrospective studies\u0026nbsp;\u003csup\u003e10,16-18\u003c/sup\u003e. While limited to double-read European settings, few prospective studies have been and are being conducted to assess the clinical effectiveness of using AI-CAD\u0026nbsp;\u003csup\u003e19-21\u003c/sup\u003e. There are however limitations in directly comparing our study to these studies due to an inherent difference in the screening practice (single- versus double-reading strategy). The population-based, prospective ScreenTrust CAD study found that dual reading by one radiologist plus AI resulted in a 4% increase in screen-detected cancers compared to standard double-reading by radiologist, which was non-inferior in cancer detection. Result favoring the radiologist plus AI arm was also observed for RRs, with RRs reduced by 4% when compared with standard human double reading during follow-up consensus discussions reviewing mammography. Consensus discussion, using medical history and AI information, has proven effective in preventing an increase in abnormal interpretation during double reading by AI and radiologists from leading to the recall rate\u0026nbsp;\u003csup\u003e19\u003c/sup\u003e. Despite this, many regions outside of Europe and some private practice in Europe adopt single reading as their many readings strategy, raising the need for prospective evidence on the effect of AI in real-world, single read setting. The preliminary analysis of AI-STREAM specifically addressed this knowledge gap, by not only utilizing real-world screening data collected prospectively from multiple centers, but also increased CDRs and unaffected RRs in BRs with AI-CAD. In the single reading setting of this AI-STREAM study, the radiologist's decision to recall or not recall a participant was made the comprehensive assessment obtained with the paired results with and without AI-CAD. Despite utilizing a single reading strategy in this study, comprehensive decisions including the comparison of prior mammograms appear to have contributed to reducing false positive recalls by BRs, while maintaining the strengths of consensus reading from a dual reading strategy.\u003c/p\u003e\n\u003cp\u003eAnother prospective study worth noting, despite differences, is the Mammography Screening with Artificial Intelligence trial (MASAI) study, where the AI score was used to triage mammograms to either single or double reading strategies; for instance, for low AI scores, independent reading was done. The implementation of AI-supported screen reading yielded a significant 28% increase in cancer detection without a rise in false positive rates, as observed in the AI interventional triage group compared to the control standard double-reading group. Furthermore, there was a 44% reduction in the workload associated with screening-reading.\u0026nbsp;\u003csup\u003e20\u003c/sup\u003e.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eIn a real-world clinical environment, the setting of AI thresholds is an important factor in mammography readings using AI, where this threshold is set and calibrated differently for each individual study\u003csup\u003e10\u003c/sup\u003e. In the AI-STREAM study, specifically, the AI score was considered positive if it was above the predefined cutoff of 10 based on the abnormality score per breast; If the AI score was 10 or higher, the abnormality score and CAD mark were displayed on the screen readout for mammography, radiologists could fully detect these markings. Thresholds could be varied depending on the purpose where for instance, an AI threshold with high sensitivity should be used if the final decision is made by radiologists\u003csup\u003e10\u003c/sup\u003e, whereas higher thresholds for specificity may be essential when AI is used as an independent reader. Regardless, repeated calibration of the AI threshold will likely be necessary in actual clinical use to maintain the desired operating point. Moreover, determining AI thresholds based on retrospective data alone may not always be sufficient, raising the need of repeated calibrations in a prospective manner. In support, the ScreenTrustCAD study aimed for a 2% increase in the true positive fraction, which however resulted in an actual 6% increase\u003csup\u003e19\u003c/sup\u003e. Currently, the lack of quality assurance protocols to detect and correct data drift, which impacts the performance of AI systems, stands as a major barrier to the practical implementation of AI, Therefore, further evaluation is required regarding AI thresholds to better understand how best to address these unresolved issues.\u003c/p\u003e\n\u003cp\u003eOne prior study of differences in screening mammography interpretation performance by radiologist experience\u0026nbsp;found that RRs for specialist radiologists was significantly lower than that of general radiologists whereas the biopsy performed CDRs was significantly higher for the specialist radiologists\u0026nbsp;\u003csup\u003e22\u003c/sup\u003e. These differences in performance, in which specialist radiologists make more true positive and fewer false positive interpretations of screening mammography may be related to increased amounts of initial and continuing education in mammography, as well as accumulated experience. Additionally, consideration may be given to reducing the differences by using multiple rather than single reading systems\u0026nbsp;\u003csup\u003e22\u003c/sup\u003e. The AI-STREAM study was additionally designed to evaluate the impact of using AI-CAD when differences exist in radiologists' experience in mammography interpretation (eg, BRs and GRs). Similar to the positive impact of using AI-CAD on screening mammography interpretation by BRs were also observed with GRs, but to a much larger extent, as CDRs significantly increased by 26.4%. GRs have a chance to see only a few cases of breast cancer in an entire year which causes limited self-confidence in the interpretation of mammography. This showed a modest increased in RRs, unlike BRs which had no significant differences in RRs. Although comparison with prior mammograms was possible in this simulation analysis, this could be explained by GRs relying more on AI-CAD results, given their relatively lower self-confidence in interpreting mammography compared to BRs, which seems to have induced increased false positive RRs. However, when comparing the results of standalone AI vs GR without AI-CAD, the standalone AI had a significantly higher CDR and no difference in RR, suggesting that AI could surpass the reading capabilities of inexperienced radiologists, or GRs. Based on these results, it is clearly expected that AI-CAD as an aid in mammography interpretation can particularly benefit those with less experience in interpreting mammograms, and produce positive results demonstrating the effect of multiple readings.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eResults of standalone AI showed no significant differences in CDRs compared to both BRs with and without AI-CAD, indicating comparable CDRs of standalone AI versus experienced, expert radiologists. Thus, these findings are also in line with previous meta-analysis, which reported that standalone AI in digital mammography is either equivalent to or superior than the interpretation by radiologists\u003csup\u003e10\u003c/sup\u003e. \u0026nbsp;However, RRs from standalone AI were significantly higher compared to those from BRs with and without AI-CAD. This could be owed to the uniformly applied AI-CAD abnormality score thresholds to determine AI's recall or no recall, as well as the fact that AI-CAD could not consider and compare with prior mammograms. Furthermore, the use of standalone AI as a mammography reader, without any human involvement, presents many challenges of current ethical and medicolegal uncertainties. Hence, the importance of leaving the ultimate clinical decision to the radiologists must be emphasized, not only to meet established medico-legal requirements but to also minimize false positive results, and future discussions specifically addressing this topic is needed.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eWith AI assistance, BRs demonstrated an approximate 11-13% increase in cancer detection in both DCIS and IDC. Cancer detected by AI assistance increased small-sized (\u0026lt; 20mm), node-negative metastasis, low grade of IDC, and better prognostic luminal A subtype. Although it is a preliminary analysis and the number of cancers is small, this suggests that AI assistance can improve the early detection of breast cancer with relevant prognostic features, with minimal unnecessary recalls.However, a two-year follow-up is needed to evaluate the true impact of AI-CAD use on interval cancers detected after two years (biennial screening interval in South Korea) and whether there is an increase in interval cancers with poor prognosis. The final results of the AI-STREAM study reflecting these results will be announced after 2026 on the analysis of data to linked to the National Cancer Register (\u003cstrong\u003eeFigure 2\u003c/strong\u003e).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eStrengths and limitations\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eStrength of AI-STREAM study are, first, as part of a multicenter prospective study conducted on patients participating in national cancer screening, it used various mammography devices (GE and Hologic digital mammography devices). Second, the radiologists who participated in the analysis were proven experts in the interpretation of mammography with many years of experience, and the analysis results were evaluated separately, according to experience.\u0026nbsp;Third, according to the standard procedure of interpreting screening mammography by a single radiologist, to the best of our knowledge, this study is the first clinical trial conducted as a prospective multicenter study evaluating diagnostic accuracy between radiologists with and without AI-CAD.\u0026nbsp;Although only one AI system was used for mammography interpretation, it was verified as the best performing algorithm compared with others used in previous research\u003csup\u003e8\u003c/sup\u003e.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThere were several limitations in our study. First, the study was an observational trial although a randomized controlled trial would be ideal for direct comparison between with AI and without AI in screening. The study was performed to evaluate the effect of AI assistance in a single reading strategy by a single radiologist, because it wasn't easy to design a protocol involving multi-centers as an randomized trial in applying AI in real clinical practice. However, in AI assistance with a single reading strategy, it is hard to assess direct AI-CAD effect due to which information affected the change in mind for recall of the radiologists. Second, the interim analysis was indeed planned for 2026 upon linkage to and reviewing the National Cancer Registry data to screen-detected cancers. However, this preliminary analysis focused on pathologically proven cases at least one year after the last participant's enrollment. Although this is a preliminary analysis, all participant data had been collected and cleaned and database was locked, and despite the short follow-up to evaluate cumulative effectiveness, it does not affect the false positive results and maintains the pre-planned Statistical Analysis Plan (SAP) (Appendix); according to SAP, assuming a cancer prevalence rate of 3.21 per 1000 tests, maintaining 80% power with approximately 24,000 participants recruited and detecting over 90 cancers. This preliminary analysis maintained the 80% power of the study and evaluated screening-detected cancer including the recall rate of radiologists with and without AI-CAD, which analysis will be able to be substituted as the result of an interim analysis of AI-STREAM. The final result of including interval cancer will be reviewed and analyzed with data linked to the National Cancer Registry expected to be available after 2026. Third, the area with the highest AI score was included in the analysis, but only in 2 cases diagnosed as bilateral cancer with abnormal scores on both sides were included in the analysis. Therefore, not all instances where the AI score was bilateral-sided in non-cancer cases were evaluated. Lastly, the conclusive results for breast cancer were obtained through additional diagnostic work-up following recall after mammography screening, as well as electronic medical and pathology reports from the same hospital where the surgery was conducted. The sample size was relatively small because we could only analyze data for cases with available results for lesion size, nodal metastasis, and molecular subtypes.\u003c/p\u003e\n\u003cp\u003eIn conclusion, given the diverse mammography interpretation procedures across countries worldwide, there is a need demonstrate to the true positive impact of increased cancer detection rates when AI is applied in various ways in real-world clinical environments. The preliminary results from this prospective AI-STREAM study demonstrated positive potential that AI assistance in radiologists’ interpretation is indeed beneficial for BRs and GRs in a single reading strategy. With the assistance of AI-CAD, a BR improved CDR and increased early cancer detection without affecting RR in a single reading strategy.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eContributors:\u0026nbsp;\u003c/strong\u003eY-WC and KH conceptualized the design of the trial with input from Y-WC. KH did the statistical analysis. Y-WC, JKA, NC, KHK, YMP, and JKR directly assessed and verified the underlying data reported in the manuscript. Y-WC and KH interpreted the results of the validation study. Y-WC wrote the first draft of the report with input from KH. All authors subsequently edited the report. JKR and Y-WC supervised the project. All authors approved the final version of the manuscript and had final responsibility for the decision to submit for publication.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding:\u003c/strong\u003e\u0026nbsp; Korea Health Industry Development Institute with its third Korea Medical Device Development Fund in 2020.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eDeclaration of interests:\u003c/strong\u003e All authors declare no competing interests.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eData sharing:\u0026nbsp;\u003c/strong\u003eIndividual patient data will be shared to the extent that anonymity can be maintained, that the recipient has ethical approval to conduct the research, and with a data transfer agreement.\u003c/p\u003e\n\u003cp\u003eA request to obtain study data can be discussed with the committee comprising researchers associated with the study hospital, to ensure compliance with General Data Protection Regulations and other legal agreements.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAcknowledgments:\u0026nbsp;\u003c/strong\u003eThis study received a grant from the Korea Health Industry Development Institute with its third Korea Medical Device Development fund in 2020. We thank the trial participants, trial support nurses at each hospital, radiologists at the simulation mammography reading (KWR, THN, JYL, DYY), and Lunit for their support. We would like special thanks to Dr. Ki Hwan Kim for management, information, and organizational contributions and Dr. Han Eol Jeong for research support.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n \u003cli\u003eMyers ER, Moorman P, Gierisch JM, et al. Benefits and Harms of Breast Cancer Screening: A Systematic Review. \u003cem\u003eJama\u003c/em\u003e. 2015;314(15):1615-1634. doi:10.1001/jama.2015.13183\u003c/li\u003e\n \u003cli\u003eThe benefits and harms of breast cancer screening: an independent review. \u003cem\u003eLancet\u003c/em\u003e. 2012;380(9855):1778-1786. doi:10.1016/s0140-6736(12)61611-0\u003c/li\u003e\n \u003cli\u003eG\u0026oslash;tzsche PC, J\u0026oslash;rgensen KJ. Screening for breast cancer with mammography. \u003cem\u003eCochrane Database Syst Rev\u003c/em\u003e. 2013;2013(6):Cd001877. doi:10.1002/14651858.CD001877.pub5\u003c/li\u003e\n \u003cli\u003eYoon JH, Kim EK. Deep Learning-Based Artificial Intelligence for Mammography. \u003cem\u003eKorean J Radiol\u003c/em\u003e. 2021;22(8):1225-1239. doi:10.3348/kjr.2020.1210\u003c/li\u003e\n \u003cli\u003eHovda T, Tsuruda K, Hoff SR, Sahlberg KK, Hofvind S. Radiological review of prior screening mammograms of screen-detected breast cancer. \u003cem\u003eEur Radiol\u003c/em\u003e. 2021;31(4):2568-2579. doi:10.1007/s00330-020-07130-y\u003c/li\u003e\n \u003cli\u003eLamb LR, Mohallem Fonseca M, Verma R, Seely JM. Missed Breast Cancer: Effects of Subconscious Bias and Lesion Characteristics. \u003cem\u003eRadiographics\u003c/em\u003e. 2020;40(4):941-960. doi:10.1148/rg.2020190090\u003c/li\u003e\n \u003cli\u003eTaylor-Phillips S, Stinton C. Double reading in breast cancer screening: considerations for policy-making. \u003cem\u003eBr J Radiol\u003c/em\u003e. 2020;93(1106):20190610. doi:10.1259/bjr.20190610\u003c/li\u003e\n \u003cli\u003eSalim M, W\u0026aring;hlin E, Dembrower K, et al. External Evaluation of 3 Commercial Artificial Intelligence Algorithms for Independent Assessment of Screening Mammograms. \u003cem\u003eJAMA Oncol\u003c/em\u003e. 2020;6(10):1581-1588. doi:10.1001/jamaoncol.2020.3321\u003c/li\u003e\n \u003cli\u003eLee CS, Moy L, Hughes D, et al. Radiologist Characteristics Associated with Interpretive Performance of Screening Mammography: A National Mammography Database (NMD) Study. \u003cem\u003eRadiology\u003c/em\u003e. 2021;300(3):518-528. doi:10.1148/radiol.2021204379\u003c/li\u003e\n \u003cli\u003eYoon JH, Strand F, Baltzer PAT, et al. Standalone AI for Breast Cancer Detection at Screening Digital Mammography and Digital Breast Tomosynthesis: A Systematic Review and Meta-Analysis. \u003cem\u003eRadiology\u003c/em\u003e. 2023;307(5):e222639. doi:10.1148/radiol.222639\u003c/li\u003e\n \u003cli\u003eRodriguez-Ruiz A, L\u0026aring;ng K, Gubern-Merida A, et al. Stand-Alone Artificial Intelligence for Breast Cancer Detection in Mammography: Comparison With 101 Radiologists. \u003cem\u003eJ Natl Cancer Inst\u003c/em\u003e. 2019;111(9):916-922. doi:10.1093/jnci/djy222\u003c/li\u003e\n \u003cli\u003eFreeman K, Geppert J, Stinton C, et al. Use of artificial intelligence for image analysis in breast cancer screening programmes: systematic review of test accuracy. \u003cem\u003eBmj\u003c/em\u003e. 2021;374:n1872. doi:10.1136/bmj.n1872\u003c/li\u003e\n \u003cli\u003eChang YW, An JK, Choi N, et al. Artificial Intelligence for Breast Cancer Screening in Mammography (AI-STREAM): A Prospective Multicenter Study Design in Korea Using AI-Based CADe/x. \u003cem\u003eJ Breast Cancer\u003c/em\u003e. 2022;25(1):57-68. doi:10.4048/jbc.2022.25.e4\u003c/li\u003e\n \u003cli\u003eKim HE, Kim HH, Han BK, et al. Changes in cancer detection and false-positive recall in mammography using artificial intelligence: a retrospective, multireader study. \u003cem\u003eLancet Digit Health\u003c/em\u003e. 2020;2(3):e138-e148. doi:10.1016/s2589-7500(20)30003-0\u003c/li\u003e\n \u003cli\u003eHong S, Song SY, Park B, et al. Effect of Digital Mammography for Breast Cancer Screening: A Comparative Study of More than 8 Million Korean Women. \u003cem\u003eRadiology\u003c/em\u003e. 2020;294(2):247-255. doi:10.1148/radiol.2019190951\u003c/li\u003e\n \u003cli\u003eHickman SE, Woitek R, Le EPV, et al. Machine Learning for Workflow Applications in Screening Mammography: Systematic Review and Meta-Analysis. \u003cem\u003eRadiology\u003c/em\u003e. 2022;302(1):88-104. doi:10.1148/radiol.2021210391\u003c/li\u003e\n \u003cli\u003eLarsen M, Aglen CF, Lee CI, et al. Artificial Intelligence Evaluation of 122\u0026thinsp;969 Mammography Examinations from a Population-based Screening Program. \u003cem\u003eRadiology\u003c/em\u003e. 2022;303(3):502-511. doi:10.1148/radiol.212381\u003c/li\u003e\n \u003cli\u003eRomero-Mart\u0026iacute;n S, El\u0026iacute;as-Cabot E, Raya-Povedano JL, Gubern-M\u0026eacute;rida A, Rodr\u0026iacute;guez-Ruiz A, \u0026Aacute;lvarez-Benito M. Stand-Alone Use of Artificial Intelligence for Digital Mammography and Digital Breast Tomosynthesis Screening: A Retrospective Evaluation. \u003cem\u003eRadiology\u003c/em\u003e. 2022;302(3):535-542. doi:10.1148/radiol.211590\u003c/li\u003e\n \u003cli\u003eDembrower K, Crippa A, Col\u0026oacute;n E, Eklund M, Strand F. Artificial intelligence for breast cancer detection in screening mammography in Sweden: a prospective, population-based, paired-reader, non-inferiority study. \u003cem\u003eLancet Digit Health\u003c/em\u003e. 2023;5(10):e703-e711. doi:10.1016/s2589-7500(23)00153-x\u003c/li\u003e\n \u003cli\u003eL\u0026aring;ng K, Josefsson V, Larsson AM, et al. Artificial intelligence-supported screen reading versus standard double reading in the Mammography Screening with Artificial Intelligence trial (MASAI): a clinical safety analysis of a randomised, controlled, non-inferiority, single-blinded, screening accuracy study. \u003cem\u003eLancet Oncol\u003c/em\u003e. 2023;24(8):936-944. doi:10.1016/s1470-2045(23)00298-x\u003c/li\u003e\n \u003cli\u003eNg AY, Oberije CJG, Ambr\u0026oacute;zay \u0026Eacute;, et al. Prospective implementation of AI-assisted screen reading to improve early detection of breast cancer. \u003cem\u003eNat Med\u003c/em\u003e. 2023 doi:10.1038/s41591-023-02625-9\u003c/li\u003e\n \u003cli\u003eSickles EA, Wolverton DE, Dee KE. Performance parameters for screening and diagnostic mammography: specialist and general radiologists. \u003cem\u003eRadiology\u003c/em\u003e. 2002;224(3):861-869. doi:10.1148/radiol.2243011482\u003c/li\u003e\n\u003c/ol\u003e"},{"header":"Tables","content":"\u003cp\u003eTables 1 to 3 are available in the Supplementary Files section\u003c/p\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"nature-portfolio","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"","title":"Nature Portfolio","twitterHandle":"","acdcEnabled":false,"dfaEnabled":false,"editorialSystem":"ejp","reportingPortfolio":"","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"","lastPublishedDoi":"10.21203/rs.3.rs-4640159/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-4640159/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eSeveral studies have shown that artificial intelligence (AI) improves mammography screening accuracy. Meanwhile, prospective evidence, particularly in a single-read setting, is lacking. This study aimed to compare the diagnostic accuracy of breast radiologists, with and without an AI-based computer-aided detection (AI-CAD) for interpreting screening mammograms in a real-world, single-read setting. A prospective multicenter cohort study in six academic hospitals participant in South Korea’s national breast cancer screening program was done, where women aged ³40 years were eligible for enrollment between February 2021 and December 2022. The primary outcome was screen-detected breast cancer diagnosed at a one-year follow-up. The primary analysis compared cancer detection rate (CDRs) and recall rates (RRs) of breast imaging specialized radiologists, with and without AI assistance. The exploratory, secondary analysis compared CDRs and RRs of general radiologists, with and without AI, as well as radiologists versus standalone AI. Of 25,008 women who were eligible for enrollment, 24,543 women were included in the final cohort (median age 61 years [IQR 51-68]), with 140 (0.57%) screen-detected breast cancers. The CDR was significantly higher by 13.8% for breast radiologists with AI-CAD (n=140 [5.70 ‰]) versus those without AI (n=123 [5.01 ‰]; p \u0026lt;0.001), with no significant difference in RRs (p =0.564). Similar trends were observed for general radiologists, with a significant 26.4% higher CDR in those with AI-CAD (n=120 [4.89 ‰]) versus those without AI (n=95 [3.87 ‰]; p \u0026lt;0.001). The CDR of standalone AI (n=128 [5.21 ‰]) was also significantly higher than that of general radiologists without AI (p=0.027), with no significant differences in RRs (p =0.809). This preliminary result from a prospective, multicenter cohort study provided evidence of significant improvement in CDRs without affected RRs of breast radiologists when using AI-CAD, as compared to not using AI-CAD, when interpreting screening mammograms in a radiologist’s standard single reading setting. Furthermore, AI-CAD assistance could potentially improve radiologist’s reading performance, regardless of experience (ClinicalTrials.gov: NCT0524591).\u003c/p\u003e","manuscriptTitle":"Artificial intelligence for breast cancer screening in mammography (AI-STREAM): Preliminary analysis of a prospective multicenter cohort study","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-07-10 01:22:54","doi":"10.21203/rs.3.rs-4640159/v1","editorialEvents":[],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"nature-communications","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"NCOMMS","sideBox":"Learn more about [Nature Communications](http://www.nature.com/ncomms/)","snPcode":"","submissionUrl":"https://mts-ncomms.nature.com/","title":"Nature Communications","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"ejp","reportingPortfolio":"Nature Communications","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"46f66ccf-21ab-4ccc-93d5-2da242f9e059","owner":[],"postedDate":"July 10th, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[{"id":34138602,"name":"Health sciences/Medical research/Clinical trial design/Clinical trials/Adaptive clinical trial"},{"id":34138603,"name":"Health sciences/Diseases/Cancer/Breast cancer"}],"tags":[],"updatedAt":"2025-03-07T08:06:31+00:00","versionOfRecord":{"articleIdentity":"rs-4640159","link":"https://doi.org/10.1038/s41467-025-57469-3","journal":{"identity":"nature-communications","isVorOnly":false,"title":"Nature Communications"},"publishedOn":"2025-03-06 05:00:00","publishedOnDateReadable":"March 6th, 2025"},"versionCreatedAt":"2024-07-10 01:22:54","video":"","vorDoi":"10.1038/s41467-025-57469-3","vorDoiUrl":"https://doi.org/10.1038/s41467-025-57469-3","workflowStages":[]},"version":"v1","identity":"rs-4640159","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-4640159","identity":"rs-4640159","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2024) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00