Privacy-Preserving Information Extraction Framework for Diverse Imaging Reports using Large Language Models

doi:10.21203/rs.3.rs-6267208/v1

Privacy-Preserving Information Extraction Framework for Diverse Imaging Reports using Large Language Models

2025 · doi:10.21203/rs.3.rs-6267208/v1

preprint OA: closed CC-BY-4.0

📄 Open PDF Full text JSON View at publisher

Full text 223,349 characters · extracted from preprint-html · click to expand

Privacy-Preserving Information Extraction Framework for Diverse Imaging Reports using Large Language Models | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Privacy-Preserving Information Extraction Framework for Diverse Imaging Reports using Large Language Models Dabin Min, Soyeon Kim, Sangheum Hwang, Kwang Nam Jin, SangHeum Bang, and 4 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-6267208/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Efficient extraction of structured information from unstructured radiology reports remains a critical challenge in healthcare. We introduce the Radiology Report Information Extraction Framework (RRIEF), a privacy-preserving approach utilizing parameter-efficient fine-tuning of open-source large language models (LLMs). We validated RRIEF across chest X-ray (CXR), mammography, and coronary CT angiography (CCTA) reports, evaluating its performance against specialized methods and proprietary LLMs (GPT-4o, Gemini-1.5-Flash, Claude-3.5-Sonnet). For CXR, RRIEF-LLaMA1-65B achieved F1 scores of 0.87 and 0.85 in internal and external tests, significantly outperforming CheXpert Labeler (0.70 and 0.69, P < .001), CheXbert (0.72 and 0.69, P < .001), and all proprietary LLMs (Claude-3.5-Sonnet: 0.69 and 0.62, P < .001). For mammography, RRIEF-LLaMA1-30B/65B reached F1 scores of 0.91 and 0.99 in internal and external tests, exceeding all proprietary LLMs (0.86 and 0.92, P = .002). For CCTA, using only 100 training reports, RRIEF-LLaMA3-8B significantly outperformed Gemini-1.5-Flash in stenosis severity (0.87 vs 0.83, P = .02), GPT-4o in external testing (0.83 vs 0.68, P < .001), and all proprietary models for modifiers in external testing (1.00 vs 0.93, P = .004). Notably, RRIEF-LLaMA3-8B achieved superior performance on CXR with only 200 training samples compared to all baselines including CheXbert and proprietary LLMs (P < .001). Our locally deployable framework enables high-performance information extraction from different types of radiology reports, facilitating large-scale research and clinical practice. We provide our complete implementation code publicly to promote accessibility and adoption. Biological sciences/Computational biology and bioinformatics/Data processing Biological sciences/Computational biology and bioinformatics/Machine learning Health sciences/Health care/Medical imaging Radiology Report Information Extraction Large Language Models Privacy-Preserving Framework Figures Figure 1 Figure 2 Figure 3 Figure 4 Introduction Radiology reports represent a vital component of healthcare data, containing detailed interpretations of medical images that are crucial for patient care, clinical research, and quality assurance. However, an estimated 80% of this clinical information exists in unstructured free-text format, making it challenging to leverage for systematic analysis and clinical decision support ( 1 ). Automated information extraction from these reports—converting unstructured text into structured, quantifiable data—has emerged as a critical need across various imaging modalities ( 2 , 3 ). Making these unstructured reports computationally accessible could advance clinical research, enhance quality monitoring, and streamline billing processes while reducing the manual annotation burden on healthcare professionals ( 4 – 7 ). Efforts for automated information extraction from radiology reports have evolved significantly over the past decades. Early approaches relied heavily on rule-based systems and traditional machine learning methods, which often required extensive manual feature engineering and were highly sensitive to variations in reporting styles and terminology ( 2 , 5 , 8 ). More recent deep learning-based methods have shown promise in specific settings, particularly for chest X-ray (CXR) reports, achieving notable accuracy in identifying common findings ( 6 ). However, these solutions typically require large annotated datasets for training and are often confined to specific imaging modalities or institutional reporting formats ( 3 ). For example, CheXpert and CheXbert have demonstrated success in information extraction from CXR reports but their application remains limited to this single modality ( 5 , 6 ). Similar constraints exist in other imaging domains, where the development of automated labeling systems has been hampered by the need for substantial manual annotations and the challenge of accommodating diverse reporting patterns across institutions and radiologists ( 7 – 11 ). Furthermore, these specialized approaches often lack the flexibility to adapt to evolving medical terminology and reporting practices, necessitating frequent retraining and updates ( 3 ). The advent of large language models (LLMs) like ChatGPT and LLaMA has introduced new possibilities for medical information extraction, potentially addressing the limitations of traditional approaches ( 12 – 19 ). In healthcare applications, these models have demonstrated remarkable capabilities across tasks including medical licensing examinations, clinical decision support, and extraction of clinical information from various text sources ( 20 – 28 ). Specifically for extracting information from radiology reports, recent studies have explored zero-shot and few-shot approaches using these models, leveraging their pre-trained knowledge without requiring extensive task-specific training data ( 13 , 14 , 26 , 29 , 30 ). However, while these attempts highlight the potential of LLMs, their performance has generally remained comparable to or slightly below that of existing specialized methods, primarily due to reliance on zero/few-shot learning approaches ( 13 , 29 , 31 ). Moreover, proprietary LLMs raise significant privacy concerns when processing healthcare data, spurring interest in locally-deployable open-source alternatives, though optimal training and implementation approaches for medical report labeling remain unclear ( 13 , 14 , 32 – 35 ). Thus, the purpose of our study is to develop and validate an efficient framework for information extraction from radiology reports. Our approach focuses on three key objectives: First, to establish a privacy-preserving framework that outperforms existing methodologies. Second, to demonstrate the framework's effectiveness across different imaging modalities with distinct characteristics and reporting styles. Third, to facilitate widespread adoption in diverse clinical settings by analyzing training data requirements and providing our complete implementation as an open-source resource. Material and Methods This retrospective study was approved by the Institutional Review Board of Seoul National University Hospital (IRB No. 2303-155-1417) and the requirement for written informed consent was waived. To promote accessibility and facilitate adoption in diverse clinical settings, our complete implementation code is publicly available at https://github.com/reonaledo/report_labeler (currently under development). Data In our study, we utilized the following datasets: MIMIC-CXR ( 36 ) and Open-i ( 37 ) for CXR reports, a private dataset from Seoul National University Hospital (SiteA-MMG) and CDD-CESM ( 38 ) for mammography reports, and two private datasets from Seoul National University Bundang Hospital (SiteB-CCTA) and Chonnam National University Hwasun Hospital (SiteC-CCTA) for coronary computed tomography angiography (CCTA) reports. For CXR, 2,000 reports were sampled from MIMIC-CXR for training (n = 1,000) and internal testing (n = 1,000), with an additional 751 reports from Open-i for external test. For mammography, 1,500 reports sampled from 224,136 screening cases collected by SiteA-MMG between January 2001 and December 2022 were used for training (n = 500) and internal testing (n = 1,000), with the entire CDD-CESM dataset (n = 326) used for external test. For CCTA, we collected 150 reports from SiteB-CCTA for training (n = 100) and internal testing (n = 50), with an additional 51 reports from SiteC-CCTA for external testing. All CCTA reports were synthetically generated by cardiothoracic radiologists following their institutional reporting formats. Detailed descriptions of dataset selection and preparation are provided in Supplemental Text 1 . Examples from each dataset, illustrating the variations in reporting styles, are shown in Table 1 . Table 1 Examples of Each Dataset. Dataset Modality Example 1 Example 2 MIMIC-CXR CXR The lungs are clear. There is no focal consolidation, pleural effusion, or pneumothorax. The cardiomediastinal silhouette is normal. There is no free air under the hemidiaphragms. No pancreatic calcificaitons visualized. Osseous structures are intact. The cardiomediastinal and hilar contours are within normal limits. The lungs are well expanded and clear. There is no large pleural effusion, pneumothorax or focal consolidation concerning for pneumonia. There is no evidence of free air. Open-i Heart is mildly enlarged stable. Mediastinal contour is normal. Pulmonary vascularity is normal. Lungs are hyperexpanded but clear. No pleural effusions or pneumothoraces. The cardiomediastinal silhouette is normal in size and contour. Stable right lower lobe calcified granuloma. No focal consolidation, pneumothorax or large pleural effusion. Spurring of the thoracic spine. SiteA-MMG Mammography Gr 3 Suspicious malignant microcal, RUO: Segmental/ fine linear pleomorphic/ductal extension to SA/ nipple retraction(+) --> C5 Suspicious malignant mass with calcifications, Rt inner --> C4c r/o metastatic LN, Rt axilla --> C4c Gr 2 and a few benign calci on Rt. CDD-CESM Right Breast: ACR C: Heterogeneously dense breasts. Upper outer quadrant benign macrocalifications. No suspicious microcalcifications. Multiple lower and central inner equal density rounded and oval shaped masses are seen, some of them show circumscribed margin and others show obscured margin. Normal skin thickness and contour of breast. Left Breast: Status postoperative with flap reconstruction showing no speculated mass lesions or suspicious microcalcifications. Normal skin thickness. OPINION: Right Breast: Multiple lower and central inner rounded and oval shaped benign looking homogenously enhancing masses with circumscribed margin (BIRADS 3). Upper outer quadrant benign macrocalifications (BIRADS 2). Left Breast: Status postoperative with flap reconstruction showing no evidence of recurrent lesions (BIRADS 2). Right Breast: Diffuse edematous changes evidenced by increased skin thickness and coarsened trabeculae. Associated upper outer clusters of pleomorphic microcalcifications are also seen. Left Breast: Central benign macrocalcifications are noted. No speculated mass lesions or suspicious microcalcifications. Normal skin thickness and contour of breast. OPINION: Right Breast: Diffuse edematous changes associated with upper outer suspicious microcalcifications (BIRADS 5). Left Breast: Central benign macrocalcifications (BIRADS 2). SiteB-CCTA CCTA Mild atherosclerosis with no significant stenosis in the coronary arteries. LM and LCx, unremarkable pLAD and pRCA, < 20–30% stenosis with calcified plaques No remarkable finding in the LV myocardium. Total coronary calcium score = 205.68 Atherosclerosis with significant stenosis in the coronary arteries. LM RI LCx RCA < 20% stenosis with calcified plaques. pLAD focal 70–80% stenosis with mixed plaque. No remarkable finding in the LV myocardium. Total coronary calcium score = 550.04. Ascending aorta dilatation: 45 mm. SiteC-CCTA * Image quality: good quality * Calcium Scoring Total Agatston Score: 245.0 Coronary calcium volume: 210.0 * Dominancy: Right dominancy * Coronary Anomaly or variant; Absent. Coronary artery stenosis 1) LM : No plaque, no stenosis 2) LAD with branches : pLAD ; mixed plaque with mild stenosis mLCA: small calcified plaque with mild stenosis 3) LCX with branches : Hypoplastic LCX, No plaque, no stenosis 4) RCA with branches : pRCA, mRCA, dRCA ; small calcified plaque with mild stenosis 5) Ramus Intermedius : Absent Vulnerable plaque : None. Other cardiac finding : S/P MVR, AVR Hypertrabeculation at LV apex. Extracardiac finding : Within normal limits. ""1. Moderate coronary calcification (Total Agatston Score: 245.0) (1 ) pRCA, mRCA, dRCA ; small calcified plaque with mild stenosis ( 2 ) pLAD ; mixed plaque with mild stenosis mLAD: small calcified plaque with mild stenosis 2. S/P MVR, AVR Hypertrabeculation at LV apex. * Image quality: limitation due to severe coronary calcification. * Calcium Scoring Total Agatston Score: 996.0 Coronary calcium volume: 802.1 * Dominancy: Right dominancy * Coronary Anomaly or variant; Absent. Coronary artery stenosis ( 1 ) proximal to distal RCA ; multifocal calcified plaque with mild stenosis ( 2 ) Proximal LAD ; calcified/mixed plaque with mild to moderate stenosis ...suspicious moderate to severe stenosis at pLAD near 2nd diagonal branch/proximal 2nd diagonal. mid LAD ; shallow myocardial bridging distal LAD ; calcified plaque with mild stenosis ( 3 ) proximal LCX ; calcified plaque with mild stenosis Vulnerable plaque : None. Other cardiac finding : Within normal limits. Extracardiac finding : Within normal limits Conclusion 1. severe coronary calcification ( Total Agatston Score: 996.0). ( 1 ) proximal to distal RCA ; multifocal calcified plaque with mild stenosis ( 2 ) Proximal LAD ; calcified/mixed plaque with mild to moderate stenosis ...suspicious moderate to severe stenosis at LAD near 2nd diagonal branch/proximal 2nd diagonal. mid LAD ; shallow myocardial bridging distal LAD ; calcified plaque with mild stenosis ( 3 ) proximal LCX ; calcified plaque with mild stenosis Note: In CXR reports, both MIMIC-CXR and Open-i maintain a structured approach with minor differences in detail expression. For mammography reports, SiteA-MMG utilizes both common abbreviations (e.g., ‘LN’ for lymph node, ‘Rt’ for right) and unconventional shorthand (e.g., ‘microcal’, for microcalcification, ‘calci’ for calcification) in a list-like format, catering to specialist readers. In contrast, CDD-CESM adopts a narrative style, using minimal abbreviations and favoring standard terminology, providing detailed descriptions for each breast followed by an opinion section. For CCTA, SiteB-CCTA employs a concise narrative format focusing on core findings with minimal structural divisions, while SiteC-CCTA uses a comprehensive starred-section format with detailed anatomical categorization, explicit measurement values, and a separate conclusion section that systematically summarizes all findings. CXR = Chest X-ray, CCTA = Coronary CT Angiography. Radiology Report Information Extraction Framework (RRIEF) Figure 1 presents an overview of our framework, Radiology Report Information Extraction Framework (RRIEF) , for extracting structured information from radiology reports. The framework consists of four sequential steps: target definition, data annotation, prompt design, and LLM training. We detail each component below. Target Definition This step involves specifying the classes of interest and their possible label categories for the target dataset. Classes represent medical findings (e.g., mass, pneumothorax) or inferable information (e.g., severity ratings) from reports, while label categories define the possible values each class can take. These can be freely configured based on the intended use of the extracted information. For CXR, we used 13 finding classes following the CheXpert Labeler’s protocol, excluding the 'no finding' class as it's inferable from the others ( 5 ). The categorization of labels was based on the presence of these findings in the reports, classified as 'positive', 'negative', 'unsure', or 'not mentioned'. For mammography, given the absence of prior research, we established 11 finding classes derived from the Breast Imaging Lexicon ( 39 ), categorized according to their presence in the reports and specified laterality as 'right breast', 'left breast', 'bilateral breasts', 'unsure', or 'not mentioned'. For CCTA, following Coronary Artery Disease-Reporting and Data System (CAD-RADS) 2.0 guidelines ( 40 ), we defined classes and labels for stenosis severity categories (0, 1, 2, 3, 4A, 4B, or 5), plaque burden scores (None, P1, P2, P3, or P4), and six modifiers (E, I, N, G, HRP, and S, each labeled as 0 or 1). Details regarding the target findings and labels are provided in Supplemental Text 2 . Data Annotation To expedite the annotation process, we first generate initial annotations using LLMs and then have human review these preliminary labels, significantly reducing the manual annotation effort compared to starting from scratch. For the initial annotation of CXR and mammography datasets, we utilized LLaMA-13B ( 18 ) to generate initial annotations based on a small number of annotated examples (we used three fixed examples). These preliminary results were then refined by two graduate students under the guidance of a chest radiologist (D.H.K.; >5 years of experience) and a breast radiologist (J.M.J.; >20 years of experience). For CCTA reports, two board-certified cardiovascular radiologists (W.G.J.; >6 years of experience and B.S.H.; >5 years of experience) independently reviewed all reports and established ground truth annotations through consensus. The labels for all test datasets additionally underwent a comprehensive review by the radiologists, each specializing in their respective fields. Prompt Design Our prompts consist of three components: instruction, input, and output. The instruction provides detailed descriptions of the task, including definitions of target classes and their possible labels. The input comprises the radiology report text, while the output is structured as a JSON format containing the extracted information. We designed prompts to be explicit about task requirements while maintaining flexibility for different reporting styles. Examples of prompt design and corresponding outputs are shown in Fig. 2 , with complete prompt templates detailed in Supplemental Text 3 . LLM Training We trained LLaMA models (version 1 and 3) independently for each imaging modality using different model sizes ranging from 7B to 70B parameters ( 16 , 18 ). To minimize computational resources while maintaining performance, we employed Quantized Low-Rank Adaptation (QLoRA) ( 41 ), a parameter-efficient fine-tuning technique. This approach reduced the trainable parameters to less than 0.1% of the total model parameters, enabling efficient training of even the largest model (LLaMA3-70B) on a single 48GB A6000 GPU within 6 hours. To ensure reproducibility and consistent label generation across different runs, we employed deterministic decoding by setting the temperature parameter to 0 ( 13 ). Detailed training configurations and hyperparameters are provided in Supplemental Text 4 . Evaluation For comparative analysis, we tested RRIEF with various LLaMA models (LLaMA1: 7B, 13B, 30B and 65B; LLaMA3: 8B and 70B) against three comparison groups: their base versions without additional training, the open-source DeepSeek-R1-Distill-Qwen-14B model (a specialized model optimized for reasoning tasks) ( 42 ), and leading proprietary LLMs (GPT-4o, Gemini-1.5-flash, Claude-3.5-sonnet), all evaluated under zero/few-shot settings (minimal example-based inference where models perform tasks with no or few examples). We conducted these comparative experiments across all three modalities: CXR, mammography, and CCTA. For CXR reports, we additionally compared against specialized methods including CheXpert Labeler and CheXbert. Implementation details of zero/few-shot learnings are provided in Supplemental Text 5 . To investigate the relationship between training data size and model performance, we conducted experiments using LLaMA3-8B on the MIMIC-CXR dataset. Training data sizes ranged from 50 to 1,000 reports (50, 100, 200, 500 and 1,000), all randomly sampled from the original training set of 1,000 reports without replacement. To ensure statistical reliability and reproducibility, we repeated each experiment five times using different random seeds. Performance was evaluated using the Macro F1 score (rationale detailed in Supplemental Text 6 ), with results averaged across 10 bootstrap repetitions and presented with 95% confidence intervals. Statistical comparisons between methods used the one-sided Wilcoxon Signed Rank test (P < .05 for significance). Cases failing to produce valid JSON outputs were excluded from analysis ( Supplemental Text 7 ). Result Patient Characteristics A flowchart illustrating the data selection process is provided in Fig. 3 . For CXR, we utilized the MIMIC-CXR dataset, including 1,000 reports from 631 patients (median age 62 years [IQR 50–74 years]; 50% female) for training and 1,000 reports from 807 patients (median age 63 years [IQR 49–72 years]; 53% female) for internal testing. The Open-i dataset provided 751 reports from 751 patients (demographics unavailable) for external testing. For mammography, the SiteA-MMG dataset contributed 500 reports from 486 patients (median age 51 years [IQR 47–56 years]; all female) for training and 1,000 reports from 995 patients (median age 53 years [IQR 47–60 years]; all female) for internal testing. The CDD-CESM dataset provided 326 reports from 326 patients (median age 50 years [IQR 43–58 years]; all female) for external testing. For CCTA, we employed the SiteB-CCTA dataset with 100 reports for training and 50 reports for internal testing. The SiteC-CCTA dataset contributed 51 reports for external testing. As synthetic data was used for CCTA, conventional demographic distributions could not be represented. Overall Performance in Chest X-ray and Mammography Report Labeling The overall performance results for CXR and mammography report labeling are summarized in Table 2 . In our internal MIMIC-CXR test, RRIEF-LLaMA1-65B achieved a mean F1 score of 0.87 (95% CI: 0.86, 0.88), significantly outperforming CheXpert Labeler (0.70, 95% CI: 0.70, 0.71; P < .001) and CheXbert (0.72, 95% CI: 0.71, 0.72; P < .001). This performance surpassed all zero/few-shot LLMs including Claude-3.5-Sonnet (0.69, 95% CI: 0.68, 0.70; P < .001). Even the smallest model, RRIEF-LLaMA1-7B, demonstrated superior performance (0.83, 95% CI: 0.81, 0.84; P < .001) compared to these benchmarks. On the external Open-i dataset, RRIEF-LLaMA3-70B scored highest at 0.85 (95% CI: 0.83, 0.87), surpassing CheXpert Labeler (0.69, 95% CI: 0.68, 0.70; P < .001) and CheXbert (0.69, 95% CI: 0.68, 0.71; P < .001). RRIEF-LLaMA3-70B showed better generalizability than RRIEF-LLaMA1-65B (0.80, 95% CI: 0.77, 0.82; P = .002). Notably, smaller models such as RRIEF-LLaMA1-7B (0.80, 95% CI: 0.79, 0.81) and RRIEF-LLaMA3-8B (0.84, 95% CI: 0.82, 0.85) outperformed all zero/few-shot benchmarks including Claude-3.5-Sonnet (0.62, 95% CI: 0.60, 0.63; P < .001), aligning with their internal test results. For mammography reports, internal testing on SiteA-MMG showed RRIEF-LLaMA1-30B and RRIEF-LLaMA1-65B both achieved 0.91 (95% CI: 0.89, 0.92 and 0.90, 0.92, respectively), exceeding zero/few-shot performances of LLaMA models (P < .001 for all comparisons) and proprietary LLMs including Gemini-1.5-Flash (0.86, 95% CI: 0.85, 0.87; P = .002). Even the smallest model, RRIEF-LLaMA1-7B (0.88, 95% CI: 0.86, 0.90), outperformed GPT-4o and Claude-3.5-Sonnet in few-shot settings (P < .001 for all comparisons). In the external CDD-CESM test, RRIEF-LLaMA3-70B achieved the highest score of 0.99 (95% CI: 0.98, 1.00), confirming its superior generalizability over RRIEF-LLaMA1-65B (P = .02) and outperforming all zero/few-shot configurations, including Claude-3.5-Sonnet (0.92, 95% CI: 0.90, 0.94; P = .002). Table 2 Overall Performance on Chest X-ray and Mammography Reports. Method Model Chest X-ray Mammography MIMIC-CXR (n = 1,000) Open-i (n = 751) SiteA-MMG (n = 1,000) CDD-CESM (n = 326) CheXpert Labeler - 0.70 (0.70, 0.71) 0.69 (0.68, 0.70) - - CheXbert - 0.72 (0.71, 0.72) 0.69 (0.68, 0.71) - - 0-shot LLaMA1-65B 0.47 (0.47, 0.47) 0.45 (0.44, 0.46) 0.40 (0.35, 0.44) 0.32 (0.31, 0.34) LLaMA3-70B 0.36 (0.35, 0.37) 0.31 (0.30, 0.32) 0.65 (0.64, 0.66) 0.84 (0.83, 0.86) DeepSeek-R1-Distill-Qwen-14B 0.54 (0.53, 0.55) 0.47 (0.46, 0.48) 0.71 (0.70, 0.72) 0.45 (0.44, 0.47) GPT-4o 0.59 (0.58, 0.59) 0.47 (0.45, 0.49) 0.71 (0.71, 0.72) 0.72 (0.71, 0.73) Gemini-1.5-Flash 0.57 (0.56, 0.58) 0.60 (0.59, 0.62 0.71 (0.70, 0.72) 0.72 (0.70, 0.75) Claude-3.5-Sonnet 0.67 (0.66, 0.68) 0.47 (0.45, 0.49 0.72 (0.71, 0.72) 0.79 (0.78, 0.81) 3-shot LLaMA1-65B 0.47 (0.45, 0.48) 0.45 (0.44, 0.47) 0.64 (0.63, 0.66) 0.60 (0.58, 0.61) LLaMA3-70B 0.60 (0.59, 0.61) 0.56 (0.54, 0.58) 0.87 (0.85, 0.88) 0.87 (0.86, 0.88) DeepSeek-R1-Distill-Qwen-14B 0.57 (0.57, 0.58) 0.53 (0.52, 0.53) 0.73 (0.71, 0.74) 0.85 (0.83, 0.88) GPT-4o 0.67 (0.66, 0.68) 0.57 (0.55, 0.59) 0.75 (0.74, 0.75) 0.89 (0.88, 0.90) Gemini-1.5-Flash 0.62 (0.61, 0.63) 0.55 (0.53, 0.56) 0.86 (0.85, 0.87) 0.86 (0.84, 0.88) Claude-3.5-Sonnet 0.69 (0.68, 0.70) 0.62 (0.60, 0.63) 0.75 (0.74, 0.75) 0.92 (0.90, 0.94) RRIEF (Ours) LLaMA1-7B 0.83 (0.81, 0.84) 0.80 (0.79, 0.81) 0.88 (0.86, 0.90) 0.74 (0.72, 0.75) LLaMA1-13B 0.83 (0.82, 0.83) 0.79 (0.78, 0.81) 0.90 (0.88, 0.91) 0.79 (0.77, 0.80) LLaMA1-30B 0.85 (0.85, 0.86) 0.81 (0.79, 0.82) 0.91 (0.89, 0.92) 0.93 (0.91, 0.94) LLaMA1-65B 0.87 (0.86, 0.88) 0.80 (0.77, 0.82) 0.91 (0.90, 0.92) 0.97 (0.97, 0.98) LLaMA3-8B 0.85 (0.85, 0.86) 0.84 (0.82, 0.85) 0.88 (0.86, 0.90) 0.85 (0.83, 0.86) LLaMA3-70B 0.85 (0.84, 0.86) 0.85 (0.83, 0.87) 0.90 (0.88, 0.92) 0.99 (0.98, 1.00) Note: Data represent the average Macro F1 Score and 95% Confidence Interval from 10 Bootstrap Tests. For each dataset, the highest performance is shown in bold and the second-highest is underlined. RRIEF = Radiology Report Information Extraction Framework, B = Billion, LLaMA = Large language model Meta AI. Finding-specific Performance Advantages of RRIEF Table 3 highlights RRIEF's average performance advantages in CXR report labeling across individual findings, comparing the average performance of all RRIEF models (Avg. RRIEF: RRIEF-LLaMA1-7B/13B/30B/65B, RRIEF-LLaMA3-8B/70B) against other methods. In labeling 'Support Devices', RRIEF models outperformed traditional methods (Avg. CheXpert & CheXbert) by F1 score differences of 0.32 (MIMIC-CXR) and 0.41 (Open-i). Compared to zero-shot and few-shot performances of LLaMA1-65B and LLaMA3-70B (Avg. LLaMA 0-shot and Avg. LLaMA 3-shot), RRIEF models demonstrated substantial improvements, particularly in 'Enlarged Cardiomediastinum' where F1 score differences reached 0.52–0.64 in both test sets. When compared against proprietary LLMs (Avg. Proprietary LLMs 0-shot and 3-shot: GPT-4o, Gemini-1.5-Flash, Claude-3.5-Sonnet), RRIEF models showed notable improvements (0.23–0.41 F1 score difference) in 'Enlarged Cardiomediastinum', 'Cardiomegaly', and 'Lung Lesion' across both internal and external test sets. Table 3 Average Performance Comparison of Methods for Chest X-ray Report Labeling by Findings. Dataset and finding Avg. RRIEF (Ours) Avg. CheXpert & CheXbert Avg. LLaMA 0-shot Avg. LLaMA 3-shot Avg. Proprietary LLMs 0-shot Avg. Proprietary LLMs 3-shot MIMIC-CXR (n = 1,000) Enlarged Cardiom. 0.91 ± .03 0.74 ± .01 (Δ-0.17) 0.27 ± .04 (Δ-0.64) 0.39 ± .17 (Δ-0.52) 0.60 ± .02 (Δ-0.31) 0.60 ± .02 (Δ-0.31) Cardiomegaly 0.93 ± .01 0.69 ± .02 (Δ-0.25) 0.50 ± .08 (Δ-0.44) 0.54 ± .07 (Δ-0.39) 0.54 ± .01 (Δ-0.39) 0.59 ± .02 (Δ-0.34) Lung Lesion 0.84 ± .06 0.70 ± .00 (Δ-0.14) 0.30 ± .00 (Δ-0.54) 0.46 ± .12 (Δ-0.38) 0.51 ± .06 (Δ-0.33) 0.61 ± .05 (Δ-0.23) Lung Opacity 0.81 ± .03 0.69 ± .09 (Δ-0.12) 0.32 ± .11 (Δ-0.49) 0.44 ± .01 (Δ-0.37) 0.45 ± .06 (Δ-0.36) 0.52 ± .03 (Δ-0.28) Edema 0.87 ± .02 0.72 ± .00 (Δ-0.15) 0.42 ± .13 (Δ-0.46) 0.55 ± .13 (Δ-0.32) 0.67 ± .03 (Δ-0.20) 0.72 ± .02 (Δ-0.16) Consolidation 0.89 ± .02 0.88 ± .04 (Δ-0.01) 0.51 ± .04 (Δ-0.38) 0.63 ± .07 (Δ-0.26) 0.65 ± .18 (Δ-0.24) 0.72 ± .12 (Δ-0.17) Pneumonia 0.82 ± .02 0.72 ± .09 (Δ-0.11) 0.37 ± .07 (Δ-0.45) 0.52 ± .17 (Δ-0.30) 0.60 ± .13 (Δ-0.22) 0.75 ± .04 (Δ-0.07) Atelectasis 0.75 ± .06 0.65 ± .01 (Δ-0.10) 0.52 ± .35 (Δ-0.23) 0.52 ± .07 (Δ-0.23) 0.55 ± .01 (Δ-0.20) 0.60 ± .01 (Δ-0.15) Pneumothorax 0.81 ± .05 0.75 ± .04 (Δ-0.06) 0.56 ± .03 (Δ-0.25) 0.69 ± .18 (Δ-0.12) 0.81 ± .06 (Δ-0.00) 0.81 ± .09 (Δ-0.00) Pleural Effusion 0.93 ± .01 0.87 ± .01 (Δ-0.06) 0.65 ± .01 (Δ-0.28) 0.70 ± .17 (Δ-0.23) 0.78 ± .04 (Δ-0.14) 0.82 ± .04 (Δ-0.10) Pleural Other 0.83 ± .05 0.62 ± .00 (Δ-0.21) 0.24 ± .08 (Δ-0.59) 0.44 ± .07 (Δ-0.39) 0.62 ± .11 (Δ-0.21) 0.62 ± .12 (Δ-0.21) Fracture 0.78 ± .04 0.72 ± .10 (Δ-0.06) 0.42 ± .05 (Δ-0.36) 0.58 ± .04 (Δ-0.20) 0.55 ± .03 (Δ-0.23) 0.57 ± .04 (Δ-0.21) Support Devices 0.85 ± .10 0.53 ± .01 (Δ-0.32) 0.31 ± .08 (Δ-0.54) 0.49 ± .04 (Δ-0.36) 0.59 ± .09 (Δ-0.26) 0.64 ± .06 (Δ-0.21) Average 0.85 ± .02 0.71 ± .01 (Δ-0.14) 0.42 ± .08 (Δ-0.43) 0.54 ± .09 (Δ-0.31) 0.61 ± .05 (Δ-0.24) 0.66 ± .04 (Δ-0.19) Open-i (n = 751) Enlarged Cardiom. 0.87 ± .04 0.72 ± .04 (Δ-0.16) 0.31 ± .03 (Δ-0.56) 0.35 ± .11 (Δ-0.53) 0.51 ± .05 (Δ-0.36) 0.56 ± .03 (Δ-0.31) Cardiomegaly 0.91 ± .01 0.78 ± .05 (Δ-0.13) 0.50 ± .06 (Δ-0.41) 0.57 ± .03 (Δ-0.34) 0.52 ± .03 (Δ-0.39) 0.60 ± .01 (Δ-0.30) Lung Lesion 0.86 ± .07 0.81 ± .10 (Δ-0.05) 0.28 ± .05 (Δ-0.59) 0.44 ± .01 (Δ-0.43) 0.45 ± .14 (Δ-0.41) 0.60 ± .09 (Δ-0.26) Lung Opacity 0.69 ± .04 0.71 ± .11 (Δ + 0.02) 0.40 ± .06 (Δ-0.29) 0.41 ± .01 (Δ-0.28) 0.50 ± .06 (Δ-0.18) 0.55 ± .04 (Δ-0.14) Edema 0.81 ± .09 0.48 ± .00 (Δ-0.33) 0.26 ± .11 (Δ-0.55) 0.43 ± .07 (Δ-0.38) 0.47 ± .05 (Δ-0.34) 0.52 ± .12 (Δ-0.29) Consolidation 0.82 ± .10 0.73 ± .06 (Δ-0.09) 0.36 ± .04 (Δ-0.46) 0.46 ± .03 (Δ-0.36) 0.48 ± .27 (Δ-0.34) 0.49 ± .16 (Δ-0.32) Pneumonia 0.89 ± .05 0.96 ± .04 (Δ + 0.07) 0.27 ± .02 (Δ-0.63) 0.56 ± .28 (Δ-0.34) 0.54 ± .12 (Δ-0.35) 0.70 ± .10 (Δ-0.19) Atelectasis 0.89 ± .04 0.89 ± .01 (Δ-0.01) 0.50 ± .35 (Δ-0.40) 0.52 ± .05 (Δ-0.38) 0.50 ± .02 (Δ-0.39) 0.55 ± .05 (Δ-0.34) Pneumothorax 0.88 ± .04 0.61 ± .04 (Δ-0.27) 0.69 ± .16 (Δ-0.19) 0.81 ± .17 (Δ-0.07) 0.57 ± .10 (Δ-0.31) 0.63 ± .06 (Δ-0.25) Pleural Effusion 0.75 ± .05 0.80 ± .01 (Δ + 0.06) 0.53 ± .13 (Δ-0.22) 0.59 ± .14 (Δ-0.16) 0.64 ± .12 (Δ-0.11) 0.72 ± .06 (Δ-0.03) Pleural Other 0.53 ± .10 0.46 ± .06 (Δ-0.07) 0.21 ± .16 (Δ-0.32) 0.43 ± .02 (Δ-0.10) 0.45 ± .05 (Δ-0.07) 0.45 ± .02 (Δ-0.08) Fracture 0.82 ± .06 0.57 ± .00 (Δ-0.25) 0.37 ± .09 (Δ-0.45) 0.44 ± .06 (Δ-0.38) 0.45 ± .02 (Δ-0.37) 0.46 ± .06 (Δ-0.36) Support Devices 0.88 ± .03 0.47 ± .04 (Δ-0.41) 0.32 ± .13 (Δ-0.56) 0.59 ± .01 (Δ-0.29) 0.60 ± .04 (Δ-0.28) 0.65 ± .19 (Δ-0.22) Average 0.82 ± .02 0.66 ± .00 (Δ-0.16) 0.38 ± .10 (Δ-0.44) 0.51 ± .08 (Δ-0.31) 0.51 ± .08 (Δ-0.30) 0.58 ± .04 (Δ-0.24) Note: Avg. RRIEF represents the average performance across all RRIEF-LLaMA models (LLaMA1: 7B, 13B, 30B, 65B; LLaMA3: 8B, 70B). Avg. CheXpert & CheXbert is the average performance of CheXpert Labeler and CheXbert. Avg. LLaMA 0-shot and 3-shot represent the average performance of LLaMA1-65B and LLaMA3-70B in 0-shot and 3-shot settings, respectively. Avg. Proprietary LLMs 0-shot and 3-shot represent the average performance of GPT-4o, Gemini-1.5-Flash, and Claude-3.5-Sonnet in 0-shot and 3-shot settings, respectively. The performance metric used is macro F1 score, presented as mean ± standard deviation (e.g., 0.47 ± .04). The difference from Avg. RRIEF is expressed in parentheses using Δ. For each method, the chest X-ray finding with the lowest Δ value is highlighted in bold, while the finding with the highest Δ value is underlined. Detailed performance for each model is shown in Table S1 -S4 . RRIEF = Radiology Report Information Extraction Framework, B = Billion, Enlarged Cardiom. = Enlarged Cardiomediastinum, LLaMA = Large Language Model Meta AI. Table 4 demonstrates RRIEF's performance advantages in mammography report labeling across individual findings, using the same comparative approach. Compared to zero-shot and few-shot performances of LLaMA1-65B and LLaMA3-70B, RRIEF models showed notable improvements in the internal test for 'Nodule', 'Calcification', and 'Architectural Distortion' with macro F1 score differences of 0.21–0.50, while in the external test, 'Nodule' and 'Lymph Node Enlargement' showed improvements with score differences of 0.45–0.71. Against proprietary LLMs in both settings, RRIEF models demonstrated higher F1 scores for 'Architectural Distortion' and 'Lymph Node Enlargement' across both internal and external tests, with improvements ranging from 0.11 to 0.63. Additionally, we observed substantial performance variations among RRIEF models as detailed in Table 4 . 'Skin Thickening', 'Skin Retraction', and 'Trabecular Thickening' showed considerable variability with standard deviations ranging from 0.12 to 0.30 across both datasets. Tables S5 and S6 indicate this variability primarily stems from the lower performance of smaller models (RRIEF-LLaMA1-7B, RRIEF-LLaMA3-8B). This size-based performance disparity was most evident in the external test, where RRIEF-LLaMA1-7B scored 0.50 (95% CI: 0.49, 0.51), 0.60 (95% CI: 0.45, 0.74), and 0.24 (95% CI: 0.24, 0.24) for these three findings respectively, while RRIEF-LLaMA3-70B achieved perfect scores of 1.00 (95% CI: 1.00, 1.00) for all three findings. Table 4 Average Performance Comparison of Methods for Mammography Report Labeling by Findings. Dataset and finding Avg. RRIEF (Ours) Avg. LLaMA 0-shot Avg. LLaMA 3-shot Avg. Proprietary LLMs 0-shot Avg. Proprietary LLMs 3-shot SiteA-MMG (n = 1,000) Nodule 0.95 ± .04 0.45 ± .23 (Δ-0.50) 0.69 ± .25 (Δ-0.26) 0.77 ± .02 (Δ-0.18) 0.77 ± .01 (Δ-0.19) Mass 0.87 ± .05 0.55 ± .17 (Δ-0.32) 0.72 ± .21 (Δ-0.16) 0.77 ± .02 (Δ-0.11) 0.78 ± .00 (Δ-0.09) Calcification 0.97 ± .02 0.56 ± .23 (Δ-0.41) 0.76 ± .06 (Δ-0.21) 0.77 ± .02 (Δ-0.20) 0.80 ± .01 (Δ-0.17) Asymmetry 0.90 ± .06 0.51 ± .19 (Δ-0.39) 0.80 ± .06 (Δ-0.10) 0.73 ± .03 (Δ-0.17) 0.86 ± .12 (Δ-0.04) Architectural Dist. 0.97 ± .05 0.56 ± .17 (Δ-0.41) 0.77 ± .25 (Δ-0.20) 0.72 ± .03 (Δ-0.25) 0.81 ± .14 (Δ-0.16) Skin Thickening 0.81 ± .13 0.55 ± .10 (Δ-0.26) 0.63 ± .10 (Δ-0.18) 0.76 ± .03 (Δ-0.05) 0.80 ± .02 (Δ-0.01) Lymph Node Enlarge. 0.79 ± .03 0.49 ± .03 (Δ-0.30) 0.67 ± .00 (Δ-0.12) 0.56 ± .05 (Δ-0.23) 0.63 ± .08 (Δ-0.16) Intra. Lymph Node 0.96 ± .05 0.46 ± .19 (Δ-0.50) 0.84 ± .10 (Δ-0.12) 0.73 ± .05 (Δ-0.23) 0.75 ± .01 (Δ-0.20) Nipple Retraction 0.90 ± .05 0.53 ± .28 (Δ-0.38) 0.91 ± .08 (Δ + 0.01) 0.71 ± .02 (Δ-0.19) 0.82 ± .15 (Δ-0.08) Skin Retraction 0.95 ± .12 0.45 ± .30 (Δ-0.51) 0.88 ± .18 (Δ-0.08) 0.69 ± .05 (Δ-0.26) 0.79 ± .08 (Δ-0.16) Trabecular Thickening 0.78 ± .19 0.67 ± .08 (Δ-0.12) 0.67 ± .47 (Δ-0.12) 0.62 ± .15 (Δ-0.16) 0.82 ± .16 (Δ + 0.04) Average 0.90 ± .01 0.53 ± .18 (Δ-0.37) 0.76 ± .16 (Δ-0.14) 0.71 ± .01 (Δ-0.18) 0.79 ± .06 (Δ-0.11) CDD-CESM (n = 326) Nodule 0.96 ± .06 0.46 ± .35 (Δ-0.51) 0.52 ± .29 (Δ-0.45) 0.83 ± .15 (Δ-0.13) 0.97 ± .03 (Δ + 0.01) Mass 0.95 ± .04 0.56 ± .25 (Δ-0.40) 0.77 ± .24 (Δ-0.18) 0.88 ± .12 (Δ-0.08) 0.98 ± .01 (Δ + 0.03) Calcification 0.92 ± .06 0.58 ± .26 (Δ-0.34) 0.81 ± .12 (Δ-0.11) 0.88 ± .12 (Δ-0.04) 0.93 ± .06 (Δ + 0.02) Asymmetry 0.95 ± .12 0.70 ± .41 (Δ-0.25) 0.80 ± .08 (Δ-0.16) 0.72 ± .04 (Δ-0.23) 0.96 ± .08 (Δ + 0.01) Architectural Dist. 0.96 ± .08 0.70 ± .43 (Δ-0.27) 0.87 ± .19 (Δ-0.10) 0.77 ± .04 (Δ-0.19) 0.86 ± .13 (Δ-0.11) Skin Thickening 0.71 ± .30 0.62 ± .52 (Δ-0.09) 0.81 ± .27 (Δ + 0.10) 0.77 ± .15 (Δ + 0.07) 0.73 ± .08 (Δ + 0.03) Lymph Node Enlarge. 0.94 ± .14 0.24 ± .02 (Δ-0.71) 0.25 ± .01 (Δ-0.70) 0.31 ± .06 (Δ-0.63) 0.53 ± .11 (Δ-0.42) Intra. Lymph Node 0.86 ± .22 0.62 ± .34 (Δ-0.24) 0.76 ± .30 (Δ-0.10) 0.86 ± .12 (Δ-0.00) 0.98 ± .03 (Δ + 0.13) Nipple Retraction 0.90 ± .15 0.71 ± .42 (Δ-0.19) 0.95 ± .05 (Δ + 0.05) 0.83 ± .15 (Δ-0.07) 1.00 ± .00 (Δ + 0.10) Skin Retraction 0.85 ± .15 0.60 ± .57 (Δ-0.25) 0.76 ± .34 (Δ-0.09) 0.69 ± .08 (Δ-0.16) 0.85 ± .27 (Δ-0.00) Trabecular Thickening 0.64 ± .30 0.66 ± .49 (Δ + 0.01) 0.83 ± .25 (Δ + 0.18) 0.86 ± .12 (Δ + 0.22) 1.00 ± .00 (Δ + 0.36) Average 0.88 ± .10 0.58 ± .37 (Δ-0.30) 0.74 ± .19 (Δ-0.14) 0.76 ± .04 (Δ-0.11) 0.89 ± .03 (Δ + 0.01) Note: Avg. RRIEF represents the average performance across all RRIEF-LLaMA models (LLaMA1: 7B, 13B, 30B, 65B; LLaMA3: 8B, 70B). Avg. LLaMA 0-shot and 3-shot represent the average performance of LLaMA1-65B and LLaMA3-70B in 0-shot and 3-shot settings, respectively. Avg. Proprietary LLMs 0-shot and 3-shot represent the average performance of GPT-4o, Gemini-1.5-Flash, and Claude-3.5-Sonnet in 0-shot and 3-shot settings, respectively. The performance metric used is macro F1 score, presented as mean ± standard deviation (e.g., 0.47 ± .04). The difference from Avg. RRIEF is expressed in parentheses using Δ. For each method, the mammography finding with the lowest Δ value is highlighted in bold, while the finding with the highest Δ value is underlined. Detailed performance for each model is shown in Table S5-S8 . RRIEF = Radiology Report Information Extraction Framework, B = Billion, Architectural Dist. = Architectural distortion, Lymph Node Enlarge. = Lymph node enlargement, Intra. Lymph Node = Intramammary lymph node, LLaMA = Large Language Model Meta AI. Report Labeling Performance on Coronary CT Angiography As shown in Table 5 , RRIEF-LLaMA3-8B demonstrated significant performance improvements in both internal and external tests compared to its 3-shot counterpart and LLaMA3-70B across all categories: stenosis severity, plaque burden, and modifiers. When compared to DeepSeek-R1-Distill-Qwen-14B, RRIEF-LLaMA3-8B showed significant performance advantages in all categories with the sole exception of plaque burden assessment in the external test. Against proprietary LLMs, RRIEF-LLaMA3-8B exhibited significantly superior performance showing F1 score of 0.87 in stenosis severity labeling compared to Gemini-1.5-Flash in the internal test (0.83, P = .02) and against GPT-4o in the external test (0.68, P < .001). Notably, for modifiers, RRIEF-LLaMA3-8B achieved significantly higher performance than all proprietary models in the external test, with an F1 score of 1.00 compared to 0.93 for the best-performing proprietary model (P = .004). Table 5 Performance on Coronary CT Angiography Reports. Method Model SiteB-CCTA (n = 50) SiteC-CCTA (n = 51) Stenosis Severity Plaque Burden Modifiers (averaged) Stenosis Severity Plaque Burden Modifiers (averaged) 3-shot LLaMA3-8B 0.31* (0.29, 0.34) 0.77* (0.72, 0.82) 0.84* (0.81, 0.87) 0.41* (0.35, 0.47) 0.54* (0.48, 0.59) 0.91* (0.89, 0.94) LLaMA3-70B 0.61* (0.56, 0.65) 0.85* (0.81, 0.90) 0.99* (0.98, 1.00) 0.74* (0.68, 0.81) 0.44* (0.35, 0.54) 0.94* (0.91, 0.97) DeepSeek-R1-Distill-Qwen-14B 0.67* (0.63, 0.70) 0.84* (0.79, 0.89) 0.99* (0.98, 1.00) 0.73* (0.67, 0.79) 0.94 (0.93, 0.96) 0.90* (0.87, 0.93) GPT-4o 0.89 (0.85, 0.93) 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 0.68* (0.63, 0.73) 1.00 (1.00, 1.00) 0.93* (0.91, 0.96) Gemini-1.5-Flash 0.83* (0.77, 0.88) 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 0.91 (0.88, 0.94) 1.00 (1.00, 1.00) 0.91* (0.89, 0.94) Claude-3.5-Sonnet 0.99 (0.97, 1.00) 1.00 (1.00, 1.00) 0.92* (0.91, 0.92) 0.94 (0.91, 0.98) 0.93 (0.91, 0.95) 0.93* (0.91, 0.95) RRIEF (Ours) LLaMA3-8B 0.87 (0.83, 0.91) 0.98 (0.96, 0.99) 1.00 (1.00, 1.00) 0.83 (0.77, 0.88) 0.93 (0.90, 0.96) 1.00 (1.00, 1.00) Note: Data represent the Macro F1 Score and 95% Confidence Interval from 10 Bootstrap Tests. For each column, the highest performance is shown in bold and the second-highest is underlined. Values marked with an asterisk (*) indicate statistically significant improvements of RRIEF-LLaMA3-8B compared to each model. ‘Modifiers (average)’ represents the simple average of F1 scores for E, I, N, G, HRP, and S. Detailed performance for each model is shown in Table S9 and S10. RRIEF = Radiology Report Information Extraction Framework, LLaMA = Large language model Meta AI. Effect of Training Sample Size on Model Performance Figure 4 demonstrates a strong positive correlation between training data size and performance of RRIEF-LLaMA3-8B on the MIMIC-CXR dataset (Pearson's correlation coefficient 0.75, P < .001). Notably, with as few as 200 training reports, RRIEF-LLaMA3-8B achieved a macro F1 score of 0.79 (95% CI: 0.77, 0.82) and surpassed all baseline models including CheXbert and 3-shot configurations of Claude-3.5-Sonnet, LLaMA3-70B, and DeepSeek-R1-Distill-Qwen-14B (P < .001 for all comparisons). Performance began to plateau around 500 training reports, where RRIEF-LLaMA3-8B achieved an F1 score of 0.84 (95% CI: 0.82, 0.86), with only marginal improvement observed when increasing to 1,000 reports (0.85, 95% CI: 0.85, 0.86). Discussion In this study, we developed and validated an efficient framework for information extraction from different types of radiology reports. Our approach achieved three key objectives. First, we established a privacy-preserving framework that significantly outperformed existing methodologies, achieving F1 scores of 0.87 and 0.85 in internal and external tests for chest X-ray reports, surpassing CheXpert Labeler, CheXbert, and all proprietary models (P < .001). Second, we demonstrated the framework's effectiveness across different imaging modalities with distinct characteristics, showing high performance in mammography (F1 scores: 0.91 and 0.99 in internal and external tests) and coronary CT angiography reports (F1 scores: 0.87 for stenosis severity, 0.98 for plaque burden, and 1.00 for modifiers in internal testing). Third, to facilitate widespread adoption, we analyzed the relationship between training data size and performance, finding that 200–500 annotated reports were sufficient to achieve superior results compared to specialized methods and proprietary models (P < .001), while providing our implementation as an open-source resource. Recent studies have highlighted the potential of LLMs for extracting information from radiology reports, yet most prior approaches relied on zero-shot or few-shot methods without additional training ( 13 , 29 , 31 ). These attempts yielded performance levels inadequate for clinical implementation ( 22 ), as confirmed by our comparative analysis where even leading proprietary models like GPT-4o, Gemini-1.5-Flash, and Claude-3.5-Sonnet underperformed significantly. Additionally, previous methodologies have predominantly focused on reports from single modalities, particularly CXR ( 5 , 6 , 30 ), creating siloed solutions with limited transferability to different imaging types and reporting formats. RRIEF addresses these limitations through parameter-efficient fine-tuning of open-source LLMs, enabling local deployment while preserving patient privacy. The performance variations observed across different findings in our mammography experiments highlight the impact of model size on generalizability when applying RRIEF. We found that smaller models like RRIEF-LLaMA1-7B showed considerable variability in handling certain findings such as 'Skin Thickening', 'Skin Retraction', and 'Trabecular Thickening', achieving F1 scores as low as 0.24 in the external test set. In contrast, larger models like RRIEF-LLaMA3-70B attained perfect scores of 1.00 for these same findings. This disparity reflects larger models' superior ability to bridge discrepancies between training and test datasets, particularly when faced with terminology variations or uncommon findings. For instance, certain findings had varying expressions across datasets (e.g., 'intramammary lymph node' versus 'IMLN'), while others were absent from the training set but present in the test set. The robust performance of larger models in these challenging scenarios demonstrates their enhanced capacity to generalize and adapt to novel or variant terms. These findings underscore the advantages of our RRIEF framework when implemented with larger models, particularly for handling diverse medical terminologies and unseen concepts—a crucial capability for real-world medical practices where terminology and reporting styles vary substantially across institutions, radiologists, and over time. Our analysis also reveals a positive correlation between training sample size and performance on the MIMIC-CXR dataset, with our approach outperforming all baselines using 200 training samples. While performance improves with more training data, we observed diminishing returns beyond 500 reports for CXR labeling. This efficiency extends across modalities, as demonstrated by RRIEF achieving comparable performance to leading proprietary models on CCTA datasets using only 100 training reports. To implement RRIEF for new datasets, users need only follow a straightforward process: first, define appropriate findings and labels according to clinical requirements or specialty guidelines. This flexibility to freely set findings and labels underscores the method's extensive scalability, as demonstrated in our mammography implementation where we created specialized laterality labels (e.g., 'right breast', 'bilateral breasts') to capture this critical diagnostic information. Second, annotate a reasonable number of reports (200–500 based on our experiments), which can be expedited by using off-the-shelf LLMs for initial annotations followed by expert review. Finally, apply parameter-efficient fine-tuning using techniques like QLoRA ( 41 ), which dramatically reduces computational requirements while maintaining performance—enabling even the largest models (e.g., 70B parameters) to be fine-tuned on a single 48GB GPU. Our publicly shared implementation code allows easy adoption of this framework, enabling customized report labeling solutions across diverse clinical settings without specialized expertise. However, this study faces the limitation of having a relatively small dataset for testing. Each dataset contains 1,000 reports or fewer, with a large proportion—around 20%—comprising normal cases devoid of any abnormalities. This condition leads to the presence of numerous rare findings, each with fewer than ten positive cases, such as ‘Pneumothorax’ in CXR and ‘Skin Retraction’ in mammography. Furthermore, the diversity of labels assigned to each finding, which totals four for CXR and five for mammography, may somewhat diminish the statistical rigor of our reported performance for these rare findings. In conclusion, RRIEF demonstrates high-performance information extraction across chest X-ray, mammography, and coronary CT angiography reports while preserving patient privacy through locally deployed fine-tuned LLMs. Our framework significantly outperformed existing specialized methods and proprietary LLMs with minimal training requirements. We anticipate that our method will serve as an effective automatic report labeling strategy for various imaging modalities and reporting styles, facilitating large-scale retrospective studies and offering a practical tool to advance radiological research and clinical practice. Declarations D ata Availability For the publicly available datasets (MIMIC-CXR, Open-i, CDD-CESM), we have shared the annotated labels for the radiology reports used in our training and testing sets. The SiteA-MMG dataset contains identifiable protected health information and therefore cannot be shared publicly. For the SiteB-CCTA, we have made both the reports and annotated labels publicly available. The SiteC-CCTA dataset is not publicly available but can be shared upon request to the corresponding author. Data are available at https://github.com/reonaledo/report_labeler. Code Availability The analytical code for reproducing the results of this study and a user-friendly implementation that can be adapted for specific research purposes are currently in development. The complete implementation code will be made publicly available at https://github.com/reonaledo/report_labeler upon publication of this manuscript. Acknowledgements This study was supported by the National Research Foundation of Korea (NRF) grants funded by the Ministry of Science and ICT (MSIT) (Grant No. RS-2024-00354666) and the Seoul National University Hospital Research Fund (Grant No. 03-2023-0410). The funders played no role in study design, data collection, analysis and interpretation of data, or the writing of this manuscript. Author Contribution Conceptualization: D.M, C.M.P, J.M.CSupervision: C.M.P, J.M.CWriting: D.MData acquisition: K.N.J, S.B, W.G.J, J.M.C, S.KData analysis: D.M, S.KCritical review: S.H, J.C, J.M.C, C.M.PAll authors read and approved the final manuscript and had final responsibility for the decision to submit it for publication. References Kong HJ. Managing Unstructured Big Data in Healthcare System. Healthc Inform Res. 2019;25(1):1–2. Casey A, Davidson E, Poon M, Dong H, Duma D, Grivas A, et al. A systematic review of natural language processing applied to radiology reports. BMC Med Inform Decis Mak. 2021;21(1):179. Reichenpfader D, Müller H, Denecke K. A scoping review of large language model based approaches for information extraction from radiology reports. Npj Digit Med. 2024;7(1):1–12. Pivovarov R, Coppleson YJ, Gorman SL, Vawdrey DK, Elhadad N. Can Patient Record Summarization Support Quality Metric Abstraction? AMIA Annu Symp Proc AMIA Symp. 2016;2016:1020–9. Irvin J, Rajpurkar P, Ko M, Yu Y, Ciurea-Ilcus S, Chute C, et al. CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison [Internet]. arXiv; 2019 [cited 2024 Jan 4]. Available from: http://arxiv.org/abs/1901.07031 Smit A, Jain S, Rajpurkar P, Pareek A, Ng AY, Lungren MP. CheXbert: Combining Automatic Labelers and Expert Annotations for Accurate Radiology Report Labeling Using BERT [Internet]. arXiv; 2020 [cited 2024 Jan 4]. Available from: http://arxiv.org/abs/2004.09167 Kuling G, Curpen B, Martel AL. BI-RADS BERT and Using Section Segmentation to Understand Radiology Reports. J Imaging. 2022;8(5):131. Torres-Lopez VM, Rovenolt GE, Olcese AJ, Garcia GE, Chacko SM, Robinson A, et al. Development and Validation of a Model to Identify Critical Brain Injuries Using Natural Language Processing of Text Computed Tomography Reports. JAMA Netw Open. 2022;5(8):e2227109. Wood DA, Lynch J, Kafiabadi S, Guilhem E, Busaidi AA, Montvila A, et al. Automated Labelling using an Attention model for Radiology reports of MRI scans (ALARM) [Internet]. arXiv; 2020 [cited 2024 Nov 20]. Available from: http://arxiv.org/abs/2002.06588 Wood DA, Kafiabadi S, Al Busaidi A, Guilhem EL, Lynch J, Townend MK, et al. Deep learning to automate the labelling of head MRI datasets for computer vision applications. Eur Radiol. 2022;32(1):725–36. Zech J, Pain M, Titano J, Badgeley M, Schefflein J, Su A, et al. Natural Language–based Machine Learning Models for the Annotation of Clinical Radiology Reports. Radiology. 2018;287(2):570–80. Adams LC, Truhn D, Busch F, Kader A, Niehues SM, Makowski MR, et al. Leveraging GPT-4 for Post Hoc Transformation of Free-text Radiology Reports into Structured Reporting: A Multilingual Feasibility Study. Radiology. 2023;307(4):e230725. Mukherjee P, Hou B, Lanfredi RB, Summers RM. Feasibility of Using the Privacy-preserving Large Language Model Vicuna for Labeling Radiology Reports. Radiology. 2023;309(1):e231147. Wiest IC, Ferber D, Zhu J, van Treeck M, Meyer SK, Juglan R, et al. Privacy-preserving large language models for structured medical information retrieval. Npj Digit Med. 2024;7(1):1–9. Touvron H, Martin L, Stone K, Albert P, Almahairi A, Babaei Y, et al. Llama 2: Open Foundation and Fine-Tuned Chat Models [Internet]. arXiv; 2023 [cited 2024 Jan 5]. Available from: http://arxiv.org/abs/2307.09288 Dubey A, Jauhri A, Pandey A, Kadian A, Al-Dahle A, Letman A, et al. arXiv.org. 2024 [cited 2024 Sep 11]. The Llama 3 Herd of Models. Available from: https://arxiv.org/abs/2407.21783v2 OpenAI, Achiam J, Adler S, Agarwal S, Ahmad L, Akkaya I, et al. GPT-4 Technical Report [Internet]. arXiv; 2024 [cited 2024 Mar 14]. Available from: http://arxiv.org/abs/2303.08774 Touvron H, Lavril T, Izacard G, Martinet X, Lachaux MA, Lacroix T, et al. LLaMA: Open and Efficient Foundation Language Models [Internet]. arXiv; 2023 [cited 2024 Jan 5]. Available from: http://arxiv.org/abs/2302.13971 Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language Models are Few-Shot Learners. In: Advances in Neural Information Processing Systems [Internet]. Curran Associates, Inc.; 2020 [cited 2024 Mar 14]. p. 1877–901. Available from: https://papers.nips.cc/paper/2020/hash/1457c0d6bfcb 4967418bfb8ac142f64a-Abstract.html Clusmann J, Kolbinger FR, Muti HS, Carrero ZI, Eckardt JN, Laleh NG, et al. The future landscape of large language models in medicine. Commun Med. 2023;3(1):1–8. Li Y, Li Z, Zhang K, Dan R, Jiang S, Zhang Y. ChatDoctor: A Medical Chat Model Fine-Tuned on a Large Language Model Meta-AI (LLaMA) Using Medical Domain Knowledge [Internet]. arXiv; 2023 [cited 2024 Mar 15]. Available from: http://arxiv.org/abs/2303.14070 Bhayana R. Chatbots and Large Language Models in Radiology: A Practical Primer for Clinical and Research Applications. Radiology. 2024;310(1):e232756. Ueda D, Mitsuyama Y, Takita H, Horiuchi D, Walston SL, Tatekawa H, et al. ChatGPT’s Diagnostic Performance from Patient History and Imaging Findings on the Diagnosis Please Quizzes. Radiology. 2023;308(1):e231040. Amin KS, Davis MA, Doshi R, Haims AH, Khosla P, Forman HP. Accuracy of ChatGPT, Google Bard, and Microsoft Bing for Simplifying Radiology Reports. Radiology. 2023;309(2):e232561. Bera K, O’Connor G, Jiang S, Tirumani SH, Ramaiya N. Analysis of ChatGPT publications in radiology: Literature so far. Curr Probl Diagn Radiol. 2024;53(2):215–25. Fink MA, Bischoff A, Fink CA, Moll M, Kroschke J, Dulz L, et al. Potential of ChatGPT and GPT-4 for Data Mining of Free-Text CT Reports on Lung Cancer. Radiology. 2023;308(3):e231362. Li D, Gupta K, Chong J. Evaluating Diagnostic Performance of ChatGPT in Radiology: Delving into Methods. Radiology. 2023;308(3):e232082. Ramasamy SK. Response to Performance of ChatGPT on a Radiology Board-style Examination. Radiology. 2023;307(5):e231330. Lehnen NC, Dorn F, Wiest IC, Zimmermann H, Radbruch A, Kather JN, et al. Data Extraction from Free-Text Reports on Mechanical Thrombectomy in Acute Ischemic Stroke Using ChatGPT: A Retrospective Analysis. Radiology. 2024;311(1):e232741. Gu J, Cho HC, Kim J, You K, Hong EK, Roh B. CheX-GPT: Harnessing Large Language Models for Enhanced Chest X-ray Report Labeling [Internet]. arXiv; 2024 [cited 2024 Apr 19]. Available from: http://arxiv.org/abs/2401.11505 Dorfner FJ, Jürgensen L, Donle L, Al Mohamad F, Bodenmann TR, Cleveland MC, et al. Comparing Commercial and Open-Source Large Language Models for Labeling Chest Radiograph Reports. Radiology. 2024;313(1):e241139. Minssen T, Vayena E, Cohen IG. The Challenges for Regulating Medical Use of ChatGPT and Other Large Language Models. JAMA. 2023;330(4):315–6. Meskó B, Topol EJ. The imperative for regulatory oversight of large language models (or generative AI) in healthcare. Npj Digit Med. 2023;6(1):1–6. Wu C, Lin W, Zhang X, Zhang Y, Wang Y, Xie W. PMC-LLaMA: Towards Building Open-source Language Models for Medicine [Internet]. arXiv; 2023 [cited 2024 Mar 15]. Available from: http://arxiv.org/abs/2304.14454 Raeini M. Privacy-Preserving Large Language Models (PPLLMs) [Internet]. Rochester, NY: Social Science Research Network; 2023 [cited 2024 Nov 21]. Available from: https://papers.ssrn.com/abstract=4512071 Johnson AEW, Pollard TJ, Berkowitz SJ, Greenbaum NR, Lungren MP, Deng C ying, et al. MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Sci Data. 2019;6(1):317. Demner-Fushman D, Kohli MD, Rosenman MB, Shooshan SE, Rodriguez L, Antani S, et al. Preparing a collection of radiology examinations for distribution and retrieval. J Am Med Inform Assoc JAMIA. 2016;23(2):304–10. Khaled R, Helal M, Alfarghaly O, Mokhtar O, Elkorany A, El Kassas H, et al. Categorized contrast enhanced mammography dataset for diagnostic and artificial intelligence research. Sci Data. 2022;9(1):122. D’Orsi CJ, Sickles EA, Mendelson EB, Morris EA. 2013 ACR BI-RADS Atlas: Breast Imaging Reporting and Data System [Internet]. American College of Radiology; 2014. Available from: https://books.google.co.kr/books?id=nhWSjwEACAAJ CAD-RADS ™ 2.0–2022 Coronary Artery Disease-Reporting and Data System - Journal of Cardiovascular Computed Tomography [Internet]. [cited 2024 Nov 21]. Available from: https://www.journalofcardiovascularct.com/article/S1934- 5925(22)00240-4/fulltext Dettmers T, Pagnoni A, Holtzman A, Zettlemoyer L. QLoRA: Efficient Finetuning of Quantized LLMs [Internet]. arXiv; 2023 [cited 2024 Jan 5]. Available from: http://arxiv.org/abs/2305.14314 DeepSeek-AI, Guo D, Yang D, Zhang H, Song J, Zhang R, et al. DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning [Internet]. arXiv; 2025 [cited 2025 Feb 14]. Available from: http://arxiv.org/abs/2501.12948 Additional Declarations No competing interests reported. Supplementary Files Supplementary.docx Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-6267208","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":442436409,"identity":"905c6d9b-2483-444e-92b8-25cca0238aee","order_by":0,"name":"Dabin Min","email":"","orcid":"","institution":"Interdisciplinary Program in Bioengineering, Seoul National University Graduate School, Seoul","correspondingAuthor":false,"prefix":"","firstName":"Dabin","middleName":"","lastName":"Min","suffix":""},{"id":442436410,"identity":"2fe614d1-c036-4bf6-94a9-84e1bcd83475","order_by":1,"name":"Soyeon Kim","email":"","orcid":"","institution":"Interdisciplinary Program in Bioengineering, Seoul National University Graduate School, Seoul","correspondingAuthor":false,"prefix":"","firstName":"Soyeon","middleName":"","lastName":"Kim","suffix":""},{"id":442436411,"identity":"1cb7c329-fd7c-4ab2-af42-facc7c1bb6c9","order_by":2,"name":"Sangheum Hwang","email":"","orcid":"","institution":"Seoul National University of Science and Technology","correspondingAuthor":false,"prefix":"","firstName":"Sangheum","middleName":"","lastName":"Hwang","suffix":""},{"id":442436412,"identity":"a288bbfe-a660-4fc6-9462-a3e9fb0dce28","order_by":3,"name":"Kwang Nam Jin","email":"","orcid":"","institution":"Boramae Medical Center, Seoul National University College of Medicine","correspondingAuthor":false,"prefix":"","firstName":"Kwang","middleName":"Nam","lastName":"Jin","suffix":""},{"id":442436413,"identity":"0218a36a-5956-4f72-b68e-bdf9f3e8816c","order_by":4,"name":"SangHeum Bang","email":"","orcid":"","institution":"Boramae Medical Center, Seoul National University College of Medicine","correspondingAuthor":false,"prefix":"","firstName":"SangHeum","middleName":"","lastName":"Bang","suffix":""},{"id":442436414,"identity":"184c1260-ea23-4e67-82be-17ddcefd4fc8","order_by":5,"name":"Won Gi Jeong","email":"","orcid":"","institution":"Chonnam National University Hwasun Hospital, Chonnam National University Medical School","correspondingAuthor":false,"prefix":"","firstName":"Won","middleName":"Gi","lastName":"Jeong","suffix":""},{"id":442436415,"identity":"f7c48345-4c61-415b-8456-a18763dbc5a1","order_by":6,"name":"Jinwook Choi","email":"","orcid":"","institution":"Seoul National University College of Medicine","correspondingAuthor":false,"prefix":"","firstName":"Jinwook","middleName":"","lastName":"Choi","suffix":""},{"id":442436416,"identity":"a5ae5fe8-c1f3-43c1-8da6-8df7f6279c84","order_by":7,"name":"Jung Min Chang","email":"","orcid":"","institution":"Seoul National University College of Medicine","correspondingAuthor":false,"prefix":"","firstName":"Jung","middleName":"Min","lastName":"Chang","suffix":""},{"id":442436418,"identity":"4fc3e11b-df7c-404c-9360-1ec71fcc8937","order_by":8,"name":"Chang Min Park","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAxklEQVRIiWNgGAWjYJACZgYDBgZ+GE+CaC2SDaRpAQKDA8Rq0W3vffa4oOCwvfH5s8ckGGrsGCRnH8CvxezMcXPjGQaHE7fdyEuTYDiWzCDNl0BAy400Nmkeg8MJZjd4zCQY2A4wyPEQcBhMi71x/xmgln8kaGHcwJBjJsHYdoBBmqCWM8fYjXkM0hNn3MgxtkjsS+aR7CGk5Xgb22OeP9b2/P1nDG98+GYnJ3GGgBYgYEMwExgYCDkLXcsoGAWjYBSMAmwAAItsNvsBUex7AAAAAElFTkSuQmCC","orcid":"","institution":"Seoul National University College of Medicine","correspondingAuthor":true,"prefix":"","firstName":"Chang","middleName":"Min","lastName":"Park","suffix":""}],"badges":[],"createdAt":"2025-03-20 07:38:15","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-6267208/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-6267208/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":82077269,"identity":"b1d9ad90-5cec-49d3-a0b9-9546de113369","added_by":"auto","created_at":"2025-05-06 14:01:09","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":55478,"visible":true,"origin":"","legend":"\u003cp\u003eAn overview of the proposed Radiology Report Information Extraction Framework (RRIEF). LLM = Large Language Model.\u003c/p\u003e","description":"","filename":"floatimage131.png","url":"https://assets-eu.researchsquare.com/files/rs-6267208/v1/caad0c183484caa308c220f8.png"},{"id":82078765,"identity":"8bfe294c-8cb6-4f72-9c62-8f9b4faa24e3","added_by":"auto","created_at":"2025-05-06 14:09:10","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":83284,"visible":true,"origin":"","legend":"\u003cp\u003eExample of a prompt designed for training and inference phases within the Radiology Report Information Extraction Framework (RRIEF). The instructions provided describe the task which requires generating outputs in JSON format upon receiving reports. The input consists of a target report. When the trained language model receives identical instructions and a report to be labeled, it generates labels in JSON format. In the figure, '0' represents a label ‘negative’, while '1' represents ‘positive’.B = billion, LLM = Large Language Model.\u003c/p\u003e","description":"","filename":"floatimage217.png","url":"https://assets-eu.researchsquare.com/files/rs-6267208/v1/f2656d01706de8d9b153ea14.png"},{"id":82077271,"identity":"19f91039-1b96-4bb7-bf0d-4dfbe01f9bf3","added_by":"auto","created_at":"2025-05-06 14:01:09","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":64352,"visible":true,"origin":"","legend":"\u003cp\u003eFlowchart shows the data selection process from the (A) Chest X-rays from medical information mart for intensive care (MIMIC-CXR), (B) Open-i, (C) SiteA-MMG, (D) Categorized digital database for low energy and subtracted contrast enhanced spectral mammography images (CDD-CESM), (E) SiteB-CCTA and (F) SiteC-CCTA. For CDD-CESM, conventional mammography report was used. BI-RADS = Breast imaging-reporting and data system.\u003c/p\u003e","description":"","filename":"floatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-6267208/v1/d796848c434a9ae770f470ec.png"},{"id":82078768,"identity":"ca1e3fbd-1aea-4ee2-83a8-63cc953f3d90","added_by":"auto","created_at":"2025-05-06 14:09:10","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":284604,"visible":true,"origin":"","legend":"\u003cp\u003eImpact of training data size on model performance. The proposed approach was applied to LLaMA3-8B and evaluated on the MIMIC-CXR dataset with varying training data sizes (50, 100, 200, 500, and 1000 reports). The solid line represents the mean performance across five repetitions with different random samples, while the shaded region indicates the 95% confidence interval. For comparison, results from baseline models are shown: CheXbert and three models in 3-shot setting (Claude-3.5-Sonnet, LLaMA3-70B, and DeepSeek-R1-Distill-Qwen-14B). RRIEF = Radiology Report Information Extraction Framework, B = billion, LLaMA = Large language model Meta AI.\u003c/p\u003e","description":"","filename":"floatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-6267208/v1/14d17856e96429f017bddf67.png"},{"id":88486051,"identity":"b423ccd3-4060-4789-bfb6-3d63fe4b1ccd","added_by":"auto","created_at":"2025-08-07 03:01:47","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":2249056,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6267208/v1/dead898d-f798-488a-84de-ebced678cb3a.pdf"},{"id":82078767,"identity":"5e49198e-c19b-424e-a957-519743d2a088","added_by":"auto","created_at":"2025-05-06 14:09:10","extension":"docx","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":125033,"visible":true,"origin":"","legend":"","description":"","filename":"Supplementary.docx","url":"https://assets-eu.researchsquare.com/files/rs-6267208/v1/cc93dab7a81a3465f8324260.docx"}],"financialInterests":"No competing interests reported.","formattedTitle":"Privacy-Preserving Information Extraction Framework for Diverse Imaging Reports using Large Language Models","fulltext":[{"header":"Introduction","content":"\u003cp\u003eRadiology reports represent a vital component of healthcare data, containing detailed interpretations of medical images that are crucial for patient care, clinical research, and quality assurance. However, an estimated 80% of this clinical information exists in unstructured free-text format, making it challenging to leverage for systematic analysis and clinical decision support (\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e). Automated information extraction from these reports\u0026mdash;converting unstructured text into structured, quantifiable data\u0026mdash;has emerged as a critical need across various imaging modalities (\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e, \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e). Making these unstructured reports computationally accessible could advance clinical research, enhance quality monitoring, and streamline billing processes while reducing the manual annotation burden on healthcare professionals (\u003cspan additionalcitationids=\"CR5 CR6\" citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eEfforts for automated information extraction from radiology reports have evolved significantly over the past decades. Early approaches relied heavily on rule-based systems and traditional machine learning methods, which often required extensive manual feature engineering and were highly sensitive to variations in reporting styles and terminology (\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e, \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e, \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e). More recent deep learning-based methods have shown promise in specific settings, particularly for chest X-ray (CXR) reports, achieving notable accuracy in identifying common findings (\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e). However, these solutions typically require large annotated datasets for training and are often confined to specific imaging modalities or institutional reporting formats (\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e). For example, CheXpert and CheXbert have demonstrated success in information extraction from CXR reports but their application remains limited to this single modality (\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e, \u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e). Similar constraints exist in other imaging domains, where the development of automated labeling systems has been hampered by the need for substantial manual annotations and the challenge of accommodating diverse reporting patterns across institutions and radiologists (\u003cspan additionalcitationids=\"CR8 CR9 CR10\" citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e). Furthermore, these specialized approaches often lack the flexibility to adapt to evolving medical terminology and reporting practices, necessitating frequent retraining and updates (\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eThe advent of large language models (LLMs) like ChatGPT and LLaMA has introduced new possibilities for medical information extraction, potentially addressing the limitations of traditional approaches (\u003cspan additionalcitationids=\"CR13 CR14 CR15 CR16 CR17 CR18\" citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e). In healthcare applications, these models have demonstrated remarkable capabilities across tasks including medical licensing examinations, clinical decision support, and extraction of clinical information from various text sources (\u003cspan additionalcitationids=\"CR21 CR22 CR23 CR24 CR25 CR26 CR27\" citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e). Specifically for extracting information from radiology reports, recent studies have explored zero-shot and few-shot approaches using these models, leveraging their pre-trained knowledge without requiring extensive task-specific training data (\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e, \u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e, \u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e, \u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e, \u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e). However, while these attempts highlight the potential of LLMs, their performance has generally remained comparable to or slightly below that of existing specialized methods, primarily due to reliance on zero/few-shot learning approaches (\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e, \u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e, \u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e). Moreover, proprietary LLMs raise significant privacy concerns when processing healthcare data, spurring interest in locally-deployable open-source alternatives, though optimal training and implementation approaches for medical report labeling remain unclear (\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e, \u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e, \u003cspan additionalcitationids=\"CR33 CR34\" citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eThus, the purpose of our study is to develop and validate an efficient framework for information extraction from radiology reports. Our approach focuses on three key objectives: First, to establish a privacy-preserving framework that outperforms existing methodologies. Second, to demonstrate the framework's effectiveness across different imaging modalities with distinct characteristics and reporting styles. Third, to facilitate widespread adoption in diverse clinical settings by analyzing training data requirements and providing our complete implementation as an open-source resource.\u003c/p\u003e"},{"header":"Material and Methods","content":"\u003cp\u003e This retrospective study was approved by the Institutional Review Board of Seoul National University Hospital (IRB No. 2303-155-1417) and the requirement for written informed consent was waived. To promote accessibility and facilitate adoption in diverse clinical settings, our complete implementation code is publicly available at \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/reonaledo/report_labeler\u003c/span\u003e\u003cspan address=\"https://github.com/reonaledo/report_labeler\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (currently under development).\u003c/p\u003e \u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003eData\u003c/h2\u003e \u003cp\u003eIn our study, we utilized the following datasets: MIMIC-CXR (\u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e) and Open-i (\u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e37\u003c/span\u003e) for CXR reports, a private dataset from Seoul National University Hospital (SiteA-MMG) and CDD-CESM (\u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e) for mammography reports, and two private datasets from Seoul National University Bundang Hospital (SiteB-CCTA) and Chonnam National University Hwasun Hospital (SiteC-CCTA) for coronary computed tomography angiography (CCTA) reports.\u003c/p\u003e \u003cp\u003eFor CXR, 2,000 reports were sampled from MIMIC-CXR for training (n\u0026thinsp;=\u0026thinsp;1,000) and internal testing (n\u0026thinsp;=\u0026thinsp;1,000), with an additional 751 reports from Open-i for external test. For mammography, 1,500 reports sampled from 224,136 screening cases collected by SiteA-MMG between January 2001 and December 2022 were used for training (n\u0026thinsp;=\u0026thinsp;500) and internal testing (n\u0026thinsp;=\u0026thinsp;1,000), with the entire CDD-CESM dataset (n\u0026thinsp;=\u0026thinsp;326) used for external test. For CCTA, we collected 150 reports from SiteB-CCTA for training (n\u0026thinsp;=\u0026thinsp;100) and internal testing (n\u0026thinsp;=\u0026thinsp;50), with an additional 51 reports from SiteC-CCTA for external testing. All CCTA reports were synthetically generated by cardiothoracic radiologists following their institutional reporting formats. Detailed descriptions of dataset selection and preparation are provided in \u003cb\u003eSupplemental Text 1\u003c/b\u003e. Examples from each dataset, illustrating the variations in reporting styles, are shown in Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eExamples of Each Dataset.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"4\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDataset\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eModality\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eExample 1\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eExample 2\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMIMIC-CXR\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eCXR\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eThe lungs are clear. There is no focal consolidation, pleural effusion, or pneumothorax. The cardiomediastinal silhouette is normal. There is no free air under the hemidiaphragms. No pancreatic calcificaitons visualized. Osseous structures are intact.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eThe cardiomediastinal and hilar contours are within normal limits. The lungs are well expanded and clear. There is no large pleural effusion, pneumothorax or focal consolidation concerning for pneumonia. There is no evidence of free air.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eOpen-i\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eHeart is mildly enlarged stable. Mediastinal contour is normal. Pulmonary vascularity is normal. Lungs are hyperexpanded but clear. No pleural effusions or pneumothoraces.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eThe cardiomediastinal silhouette is normal in size and contour. Stable right lower lobe calcified granuloma. No focal consolidation, pneumothorax or large pleural effusion. Spurring of the thoracic spine.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSiteA-MMG\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eMammography\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eGr 3\u003c/p\u003e \u003cp\u003eSuspicious malignant microcal, RUO: Segmental/ fine linear pleomorphic/ductal extension to SA/ nipple retraction(+) --\u0026gt; C5\u003c/p\u003e \u003cp\u003eSuspicious malignant mass with calcifications, Rt inner --\u0026gt; C4c\u003c/p\u003e \u003cp\u003er/o metastatic LN, Rt axilla --\u0026gt; C4c\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eGr 2 and a few benign calci on Rt.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCDD-CESM\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eRight Breast: ACR C: Heterogeneously dense breasts. Upper outer quadrant benign macrocalifications. No suspicious microcalcifications. Multiple lower and central inner equal density rounded and oval shaped masses are seen, some of them show circumscribed margin and others show obscured margin. Normal skin thickness and contour of breast.\u003c/p\u003e \u003cp\u003eLeft Breast: Status postoperative with flap reconstruction showing no speculated mass lesions or suspicious microcalcifications. Normal skin thickness.\u003c/p\u003e \u003cp\u003eOPINION:\u003c/p\u003e \u003cp\u003eRight Breast: Multiple lower and central inner rounded and oval shaped benign looking homogenously enhancing masses with circumscribed margin (BIRADS 3). Upper outer quadrant benign macrocalifications (BIRADS 2).\u003c/p\u003e \u003cp\u003eLeft Breast: Status postoperative with flap reconstruction showing no evidence of recurrent lesions (BIRADS 2).\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eRight Breast: Diffuse edematous changes evidenced by increased skin thickness and coarsened trabeculae. Associated upper outer clusters of pleomorphic microcalcifications are also seen.\u003c/p\u003e \u003cp\u003eLeft Breast: Central benign macrocalcifications are noted. No speculated mass lesions or suspicious microcalcifications. Normal skin thickness and contour of breast.\u003c/p\u003e \u003cp\u003eOPINION:\u003c/p\u003e \u003cp\u003eRight Breast: Diffuse edematous changes associated with upper outer suspicious microcalcifications (BIRADS 5).\u003c/p\u003e \u003cp\u003eLeft Breast: Central benign macrocalcifications (BIRADS 2).\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSiteB-CCTA\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eCCTA\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eMild atherosclerosis with no significant stenosis in the coronary arteries.\u003c/p\u003e \u003cp\u003eLM and LCx, unremarkable\u003c/p\u003e \u003cp\u003epLAD and pRCA, \u0026lt;\u0026thinsp;20\u0026ndash;30% stenosis with calcified plaques\u003c/p\u003e \u003cp\u003eNo remarkable finding in the LV myocardium.\u003c/p\u003e \u003cp\u003eTotal coronary calcium score\u0026thinsp;=\u0026thinsp;205.68\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eAtherosclerosis with significant stenosis in the coronary arteries.\u003c/p\u003e \u003cp\u003eLM RI LCx RCA\u0026thinsp;\u0026lt;\u0026thinsp;20% stenosis with calcified plaques.\u003c/p\u003e \u003cp\u003epLAD focal 70\u0026ndash;80% stenosis with mixed plaque.\u003c/p\u003e \u003cp\u003eNo remarkable finding in the LV myocardium.\u003c/p\u003e \u003cp\u003eTotal coronary calcium score\u0026thinsp;=\u0026thinsp;550.04.\u003c/p\u003e \u003cp\u003eAscending aorta dilatation: 45 mm.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSiteC-CCTA\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e* Image quality: good quality\u003c/p\u003e \u003cp\u003e* Calcium Scoring\u003c/p\u003e \u003cp\u003eTotal Agatston Score: 245.0\u003c/p\u003e \u003cp\u003eCoronary calcium volume: 210.0\u003c/p\u003e \u003cp\u003e* Dominancy: Right dominancy\u003c/p\u003e \u003cp\u003e* Coronary Anomaly or variant; Absent.\u003c/p\u003e \u003cp\u003eCoronary artery stenosis\u003c/p\u003e \u003cp\u003e1) LM : No plaque, no stenosis\u003c/p\u003e \u003cp\u003e2) LAD with branches :\u003c/p\u003e \u003cp\u003epLAD ; mixed plaque with mild stenosis\u003c/p\u003e \u003cp\u003emLCA: small calcified plaque with mild stenosis\u003c/p\u003e \u003cp\u003e3) LCX with branches :\u003c/p\u003e \u003cp\u003eHypoplastic LCX, No plaque, no stenosis\u003c/p\u003e \u003cp\u003e4) RCA with branches :\u003c/p\u003e \u003cp\u003epRCA, mRCA, dRCA ; small calcified plaque with mild stenosis\u003c/p\u003e \u003cp\u003e5) Ramus Intermedius : Absent\u003c/p\u003e \u003cp\u003eVulnerable plaque : None.\u003c/p\u003e \u003cp\u003eOther cardiac finding : S/P MVR, AVR\u003c/p\u003e \u003cp\u003eHypertrabeculation at LV apex.\u003c/p\u003e \u003cp\u003eExtracardiac finding : Within normal limits.\u003c/p\u003e \u003cp\u003e\"\"1. Moderate coronary calcification (Total Agatston Score: 245.0)\u003c/p\u003e \u003cp\u003e(1 ) pRCA, mRCA, dRCA ; small calcified plaque with mild stenosis\u003c/p\u003e \u003cp\u003e(\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e) pLAD ; mixed plaque with mild stenosis\u003c/p\u003e \u003cp\u003emLAD: small calcified plaque with mild stenosis\u003c/p\u003e \u003cp\u003e2. S/P MVR, AVR\u003c/p\u003e \u003cp\u003eHypertrabeculation at LV apex.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e* Image quality: limitation due to severe coronary calcification.\u003c/p\u003e \u003cp\u003e* Calcium Scoring\u003c/p\u003e \u003cp\u003eTotal Agatston Score: 996.0\u003c/p\u003e \u003cp\u003eCoronary calcium volume: 802.1\u003c/p\u003e \u003cp\u003e* Dominancy: Right dominancy\u003c/p\u003e \u003cp\u003e* Coronary Anomaly or variant; Absent.\u003c/p\u003e \u003cp\u003eCoronary artery stenosis\u003c/p\u003e \u003cp\u003e(\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e) proximal to distal RCA ; multifocal calcified plaque with mild stenosis\u003c/p\u003e \u003cp\u003e(\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e) Proximal LAD ; calcified/mixed plaque with mild to moderate stenosis\u003c/p\u003e \u003cp\u003e...suspicious moderate to severe stenosis at pLAD near 2nd diagonal branch/proximal 2nd diagonal.\u003c/p\u003e \u003cp\u003emid LAD ; shallow myocardial bridging\u003c/p\u003e \u003cp\u003edistal LAD ; calcified plaque with mild stenosis\u003c/p\u003e \u003cp\u003e(\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e) proximal LCX ; calcified plaque with mild stenosis\u003c/p\u003e \u003cp\u003eVulnerable plaque : None.\u003c/p\u003e \u003cp\u003eOther cardiac finding : Within normal limits.\u003c/p\u003e \u003cp\u003eExtracardiac finding : Within normal limits\u003c/p\u003e \u003cp\u003eConclusion\u003c/p\u003e \u003cp\u003e1. severe coronary calcification ( Total Agatston Score: 996.0).\u003c/p\u003e \u003cp\u003e(\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e) proximal to distal RCA ; multifocal calcified plaque with mild stenosis\u003c/p\u003e \u003cp\u003e(\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e) Proximal LAD ; calcified/mixed plaque with mild to moderate stenosis\u003c/p\u003e \u003cp\u003e...suspicious moderate to severe stenosis at LAD near 2nd diagonal branch/proximal 2nd diagonal.\u003c/p\u003e \u003cp\u003emid LAD ; shallow myocardial bridging\u003c/p\u003e \u003cp\u003edistal LAD ; calcified plaque with mild stenosis\u003c/p\u003e \u003cp\u003e(\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e) proximal LCX ; calcified plaque with mild stenosis\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003ctfoot\u003e \u003ctr\u003e\u003ctd colspan=\"4\"\u003eNote: In CXR reports, both MIMIC-CXR and Open-i maintain a structured approach with minor differences in detail expression. For mammography reports, SiteA-MMG utilizes both common abbreviations (e.g., \u0026lsquo;LN\u0026rsquo; for lymph node, \u0026lsquo;Rt\u0026rsquo; for right) and unconventional shorthand (e.g., \u0026lsquo;microcal\u0026rsquo;, for microcalcification, \u0026lsquo;calci\u0026rsquo; for calcification) in a list-like format, catering to specialist readers. In contrast, CDD-CESM adopts a narrative style, using minimal abbreviations and favoring standard terminology, providing detailed descriptions for each breast followed by an opinion section. For CCTA, SiteB-CCTA employs a concise narrative format focusing on core findings with minimal structural divisions, while SiteC-CCTA uses a comprehensive starred-section format with detailed anatomical categorization, explicit measurement values, and a separate conclusion section that systematically summarizes all findings. CXR\u0026thinsp;=\u0026thinsp;Chest X-ray, CCTA\u0026thinsp;=\u0026thinsp;Coronary CT Angiography.\u003c/td\u003e\u003c/tr\u003e \u003c/tfoot\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eRadiology Report Information Extraction Framework (RRIEF)\u003c/h3\u003e\n\u003cp\u003eFigure \u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e presents an overview of our framework, \u003cb\u003eRadiology Report Information Extraction Framework (RRIEF)\u003c/b\u003e, for extracting structured information from radiology reports. The framework consists of four sequential steps: target definition, data annotation, prompt design, and LLM training. We detail each component below.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e\n\u003ch3\u003eTarget Definition\u003c/h3\u003e\n\u003cp\u003eThis step involves specifying the classes of interest and their possible label categories for the target dataset. Classes represent medical findings (e.g., mass, pneumothorax) or inferable information (e.g., severity ratings) from reports, while label categories define the possible values each class can take. These can be freely configured based on the intended use of the extracted information.\u003c/p\u003e \u003cp\u003eFor CXR, we used 13 finding classes following the CheXpert Labeler\u0026rsquo;s protocol, excluding the 'no finding' class as it's inferable from the others (\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e). The categorization of labels was based on the presence of these findings in the reports, classified as 'positive', 'negative', 'unsure', or 'not mentioned'. For mammography, given the absence of prior research, we established 11 finding classes derived from the Breast Imaging Lexicon (\u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e), categorized according to their presence in the reports and specified laterality as 'right breast', 'left breast', 'bilateral breasts', 'unsure', or 'not mentioned'. For CCTA, following Coronary Artery Disease-Reporting and Data System (CAD-RADS) 2.0 guidelines (\u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e40\u003c/span\u003e), we defined classes and labels for stenosis severity categories (0, 1, 2, 3, 4A, 4B, or 5), plaque burden scores (None, P1, P2, P3, or P4), and six modifiers (E, I, N, G, HRP, and S, each labeled as 0 or 1). Details regarding the target findings and labels are provided in \u003cb\u003eSupplemental Text 2\u003c/b\u003e.\u003c/p\u003e\n\u003ch3\u003eData Annotation\u003c/h3\u003e\n\u003cp\u003eTo expedite the annotation process, we first generate initial annotations using LLMs and then have human review these preliminary labels, significantly reducing the manual annotation effort compared to starting from scratch. For the initial annotation of CXR and mammography datasets, we utilized LLaMA-13B (\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e) to generate initial annotations based on a small number of annotated examples (we used three fixed examples). These preliminary results were then refined by two graduate students under the guidance of a chest radiologist (D.H.K.; \u0026gt;5 years of experience) and a breast radiologist (J.M.J.; \u0026gt;20 years of experience). For CCTA reports, two board-certified cardiovascular radiologists (W.G.J.; \u0026gt;6 years of experience and B.S.H.; \u0026gt;5 years of experience) independently reviewed all reports and established ground truth annotations through consensus. The labels for all test datasets additionally underwent a comprehensive review by the radiologists, each specializing in their respective fields.\u003c/p\u003e\n\u003ch3\u003ePrompt Design\u003c/h3\u003e\n\u003cp\u003eOur prompts consist of three components: instruction, input, and output. The instruction provides detailed descriptions of the task, including definitions of target classes and their possible labels. The input comprises the radiology report text, while the output is structured as a JSON format containing the extracted information. We designed prompts to be explicit about task requirements while maintaining flexibility for different reporting styles. Examples of prompt design and corresponding outputs are shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e, with complete prompt templates detailed in \u003cb\u003eSupplemental Text 3\u003c/b\u003e.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003eLLM Training\u003c/h2\u003e \u003cp\u003eWe trained LLaMA models (version 1 and 3) independently for each imaging modality using different model sizes ranging from 7B to 70B parameters (\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e, \u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e). To minimize computational resources while maintaining performance, we employed Quantized Low-Rank Adaptation (QLoRA) (\u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e), a parameter-efficient fine-tuning technique. This approach reduced the trainable parameters to less than 0.1% of the total model parameters, enabling efficient training of even the largest model (LLaMA3-70B) on a single 48GB A6000 GPU within 6 hours. To ensure reproducibility and consistent label generation across different runs, we employed deterministic decoding by setting the temperature parameter to 0 (\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e). Detailed training configurations and hyperparameters are provided in \u003cb\u003eSupplemental Text 4\u003c/b\u003e.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eEvaluation\u003c/h3\u003e\n\u003cp\u003eFor comparative analysis, we tested RRIEF with various LLaMA models (LLaMA1: 7B, 13B, 30B and 65B; LLaMA3: 8B and 70B) against three comparison groups: their base versions without additional training, the open-source DeepSeek-R1-Distill-Qwen-14B model (a specialized model optimized for reasoning tasks) (\u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e), and leading proprietary LLMs (GPT-4o, Gemini-1.5-flash, Claude-3.5-sonnet), all evaluated under zero/few-shot settings (minimal example-based inference where models perform tasks with no or few examples). We conducted these comparative experiments across all three modalities: CXR, mammography, and CCTA. For CXR reports, we additionally compared against specialized methods including CheXpert Labeler and CheXbert. Implementation details of zero/few-shot learnings are provided in \u003cb\u003eSupplemental Text 5\u003c/b\u003e.\u003c/p\u003e \u003cp\u003eTo investigate the relationship between training data size and model performance, we conducted experiments using LLaMA3-8B on the MIMIC-CXR dataset. Training data sizes ranged from 50 to 1,000 reports (50, 100, 200, 500 and 1,000), all randomly sampled from the original training set of 1,000 reports without replacement. To ensure statistical reliability and reproducibility, we repeated each experiment five times using different random seeds.\u003c/p\u003e \u003cp\u003ePerformance was evaluated using the Macro F1 score (rationale detailed in \u003cb\u003eSupplemental Text 6\u003c/b\u003e), with results averaged across 10 bootstrap repetitions and presented with 95% confidence intervals. Statistical comparisons between methods used the one-sided Wilcoxon Signed Rank test (P\u0026thinsp;\u0026lt;\u0026thinsp;.05 for significance). Cases failing to produce valid JSON outputs were excluded from analysis (\u003cb\u003eSupplemental Text 7\u003c/b\u003e).\u003c/p\u003e"},{"header":"Result","content":"\u003cdiv id=\"Sec11\" class=\"Section2\"\u003e \u003ch2\u003ePatient Characteristics\u003c/h2\u003e \u003cp\u003eA flowchart illustrating the data selection process is provided in Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e. For CXR, we utilized the MIMIC-CXR dataset, including 1,000 reports from 631 patients (median age 62 years [IQR 50\u0026ndash;74 years]; 50% female) for training and 1,000 reports from 807 patients (median age 63 years [IQR 49\u0026ndash;72 years]; 53% female) for internal testing. The Open-i dataset provided 751 reports from 751 patients (demographics unavailable) for external testing. For mammography, the SiteA-MMG dataset contributed 500 reports from 486 patients (median age 51 years [IQR 47\u0026ndash;56 years]; all female) for training and 1,000 reports from 995 patients (median age 53 years [IQR 47\u0026ndash;60 years]; all female) for internal testing. The CDD-CESM dataset provided 326 reports from 326 patients (median age 50 years [IQR 43\u0026ndash;58 years]; all female) for external testing. For CCTA, we employed the SiteB-CCTA dataset with 100 reports for training and 50 reports for internal testing. The SiteC-CCTA dataset contributed 51 reports for external testing. As synthetic data was used for CCTA, conventional demographic distributions could not be represented.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec12\" class=\"Section2\"\u003e \u003ch2\u003eOverall Performance in Chest X-ray and Mammography Report Labeling\u003c/h2\u003e \u003cp\u003eThe overall performance results for CXR and mammography report labeling are summarized in Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e. In our internal MIMIC-CXR test, RRIEF-LLaMA1-65B achieved a mean F1 score of 0.87 (95% CI: 0.86, 0.88), significantly outperforming CheXpert Labeler (0.70, 95% CI: 0.70, 0.71; P\u0026thinsp;\u0026lt;\u0026thinsp;.001) and CheXbert (0.72, 95% CI: 0.71, 0.72; P\u0026thinsp;\u0026lt;\u0026thinsp;.001). This performance surpassed all zero/few-shot LLMs including Claude-3.5-Sonnet (0.69, 95% CI: 0.68, 0.70; P\u0026thinsp;\u0026lt;\u0026thinsp;.001). Even the smallest model, RRIEF-LLaMA1-7B, demonstrated superior performance (0.83, 95% CI: 0.81, 0.84; P\u0026thinsp;\u0026lt;\u0026thinsp;.001) compared to these benchmarks. On the external Open-i dataset, RRIEF-LLaMA3-70B scored highest at 0.85 (95% CI: 0.83, 0.87), surpassing CheXpert Labeler (0.69, 95% CI: 0.68, 0.70; P\u0026thinsp;\u0026lt;\u0026thinsp;.001) and CheXbert (0.69, 95% CI: 0.68, 0.71; P\u0026thinsp;\u0026lt;\u0026thinsp;.001). RRIEF-LLaMA3-70B showed better generalizability than RRIEF-LLaMA1-65B (0.80, 95% CI: 0.77, 0.82; P\u0026thinsp;=\u0026thinsp;.002). Notably, smaller models such as RRIEF-LLaMA1-7B (0.80, 95% CI: 0.79, 0.81) and RRIEF-LLaMA3-8B (0.84, 95% CI: 0.82, 0.85) outperformed all zero/few-shot benchmarks including Claude-3.5-Sonnet (0.62, 95% CI: 0.60, 0.63; P\u0026thinsp;\u0026lt;\u0026thinsp;.001), aligning with their internal test results.\u003c/p\u003e \u003cp\u003eFor mammography reports, internal testing on SiteA-MMG showed RRIEF-LLaMA1-30B and RRIEF-LLaMA1-65B both achieved 0.91 (95% CI: 0.89, 0.92 and 0.90, 0.92, respectively), exceeding zero/few-shot performances of LLaMA models (P\u0026thinsp;\u0026lt;\u0026thinsp;.001 for all comparisons) and proprietary LLMs including Gemini-1.5-Flash (0.86, 95% CI: 0.85, 0.87; P\u0026thinsp;=\u0026thinsp;.002). Even the smallest model, RRIEF-LLaMA1-7B (0.88, 95% CI: 0.86, 0.90), outperformed GPT-4o and Claude-3.5-Sonnet in few-shot settings (P\u0026thinsp;\u0026lt;\u0026thinsp;.001 for all comparisons). In the external CDD-CESM test, RRIEF-LLaMA3-70B achieved the highest score of 0.99 (95% CI: 0.98, 1.00), confirming its superior generalizability over RRIEF-LLaMA1-65B (P\u0026thinsp;=\u0026thinsp;.02) and outperforming all zero/few-shot configurations, including Claude-3.5-Sonnet (0.92, 95% CI: 0.90, 0.94; P\u0026thinsp;=\u0026thinsp;.002).\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eOverall Performance on Chest X-ray and Mammography Reports.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"6\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eMethod\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eModel\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"2\" nameend=\"c4\" namest=\"c3\"\u003e \u003cp\u003eChest X-ray\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"2\" nameend=\"c6\" namest=\"c5\"\u003e \u003cp\u003eMammography\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eMIMIC-CXR (n\u0026thinsp;=\u0026thinsp;1,000)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eOpen-i\u003c/p\u003e \u003cp\u003e(n\u0026thinsp;=\u0026thinsp;751)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eSiteA-MMG\u003c/p\u003e \u003cp\u003e(n\u0026thinsp;=\u0026thinsp;1,000)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eCDD-CESM\u003c/p\u003e \u003cp\u003e(n\u0026thinsp;=\u0026thinsp;326)\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCheXpert Labeler\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.70\u003c/p\u003e \u003cp\u003e(0.70, 0.71)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.69 \u003c/p\u003e \u003cp\u003e(0.68, 0.70)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCheXbert\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.72\u003c/p\u003e \u003cp\u003e(0.71, 0.72)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.69\u003c/p\u003e \u003cp\u003e(0.68, 0.71)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"5\" rowspan=\"6\"\u003e \u003cp\u003e0-shot\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eLLaMA1-65B\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.47\u003c/p\u003e \u003cp\u003e(0.47, 0.47)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.45\u003c/p\u003e \u003cp\u003e(0.44, 0.46)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.40\u003c/p\u003e \u003cp\u003e(0.35, 0.44)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.32\u003c/p\u003e \u003cp\u003e(0.31, 0.34)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eLLaMA3-70B\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.36\u003c/p\u003e \u003cp\u003e(0.35, 0.37)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.31\u003c/p\u003e \u003cp\u003e(0.30, 0.32)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.65\u003c/p\u003e \u003cp\u003e(0.64, 0.66)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.84\u003c/p\u003e \u003cp\u003e(0.83, 0.86)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eDeepSeek-R1-Distill-Qwen-14B\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.54\u003c/p\u003e \u003cp\u003e(0.53, 0.55)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.47\u003c/p\u003e \u003cp\u003e(0.46, 0.48)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.71\u003c/p\u003e \u003cp\u003e(0.70, 0.72)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.45\u003c/p\u003e \u003cp\u003e(0.44, 0.47)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eGPT-4o\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.59\u003c/p\u003e \u003cp\u003e(0.58, 0.59)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.47\u003c/p\u003e \u003cp\u003e(0.45, 0.49)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.71\u003c/p\u003e \u003cp\u003e(0.71, 0.72)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.72\u003c/p\u003e \u003cp\u003e(0.71, 0.73)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eGemini-1.5-Flash\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.57\u003c/p\u003e \u003cp\u003e(0.56, 0.58)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.60\u003c/p\u003e \u003cp\u003e(0.59, 0.62\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.71\u003c/p\u003e \u003cp\u003e(0.70, 0.72)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.72\u003c/p\u003e \u003cp\u003e(0.70, 0.75)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eClaude-3.5-Sonnet\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.67\u003c/p\u003e \u003cp\u003e(0.66, 0.68)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.47\u003c/p\u003e \u003cp\u003e(0.45, 0.49\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.72\u003c/p\u003e \u003cp\u003e(0.71, 0.72)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.79\u003c/p\u003e \u003cp\u003e(0.78, 0.81)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"5\" rowspan=\"6\"\u003e \u003cp\u003e3-shot\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eLLaMA1-65B\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.47\u003c/p\u003e \u003cp\u003e(0.45, 0.48)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.45\u003c/p\u003e \u003cp\u003e(0.44, 0.47)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.64\u003c/p\u003e \u003cp\u003e(0.63, 0.66)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.60\u003c/p\u003e \u003cp\u003e(0.58, 0.61)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eLLaMA3-70B\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.60\u003c/p\u003e \u003cp\u003e(0.59, 0.61)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.56\u003c/p\u003e \u003cp\u003e(0.54, 0.58)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.87\u003c/p\u003e \u003cp\u003e(0.85, 0.88)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.87\u003c/p\u003e \u003cp\u003e(0.86, 0.88)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eDeepSeek-R1-Distill-Qwen-14B\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.57\u003c/p\u003e \u003cp\u003e(0.57, 0.58)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.53\u003c/p\u003e \u003cp\u003e(0.52, 0.53)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.73\u003c/p\u003e \u003cp\u003e(0.71, 0.74)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.85\u003c/p\u003e \u003cp\u003e(0.83, 0.88)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eGPT-4o\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.67\u003c/p\u003e \u003cp\u003e(0.66, 0.68)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.57\u003c/p\u003e \u003cp\u003e(0.55, 0.59)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.75\u003c/p\u003e \u003cp\u003e(0.74, 0.75)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.89\u003c/p\u003e \u003cp\u003e(0.88, 0.90)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eGemini-1.5-Flash\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.62\u003c/p\u003e \u003cp\u003e(0.61, 0.63)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.55\u003c/p\u003e \u003cp\u003e(0.53, 0.56)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.86\u003c/p\u003e \u003cp\u003e(0.85, 0.87)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.86\u003c/p\u003e \u003cp\u003e(0.84, 0.88)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eClaude-3.5-Sonnet\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.69\u003c/p\u003e \u003cp\u003e(0.68, 0.70)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.62\u003c/p\u003e \u003cp\u003e(0.60, 0.63)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.75\u003c/p\u003e \u003cp\u003e(0.74, 0.75)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.92\u003c/p\u003e \u003cp\u003e(0.90, 0.94)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"5\" rowspan=\"6\"\u003e \u003cp\u003e\u003cb\u003eRRIEF\u003c/b\u003e\u003c/p\u003e \u003cp\u003e\u003cb\u003e(Ours)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eLLaMA1-7B\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.83\u003c/p\u003e \u003cp\u003e(0.81, 0.84)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.80\u003c/p\u003e \u003cp\u003e(0.79, 0.81)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.88\u003c/p\u003e \u003cp\u003e(0.86, 0.90)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.74 \u003c/p\u003e \u003cp\u003e(0.72, 0.75)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eLLaMA1-13B\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.83\u003c/p\u003e \u003cp\u003e(0.82, 0.83)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.79\u003c/p\u003e \u003cp\u003e(0.78, 0.81)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e0.90\u003c/span\u003e\u003c/p\u003e \u003cp\u003e(0.88, 0.91)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.79\u003c/p\u003e \u003cp\u003e(0.77, 0.80)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eLLaMA1-30B\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e0.85\u003c/span\u003e\u003c/p\u003e \u003cp\u003e(0.85, 0.86)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.81\u003c/p\u003e \u003cp\u003e(0.79, 0.82)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e0.91\u003c/b\u003e\u003c/p\u003e \u003cp\u003e(0.89, 0.92)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.93\u003c/p\u003e \u003cp\u003e(0.91, 0.94)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eLLaMA1-65B\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003e0.87\u003c/b\u003e\u003c/p\u003e \u003cp\u003e(0.86, 0.88)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.80\u003c/p\u003e \u003cp\u003e(0.77, 0.82)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e0.91\u003c/b\u003e\u003c/p\u003e \u003cp\u003e(0.90, 0.92)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e0.97\u003c/span\u003e \u003c/p\u003e \u003cp\u003e(0.97, 0.98)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eLLaMA3-8B\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e0.85\u003c/span\u003e\u003c/p\u003e \u003cp\u003e(0.85, 0.86)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e0.84\u003c/span\u003e\u003c/p\u003e \u003cp\u003e(0.82, 0.85)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.88\u003c/p\u003e \u003cp\u003e(0.86, 0.90)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.85\u003c/p\u003e \u003cp\u003e(0.83, 0.86)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eLLaMA3-70B\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e0.85\u003c/span\u003e\u003c/p\u003e \u003cp\u003e(0.84, 0.86)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.85\u003c/b\u003e\u003c/p\u003e \u003cp\u003e(0.83, 0.87)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e0.90\u003c/span\u003e\u003c/p\u003e \u003cp\u003e(0.88, 0.92)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e\u003cb\u003e0.99\u003c/b\u003e\u003c/p\u003e \u003cp\u003e(0.98, 1.00)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003ctfoot\u003e \u003ctr\u003e\u003ctd colspan=\"6\"\u003eNote: Data represent the average Macro F1 Score and 95% Confidence Interval from 10 Bootstrap Tests. For each dataset, the highest performance is shown in bold and the second-highest is underlined. RRIEF\u0026thinsp;=\u0026thinsp;Radiology Report Information Extraction Framework, B\u0026thinsp;=\u0026thinsp;Billion, LLaMA\u0026thinsp;=\u0026thinsp;Large language model Meta AI.\u003c/td\u003e\u003c/tr\u003e \u003c/tfoot\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec13\" class=\"Section2\"\u003e \u003ch2\u003eFinding-specific Performance Advantages of RRIEF\u003c/h2\u003e \u003cp\u003eTable\u0026nbsp;\u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e3\u003c/span\u003e highlights RRIEF's average performance advantages in CXR report labeling across individual findings, comparing the average performance of all RRIEF models (Avg. RRIEF: RRIEF-LLaMA1-7B/13B/30B/65B, RRIEF-LLaMA3-8B/70B) against other methods. In labeling 'Support Devices', RRIEF models outperformed traditional methods (Avg. CheXpert \u0026amp; CheXbert) by F1 score differences of 0.32 (MIMIC-CXR) and 0.41 (Open-i). Compared to zero-shot and few-shot performances of LLaMA1-65B and LLaMA3-70B (Avg. LLaMA 0-shot and Avg. LLaMA 3-shot), RRIEF models demonstrated substantial improvements, particularly in 'Enlarged Cardiomediastinum' where F1 score differences reached 0.52\u0026ndash;0.64 in both test sets. When compared against proprietary LLMs (Avg. Proprietary LLMs 0-shot and 3-shot: GPT-4o, Gemini-1.5-Flash, Claude-3.5-Sonnet), RRIEF models showed notable improvements (0.23\u0026ndash;0.41 F1 score difference) in 'Enlarged Cardiomediastinum', 'Cardiomegaly', and 'Lung Lesion' across both internal and external test sets.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab3\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 3\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eAverage Performance Comparison of Methods for Chest X-ray Report Labeling by Findings.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"7\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDataset and finding\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAvg.\u003c/p\u003e \u003cp\u003eRRIEF (Ours)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eAvg.\u003c/p\u003e \u003cp\u003eCheXpert\u003c/p\u003e \u003cp\u003e\u0026amp; CheXbert\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eAvg.\u003c/p\u003e \u003cp\u003eLLaMA\u003c/p\u003e \u003cp\u003e0-shot\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eAvg.\u003c/p\u003e \u003cp\u003eLLaMA\u003c/p\u003e \u003cp\u003e3-shot\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eAvg.\u003c/p\u003e \u003cp\u003eProprietary LLMs 0-shot\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003eAvg.\u003c/p\u003e \u003cp\u003eProprietary LLMs 3-shot\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colspan=\"7\" nameend=\"c7\" namest=\"c1\"\u003e \u003cp\u003e\u003cem\u003eMIMIC-CXR (n\u0026thinsp;=\u0026thinsp;1,000)\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eEnlarged Cardiom.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.91\u0026thinsp;\u0026plusmn;\u0026thinsp;.03\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.74\u0026thinsp;\u0026plusmn;\u0026thinsp;.01 (Δ-0.17)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.27\u0026thinsp;\u0026plusmn;\u0026thinsp;.04 (Δ-0.64)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e0.39\u0026thinsp;\u0026plusmn;\u0026thinsp;.17 (Δ-0.52)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.60\u0026thinsp;\u0026plusmn;\u0026thinsp;.02 (Δ-0.31)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.60\u0026thinsp;\u0026plusmn;\u0026thinsp;.02 (Δ-0.31)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCardiomegaly\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.93\u0026thinsp;\u0026plusmn;\u0026thinsp;.01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.69\u0026thinsp;\u0026plusmn;\u0026thinsp;.02 (Δ-0.25)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.50\u0026thinsp;\u0026plusmn;\u0026thinsp;.08 (Δ-0.44)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.54\u0026thinsp;\u0026plusmn;\u0026thinsp;.07 (Δ-0.39)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e\u003cb\u003e0.54\u0026thinsp;\u0026plusmn;\u0026thinsp;.01 (Δ-0.39)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.59\u0026thinsp;\u0026plusmn;\u0026thinsp;.02 (Δ-0.34)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLung Lesion\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.84\u0026thinsp;\u0026plusmn;\u0026thinsp;.06\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.70\u0026thinsp;\u0026plusmn;\u0026thinsp;.00 (Δ-0.14)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.30\u0026thinsp;\u0026plusmn;\u0026thinsp;.00 (Δ-0.54)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.46\u0026thinsp;\u0026plusmn;\u0026thinsp;.12 (Δ-0.38)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.51\u0026thinsp;\u0026plusmn;\u0026thinsp;.06 (Δ-0.33)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.61\u0026thinsp;\u0026plusmn;\u0026thinsp;.05 (Δ-0.23)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLung Opacity\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.81\u0026thinsp;\u0026plusmn;\u0026thinsp;.03\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.69\u0026thinsp;\u0026plusmn;\u0026thinsp;.09 (Δ-0.12)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.32\u0026thinsp;\u0026plusmn;\u0026thinsp;.11 (Δ-0.49)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.44\u0026thinsp;\u0026plusmn;\u0026thinsp;.01 (Δ-0.37)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.45\u0026thinsp;\u0026plusmn;\u0026thinsp;.06 (Δ-0.36)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.52\u0026thinsp;\u0026plusmn;\u0026thinsp;.03 (Δ-0.28)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eEdema\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.87\u0026thinsp;\u0026plusmn;\u0026thinsp;.02\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.72\u0026thinsp;\u0026plusmn;\u0026thinsp;.00 (Δ-0.15)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.42\u0026thinsp;\u0026plusmn;\u0026thinsp;.13 (Δ-0.46)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.55\u0026thinsp;\u0026plusmn;\u0026thinsp;.13 (Δ-0.32)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.67\u0026thinsp;\u0026plusmn;\u0026thinsp;.03 (Δ-0.20)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.72\u0026thinsp;\u0026plusmn;\u0026thinsp;.02 (Δ-0.16)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eConsolidation\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.89\u0026thinsp;\u0026plusmn;\u0026thinsp;.02\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e0.88\u0026thinsp;\u0026plusmn;\u0026thinsp;.04 (Δ-0.01)\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.51\u0026thinsp;\u0026plusmn;\u0026thinsp;.04 (Δ-0.38)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.63\u0026thinsp;\u0026plusmn;\u0026thinsp;.07 (Δ-0.26)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.65\u0026thinsp;\u0026plusmn;\u0026thinsp;.18 (Δ-0.24)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.72\u0026thinsp;\u0026plusmn;\u0026thinsp;.12 (Δ-0.17)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ePneumonia\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.82\u0026thinsp;\u0026plusmn;\u0026thinsp;.02\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.72\u0026thinsp;\u0026plusmn;\u0026thinsp;.09 (Δ-0.11)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.37\u0026thinsp;\u0026plusmn;\u0026thinsp;.07 (Δ-0.45)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.52\u0026thinsp;\u0026plusmn;\u0026thinsp;.17 (Δ-0.30)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.60\u0026thinsp;\u0026plusmn;\u0026thinsp;.13 (Δ-0.22)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.75\u0026thinsp;\u0026plusmn;\u0026thinsp;.04 (Δ-0.07)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAtelectasis\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.75\u0026thinsp;\u0026plusmn;\u0026thinsp;.06\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.65\u0026thinsp;\u0026plusmn;\u0026thinsp;.01 (Δ-0.10)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e0.52\u0026thinsp;\u0026plusmn;\u0026thinsp;.35 (Δ-0.23)\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.52\u0026thinsp;\u0026plusmn;\u0026thinsp;.07 (Δ-0.23)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.55\u0026thinsp;\u0026plusmn;\u0026thinsp;.01 (Δ-0.20)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.60\u0026thinsp;\u0026plusmn;\u0026thinsp;.01 (Δ-0.15)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ePneumothorax\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.81\u0026thinsp;\u0026plusmn;\u0026thinsp;.05\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.75\u0026thinsp;\u0026plusmn;\u0026thinsp;.04 (Δ-0.06)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.56\u0026thinsp;\u0026plusmn;\u0026thinsp;.03 (Δ-0.25)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e0.69\u0026thinsp;\u0026plusmn;\u0026thinsp;.18 (Δ-0.12)\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e0.81\u0026thinsp;\u0026plusmn;\u0026thinsp;.06 (Δ-0.00)\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e0.81\u0026thinsp;\u0026plusmn;\u0026thinsp;.09 (Δ-0.00)\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ePleural Effusion\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.93\u0026thinsp;\u0026plusmn;\u0026thinsp;.01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.87\u0026thinsp;\u0026plusmn;\u0026thinsp;.01 (Δ-0.06)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.65\u0026thinsp;\u0026plusmn;\u0026thinsp;.01 (Δ-0.28)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.70\u0026thinsp;\u0026plusmn;\u0026thinsp;.17 (Δ-0.23)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.78\u0026thinsp;\u0026plusmn;\u0026thinsp;.04 (Δ-0.14)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.82\u0026thinsp;\u0026plusmn;\u0026thinsp;.04 (Δ-0.10)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ePleural Other\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.83\u0026thinsp;\u0026plusmn;\u0026thinsp;.05\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.62\u0026thinsp;\u0026plusmn;\u0026thinsp;.00 (Δ-0.21)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.24\u0026thinsp;\u0026plusmn;\u0026thinsp;.08 (Δ-0.59)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.44\u0026thinsp;\u0026plusmn;\u0026thinsp;.07 (Δ-0.39)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.62\u0026thinsp;\u0026plusmn;\u0026thinsp;.11 (Δ-0.21)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.62\u0026thinsp;\u0026plusmn;\u0026thinsp;.12 (Δ-0.21)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eFracture\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.78\u0026thinsp;\u0026plusmn;\u0026thinsp;.04\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.72\u0026thinsp;\u0026plusmn;\u0026thinsp;.10 (Δ-0.06)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.42\u0026thinsp;\u0026plusmn;\u0026thinsp;.05 (Δ-0.36)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.58\u0026thinsp;\u0026plusmn;\u0026thinsp;.04 (Δ-0.20)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.55\u0026thinsp;\u0026plusmn;\u0026thinsp;.03 (Δ-0.23)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.57\u0026thinsp;\u0026plusmn;\u0026thinsp;.04 (Δ-0.21)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSupport Devices\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.85\u0026thinsp;\u0026plusmn;\u0026thinsp;.10\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003e0.53\u0026thinsp;\u0026plusmn;\u0026thinsp;.01 (Δ-0.32)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.31\u0026thinsp;\u0026plusmn;\u0026thinsp;.08 (Δ-0.54)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.49\u0026thinsp;\u0026plusmn;\u0026thinsp;.04 (Δ-0.36)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.59\u0026thinsp;\u0026plusmn;\u0026thinsp;.09 (Δ-0.26)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.64\u0026thinsp;\u0026plusmn;\u0026thinsp;.06 (Δ-0.21)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAverage\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.85\u0026thinsp;\u0026plusmn;\u0026thinsp;.02\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.71\u0026thinsp;\u0026plusmn;\u0026thinsp;.01 (Δ-0.14)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.42\u0026thinsp;\u0026plusmn;\u0026thinsp;.08 (Δ-0.43)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.54\u0026thinsp;\u0026plusmn;\u0026thinsp;.09 (Δ-0.31)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.61\u0026thinsp;\u0026plusmn;\u0026thinsp;.05 (Δ-0.24)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.66\u0026thinsp;\u0026plusmn;\u0026thinsp;.04 (Δ-0.19)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colspan=\"7\" nameend=\"c7\" namest=\"c1\"\u003e \u003cp\u003e\u003cem\u003eOpen-i (n\u0026thinsp;=\u0026thinsp;751)\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eEnlarged Cardiom.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.87\u0026thinsp;\u0026plusmn;\u0026thinsp;.04\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.72\u0026thinsp;\u0026plusmn;\u0026thinsp;.04 (Δ-0.16)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.31\u0026thinsp;\u0026plusmn;\u0026thinsp;.03 (Δ-0.56)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e0.35\u0026thinsp;\u0026plusmn;\u0026thinsp;.11 (Δ-0.53)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.51\u0026thinsp;\u0026plusmn;\u0026thinsp;.05 (Δ-0.36)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.56\u0026thinsp;\u0026plusmn;\u0026thinsp;.03 (Δ-0.31)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCardiomegaly\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.91\u0026thinsp;\u0026plusmn;\u0026thinsp;.01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.78\u0026thinsp;\u0026plusmn;\u0026thinsp;.05 (Δ-0.13)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.50\u0026thinsp;\u0026plusmn;\u0026thinsp;.06 (Δ-0.41)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.57\u0026thinsp;\u0026plusmn;\u0026thinsp;.03 (Δ-0.34)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.52\u0026thinsp;\u0026plusmn;\u0026thinsp;.03 (Δ-0.39)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.60\u0026thinsp;\u0026plusmn;\u0026thinsp;.01 (Δ-0.30)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLung Lesion\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.86\u0026thinsp;\u0026plusmn;\u0026thinsp;.07\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.81\u0026thinsp;\u0026plusmn;\u0026thinsp;.10 (Δ-0.05)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.28\u0026thinsp;\u0026plusmn;\u0026thinsp;.05 (Δ-0.59)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.44\u0026thinsp;\u0026plusmn;\u0026thinsp;.01 (Δ-0.43)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e\u003cb\u003e0.45\u0026thinsp;\u0026plusmn;\u0026thinsp;.14 (Δ-0.41)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.60\u0026thinsp;\u0026plusmn;\u0026thinsp;.09 (Δ-0.26)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLung Opacity\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.69\u0026thinsp;\u0026plusmn;\u0026thinsp;.04\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.71\u0026thinsp;\u0026plusmn;\u0026thinsp;.11 (Δ\u0026thinsp;+\u0026thinsp;0.02)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.40\u0026thinsp;\u0026plusmn;\u0026thinsp;.06 (Δ-0.29)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.41\u0026thinsp;\u0026plusmn;\u0026thinsp;.01 (Δ-0.28)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.50\u0026thinsp;\u0026plusmn;\u0026thinsp;.06 (Δ-0.18)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.55\u0026thinsp;\u0026plusmn;\u0026thinsp;.04 (Δ-0.14)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eEdema\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.81\u0026thinsp;\u0026plusmn;\u0026thinsp;.09\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.48\u0026thinsp;\u0026plusmn;\u0026thinsp;.00 (Δ-0.33)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.26\u0026thinsp;\u0026plusmn;\u0026thinsp;.11 (Δ-0.55)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.43\u0026thinsp;\u0026plusmn;\u0026thinsp;.07 (Δ-0.38)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.47\u0026thinsp;\u0026plusmn;\u0026thinsp;.05 (Δ-0.34)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.52\u0026thinsp;\u0026plusmn;\u0026thinsp;.12 (Δ-0.29)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eConsolidation\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.82\u0026thinsp;\u0026plusmn;\u0026thinsp;.10\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.73\u0026thinsp;\u0026plusmn;\u0026thinsp;.06 (Δ-0.09)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.36\u0026thinsp;\u0026plusmn;\u0026thinsp;.04 (Δ-0.46)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.46\u0026thinsp;\u0026plusmn;\u0026thinsp;.03 (Δ-0.36)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.48\u0026thinsp;\u0026plusmn;\u0026thinsp;.27 (Δ-0.34)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.49\u0026thinsp;\u0026plusmn;\u0026thinsp;.16 (Δ-0.32)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ePneumonia\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.89\u0026thinsp;\u0026plusmn;\u0026thinsp;.05\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e0.96\u0026thinsp;\u0026plusmn;\u0026thinsp;.04 (Δ\u0026thinsp;+\u0026thinsp;0.07)\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.27\u0026thinsp;\u0026plusmn;\u0026thinsp;.02 (Δ-0.63)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.56\u0026thinsp;\u0026plusmn;\u0026thinsp;.28 (Δ-0.34)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.54\u0026thinsp;\u0026plusmn;\u0026thinsp;.12 (Δ-0.35)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.70\u0026thinsp;\u0026plusmn;\u0026thinsp;.10 (Δ-0.19)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAtelectasis\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.89\u0026thinsp;\u0026plusmn;\u0026thinsp;.04\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.89\u0026thinsp;\u0026plusmn;\u0026thinsp;.01 (Δ-0.01)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.50\u0026thinsp;\u0026plusmn;\u0026thinsp;.35 (Δ-0.40)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.52\u0026thinsp;\u0026plusmn;\u0026thinsp;.05 (Δ-0.38)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.50\u0026thinsp;\u0026plusmn;\u0026thinsp;.02 (Δ-0.39)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.55\u0026thinsp;\u0026plusmn;\u0026thinsp;.05 (Δ-0.34)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ePneumothorax\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.88\u0026thinsp;\u0026plusmn;\u0026thinsp;.04\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.61\u0026thinsp;\u0026plusmn;\u0026thinsp;.04 (Δ-0.27)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e0.69\u0026thinsp;\u0026plusmn;\u0026thinsp;.16 (Δ-0.19)\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e0.81\u0026thinsp;\u0026plusmn;\u0026thinsp;.17 (Δ-0.07)\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.57\u0026thinsp;\u0026plusmn;\u0026thinsp;.10 (Δ-0.31)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.63\u0026thinsp;\u0026plusmn;\u0026thinsp;.06 (Δ-0.25)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ePleural Effusion\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.75\u0026thinsp;\u0026plusmn;\u0026thinsp;.05\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.80\u0026thinsp;\u0026plusmn;\u0026thinsp;.01 (Δ\u0026thinsp;+\u0026thinsp;0.06)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.53\u0026thinsp;\u0026plusmn;\u0026thinsp;.13 (Δ-0.22)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.59\u0026thinsp;\u0026plusmn;\u0026thinsp;.14 (Δ-0.16)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.64\u0026thinsp;\u0026plusmn;\u0026thinsp;.12 (Δ-0.11)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e0.72\u0026thinsp;\u0026plusmn;\u0026thinsp;.06 (Δ-0.03)\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ePleural Other\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.53\u0026thinsp;\u0026plusmn;\u0026thinsp;.10\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.46\u0026thinsp;\u0026plusmn;\u0026thinsp;.06 (Δ-0.07)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.21\u0026thinsp;\u0026plusmn;\u0026thinsp;.16 (Δ-0.32)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.43\u0026thinsp;\u0026plusmn;\u0026thinsp;.02 (Δ-0.10)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e0.45\u0026thinsp;\u0026plusmn;\u0026thinsp;.05 (Δ-0.07)\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.45\u0026thinsp;\u0026plusmn;\u0026thinsp;.02 (Δ-0.08)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eFracture\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.82\u0026thinsp;\u0026plusmn;\u0026thinsp;.06\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.57\u0026thinsp;\u0026plusmn;\u0026thinsp;.00 (Δ-0.25)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.37\u0026thinsp;\u0026plusmn;\u0026thinsp;.09 (Δ-0.45)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.44\u0026thinsp;\u0026plusmn;\u0026thinsp;.06 (Δ-0.38)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.45\u0026thinsp;\u0026plusmn;\u0026thinsp;.02 (Δ-0.37)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.46\u0026thinsp;\u0026plusmn;\u0026thinsp;.06 (Δ-0.36)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSupport Devices\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.88\u0026thinsp;\u0026plusmn;\u0026thinsp;.03\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003e0.47\u0026thinsp;\u0026plusmn;\u0026thinsp;.04 (Δ-0.41)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.32\u0026thinsp;\u0026plusmn;\u0026thinsp;.13 (Δ-0.56)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.59\u0026thinsp;\u0026plusmn;\u0026thinsp;.01 (Δ-0.29)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.60\u0026thinsp;\u0026plusmn;\u0026thinsp;.04 (Δ-0.28)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.65\u0026thinsp;\u0026plusmn;\u0026thinsp;.19 (Δ-0.22)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAverage\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.82\u0026thinsp;\u0026plusmn;\u0026thinsp;.02\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.66\u0026thinsp;\u0026plusmn;\u0026thinsp;.00 (Δ-0.16)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.38\u0026thinsp;\u0026plusmn;\u0026thinsp;.10 (Δ-0.44)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.51\u0026thinsp;\u0026plusmn;\u0026thinsp;.08 (Δ-0.31)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.51\u0026thinsp;\u0026plusmn;\u0026thinsp;.08 (Δ-0.30)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.58\u0026thinsp;\u0026plusmn;\u0026thinsp;.04 (Δ-0.24)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003ctfoot\u003e \u003ctr\u003e\u003ctd colspan=\"7\"\u003eNote: Avg. RRIEF represents the average performance across all RRIEF-LLaMA models (LLaMA1: 7B, 13B, 30B, 65B; LLaMA3: 8B, 70B). Avg. CheXpert \u0026amp; CheXbert is the average performance of CheXpert Labeler and CheXbert. Avg. LLaMA 0-shot and 3-shot represent the average performance of LLaMA1-65B and LLaMA3-70B in 0-shot and 3-shot settings, respectively. Avg. Proprietary LLMs 0-shot and 3-shot represent the average performance of GPT-4o, Gemini-1.5-Flash, and Claude-3.5-Sonnet in 0-shot and 3-shot settings, respectively. The performance metric used is macro F1 score, presented as mean\u0026thinsp;\u0026plusmn;\u0026thinsp;standard deviation (e.g., 0.47\u0026thinsp;\u0026plusmn;\u0026thinsp;.04). The difference from Avg. RRIEF is expressed in parentheses using Δ. For each method, the chest X-ray finding with the lowest Δ value is highlighted in bold, while the finding with the highest Δ value is underlined. Detailed performance for each model is shown in \u003cb\u003eTable \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003eS1\u003c/span\u003e-S4\u003c/b\u003e. RRIEF\u0026thinsp;=\u0026thinsp;Radiology Report Information Extraction Framework, B\u0026thinsp;=\u0026thinsp;Billion, Enlarged Cardiom. = Enlarged Cardiomediastinum, LLaMA\u0026thinsp;=\u0026thinsp;Large Language Model Meta AI.\u003c/td\u003e\u003c/tr\u003e \u003c/tfoot\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eTable\u0026nbsp;\u003cspan refid=\"Tab4\" class=\"InternalRef\"\u003e4\u003c/span\u003e demonstrates RRIEF's performance advantages in mammography report labeling across individual findings, using the same comparative approach. Compared to zero-shot and few-shot performances of LLaMA1-65B and LLaMA3-70B, RRIEF models showed notable improvements in the internal test for 'Nodule', 'Calcification', and 'Architectural Distortion' with macro F1 score differences of 0.21\u0026ndash;0.50, while in the external test, 'Nodule' and 'Lymph Node Enlargement' showed improvements with score differences of 0.45\u0026ndash;0.71. Against proprietary LLMs in both settings, RRIEF models demonstrated higher F1 scores for 'Architectural Distortion' and 'Lymph Node Enlargement' across both internal and external tests, with improvements ranging from 0.11 to 0.63.\u003c/p\u003e \u003cp\u003eAdditionally, we observed substantial performance variations among RRIEF models as detailed in Table\u0026nbsp;\u003cspan refid=\"Tab4\" class=\"InternalRef\"\u003e4\u003c/span\u003e. 'Skin Thickening', 'Skin Retraction', and 'Trabecular Thickening' showed considerable variability with standard deviations ranging from 0.12 to 0.30 across both datasets. \u003cb\u003eTables S5 and S6\u003c/b\u003e indicate this variability primarily stems from the lower performance of smaller models (RRIEF-LLaMA1-7B, RRIEF-LLaMA3-8B). This size-based performance disparity was most evident in the external test, where RRIEF-LLaMA1-7B scored 0.50 (95% CI: 0.49, 0.51), 0.60 (95% CI: 0.45, 0.74), and 0.24 (95% CI: 0.24, 0.24) for these three findings respectively, while RRIEF-LLaMA3-70B achieved perfect scores of 1.00 (95% CI: 1.00, 1.00) for all three findings.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab4\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 4\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eAverage Performance Comparison of Methods for Mammography Report Labeling by Findings.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"6\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDataset and finding\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAvg.\u003c/p\u003e \u003cp\u003eRRIEF (Ours)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eAvg.\u003c/p\u003e \u003cp\u003eLLaMA\u003c/p\u003e \u003cp\u003e0-shot\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eAvg.\u003c/p\u003e \u003cp\u003eLLaMA\u003c/p\u003e \u003cp\u003e3-shot\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eAvg.\u003c/p\u003e \u003cp\u003eProprietary LLMs 0-shot\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eAvg.\u003c/p\u003e \u003cp\u003eProprietary LLMs 3-shot\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colspan=\"6\" nameend=\"c6\" namest=\"c1\"\u003e \u003cp\u003e\u003cem\u003eSiteA-MMG (n\u0026thinsp;=\u0026thinsp;1,000)\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eNodule\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.95\u0026thinsp;\u0026plusmn;\u0026thinsp;.04\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.45\u0026thinsp;\u0026plusmn;\u0026thinsp;.23 (Δ-0.50)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.69\u0026thinsp;\u0026plusmn;\u0026thinsp;.25 (Δ-0.26)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.77\u0026thinsp;\u0026plusmn;\u0026thinsp;.02 (Δ-0.18)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.77\u0026thinsp;\u0026plusmn;\u0026thinsp;.01 (Δ-0.19)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMass\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.87\u0026thinsp;\u0026plusmn;\u0026thinsp;.05\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.55\u0026thinsp;\u0026plusmn;\u0026thinsp;.17 (Δ-0.32)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.72\u0026thinsp;\u0026plusmn;\u0026thinsp;.21 (Δ-0.16)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.77\u0026thinsp;\u0026plusmn;\u0026thinsp;.02 (Δ-0.11)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.78\u0026thinsp;\u0026plusmn;\u0026thinsp;.00 (Δ-0.09)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCalcification\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.97\u0026thinsp;\u0026plusmn;\u0026thinsp;.02\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.56\u0026thinsp;\u0026plusmn;\u0026thinsp;.23 (Δ-0.41)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.76\u0026thinsp;\u0026plusmn;\u0026thinsp;.06 (Δ-0.21)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.77\u0026thinsp;\u0026plusmn;\u0026thinsp;.02 (Δ-0.20)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.80\u0026thinsp;\u0026plusmn;\u0026thinsp;.01 (Δ-0.17)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAsymmetry\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.90\u0026thinsp;\u0026plusmn;\u0026thinsp;.06\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.51\u0026thinsp;\u0026plusmn;\u0026thinsp;.19 (Δ-0.39)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.80\u0026thinsp;\u0026plusmn;\u0026thinsp;.06 (Δ-0.10)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.73\u0026thinsp;\u0026plusmn;\u0026thinsp;.03 (Δ-0.17)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.86\u0026thinsp;\u0026plusmn;\u0026thinsp;.12 (Δ-0.04)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eArchitectural Dist.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.97\u0026thinsp;\u0026plusmn;\u0026thinsp;.05\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.56\u0026thinsp;\u0026plusmn;\u0026thinsp;.17 (Δ-0.41)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.77\u0026thinsp;\u0026plusmn;\u0026thinsp;.25 (Δ-0.20)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e0.72\u0026thinsp;\u0026plusmn;\u0026thinsp;.03 (Δ-0.25)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.81\u0026thinsp;\u0026plusmn;\u0026thinsp;.14 (Δ-0.16)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSkin Thickening\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.81\u0026thinsp;\u0026plusmn;\u0026thinsp;.13\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.55\u0026thinsp;\u0026plusmn;\u0026thinsp;.10 (Δ-0.26)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.63\u0026thinsp;\u0026plusmn;\u0026thinsp;.10 (Δ-0.18)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e0.76\u0026thinsp;\u0026plusmn;\u0026thinsp;.03 (Δ-0.05)\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.80\u0026thinsp;\u0026plusmn;\u0026thinsp;.02 (Δ-0.01)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLymph Node Enlarge.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.79\u0026thinsp;\u0026plusmn;\u0026thinsp;.03\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.49\u0026thinsp;\u0026plusmn;\u0026thinsp;.03 (Δ-0.30)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.67\u0026thinsp;\u0026plusmn;\u0026thinsp;.00 (Δ-0.12)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.56\u0026thinsp;\u0026plusmn;\u0026thinsp;.05 (Δ-0.23)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.63\u0026thinsp;\u0026plusmn;\u0026thinsp;.08 (Δ-0.16)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eIntra. Lymph Node\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.96\u0026thinsp;\u0026plusmn;\u0026thinsp;.05\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.46\u0026thinsp;\u0026plusmn;\u0026thinsp;.19 (Δ-0.50)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.84\u0026thinsp;\u0026plusmn;\u0026thinsp;.10 (Δ-0.12)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.73\u0026thinsp;\u0026plusmn;\u0026thinsp;.05 (Δ-0.23)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e\u003cb\u003e0.75\u0026thinsp;\u0026plusmn;\u0026thinsp;.01 (Δ-0.20)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eNipple Retraction\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.90\u0026thinsp;\u0026plusmn;\u0026thinsp;.05\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.53\u0026thinsp;\u0026plusmn;\u0026thinsp;.28 (Δ-0.38)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e0.91\u0026thinsp;\u0026plusmn;\u0026thinsp;.08 (Δ\u0026thinsp;+\u0026thinsp;0.01)\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.71\u0026thinsp;\u0026plusmn;\u0026thinsp;.02 (Δ-0.19)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.82\u0026thinsp;\u0026plusmn;\u0026thinsp;.15 (Δ-0.08)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSkin Retraction\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.95\u0026thinsp;\u0026plusmn;\u0026thinsp;.12\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003e0.45\u0026thinsp;\u0026plusmn;\u0026thinsp;.30 (Δ-0.51)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.88\u0026thinsp;\u0026plusmn;\u0026thinsp;.18 (Δ-0.08)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e0.69\u0026thinsp;\u0026plusmn;\u0026thinsp;.05 (Δ-0.26)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.79\u0026thinsp;\u0026plusmn;\u0026thinsp;.08 (Δ-0.16)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTrabecular Thickening\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.78\u0026thinsp;\u0026plusmn;\u0026thinsp;.19\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e0.67\u0026thinsp;\u0026plusmn;\u0026thinsp;.08 (Δ-0.12)\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.67\u0026thinsp;\u0026plusmn;\u0026thinsp;.47 (Δ-0.12)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.62\u0026thinsp;\u0026plusmn;\u0026thinsp;.15 (Δ-0.16)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e0.82\u0026thinsp;\u0026plusmn;\u0026thinsp;.16 (Δ\u0026thinsp;+\u0026thinsp;0.04)\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAverage\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.90\u0026thinsp;\u0026plusmn;\u0026thinsp;.01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.53\u0026thinsp;\u0026plusmn;\u0026thinsp;.18 (Δ-0.37)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.76\u0026thinsp;\u0026plusmn;\u0026thinsp;.16 (Δ-0.14)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.71\u0026thinsp;\u0026plusmn;\u0026thinsp;.01 (Δ-0.18)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.79\u0026thinsp;\u0026plusmn;\u0026thinsp;.06 (Δ-0.11)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colspan=\"6\" nameend=\"c6\" namest=\"c1\"\u003e \u003cp\u003e\u003cem\u003eCDD-CESM (n\u0026thinsp;=\u0026thinsp;326)\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eNodule\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.96\u0026thinsp;\u0026plusmn;\u0026thinsp;.06\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.46\u0026thinsp;\u0026plusmn;\u0026thinsp;.35 (Δ-0.51)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.52\u0026thinsp;\u0026plusmn;\u0026thinsp;.29 (Δ-0.45)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.83\u0026thinsp;\u0026plusmn;\u0026thinsp;.15 (Δ-0.13)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.97\u0026thinsp;\u0026plusmn;\u0026thinsp;.03 (Δ\u0026thinsp;+\u0026thinsp;0.01)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMass\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.95\u0026thinsp;\u0026plusmn;\u0026thinsp;.04\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.56\u0026thinsp;\u0026plusmn;\u0026thinsp;.25 (Δ-0.40)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.77\u0026thinsp;\u0026plusmn;\u0026thinsp;.24 (Δ-0.18)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.88\u0026thinsp;\u0026plusmn;\u0026thinsp;.12 (Δ-0.08)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.98\u0026thinsp;\u0026plusmn;\u0026thinsp;.01 (Δ\u0026thinsp;+\u0026thinsp;0.03)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCalcification\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.92\u0026thinsp;\u0026plusmn;\u0026thinsp;.06\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.58\u0026thinsp;\u0026plusmn;\u0026thinsp;.26 (Δ-0.34)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.81\u0026thinsp;\u0026plusmn;\u0026thinsp;.12 (Δ-0.11)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.88\u0026thinsp;\u0026plusmn;\u0026thinsp;.12 (Δ-0.04)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.93\u0026thinsp;\u0026plusmn;\u0026thinsp;.06 (Δ\u0026thinsp;+\u0026thinsp;0.02)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAsymmetry\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.95\u0026thinsp;\u0026plusmn;\u0026thinsp;.12\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.70\u0026thinsp;\u0026plusmn;\u0026thinsp;.41 (Δ-0.25)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.80\u0026thinsp;\u0026plusmn;\u0026thinsp;.08 (Δ-0.16)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.72\u0026thinsp;\u0026plusmn;\u0026thinsp;.04 (Δ-0.23)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.96\u0026thinsp;\u0026plusmn;\u0026thinsp;.08 (Δ\u0026thinsp;+\u0026thinsp;0.01)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eArchitectural Dist.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.96\u0026thinsp;\u0026plusmn;\u0026thinsp;.08\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.70\u0026thinsp;\u0026plusmn;\u0026thinsp;.43 (Δ-0.27)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.87\u0026thinsp;\u0026plusmn;\u0026thinsp;.19 (Δ-0.10)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.77\u0026thinsp;\u0026plusmn;\u0026thinsp;.04 (Δ-0.19)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.86\u0026thinsp;\u0026plusmn;\u0026thinsp;.13 (Δ-0.11)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSkin Thickening\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.71\u0026thinsp;\u0026plusmn;\u0026thinsp;.30\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.62\u0026thinsp;\u0026plusmn;\u0026thinsp;.52 (Δ-0.09)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.81\u0026thinsp;\u0026plusmn;\u0026thinsp;.27 (Δ\u0026thinsp;+\u0026thinsp;0.10)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.77\u0026thinsp;\u0026plusmn;\u0026thinsp;.15 (Δ\u0026thinsp;+\u0026thinsp;0.07)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.73\u0026thinsp;\u0026plusmn;\u0026thinsp;.08 (Δ\u0026thinsp;+\u0026thinsp;0.03)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLymph Node Enlarge.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.94\u0026thinsp;\u0026plusmn;\u0026thinsp;.14\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003e0.24\u0026thinsp;\u0026plusmn;\u0026thinsp;.02 (Δ-0.71)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.25\u0026thinsp;\u0026plusmn;\u0026thinsp;.01 (Δ-0.70)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e0.31\u0026thinsp;\u0026plusmn;\u0026thinsp;.06 (Δ-0.63)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e\u003cb\u003e0.53\u0026thinsp;\u0026plusmn;\u0026thinsp;.11 (Δ-0.42)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eIntra. Lymph Node\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.86\u0026thinsp;\u0026plusmn;\u0026thinsp;.22\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.62\u0026thinsp;\u0026plusmn;\u0026thinsp;.34 (Δ-0.24)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.76\u0026thinsp;\u0026plusmn;\u0026thinsp;.30 (Δ-0.10)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.86\u0026thinsp;\u0026plusmn;\u0026thinsp;.12 (Δ-0.00)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.98\u0026thinsp;\u0026plusmn;\u0026thinsp;.03 (Δ\u0026thinsp;+\u0026thinsp;0.13)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eNipple Retraction\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.90\u0026thinsp;\u0026plusmn;\u0026thinsp;.15\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.71\u0026thinsp;\u0026plusmn;\u0026thinsp;.42 (Δ-0.19)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.95\u0026thinsp;\u0026plusmn;\u0026thinsp;.05 (Δ\u0026thinsp;+\u0026thinsp;0.05)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.83\u0026thinsp;\u0026plusmn;\u0026thinsp;.15 (Δ-0.07)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e1.00\u0026thinsp;\u0026plusmn;\u0026thinsp;.00 (Δ\u0026thinsp;+\u0026thinsp;0.10)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSkin Retraction\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.85\u0026thinsp;\u0026plusmn;\u0026thinsp;.15\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.60\u0026thinsp;\u0026plusmn;\u0026thinsp;.57 (Δ-0.25)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.76\u0026thinsp;\u0026plusmn;\u0026thinsp;.34 (Δ-0.09)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.69\u0026thinsp;\u0026plusmn;\u0026thinsp;.08 (Δ-0.16)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.85\u0026thinsp;\u0026plusmn;\u0026thinsp;.27 (Δ-0.00)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTrabecular Thickening\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.64\u0026thinsp;\u0026plusmn;\u0026thinsp;.30\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e0.66\u0026thinsp;\u0026plusmn;\u0026thinsp;.49 (Δ\u0026thinsp;+\u0026thinsp;0.01)\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e0.83\u0026thinsp;\u0026plusmn;\u0026thinsp;.25 (Δ\u0026thinsp;+\u0026thinsp;0.18)\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e0.86\u0026thinsp;\u0026plusmn;\u0026thinsp;.12 (Δ\u0026thinsp;+\u0026thinsp;0.22)\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e1.00\u0026thinsp;\u0026plusmn;\u0026thinsp;.00 (Δ\u0026thinsp;+\u0026thinsp;0.36)\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAverage\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.88\u0026thinsp;\u0026plusmn;\u0026thinsp;.10\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.58\u0026thinsp;\u0026plusmn;\u0026thinsp;.37 (Δ-0.30)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.74\u0026thinsp;\u0026plusmn;\u0026thinsp;.19 (Δ-0.14)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.76\u0026thinsp;\u0026plusmn;\u0026thinsp;.04 (Δ-0.11)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.89\u0026thinsp;\u0026plusmn;\u0026thinsp;.03 (Δ\u0026thinsp;+\u0026thinsp;0.01)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003ctfoot\u003e \u003ctr\u003e\u003ctd colspan=\"6\"\u003eNote: Avg. RRIEF represents the average performance across all RRIEF-LLaMA models (LLaMA1: 7B, 13B, 30B, 65B; LLaMA3: 8B, 70B). Avg. LLaMA 0-shot and 3-shot represent the average performance of LLaMA1-65B and LLaMA3-70B in 0-shot and 3-shot settings, respectively. Avg. Proprietary LLMs 0-shot and 3-shot represent the average performance of GPT-4o, Gemini-1.5-Flash, and Claude-3.5-Sonnet in 0-shot and 3-shot settings, respectively. The performance metric used is macro F1 score, presented as mean\u0026thinsp;\u0026plusmn;\u0026thinsp;standard deviation (e.g., 0.47\u0026thinsp;\u0026plusmn;\u0026thinsp;.04). The difference from Avg. RRIEF is expressed in parentheses using Δ. For each method, the mammography finding with the lowest Δ value is highlighted in bold, while the finding with the highest Δ value is underlined. Detailed performance for each model is shown in \u003cb\u003eTable S5-S8\u003c/b\u003e. RRIEF\u0026thinsp;=\u0026thinsp;Radiology Report Information Extraction Framework, B\u0026thinsp;=\u0026thinsp;Billion, Architectural Dist. = Architectural distortion, Lymph Node Enlarge. = Lymph node enlargement, Intra. Lymph Node\u0026thinsp;=\u0026thinsp;Intramammary lymph node, LLaMA\u0026thinsp;=\u0026thinsp;Large Language Model Meta AI.\u003c/td\u003e\u003c/tr\u003e \u003c/tfoot\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec14\" class=\"Section2\"\u003e \u003ch2\u003eReport Labeling Performance on Coronary CT Angiography\u003c/h2\u003e \u003cp\u003eAs shown in Table\u0026nbsp;\u003cspan refid=\"Tab5\" class=\"InternalRef\"\u003e5\u003c/span\u003e, RRIEF-LLaMA3-8B demonstrated significant performance improvements in both internal and external tests compared to its 3-shot counterpart and LLaMA3-70B across all categories: stenosis severity, plaque burden, and modifiers. When compared to DeepSeek-R1-Distill-Qwen-14B, RRIEF-LLaMA3-8B showed significant performance advantages in all categories with the sole exception of plaque burden assessment in the external test. Against proprietary LLMs, RRIEF-LLaMA3-8B exhibited significantly superior performance showing F1 score of 0.87 in stenosis severity labeling compared to Gemini-1.5-Flash in the internal test (0.83, P\u0026thinsp;=\u0026thinsp;.02) and against GPT-4o in the external test (0.68, P\u0026thinsp;\u0026lt;\u0026thinsp;.001). Notably, for modifiers, RRIEF-LLaMA3-8B achieved significantly higher performance than all proprietary models in the external test, with an F1 score of 1.00 compared to 0.93 for the best-performing proprietary model (P\u0026thinsp;=\u0026thinsp;.004).\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab5\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 5\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003ePerformance on Coronary CT Angiography Reports.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"8\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c8\" colnum=\"8\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eMethod\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eModel\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"3\" nameend=\"c5\" namest=\"c3\"\u003e \u003cp\u003eSiteB-CCTA (n\u0026thinsp;=\u0026thinsp;50)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"3\" nameend=\"c8\" namest=\"c6\"\u003e \u003cp\u003eSiteC-CCTA (n\u0026thinsp;=\u0026thinsp;51)\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eStenosis Severity\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003ePlaque Burden\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eModifiers\u003c/p\u003e \u003cp\u003e(averaged)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eStenosis Severity\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003ePlaque Burden\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c8\"\u003e \u003cp\u003eModifiers\u003c/p\u003e \u003cp\u003e(averaged)\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"5\" rowspan=\"6\"\u003e \u003cp\u003e3-shot\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eLLaMA3-8B\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.31*\u003c/p\u003e \u003cp\u003e(0.29, 0.34)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.77*\u003c/p\u003e \u003cp\u003e(0.72, 0.82)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.84*\u003c/p\u003e \u003cp\u003e(0.81, 0.87)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.41*\u003c/p\u003e \u003cp\u003e(0.35, 0.47)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.54*\u003c/p\u003e \u003cp\u003e(0.48, 0.59)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.91*\u003c/p\u003e \u003cp\u003e(0.89, 0.94)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eLLaMA3-70B\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.61*\u003c/p\u003e \u003cp\u003e(0.56, 0.65)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.85*\u003c/p\u003e \u003cp\u003e(0.81, 0.90)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e0.99*\u003c/span\u003e\u003c/p\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e(0.98, 1.00)\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.74*\u003c/p\u003e \u003cp\u003e(0.68, 0.81)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.44*\u003c/p\u003e \u003cp\u003e(0.35, 0.54)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e0.94*\u003c/span\u003e\u003c/p\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e(0.91, 0.97)\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eDeepSeek-R1-Distill-Qwen-14B\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.67*\u003c/p\u003e \u003cp\u003e(0.63, 0.70)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.84*\u003c/p\u003e \u003cp\u003e(0.79, 0.89)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e0.99*\u003c/span\u003e\u003c/p\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e(0.98, 1.00)\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.73*\u003c/p\u003e \u003cp\u003e(0.67, 0.79)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e0.94\u003c/span\u003e\u003c/p\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e(0.93, 0.96)\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.90*\u003c/p\u003e \u003cp\u003e(0.87, 0.93)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eGPT-4o\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e0.89\u003c/span\u003e\u003c/p\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e(0.85, 0.93)\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e1.00\u003c/b\u003e\u003c/p\u003e \u003cp\u003e\u003cb\u003e(1.00, 1.00)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e1.00\u003c/b\u003e\u003c/p\u003e \u003cp\u003e\u003cb\u003e(1.00, 1.00)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.68*\u003c/p\u003e \u003cp\u003e(0.63, 0.73)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e1.00\u003c/b\u003e\u003c/p\u003e \u003cp\u003e\u003cb\u003e(1.00, 1.00)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.93*\u003c/p\u003e \u003cp\u003e(0.91, 0.96)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eGemini-1.5-Flash\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.83*\u003c/p\u003e \u003cp\u003e(0.77, 0.88)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e1.00\u003c/b\u003e\u003c/p\u003e \u003cp\u003e\u003cb\u003e(1.00, 1.00)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e1.00\u003c/b\u003e\u003c/p\u003e \u003cp\u003e\u003cb\u003e(1.00, 1.00)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e0.91\u003c/span\u003e\u003c/p\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e(0.88, 0.94)\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e1.00\u003c/b\u003e\u003c/p\u003e \u003cp\u003e\u003cb\u003e(1.00, 1.00)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.91*\u003c/p\u003e \u003cp\u003e(0.89, 0.94)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eClaude-3.5-Sonnet\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003e0.99\u003c/b\u003e\u003c/p\u003e \u003cp\u003e\u003cb\u003e(0.97, 1.00)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e1.00\u003c/b\u003e\u003c/p\u003e \u003cp\u003e\u003cb\u003e(1.00, 1.00)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.92*\u003c/p\u003e \u003cp\u003e(0.91, 0.92)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e\u003cb\u003e0.94\u003c/b\u003e\u003c/p\u003e \u003cp\u003e\u003cb\u003e(0.91, 0.98)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.93\u003c/p\u003e \u003cp\u003e(0.91, 0.95)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.93*\u003c/p\u003e \u003cp\u003e(0.91, 0.95)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eRRIEF\u003c/b\u003e\u003c/p\u003e \u003cp\u003e\u003cb\u003e(Ours)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eLLaMA3-8B\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.87\u003c/p\u003e \u003cp\u003e(0.83, 0.91)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e0.98\u003c/span\u003e\u003c/p\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e(0.96, 0.99)\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e1.00\u003c/b\u003e\u003c/p\u003e \u003cp\u003e\u003cb\u003e(1.00, 1.00)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.83\u003c/p\u003e \u003cp\u003e(0.77, 0.88)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.93\u003c/p\u003e \u003cp\u003e(0.90, 0.96)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003e1.00\u003c/b\u003e\u003c/p\u003e \u003cp\u003e\u003cb\u003e(1.00, 1.00)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003ctfoot\u003e \u003ctr\u003e\u003ctd colspan=\"8\"\u003eNote: Data represent the \u003cb\u003eMacro F1\u003c/b\u003e Score and 95% Confidence Interval from 10 Bootstrap Tests. For each column, the highest performance is shown in bold and the second-highest is underlined. Values marked with an asterisk (*) indicate statistically significant improvements of RRIEF-LLaMA3-8B compared to each model. \u0026lsquo;Modifiers (average)\u0026rsquo; represents the simple average of F1 scores for E, I, N, G, HRP, and S. Detailed performance for each model is shown in \u003cb\u003eTable S9\u003c/b\u003e and \u003cb\u003eS10.\u003c/b\u003e RRIEF\u0026thinsp;=\u0026thinsp;Radiology Report Information Extraction Framework, LLaMA\u0026thinsp;=\u0026thinsp;Large language model Meta AI.\u003c/td\u003e\u003c/tr\u003e \u003c/tfoot\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec15\" class=\"Section2\"\u003e \u003ch2\u003eEffect of Training Sample Size on Model Performance\u003c/h2\u003e \u003cp\u003eFigure \u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e demonstrates a strong positive correlation between training data size and performance of RRIEF-LLaMA3-8B on the MIMIC-CXR dataset (Pearson's correlation coefficient 0.75, P\u0026thinsp;\u0026lt;\u0026thinsp;.001). Notably, with as few as 200 training reports, RRIEF-LLaMA3-8B achieved a macro F1 score of 0.79 (95% CI: 0.77, 0.82) and surpassed all baseline models including CheXbert and 3-shot configurations of Claude-3.5-Sonnet, LLaMA3-70B, and DeepSeek-R1-Distill-Qwen-14B (P\u0026thinsp;\u0026lt;\u0026thinsp;.001 for all comparisons). Performance began to plateau around 500 training reports, where RRIEF-LLaMA3-8B achieved an F1 score of 0.84 (95% CI: 0.82, 0.86), with only marginal improvement observed when increasing to 1,000 reports (0.85, 95% CI: 0.85, 0.86).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e"},{"header":"Discussion","content":"\u003cp\u003eIn this study, we developed and validated an efficient framework for information extraction from different types of radiology reports. Our approach achieved three key objectives. First, we established a privacy-preserving framework that significantly outperformed existing methodologies, achieving F1 scores of 0.87 and 0.85 in internal and external tests for chest X-ray reports, surpassing CheXpert Labeler, CheXbert, and all proprietary models (P\u0026thinsp;\u0026lt;\u0026thinsp;.001). Second, we demonstrated the framework's effectiveness across different imaging modalities with distinct characteristics, showing high performance in mammography (F1 scores: 0.91 and 0.99 in internal and external tests) and coronary CT angiography reports (F1 scores: 0.87 for stenosis severity, 0.98 for plaque burden, and 1.00 for modifiers in internal testing). Third, to facilitate widespread adoption, we analyzed the relationship between training data size and performance, finding that 200\u0026ndash;500 annotated reports were sufficient to achieve superior results compared to specialized methods and proprietary models (P\u0026thinsp;\u0026lt;\u0026thinsp;.001), while providing our implementation as an open-source resource.\u003c/p\u003e \u003cp\u003eRecent studies have highlighted the potential of LLMs for extracting information from radiology reports, yet most prior approaches relied on zero-shot or few-shot methods without additional training (\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e, \u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e, \u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e). These attempts yielded performance levels inadequate for clinical implementation (\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e), as confirmed by our comparative analysis where even leading proprietary models like GPT-4o, Gemini-1.5-Flash, and Claude-3.5-Sonnet underperformed significantly. Additionally, previous methodologies have predominantly focused on reports from single modalities, particularly CXR (\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e, \u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e, \u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e), creating siloed solutions with limited transferability to different imaging types and reporting formats. RRIEF addresses these limitations through parameter-efficient fine-tuning of open-source LLMs, enabling local deployment while preserving patient privacy.\u003c/p\u003e \u003cp\u003eThe performance variations observed across different findings in our mammography experiments highlight the impact of model size on generalizability when applying RRIEF. We found that smaller models like RRIEF-LLaMA1-7B showed considerable variability in handling certain findings such as 'Skin Thickening', 'Skin Retraction', and 'Trabecular Thickening', achieving F1 scores as low as 0.24 in the external test set. In contrast, larger models like RRIEF-LLaMA3-70B attained perfect scores of 1.00 for these same findings. This disparity reflects larger models' superior ability to bridge discrepancies between training and test datasets, particularly when faced with terminology variations or uncommon findings. For instance, certain findings had varying expressions across datasets (e.g., 'intramammary lymph node' versus 'IMLN'), while others were absent from the training set but present in the test set. The robust performance of larger models in these challenging scenarios demonstrates their enhanced capacity to generalize and adapt to novel or variant terms. These findings underscore the advantages of our RRIEF framework when implemented with larger models, particularly for handling diverse medical terminologies and unseen concepts\u0026mdash;a crucial capability for real-world medical practices where terminology and reporting styles vary substantially across institutions, radiologists, and over time.\u003c/p\u003e \u003cp\u003eOur analysis also reveals a positive correlation between training sample size and performance on the MIMIC-CXR dataset, with our approach outperforming all baselines using 200 training samples. While performance improves with more training data, we observed diminishing returns beyond 500 reports for CXR labeling. This efficiency extends across modalities, as demonstrated by RRIEF achieving comparable performance to leading proprietary models on CCTA datasets using only 100 training reports.\u003c/p\u003e \u003cp\u003e To implement RRIEF for new datasets, users need only follow a straightforward process: first, define appropriate findings and labels according to clinical requirements or specialty guidelines. This flexibility to freely set findings and labels underscores the method's extensive scalability, as demonstrated in our mammography implementation where we created specialized laterality labels (e.g., 'right breast', 'bilateral breasts') to capture this critical diagnostic information. Second, annotate a reasonable number of reports (200\u0026ndash;500 based on our experiments), which can be expedited by using off-the-shelf LLMs for initial annotations followed by expert review. Finally, apply parameter-efficient fine-tuning using techniques like QLoRA (\u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e), which dramatically reduces computational requirements while maintaining performance\u0026mdash;enabling even the largest models (e.g., 70B parameters) to be fine-tuned on a single 48GB GPU. Our publicly shared implementation code allows easy adoption of this framework, enabling customized report labeling solutions across diverse clinical settings without specialized expertise.\u003c/p\u003e \u003cp\u003eHowever, this study faces the limitation of having a relatively small dataset for testing. Each dataset contains 1,000 reports or fewer, with a large proportion\u0026mdash;around 20%\u0026mdash;comprising normal cases devoid of any abnormalities. This condition leads to the presence of numerous rare findings, each with fewer than ten positive cases, such as \u0026lsquo;Pneumothorax\u0026rsquo; in CXR and \u0026lsquo;Skin Retraction\u0026rsquo; in mammography. Furthermore, the diversity of labels assigned to each finding, which totals four for CXR and five for mammography, may somewhat diminish the statistical rigor of our reported performance for these rare findings.\u003c/p\u003e \u003cp\u003eIn conclusion, RRIEF demonstrates high-performance information extraction across chest X-ray, mammography, and coronary CT angiography reports while preserving patient privacy through locally deployed fine-tuned LLMs. Our framework significantly outperformed existing specialized methods and proprietary LLMs with minimal training requirements. We anticipate that our method will serve as an effective automatic report labeling strategy for various imaging modalities and reporting styles, facilitating large-scale retrospective studies and offering a practical tool to advance radiological research and clinical practice.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eD\u003c/strong\u003e\u003cstrong\u003eata Availability\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eFor the publicly available datasets (MIMIC-CXR, Open-i, CDD-CESM), we have shared the annotated labels for the radiology reports used in our training and testing sets. The SiteA-MMG dataset contains identifiable protected health information and therefore cannot be shared publicly. For the SiteB-CCTA, we have made both the reports and annotated labels publicly available. The SiteC-CCTA dataset is not publicly available but can be shared upon request to the corresponding author. Data are available at https://github.com/reonaledo/report_labeler.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCode Availability\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe analytical code for reproducing the results of this study and a user-friendly implementation that can be adapted for specific research purposes are currently in development. The complete implementation code will be made publicly available at https://github.com/reonaledo/report_labeler upon publication of this manuscript.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAcknowledgements\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis study was supported by the National Research Foundation of Korea (NRF) grants funded by the Ministry of Science and ICT (MSIT) (Grant No. RS-2024-00354666) and the Seoul National University Hospital Research Fund (Grant No. 03-2023-0410). The funders played no role in study design, data collection, analysis and interpretation of data, or the writing of this manuscript.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthor Contribution\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eConceptualization: D.M, C.M.P, J.M.CSupervision: C.M.P, J.M.CWriting: D.MData acquisition: K.N.J, S.B, W.G.J, J.M.C, S.KData analysis: D.M, S.KCritical review: S.H, J.C, J.M.C, C.M.PAll authors read and approved the final manuscript and had final responsibility for the decision to submit it for publication.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eKong HJ. Managing Unstructured Big Data in Healthcare System. Healthc Inform Res. 2019;25(1):1\u0026ndash;2.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCasey A, Davidson E, Poon M, Dong H, Duma D, Grivas A, et al. A systematic review of natural language processing applied to radiology reports. BMC Med Inform Decis Mak. 2021;21(1):179.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eReichenpfader D, M\u0026uuml;ller H, Denecke K. A scoping review of large language model based approaches for information extraction from radiology reports. Npj Digit Med. 2024;7(1):1\u0026ndash;12.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePivovarov R, Coppleson YJ, Gorman SL, Vawdrey DK, Elhadad N. Can Patient Record Summarization Support Quality Metric Abstraction? AMIA Annu Symp Proc AMIA Symp. 2016;2016:1020\u0026ndash;9.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eIrvin J, Rajpurkar P, Ko M, Yu Y, Ciurea-Ilcus S, Chute C, et al. CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison [Internet]. arXiv; 2019 [cited 2024 Jan 4]. Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://arxiv.org/abs/1901.07031\u003c/span\u003e\u003cspan address=\"http://arxiv.org/abs/1901.07031\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSmit A, Jain S, Rajpurkar P, Pareek A, Ng AY, Lungren MP. CheXbert: Combining Automatic Labelers and Expert Annotations for Accurate Radiology Report Labeling Using BERT [Internet]. arXiv; 2020 [cited 2024 Jan 4]. Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://arxiv.org/abs/2004.09167\u003c/span\u003e\u003cspan address=\"http://arxiv.org/abs/2004.09167\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKuling G, Curpen B, Martel AL. BI-RADS BERT and Using Section Segmentation to Understand Radiology Reports. J Imaging. 2022;8(5):131.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTorres-Lopez VM, Rovenolt GE, Olcese AJ, Garcia GE, Chacko SM, Robinson A, et al. Development and Validation of a Model to Identify Critical Brain Injuries Using Natural Language Processing of Text Computed Tomography Reports. JAMA Netw Open. 2022;5(8):e2227109.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWood DA, Lynch J, Kafiabadi S, Guilhem E, Busaidi AA, Montvila A, et al. Automated Labelling using an Attention model for Radiology reports of MRI scans (ALARM) [Internet]. arXiv; 2020 [cited 2024 Nov 20]. Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://arxiv.org/abs/2002.06588\u003c/span\u003e\u003cspan address=\"http://arxiv.org/abs/2002.06588\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWood DA, Kafiabadi S, Al Busaidi A, Guilhem EL, Lynch J, Townend MK, et al. Deep learning to automate the labelling of head MRI datasets for computer vision applications. Eur Radiol. 2022;32(1):725\u0026ndash;36.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZech J, Pain M, Titano J, Badgeley M, Schefflein J, Su A, et al. Natural Language\u0026ndash;based Machine Learning Models for the Annotation of Clinical Radiology Reports. Radiology. 2018;287(2):570\u0026ndash;80.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAdams LC, Truhn D, Busch F, Kader A, Niehues SM, Makowski MR, et al. Leveraging GPT-4 for Post Hoc Transformation of Free-text Radiology Reports into Structured Reporting: A Multilingual Feasibility Study. Radiology. 2023;307(4):e230725.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMukherjee P, Hou B, Lanfredi RB, Summers RM. Feasibility of Using the Privacy-preserving Large Language Model Vicuna for Labeling Radiology Reports. Radiology. 2023;309(1):e231147.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWiest IC, Ferber D, Zhu J, van Treeck M, Meyer SK, Juglan R, et al. Privacy-preserving large language models for structured medical information retrieval. Npj Digit Med. 2024;7(1):1\u0026ndash;9.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTouvron H, Martin L, Stone K, Albert P, Almahairi A, Babaei Y, et al. Llama 2: Open Foundation and Fine-Tuned Chat Models [Internet]. arXiv; 2023 [cited 2024 Jan 5]. Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://arxiv.org/abs/2307.09288\u003c/span\u003e\u003cspan address=\"http://arxiv.org/abs/2307.09288\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDubey A, Jauhri A, Pandey A, Kadian A, Al-Dahle A, Letman A, et al. arXiv.org. 2024 [cited 2024 Sep 11]. The Llama 3 Herd of Models. Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://arxiv.org/abs/2407.21783v2\u003c/span\u003e\u003cspan address=\"https://arxiv.org/abs/2407.21783v2\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eOpenAI, Achiam J, Adler S, Agarwal S, Ahmad L, Akkaya I, et al. GPT-4 Technical Report [Internet]. arXiv; 2024 [cited 2024 Mar 14]. Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://arxiv.org/abs/2303.08774\u003c/span\u003e\u003cspan address=\"http://arxiv.org/abs/2303.08774\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTouvron H, Lavril T, Izacard G, Martinet X, Lachaux MA, Lacroix T, et al. LLaMA: Open and Efficient Foundation Language Models [Internet]. arXiv; 2023 [cited 2024 Jan 5]. Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://arxiv.org/abs/2302.13971\u003c/span\u003e\u003cspan address=\"http://arxiv.org/abs/2302.13971\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBrown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language Models are Few-Shot Learners. In: Advances in Neural Information Processing Systems [Internet]. Curran Associates, Inc.; 2020 [cited 2024 Mar 14]. p. 1877\u0026ndash;901. Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://papers.nips.cc/paper/2020/hash/1457c0d6bfcb\u003c/span\u003e\u003cspan address=\"https://papers.nips.cc/paper/2020/hash/1457c0d6bfcb\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e4967418bfb8ac142f64a-Abstract.html\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eClusmann J, Kolbinger FR, Muti HS, Carrero ZI, Eckardt JN, Laleh NG, et al. The future landscape of large language models in medicine. Commun Med. 2023;3(1):1\u0026ndash;8.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi Y, Li Z, Zhang K, Dan R, Jiang S, Zhang Y. ChatDoctor: A Medical Chat Model Fine-Tuned on a Large Language Model Meta-AI (LLaMA) Using Medical Domain Knowledge [Internet]. arXiv; 2023 [cited 2024 Mar 15]. Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://arxiv.org/abs/2303.14070\u003c/span\u003e\u003cspan address=\"http://arxiv.org/abs/2303.14070\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBhayana R. Chatbots and Large Language Models in Radiology: A Practical Primer for Clinical and Research Applications. Radiology. 2024;310(1):e232756.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eUeda D, Mitsuyama Y, Takita H, Horiuchi D, Walston SL, Tatekawa H, et al. ChatGPT\u0026rsquo;s Diagnostic Performance from Patient History and Imaging Findings on the Diagnosis Please Quizzes. Radiology. 2023;308(1):e231040.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAmin KS, Davis MA, Doshi R, Haims AH, Khosla P, Forman HP. Accuracy of ChatGPT, Google Bard, and Microsoft Bing for Simplifying Radiology Reports. Radiology. 2023;309(2):e232561.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBera K, O\u0026rsquo;Connor G, Jiang S, Tirumani SH, Ramaiya N. Analysis of ChatGPT publications in radiology: Literature so far. Curr Probl Diagn Radiol. 2024;53(2):215\u0026ndash;25.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFink MA, Bischoff A, Fink CA, Moll M, Kroschke J, Dulz L, et al. Potential of ChatGPT and GPT-4 for Data Mining of Free-Text CT Reports on Lung Cancer. Radiology. 2023;308(3):e231362.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi D, Gupta K, Chong J. Evaluating Diagnostic Performance of ChatGPT in Radiology: Delving into Methods. Radiology. 2023;308(3):e232082.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRamasamy SK. Response to Performance of ChatGPT on a Radiology Board-style Examination. Radiology. 2023;307(5):e231330.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLehnen NC, Dorn F, Wiest IC, Zimmermann H, Radbruch A, Kather JN, et al. Data Extraction from Free-Text Reports on Mechanical Thrombectomy in Acute Ischemic Stroke Using ChatGPT: A Retrospective Analysis. Radiology. 2024;311(1):e232741.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGu J, Cho HC, Kim J, You K, Hong EK, Roh B. CheX-GPT: Harnessing Large Language Models for Enhanced Chest X-ray Report Labeling [Internet]. arXiv; 2024 [cited 2024 Apr 19]. Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://arxiv.org/abs/2401.11505\u003c/span\u003e\u003cspan address=\"http://arxiv.org/abs/2401.11505\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDorfner FJ, J\u0026uuml;rgensen L, Donle L, Al Mohamad F, Bodenmann TR, Cleveland MC, et al. Comparing Commercial and Open-Source Large Language Models for Labeling Chest Radiograph Reports. Radiology. 2024;313(1):e241139.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMinssen T, Vayena E, Cohen IG. The Challenges for Regulating Medical Use of ChatGPT and Other Large Language Models. JAMA. 2023;330(4):315\u0026ndash;6.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMesk\u0026oacute; B, Topol EJ. The imperative for regulatory oversight of large language models (or generative AI) in healthcare. Npj Digit Med. 2023;6(1):1\u0026ndash;6.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWu C, Lin W, Zhang X, Zhang Y, Wang Y, Xie W. PMC-LLaMA: Towards Building Open-source Language Models for Medicine [Internet]. arXiv; 2023 [cited 2024 Mar 15]. Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://arxiv.org/abs/2304.14454\u003c/span\u003e\u003cspan address=\"http://arxiv.org/abs/2304.14454\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRaeini M. Privacy-Preserving Large Language Models (PPLLMs) [Internet]. Rochester, NY: Social Science Research Network; 2023 [cited 2024 Nov 21]. Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://papers.ssrn.com/abstract=4512071\u003c/span\u003e\u003cspan address=\"https://papers.ssrn.com/abstract=4512071\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJohnson AEW, Pollard TJ, Berkowitz SJ, Greenbaum NR, Lungren MP, Deng C ying, et al. MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Sci Data. 2019;6(1):317.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDemner-Fushman D, Kohli MD, Rosenman MB, Shooshan SE, Rodriguez L, Antani S, et al. Preparing a collection of radiology examinations for distribution and retrieval. J Am Med Inform Assoc JAMIA. 2016;23(2):304\u0026ndash;10.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKhaled R, Helal M, Alfarghaly O, Mokhtar O, Elkorany A, El Kassas H, et al. Categorized contrast enhanced mammography dataset for diagnostic and artificial intelligence research. Sci Data. 2022;9(1):122.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eD\u0026rsquo;Orsi CJ, Sickles EA, Mendelson EB, Morris EA. 2013 ACR BI-RADS Atlas: Breast Imaging Reporting and Data System [Internet]. American College of Radiology; 2014. Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://books.google.co.kr/books?id=nhWSjwEACAAJ\u003c/span\u003e\u003cspan address=\"https://books.google.co.kr/books?id=nhWSjwEACAAJ\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCAD-RADS\u003csup\u003e\u0026trade;\u003c/sup\u003e 2.0\u0026ndash;2022 Coronary Artery Disease-Reporting and Data System - Journal of Cardiovascular Computed Tomography [Internet]. [cited 2024 Nov 21]. Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.journalofcardiovascularct.com/article/S1934-\u003c/span\u003e\u003cspan address=\"https://www.journalofcardiovascularct.com/article/S1934-\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e5925(22)00240-4/fulltext\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDettmers T, Pagnoni A, Holtzman A, Zettlemoyer L. QLoRA: Efficient Finetuning of Quantized LLMs [Internet]. arXiv; 2023 [cited 2024 Jan 5]. Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://arxiv.org/abs/2305.14314\u003c/span\u003e\u003cspan address=\"http://arxiv.org/abs/2305.14314\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDeepSeek-AI, Guo D, Yang D, Zhang H, Song J, Zhang R, et al. DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning [Internet]. arXiv; 2025 [cited 2025 Feb 14]. Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://arxiv.org/abs/2501.12948\u003c/span\u003e\u003cspan address=\"http://arxiv.org/abs/2501.12948\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Radiology Report Information Extraction, Large Language Models, Privacy-Preserving Framework","lastPublishedDoi":"10.21203/rs.3.rs-6267208/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-6267208/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eEfficient extraction of structured information from unstructured radiology reports remains a critical challenge in healthcare. We introduce the Radiology Report Information Extraction Framework (RRIEF), a privacy-preserving approach utilizing parameter-efficient fine-tuning of open-source large language models (LLMs). We validated RRIEF across chest X-ray (CXR), mammography, and coronary CT angiography (CCTA) reports, evaluating its performance against specialized methods and proprietary LLMs (GPT-4o, Gemini-1.5-Flash, Claude-3.5-Sonnet). For CXR, RRIEF-LLaMA1-65B achieved F1 scores of 0.87 and 0.85 in internal and external tests, significantly outperforming CheXpert Labeler (0.70 and 0.69, P\u0026thinsp;\u0026lt;\u0026thinsp;.001), CheXbert (0.72 and 0.69, P\u0026thinsp;\u0026lt;\u0026thinsp;.001), and all proprietary LLMs (Claude-3.5-Sonnet: 0.69 and 0.62, P\u0026thinsp;\u0026lt;\u0026thinsp;.001). For mammography, RRIEF-LLaMA1-30B/65B reached F1 scores of 0.91 and 0.99 in internal and external tests, exceeding all proprietary LLMs (0.86 and 0.92, P\u0026thinsp;=\u0026thinsp;.002). For CCTA, using only 100 training reports, RRIEF-LLaMA3-8B significantly outperformed Gemini-1.5-Flash in stenosis severity (0.87 vs 0.83, P\u0026thinsp;=\u0026thinsp;.02), GPT-4o in external testing (0.83 vs 0.68, P\u0026thinsp;\u0026lt;\u0026thinsp;.001), and all proprietary models for modifiers in external testing (1.00 vs 0.93, P\u0026thinsp;=\u0026thinsp;.004). Notably, RRIEF-LLaMA3-8B achieved superior performance on CXR with only 200 training samples compared to all baselines including CheXbert and proprietary LLMs (P\u0026thinsp;\u0026lt;\u0026thinsp;.001). Our locally deployable framework enables high-performance information extraction from different types of radiology reports, facilitating large-scale research and clinical practice. We provide our complete implementation code publicly to promote accessibility and adoption.\u003c/p\u003e","manuscriptTitle":"Privacy-Preserving Information Extraction Framework for Diverse Imaging Reports using Large Language Models","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-05-06 14:01:05","doi":"10.21203/rs.3.rs-6267208/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"dd1a7905-a906-4081-bac6-410b8cdda661","owner":[],"postedDate":"May 6th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":47098360,"name":"Biological sciences/Computational biology and bioinformatics/Data processing"},{"id":47098361,"name":"Biological sciences/Computational biology and bioinformatics/Machine learning"},{"id":47098362,"name":"Health sciences/Health care/Medical imaging"}],"tags":[],"updatedAt":"2025-08-07T02:53:38+00:00","versionOfRecord":[],"versionCreatedAt":"2025-05-06 14:01:05","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-6267208","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-6267208","identity":"rs-6267208","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00
unpaywall: last seen: 2026-05-23T02:00:01.238055+00:00

License: CC-BY-4.0