Diagnostic Performance of Expert Physicians Versus General-Purpose Artificial Intelligence Using Standardized Static Coronary CT Images: A Dual-Reference Validation Study | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Diagnostic Performance of Expert Physicians Versus General-Purpose Artificial Intelligence Using Standardized Static Coronary CT Images: A Dual-Reference Validation Study Sefa Okar, Ziya Gökalp BİLGEL, İsa Göktürk BALCI, Gürcan ERBAY, and 1 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-9135888/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Background: Coronary CT angiography (CCTA) is a first-line diagnostic modality for coronary artery disease (CAD), yet its interpretation requires significant expert experience. Although general-purpose multimodal artificial intelligence (GP-AI) models have shown promise in text-based medical tasks, their visual diagnostic performance in evaluating complex CCTA data remains poorly defined. Methods: This single-center retrospective study included 63 patients (252 vessel-based image sets) who underwent both CCTA and invasive coronary angiography. Expert physician consensus and four frontier GP-AI models (GPT-4o, Gemini 2.5, Claude 3.5 Sonnet, and Grok 4) evaluated identical standardized static images using a zero-shot approach with default generation parameters. Obstructive disease was defined as ≥ 50% luminal stenosis. Diagnostic performance was validated against expert consensus for plaque characterization and quantitative coronary angiography (QCA) for stenosis severity. Results: Expert consensus demonstrated robust agreement with QCA across all coronary territories (κ = 0.774–0.933, p < 0.001). In contrast, a marked performance disparity was observed for the GP-AI models; none achieved statistically significant agreement with QCA in the prognostically critical left anterior descending (LAD) or left main coronary arteries (LMCA) (p > 0.05). While Gemini 2.5 showed a moderate correlation in the right coronary artery (ICC = 0.515), overall continuous stenosis assessment and plaque characterization remained uniformly limited and clinically unreliable across all models. Conclusion: Expert physician interpretation remains the reference standard for CCTA. Current frontier GP-AI models are not suitable for independent clinical interpretation of coronary imaging, particularly in anatomically complex segments. These findings emphasize that general visual reasoning cannot yet replace domain-specific cardiovascular AI solutions or expert clinical judgment in specialized radiological tasks. Coronary CT Angiography (CCTA) Artificial Intelligence Large Language Models (LLM) Quantitative Coronary Angiography (QCA) Diagnostic Accuracy Machine Learning Figures Figure 1 Figure 2 Figure 3 1. INTRODUCTION Coronary artery disease (CAD) remains the leading cause of morbidity and mortality globally [ 1 ]. Due to its ability to provide non-invasive imaging of coronary atherosclerosis and stenosis, recent guidelines have established Coronary CT Angiography (CCTA) as the "first-line test" [ 1 – 3 ] and the "cornerstone" of diagnostic management [ 4 ]. Beyond merely identifying lumen stenosis, CCTA provides critical data regarding coronary anatomy and plaque characterization [ 4 , 5 ]. Currently, the interpretation of these images relies heavily on the expertise of specialized cardiologists and radiologists [ 6 ]. Recent reviews highlight the growing potential of AI-based clinical decision support systems in optimizing cardiovascular disease management and risk assessment [ 7 ]. However, as reported in previous studies, visual assessment of CCTA is time-consuming, highly dependent on reader experience, and hallmarked by significant interobserver variability [ 4 , 6 , 8 – 10 ]. To address these limitations, artificial intelligence (AI) technologies are increasingly being integrated into cardiovascular imaging [ 6 ]. "Narrow" AI tools specifically trained for CCTA—such as Cleerly and HeartFlow—have successfully entered clinical workflows, offering automated vessel segmentation and stenosis grading [ 4 , 11 , 12 ]. Extensive clinical trials have demonstrated that these purpose-built platforms enhance diagnostic consistency and reduce reading times. In contrast, the role of general-purpose multimodal artificial intelligence (GP-AI) like ChatGPT, Gemini, Grok, and Claude in medical imaging remains a subject of intense debate. The vast majority of existing studies have evaluated these models solely on text-based tasks [ 13 ]. The only notable study testing visual diagnostic capability was limited to pediatric chest radiographs, where models performed at a "chance level," failing to demonstrate consistent radiological reasoning [ 14 ]. To date, there is virtually no data rigorously evaluating the ability of these multimodal models to interpret coronary angiography. This study aims to bridge that specific gap. Moving beyond text-based benchmarking, we directly tested the visual diagnostic performance of four frontier GP-AI models (Gemini 2.5, Grok 4, Claude 3.5 Sonnet, and ChatGPT-4o). In our study design, both the AI models and the expert physician consensus were presented with the exact same standardized static images (curved multiplanar reconstruction (cMPR)). We then evaluated the diagnostic accuracy of their interpretations against two rigorous reference standards: the full clinical CCTA reports and invasive quantitative coronary angiography (QCA). 2. MATERIALS AND METHODS 2.1. Study Design and Study Population This study was designed as a retrospective diagnostic accuracy investigation conducted at a single tertiary referral center between May 2022 and December 2024. Consecutive patients who underwent Coronary Computed Tomography Angiography (CCTA) for clinical indications and subsequently received invasive coronary angiography (ICA) within 30 days were screened for inclusion (n = 66). To ensure a methodologically rigorous evaluation focused exclusively on native coronary artery anatomy, patients with a history of prior coronary stent implantation or coronary artery bypass grafting (CABG) were excluded (n = 3). Following the application of these exclusion criteria, a total of 63 patients constituted the final study cohort. All procedures were performed in accordance with the principles of the Declaration of Helsinki. This study was approved by Baskent University Institutional Review Board (Project no: KA25/407) and supported by Baskent University Research Fund. Given the retrospective nature of the study, the requirement for written informed consent was waived by the institutional review board. 2.2. Imaging Protocol and Coronary Computed Tomography Angiography Acquisition. To create a standardized visual dataset for a fair comparison between human experts and GP-AI models, representative 2D cMPR keyframes were selected by a dedicated 'preparation team' of two cardiovascular imaging specialists. This team was unblinded, with full access to patients' complete 3D CCTA volumetric datasets, clinical histories, original radiology reports, and reference invasive coronary angiography (ICA) results. Their objective was to deliberately isolate the optimal static frame demonstrating the most severe lesion or characteristic pathology for each vessel. Subsequently, these optimized static images were evaluated by two independent, blinded expert readers. Having no access to the clinical context, 3D CCTA volumes, or ICA results, these physicians reviewed the images under strictly identical, constrained visual conditions as the GP-AI models. This two-stage design effectively eliminated clinical and spatial context as confounding variables 2.3. Image Standardization and Selection of Representative Keyframes To enable a fair and objective comparison between human experts and general-purpose artificial intelligence (GP-AI) models, a 'Representative Keyframe Selection Protocol' was implemented to create a standardized visual dataset (Fig. 1 ). In the preparatory phase of this process, a dedicated preparation team consisting of a Level-3 cardiovascular radiologist and a Level-2 cardiologist reviewed the scans, providing full (unblinded) access to all 3D CCTA volumetric datasets, clinical histories, original radiology reports, and reference invasive coronary angiography (ICA) results of the patients. The primary goal of this team was to intentionally isolate two high-resolution static cMPR frames that optimally showed the most severe lesion or characteristic pathology for each major epicardial coronary artery (LAD, LCx, and RCA). These selected images (JPEG) were completely anonymized by removing all patient identifiers and radiological markings. The main diagnostic evaluation phase of the study was performed in a blinded manner by two independent expert readers (a radiologist and a cardiologist). These physicians, having no access to the patients' clinical context, 3D CCTA volumes, or reference ICA results, evaluated the images under completely identical and restricted visual conditions, just like the GP-AI models. This two-stage design successfully eliminated the influence of clinical and spatial context as confounding variables and ensured that the pure visual recognition capabilities of both human experts and GP-AI models were tested against a standard reference. In total, 252 vessel-specific image sets (four vessels per patient across 63 patients) were prepared and subjected to evaluation by both GP-AI models and expert readers. 2.4. GP-AI Models and Evaluation Methodology In this study, four frontier general-purpose multimodal artificial intelligence (GP-AI) models were evaluated using their most advanced versions available as of February 2026: GPT-4o (OpenAI), Gemini 2.5 (Google), Claude 3.5 Sonnet (Anthropic), and Grok 4 (xAI) [ 15 – 17 ]. All models were accessed through their official web-based interfaces using default generation parameters. To ensure methodological consistency, minimize stochastic variability, and address reproducibility concerns, each image was processed in a new and independent chat session with only a single iteration to prevent cross-contamination of data (zero-shot evaluation). For each major epicardial coronary artery, both GP-AI models and blinded expert readers were required to report predefined diagnostic parameters based on the standardized image sets (Fig. 2 ). These parameters included: (i) the presence or absence of obstructive coronary stenosis (defined as ≥ 50% luminal narrowing relative to reference vessel diameter), (ii) the estimated percentage of luminal narrowing, and (iii) the morphological classification of detected atherosclerotic plaques as calcified, non-calcified (soft), or mixed. 2.5. Reference Standards To avoid subjective interpretation in the assessment of coronary artery lumen stenosis, Quantitative Coronary Angiography (QCA) was established as the reference standard. Invasive coronary angiography procedures were performed using the Siemens Artis zee angiography system. All angiographic images were analyzed by an independent interventional cardiologist blinded to CCTA and AI findings using syngo QCA software (Siemens Healthineers, Erlangen, Germany). In accordance with standard validation protocols [ 18 ], an automated edge-detection algorithm was used to determine arterial contours. Calibration was achieved using catheter tip contrast filling as a reference scaling tool. Minimal lumen diameter (MLD) and reference vessel diameter (RVD) were measured from the end-diastolic square, indicating the most severe stenosis and the least foreshortening (Fig. 3 ). A stenosis threshold of ≥ 50% relative to the reference vessel diameter was defined as hemodynamically significant obstructive coronary artery disease. For plaque characterization, considering the limitations of conventional angiography in differentiating plaque components, expert physician consensus, validated by an experienced cardiologist and radiologist with full access to the patients' volumetric CCTA datasets, was accepted as the reference standard. 2.6. Statistical Analysis The diagnostic performance of artificial intelligence models and human readers was evaluated by calculating sensitivity, specificity, positive predictive value, and negative predictive value. Inter-method agreement for categorical variables was assessed using Cohen’s Kappa (κ) coefficient, interpreted according to the criteria established by Landis and Koch [ 19 ]. All statistical analyses were performed using SPSS version 26.0 (IBM Corp., Armonk, NY, USA), and a two-sided p-value of < 0.05 was considered statistically significant. 3. RESULTS 3.1. Baseline Patient Characteristics The final study cohort comprised 63 patients with a mean age of 60.25 ± 10.59 years. The population demonstrated a high prevalence of cardiovascular risk factors: hypertension was present in 58.7% of patients, diabetes mellitus in 38.1%, and a family history of coronary artery disease in 68.3%. Based on the reference invasive coronary angiography (ICA) results, obstructive coronary artery disease (CAD) was identified in 55.6% of the cohort, reflecting a significant disease burden. Regarding clinical management following ICA, 27 patients (42.9%) underwent percutaneous coronary intervention (PCI), while 8 patients (12.7%) were referred for coronary artery bypass grafting (CABG) (Table 1 ). Table 1 Baseline characteristics of the study cohort Characteristic Value Age, years (Mean ± SD) 60.25 ± 10.59 Sex (Male), n (%) 41 (65.1%) BMI, kg/m² (Median) 27.12 Hypertension, n (%) 37 (58.7%) Diabetes Mellitus, n (%) 24 (38.1%) Family History of CAD, n (%) 43 (68.3%) No Obstructive CAD (< 50% stenosis), n (%) 28 (44.4%) Obstructive CAD (≥50% stenosis), n (%) 35 (55.6%) Medical Therapy, n (%) 28 (44.4%) PCI, n (%) 27 (42.9%) CABG, n (%) 8 (12.7%) 3.2. Reliability of Human Expert Consensus Inter-reader reliability among human observers was initially assessed. For the detection of significant coronary stenosis, inter-observer agreement was moderate across the major epicardial coronary arteries, with Cohen’s Kappa ( κ ) values ranging from 0.477 to 0.519 for the LAD, LCx, and RCA (Table 2 ). In contrast, agreement for the LMCA was notably poor (κ = 0.096), reflecting the negligible prevalence of obstructive lesions in this segment and the high sensitivity of this region to image-related artifacts. The assessment's restriction to predefined static frames inherently reduced spatial and contextual information, thereby increasing susceptibility to variability from blooming artifacts and partial volume effects. This limited agreement even among experienced physicians underscores the intrinsic diagnostic challenge of interpreting CCTA solely from static 2D images. Consequently, a third-reader adjudication process was implemented to establish the final expert consensus, which served as the reference standard for the subsequent GP-AI performance analysis. Table 2 Inter-Observer Agreement (Physician 1 vs. Physician 2) Vessel Kappa (p) ICC (95% CI) p-value LMCA 0.096 (0.029) 0.553 (0.354–0.705) < 0.001 LAD 0.501 (< 0.001) 0.647 (0.475–0.771) < 0.001 CX 0.519 (< 0.001) 0.622 (0.442–0.754) < 0.001 RCA 0.477 (< 0.001) 0.790 (0.674–0.868) < 0.001 3.3. Diagnostic Accuracy for Detection of Significant (≥ 50%) Stenosis When compared with invasive quantitative coronary angiography (QCA) as the reference standard, the Expert Physician Consensus demonstrated consistently high diagnostic performance across all coronary territories. Agreement with QCA was robust, with Cohen’s Kappa ( κ ) values ranging from 0.774 to 0.933 (all p < 0.001), accompanied by sensitivity and specificity values generally exceeding 90%. These findings indicate that experienced readers maintain high concordance with invasive reference measurements despite the methodological reliance on constrained static image inputs. In contrast, a marked disparity was observed in the performance of the frontier GP-AI models, which exhibited substantial diagnostic limitations. Notably, none of the GP-AI models achieved statistically significant agreement with QCA in the LAD or LMCA—the most clinically critical coronary segments. For all four models, Kappa values in these territories were negligible and statistically non-significant (p > 0.05). While marginal statistical significance was reached by selected models in the LCx and RCA, overall diagnostic accuracy, including sensitivity and specificity, remained well below the thresholds required for clinical reliability. For instance, in the LAD, Gemini 2.5 achieved a Kappa value of only 0.160 (p = 0.153), while ChatGPT-4o demonstrated virtually no agreement (κ = 0.004, p = 0.963). Detailed diagnostic performance metrics for each coronary artery, including positive (PPV) and negative predictive values (NPV), are presented in Table 3 . Table 3 Diagnostic Performance for LMCA-LAD-CX-RCA Stenosis (> 50%) vs. QCA Evaluator Kappa (κ) p-value Sensitivity Specificity PPV NPV Accuracy Gemini 2.5- LMCA -0.016 0.897 0% 98.4% 0% 98.4% 96.8% Grok 4- LMCA 0.000 1.000 - 100% - 100% 100% Claude 3.5- LMCA -0.025 0.820 0% 95.1% 0% 98.3% 93.5% ChatGPT-4o- LMCA 0.000 1.000 - 100% - 100% 100% Consensus- LMCA 1.000 < 0.001 100% 100% 100% 100% 100% Gemini 2.5- LAD 0.160 0.153 78.6% 38.2% 51.2% 68.4% 56.5% Grok 4- LAD 0.109 0.290 82.1% 29.4% 48.9% 66.7% 53.2% Claude 3.5- LAD 0.064 0.573 71.4% 35.3% 47.6% 60.0% 51.6% ChatGPT-4o- LAD 0.004 0.963 85.7% 14.7% 45.3% 55.6% 46.8% Consensus- LAD 0.774 < 0.001 92.9% 85.3% 83.9% 93.5% 87.1% Gemini 2.5- CX 0.342 0.007 50.0% 82.9% 58.8% 77.3% 72.1% Grok 4- CX 0.037 0.771 42.9% 61.0% 36.0% 67.6% 54.8% Claude 3.5- CX 0.136 0.285 42.9% 70.7% 42.9% 70.7% 61.3% ChatGPT-4o- CX 0.218 0.060 71.4% 53.7% 44.1% 78.6% 59.7% Consensus- CX 0.789 < 0.001 90.5% 90.2% 82.6% 94.9% 90.5% Gemini 2.5- RCA 0.338 0.008 64.0% 70.3% 59.3% 74.3% 67.7% Grok 4- RCA 0.245 0.042 72.0% 54.1% 51.4% 74.1% 61.3% Claude 3.5- RCA 0.214 0.090 48.0% 73.0% 54.5% 67.5% 62.9% ChatGPT-4o- RCA 0.281 0.025 64.0% 64.9% 55.2% 72.7% 64.5% Consensus- RCA 0.933 < 0.001 96.0% 97.3% 96.0% 97.3% 96.8% 3.4. Continuous Stenosis Assessment and Plaque Characterization The analysis of continuous stenosis severity revealed a pronounced performance gap between expert readers and the frontier GP-AI models. The Expert Consensus demonstrated high concordance with QCA-derived stenosis degrees across all coronary territories. In contrast, the majority of the GP-AI models failed to show meaningful continuous correlation with invasive reference measurements. Gemini 2.5 represented a partial exception, achieving a moderate intraclass correlation coefficient (ICC = 0.515, p < 0.001) specifically in the right coronary artery (RCA). This finding constituted the sole instance of statistically significant continuous agreement between a GP-AI model and QCA observed in the present study. However, none of the GP-AI models exhibited consistent or clinically acceptable continuous agreement across multiple coronary segments. Detailed ICC results are presented in Table 4 . Regarding plaque characterization, performance across all GP-AI models was uniformly limited. Agreement with the Expert Consensus on plaque morphology (calcified, non-calcified, or mixed) was largely negligible. Only Gemini 2.5 showed slight-to-moderate agreement in the LCx and RCA territories; however, these agreement levels remain insufficient for reliable clinical interpretation. Plaque characterization results are summarized in Table 5 . Table 4 Agreement Between AI Models and QCA-Derived Stenosis Grades Parameter Gemini ICC (95% CI) p Grok ICC (95% CI) p Claude ICC (95% CI) p ChatGPT ICC (95% CI) p Consensus ICC (95% CI) p QCA LMCA grade 0.05 (− 0.20 to 0.295) 0.348 0.00 (− 0.248 to 0.248) 0.500 0.269 (0.022 to 0.485) 0.017 0.206 (− 0.044 to 0.432) 0.053 0.612 (0.43 to 0.747) < 0.001 QCA LAD grade 0.159 (− 0.092 to 0.392) 0.106 0.187 (− 0.064 to 0.416) 0.071 0.091 (− 0.16 to 0.332) 0.239 0.111 (− 0.141 to 0.349) 0.194 0.908 (0.853 to 0.944) < 0.001 QCA CX grade 0.401 (0.168 to 0.592) 0.001 0.143 (− 0.109 to 0.378) 0.131 0.274 (0.027 to 0.488) 0.015 0.141 (− 0.111 to 0.375) 0.136 0.847 (0.758 to 0.905) < 0.001 QCA RCA grade 0.515 (0.306 to 0.677) < 0.001 0.327 (0.086 to 0.532) 0.004 0.235 (− 0.013 to 0.457) 0.032 0.305 (0.062 to 0.514) 0.007 0.942 (0.906 to 0.965) < 0.001 ICC: Intraclass Correlation Coefficient (95% Confidence Interval) Table 5 Agreement on Plaque Characterization (Kappa) Vessel AI Model Kappa (κ) Agreement Level LMCA All Models Non-significant None LAD All Models Non-significant None CX Gemini 2.5 0.150 (p = 0.025) Slight RCA Gemini 2.5 0.203 (p = 0.009) Fair 4. DISCUSSION This study provides a critical objective clinical evaluation revealing the true diagnostic limitations of general-purpose multimodal artificial intelligence (GP-AI) models in complex cardiovascular imaging tasks. While these models excel in text-based and semantic reasoning, they exhibit significant deficiencies in areas requiring advanced visual-anatomical interpretation, such as coronary CT angiography (CCTA). Although general-purpose artificial intelligence (GP-AI) models are being heavily debated as the next generation of clinical decision support systems [ 20 ], our findings emphasize that their visual reasoning capacities are not yet mature enough for complex cardiovascular imaging. Our findings reveal a significant “semantic–visual gap,” defining the fundamental divergence between linguistic intelligence and reliable radiological assessment [ 21 – 23 ]. A critical methodological element in interpreting this study is that both human experts and GP-AI models performed assessments on identical standardized static JPEG images. Neither group was provided with the full volumetric CCTA datasets during the assessment process. Instead, two high-resolution curved MPR frames, intentionally selected to depict the most severe lesions or characteristic anatomy, were used for each major epicardial coronary artery. Thus, the comparison between human and artificial intelligence was performed on a fully symmetrical and methodologically fair basis in terms of data input. Evaluations were based on two separate reference standards. Plaque morphology was compared with expert physician consensus based on all volumetric CCTA data; the degree of luminal stenosis and the presence of ≥ 50% obstructive disease were analyzed based on quantitative coronary angiography (QCA) results. This dual-reference approach provides a sound validation basis for the biological and technical nature of plaque characterization and hemodynamically significant stenosis detection [ 24 – 26 ]. The significant performance difference between GP-AI models and human experts is largely related to the diagnostic challenges inherent in the static two-dimensional imaging approach. The moderate-to-low inter-observer agreement observed among human experts—specifically the markedly low agreement in the LMCA ( κ = 0.096) highlights the intrinsic difficulty of this task. This low concordance in the LMCA is likely due to its short anatomical length, the relatively lower prevalence of isolated obstructive disease in this segment, and its high susceptibility to artifacts in static frames. These factors create a 'diagnostic blind spot' when volumetric scrolling is unavailable, explicitly demonstrating that static-image CCTA assessment is inherently constrained, even for highly trained specialists. Consequently, it is entirely unsurprising that GP-AI models, which lack both contextual clinical knowledge and full three-dimensional spatial reasoning, failed significantly in the LMCA and LAD. These 'small-target' segments require advanced anatomical contextual inference that current multimodal architectures lack. The models' inability to achieve statistical significance in these prognostically critical territories—often misinterpreting vascular overlaps, blooming artifacts, or calcium shadows as stenosis—represents a major barrier to their clinical reliability and increases the risk of severe misinterpretations [ 27 – 30 ]. Within this general picture of failure, the limited performance improvement observed in the right coronary artery (RCA) constitutes a notable exception. The more isolated anatomical course and relatively lower artifact load of the RCA may have allowed some models, particularly Gemini 2.5, to partially capture the change in stenosis severity on a continuous scale. However, the discrepancy between continuous correlation and binary classification success reveals that the models struggle to define clinically meaningful decision limits and exhibit a tendency toward systematic overestimation. Finally, while the GP-AI models used (GPT-4o, Gemini 2.5, Claude 3.5 Sonnet, and Grok 4) were the most current versions as of February 2026, the rapid evolution of AI technologies means that more advanced versions are quickly introduced. However, since this study aimed to reveal the fundamental competency limitations of general-purpose architectures in tasks requiring expert-level radiological evaluation, the findings retain their conceptual and methodological validity. Limitations Our study design carries specific inherent limitations that must be explicitly acknowledged. First, the unblinded selection of key frames by the preparation team introduces a potential selection bias, as it intentionally guides the presented images toward the most severe lesion. While this does not reflect real-world volumetric CCTA reading where the clinician must actively search for pathology, it was a necessary methodological compromise to establish a standardized ground truth for evaluating pure visual diagnostic capacity. Second, because this study evaluates decision-making within a constrained framework utilizing only a few standardized static images per vessel rather than actual clinical volumetric interpretation, extrapolation of these findings to real-world, unconstrained clinical practice should be interpreted with caution. 5. CONCLUSION This study demonstrates that consensus interpretation generated by experienced physicians exhibits strong agreement with invasive quantitative coronary angiography (QCA) when using a ≥ %50 stenosis threshold and remains the most reliable reference approach in coronary CT angiography (CCTA) assessment. In contrast, current general-purpose multimodal artificial intelligence (GP-AI) models have shown significant limitations in terms of visual diagnostic performance. Despite their known competencies in text-based medical tasks, these systems have failed to achieve a consistent and clinically acceptable level of accuracy when applied to static two-dimensional CCTA images, particularly in anatomically and prognostically critical coronary segments. These findings indicate that direct transfer of general visual reasoning capabilities to highly specialized cardiovascular imaging tasks is not feasible under current conditions. Therefore, at the current stage, it is not medically appropriate for GP-AI systems to replace physicians in interpreting coronary CT angiography results or for patients to directly consult these systems for diagnostic decisions. CCTA assessment currently requires the clinical experience, anatomical knowledge, and contextual interpretation skills of expert cardiologists and radiologists. The limited performance signal observed with the Gemini 2.5 model in the right coronary artery (RCA) suggests that clinically relevant feature recognition capacity may be partially present in some pioneering architectures. However, this finding is insufficient to support the independent clinical use of GP-AI models, suggesting that the identified shortcomings are related to a lack of domain adaptation and input representation. In conclusion, current GP-AI models are not suitable for the independent clinical interpretation of coronary CT angiography. Importantly, this limitation applies specifically to broad, non-domain-specific architectures and does not detract from the proven clinical value of dedicated, task-specific cardiovascular AI platforms. Their potential as supportive tools in the future can only be achieved through the development of architectures incorporating domain-specific targeted training, radiological physics, and three-dimensional spatial relationships, and their use under expert physician supervision. Declarations Ethics approval and consent to participate: This study was approved by the Baskent University Institutional Review Board (Project no: KA25/407). All procedures were performed in accordance with the principles of the Declaration of Helsinki. Given the retrospective nature of the study, the requirement for written informed consent was waived by the Institutional Review Board. Consent for publication: Not applicable. Availability of data and materials: The datasets generated and/or analysed during the current study are not publicly available due to patient privacy and institutional regulations but are available from the corresponding author on reasonable request. Competing interests: The authors declare that they have no competing interests. Funding: This study was supported by the Baskent University Research Fund (Project no: KA25/407). Authors' contributions: S.O. conceptualized and designed the study. S.O., İ.G.B., and G.E. collected the clinical data and performed the expert image evaluations. Z.G.B. and S.O. conducted the artificial intelligence model assessments and statistical analysis. S.O. wrote the main manuscript text. M.Y. supervised the research project and critically revised the manuscript. All authors reviewed and approved the final manuscript. Acknowledgements: Not applicable. References Vrints CJM, Senior R, Crea F, et al. 2024 ESC Guidelines for the management of chronic coronary syndromes. Eur Heart J. 2024;45(36):3415-3537. Kelion AD, Nicol ED. The rationale for the primacy of coronary CT angiography in the National Institute for Health and Care Excellence (NICE) guideline (CG95) for the investigation of chest pain of recent onset. J Cardiovasc Comput Tomogr. 2018;12:516–22. Abbara S, Blanke P, Maroules CD, et al. SCCT guidelines for the performance and acquisition of coronary computed tomographic angiography: A report of the society of Cardiovascular Computed Tomography Guidelines Committee. J Cardiovasc Comput Tomogr. 2016;10(6):435-449. Conte E, Sala E. AI-assisted CCTA: supporting diagnosis across the CAD spectrum. Int J Cardiovasc Imaging. 2025;41:825-826. D’Costa Z, Karlsberg RP, Cho GW. Artificial-intelligence-assisted CCTA quantifies sex differences in coronary atherosclerotic burden at low atheroma volumes. IJC Heart & Vasculature. 2025;60:101758. Liao J, Huang L, Qu M, Chen B, Wang G. Artificial Intelligence in Coronary CT Angiography: Current Status and Future Prospects. Front Cardiovasc Med. 2022;9:896366. Bozyel S, Şimşek E, Koçyiğit Burunkaya D, et al. Artificial intelligence-based clinical decision support systems in cardiovascular diseases. Anatol J Cardiol. 2024;28(2):74-86. van Assen M, De Cecco CN, et al. Inter-observer variability of coronary artery calcium scoring and CCTA interpretation. J Cardiovasc Comput Tomogr. 2019;13(4):228–233. Budoff MJ, et al. Interobserver variability among expert readers quantifying plaque volume on coronary CT angiography. J Cardiovasc Comput Tomogr. 2022;16(6):501–507. Interobserver variability of coronary stenosis characterization and its relation to plaque composition. JACC Cardiovasc Imaging. 2024. Chen M, Wang X, Hao G, et al. Diagnostic performance of deep learning-based vascular extraction and stenosis detection technique for coronary artery disease. Br J Radiol. 2020;93:20191028. CathAI: fully automated coronary angiography interpretation and stenosis estimation. NPJ Digit Med. 2023;6(1):142. Sarangi PK, Datta S, Panda BB, et al. Evaluating ChatGPT-4’s Performance in Identifying Radiological Anatomy in FRCR Part 1 Examination Questions. Indian J Radiol Imaging. 2025;35:287-294. Gillette J, Lu M, Heston TF. Large Language Models Perform at Chance Level in the Diagnosis of Pediatric Pneumonia Using Chest Radiographs. Cureus. 2025;17(9):e92596. Gemini Team, Google. Gemini: A Family of Highly Capable Multimodal Models. arXiv preprint arXiv:2312.11805. 2023. Anthropic. The Claude 3 Model Family: Opus, Sonnet, Haiku. Technical Report. 2024. OpenAI. GPT-4 Technical Report. arXiv preprint arXiv:2303.08774. 2023. Reiber JH, Serruys PW, Kooijman CJ, et al. Assessment of short-, medium-, and long-term variations in arterial dimensions from computer-assisted quantitation of coronary cineangiograms. Circulation. 1985;71(2):280-288. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33(1):159-174. Güneş YC, Cesur T. Large Language Models: Could They Be the Next Generation of Clinical Decision Support Systems in Cardiovascular Diseases? Anatol J Cardiol. 2024;28(7):371-372. Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med. 2019;25(1):44–56. Davenport T, Kalakota R. The potential for artificial intelligence in healthcare. Future Healthc J. 2019;6(2):94–98. Large Language Models for disease diagnosis: a scoping review. NPJ Digit Med. 2025;8:11. Hulten E, Villines TC, Cheezum MK, et al. The role of coronary CT angiography in the diagnosis and management of coronary artery disease. J Nucl Cardiol. 2017;24(5):1609–1624. Knuuti J, Wijns W, Saraste A, et al. 2019 ESC Guidelines for the diagnosis and management of chronic coronary syndromes. Eur Heart J. 2020;41(3):407–477. Clinical expert consensus document on quantitative coronary angiography. Cardiovasc Interv Ther. 2020;35(2):105–116. Park SH. Artificial intelligence in radiology: practical issues and challenges. Radiology. 2018;287(3):749–772. Large Language Models in Medical Image Analysis: A Systematic Review. Bioengineering (Basel). 2025;12(8):818. Multimodal Large Language Models in Medical Imaging. Korean J Radiol. 2025;26(9):843–853. Best Practices for the Safe Use of Large Language Models and Generative AI in Radiology. Radiology. 2025;312(3):e241516. Additional Declarations No competing interests reported. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-9135888","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":614888483,"identity":"850df262-0c64-4a30-9a00-a7111c6b520a","order_by":0,"name":"Sefa Okar","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA4ElEQVRIie3PMQrCMBSA4QeBdHnaNaLoFSqBINjDOHVKdwdxdPQCgmdwco4E6/JwFlwUQcGpIIiT2Do5tXUTzA8JCeQjCYDL9YuZbAwBAgCWL0WrGqE34YOc4DcEg3xfTur7zfFgRqH0Z3S77kY9BM+uF0WksdUyMEmkxDZe9nWSPQyjaFdEAkLeTLkNgWpLqXlGBKoS4l0e5mnDDuFZ6mclAgpWE6uy69gpnlQgDUIpVtNIdokrFk8F8rK/1Mk7puYedufETjd9H7d9zyaF5DMu3nPV43ks/ea0y+Vy/U8vqs9Jq3zmXqgAAAAASUVORK5CYII=","orcid":"","institution":"Başkent University Hospital","correspondingAuthor":true,"prefix":"","firstName":"Sefa","middleName":"","lastName":"Okar","suffix":""},{"id":614888484,"identity":"ea6566ae-57eb-4a0f-8248-a7e294b51efb","order_by":1,"name":"Ziya Gökalp BİLGEL","email":"","orcid":"","institution":"Başkent University Hospital","correspondingAuthor":false,"prefix":"","firstName":"Ziya","middleName":"Gökalp","lastName":"BİLGEL","suffix":""},{"id":614888486,"identity":"3908b38f-fa03-4635-afc9-ed1f0d5a123c","order_by":2,"name":"İsa Göktürk BALCI","email":"","orcid":"","institution":"Başkent University Hospital","correspondingAuthor":false,"prefix":"","firstName":"İsa","middleName":"Göktürk","lastName":"BALCI","suffix":""},{"id":614888488,"identity":"48df3778-6994-473a-848f-0fa204f29344","order_by":3,"name":"Gürcan ERBAY","email":"","orcid":"","institution":"Başkent University Hospital","correspondingAuthor":false,"prefix":"","firstName":"Gürcan","middleName":"","lastName":"ERBAY","suffix":""},{"id":614888489,"identity":"c11207d8-0cc9-4d2d-9042-367c8dadfe78","order_by":4,"name":"Mustafa YILMAZ","email":"","orcid":"","institution":"Başkent University Hospital","correspondingAuthor":false,"prefix":"","firstName":"Mustafa","middleName":"","lastName":"YILMAZ","suffix":""}],"badges":[],"createdAt":"2026-03-16 09:38:57","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-9135888/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-9135888/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":106188739,"identity":"9e9c2135-8b9a-480d-b7c3-8804aef1862a","added_by":"auto","created_at":"2026-04-05 17:06:00","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":291892,"visible":true,"origin":"","legend":"\u003cp\u003eStudy Methodology for Artificial Intelligence Assessment\u003c/p\u003e\n\u003cp\u003eAn example of a Curved Multi-Planar Reformat (Curved MPR) image of the Right Coronary Artery (RCA) from two standardized sequences. This technique straightens the tortuous vessel along the centerline to clearly visualize the vessel trajectory and plaque burden and facilitate lumen assessment.\u003c/p\u003e","description":"","filename":"1.png","url":"https://assets-eu.researchsquare.com/files/rs-9135888/v1/97966cf54f09c342febe35c7.png"},{"id":106188740,"identity":"7957021e-2d10-40f0-840c-0c98ca389846","added_by":"auto","created_at":"2026-04-05 17:06:00","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":120447,"visible":true,"origin":"","legend":"\u003cp\u003eStandard JSON Prompt Template Used to Query Artificial Intelligence Models.\u003c/p\u003e\n\u003cp\u003eThis structured command set was sent to all artificial intelligence models evaluated in the study. This template enabled the models to analyze visual data and report the presence of significant stenosis (≥50%), estimated stenosis percentage, plaque characterization, and confidence score in a standardized format.\u003c/p\u003e","description":"","filename":"2.png","url":"https://assets-eu.researchsquare.com/files/rs-9135888/v1/f4202f9cb342a6f619f60e0f.png"},{"id":106188741,"identity":"997b7178-a07a-4be1-8a5b-050db5b03b3e","added_by":"auto","created_at":"2026-04-05 17:06:00","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":238611,"visible":true,"origin":"","legend":"\u003cp\u003eQuantitative Coronary Angiography (QCA) Analysis (Gold Standard). Examples of analyses performed with syngo QCA software (Siemens Healthineers), used as the reference method in the study. (A) Example of significant stenosis in the Left Anterior Descending Artery (LAD). The 60% diameter stenosis measured in the analysis meets the study's significant stenosis threshold (≥50%). (B) Example of non-obstructive disease in the Right Coronary Artery (RCA). The 30% diameter stenosis measured in the analysis was not considered significant.\u003c/p\u003e","description":"","filename":"3.png","url":"https://assets-eu.researchsquare.com/files/rs-9135888/v1/6dba3a1c3a1c99464fa50e11.png"},{"id":107480787,"identity":"ecda073b-00cc-42ed-aab8-92e717edf2db","added_by":"auto","created_at":"2026-04-22 02:13:34","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1050617,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-9135888/v1/470748af-ba4e-41f9-9d27-52496ac14aa8.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Diagnostic Performance of Expert Physicians Versus General-Purpose Artificial Intelligence Using Standardized Static Coronary CT Images: A Dual-Reference Validation Study","fulltext":[{"header":"1. INTRODUCTION","content":"\u003cp\u003eCoronary artery disease (CAD) remains the leading cause of morbidity and mortality globally [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e]. Due to its ability to provide non-invasive imaging of coronary atherosclerosis and stenosis, recent guidelines have established Coronary CT Angiography (CCTA) as the \"first-line test\" [\u003cspan additionalcitationids=\"CR2\" citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e] and the \"cornerstone\" of diagnostic management [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e]. Beyond merely identifying lumen stenosis, CCTA provides critical data regarding coronary anatomy and plaque characterization [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e, \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e]. Currently, the interpretation of these images relies heavily on the expertise of specialized cardiologists and radiologists [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e]. Recent reviews highlight the growing potential of AI-based clinical decision support systems in optimizing cardiovascular disease management and risk assessment [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e]. However, as reported in previous studies, visual assessment of CCTA is time-consuming, highly dependent on reader experience, and hallmarked by significant interobserver variability [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e, \u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e, \u003cspan additionalcitationids=\"CR9\" citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eTo address these limitations, artificial intelligence (AI) technologies are increasingly being integrated into cardiovascular imaging [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e]. \"Narrow\" AI tools specifically trained for CCTA\u0026mdash;such as Cleerly and HeartFlow\u0026mdash;have successfully entered clinical workflows, offering automated vessel segmentation and stenosis grading [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e, \u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e, \u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e]. Extensive clinical trials have demonstrated that these purpose-built platforms enhance diagnostic consistency and reduce reading times.\u003c/p\u003e \u003cp\u003eIn contrast, the role of general-purpose multimodal artificial intelligence (GP-AI) like ChatGPT, Gemini, Grok, and Claude in medical imaging remains a subject of intense debate. The vast majority of existing studies have evaluated these models solely on text-based tasks [\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e]. The only notable study testing visual diagnostic capability was limited to pediatric chest radiographs, where models performed at a \"chance level,\" failing to demonstrate consistent radiological reasoning [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eTo date, there is virtually no data rigorously evaluating the ability of these multimodal models to interpret coronary angiography. This study aims to bridge that specific gap. Moving beyond text-based benchmarking, we directly tested the visual diagnostic performance of four frontier GP-AI models (Gemini 2.5, Grok 4, Claude 3.5 Sonnet, and ChatGPT-4o). In our study design, both the AI models and the expert physician consensus were presented with the exact same standardized static images (curved multiplanar reconstruction (cMPR)). We then evaluated the diagnostic accuracy of their interpretations against two rigorous reference standards: the full clinical CCTA reports and invasive quantitative coronary angiography (QCA).\u003c/p\u003e"},{"header":"2. MATERIALS AND METHODS","content":"\u003cp\u003e\u003cspan\u003e\u003c/span\u003e\u003c/p\u003e\n\u003cp\u003e2.1. Study Design and Study Population This study was designed as a retrospective diagnostic accuracy investigation conducted at a single tertiary referral center between May 2022 and December 2024. Consecutive patients who underwent Coronary Computed Tomography Angiography (CCTA) for clinical indications and subsequently received invasive coronary angiography (ICA) within 30 days were screened for inclusion (n\u0026thinsp;=\u0026thinsp;66). To ensure a methodologically rigorous evaluation focused exclusively on native coronary artery anatomy, patients with a history of prior coronary stent implantation or coronary artery bypass grafting (CABG) were excluded (n\u0026thinsp;=\u0026thinsp;3). Following the application of these exclusion criteria, a total of 63 patients constituted the final study cohort. All procedures were performed in accordance with the principles of the Declaration of Helsinki. This study was approved by Baskent University Institutional Review Board (Project no: KA25/407) and supported by Baskent University Research Fund. Given the retrospective nature of the study, the requirement for written informed consent was waived by the institutional review board.\u003c/p\u003e\u003cspan\u003e\n \u003cp\u003e2.2. Imaging Protocol and Coronary Computed Tomography Angiography Acquisition. To create a standardized visual dataset for a fair comparison between human experts and GP-AI models, representative 2D cMPR keyframes were selected by a dedicated \u0026apos;preparation team\u0026apos; of two cardiovascular imaging specialists. This team was unblinded, with full access to patients\u0026apos; complete 3D CCTA volumetric datasets, clinical histories, original radiology reports, and reference invasive coronary angiography (ICA) results. Their objective was to deliberately isolate the optimal static frame demonstrating the most severe lesion or characteristic pathology for each vessel.\u003c/p\u003e\n\u003c/span\u003e\n\u003cp\u003e\u003c/p\u003e\n\u003cp\u003eSubsequently, these optimized static images were evaluated by two independent, blinded expert readers. Having no access to the clinical context, 3D CCTA volumes, or ICA results, these physicians reviewed the images under strictly identical, constrained visual conditions as the GP-AI models. This two-stage design effectively eliminated clinical and spatial context as confounding variables\u003c/p\u003e\n\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e\n \u003ch2\u003e2.3. Image Standardization and Selection of Representative Keyframes\u003c/h2\u003e\n \u003cp\u003eTo enable a fair and objective comparison between human experts and general-purpose artificial intelligence (GP-AI) models, a \u0026apos;Representative Keyframe Selection Protocol\u0026apos; was implemented to create a standardized visual dataset (Fig. \u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). In the preparatory phase of this process, a dedicated preparation team consisting of a Level-3 cardiovascular radiologist and a Level-2 cardiologist reviewed the scans, providing full (unblinded) access to all 3D CCTA volumetric datasets, clinical histories, original radiology reports, and reference invasive coronary angiography (ICA) results of the patients.\u003c/p\u003e\n \u003cp\u003eThe primary goal of this team was to intentionally isolate two high-resolution static cMPR frames that optimally showed the most severe lesion or characteristic pathology for each major epicardial coronary artery (LAD, LCx, and RCA). These selected images (JPEG) were completely anonymized by removing all patient identifiers and radiological markings.\u003c/p\u003e\n \u003cp\u003eThe main diagnostic evaluation phase of the study was performed in a blinded manner by two independent expert readers (a radiologist and a cardiologist). These physicians, having no access to the patients\u0026apos; clinical context, 3D CCTA volumes, or reference ICA results, evaluated the images under completely identical and restricted visual conditions, just like the GP-AI models. This two-stage design successfully eliminated the influence of clinical and spatial context as confounding variables and ensured that the pure visual recognition capabilities of both human experts and GP-AI models were tested against a standard reference. In total, 252 vessel-specific image sets (four vessels per patient across 63 patients) were prepared and subjected to evaluation by both GP-AI models and expert readers.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec4\" class=\"Section2\"\u003e\n \u003ch2\u003e2.4. GP-AI Models and Evaluation Methodology\u003c/h2\u003e\n \u003cp\u003eIn this study, four frontier general-purpose multimodal artificial intelligence (GP-AI) models were evaluated using their most advanced versions available as of February 2026: GPT-4o (OpenAI), Gemini 2.5 (Google), Claude 3.5 Sonnet (Anthropic), and Grok 4 (xAI) [\u003cspan additionalcitationids=\"CR16\" citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e]. All models were accessed through their official web-based interfaces using default generation parameters. To ensure methodological consistency, minimize stochastic variability, and address reproducibility concerns, each image was processed in a new and independent chat session with only a single iteration to prevent cross-contamination of data (zero-shot evaluation).\u003c/p\u003e\n \u003cp\u003eFor each major epicardial coronary artery, both GP-AI models and blinded expert readers were required to report predefined diagnostic parameters based on the standardized image sets (Fig. \u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e). These parameters included: (i) the presence or absence of obstructive coronary stenosis (defined as \u0026ge;\u0026thinsp;50% luminal narrowing relative to reference vessel diameter), (ii) the estimated percentage of luminal narrowing, and (iii) the morphological classification of detected atherosclerotic plaques as calcified, non-calcified (soft), or mixed.\u003c/p\u003e\n \u003cp\u003e2.5. Reference Standards\u003c/p\u003e\n \u003cp\u003eTo avoid subjective interpretation in the assessment of coronary artery lumen stenosis, Quantitative Coronary Angiography (QCA) was established as the reference standard. Invasive coronary angiography procedures were performed using the Siemens Artis zee angiography system. All angiographic images were analyzed by an independent interventional cardiologist blinded to CCTA and AI findings using syngo QCA software (Siemens Healthineers, Erlangen, Germany). In accordance with standard validation protocols [\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e], an automated edge-detection algorithm was used to determine arterial contours. Calibration was achieved using catheter tip contrast filling as a reference scaling tool. Minimal lumen diameter (MLD) and reference vessel diameter (RVD) were measured from the end-diastolic square, indicating the most severe stenosis and the least foreshortening (Fig. \u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e). A stenosis threshold of \u0026ge;\u0026thinsp;50% relative to the reference vessel diameter was defined as hemodynamically significant obstructive coronary artery disease. For plaque characterization, considering the limitations of conventional angiography in differentiating plaque components, expert physician consensus, validated by an experienced cardiologist and radiologist with full access to the patients\u0026apos; volumetric CCTA datasets, was accepted as the reference standard.\u003c/p\u003e\n \u003cp\u003e2.6. Statistical Analysis The diagnostic performance of artificial intelligence models and human readers was evaluated by calculating sensitivity, specificity, positive predictive value, and negative predictive value. Inter-method agreement for categorical variables was assessed using Cohen\u0026rsquo;s Kappa (\u0026kappa;) coefficient, interpreted according to the criteria established by Landis and Koch [\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e]. All statistical analyses were performed using SPSS version 26.0 (IBM Corp., Armonk, NY, USA), and a two-sided p-value of \u0026lt;\u0026thinsp;0.05 was considered statistically significant.\u003c/p\u003e\n\u003c/div\u003e"},{"header":"3. RESULTS","content":"\u003cdiv id=\"Sec6\" class=\"Section2\"\u003e \u003ch2\u003e3.1. Baseline Patient Characteristics\u003c/h2\u003e \u003cp\u003eThe final study cohort comprised 63 patients with a mean age of 60.25\u0026thinsp;\u0026plusmn;\u0026thinsp;10.59 years. The population demonstrated a high prevalence of cardiovascular risk factors: hypertension was present in 58.7% of patients, diabetes mellitus in 38.1%, and a family history of coronary artery disease in 68.3%. Based on the reference invasive coronary angiography (ICA) results, obstructive coronary artery disease (CAD) was identified in 55.6% of the cohort, reflecting a significant disease burden. Regarding clinical management following ICA, 27 patients (42.9%) underwent percutaneous coronary intervention (PCI), while 8 patients (12.7%) were referred for coronary artery bypass grafting (CABG) (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e).\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eBaseline characteristics of the study cohort\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"2\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCharacteristic\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eValue\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAge, years (Mean\u0026thinsp;\u0026plusmn;\u0026thinsp;SD)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e60.25\u0026thinsp;\u0026plusmn;\u0026thinsp;10.59\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSex (Male), n (%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e41 (65.1%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eBMI, kg/m\u0026sup2; (Median)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e27.12\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eHypertension, n (%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e37 (58.7%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDiabetes Mellitus, n (%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e24 (38.1%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eFamily History of CAD, n (%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e43 (68.3%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eNo Obstructive CAD (\u0026lt;\u0026thinsp;50% stenosis), n (%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e28 (44.4%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eObstructive CAD (\u0026amp;ge;50% stenosis), n (%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e35 (55.6%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMedical Therapy, n (%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e28 (44.4%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ePCI, n (%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e27 (42.9%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCABG, n (%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e8 (12.7%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec7\" class=\"Section2\"\u003e \u003ch2\u003e3.2. Reliability of Human Expert Consensus\u003c/h2\u003e \u003cp\u003eInter-reader reliability among human observers was initially assessed. For the detection of significant coronary stenosis, inter-observer agreement was moderate across the major epicardial coronary arteries, with Cohen\u0026rsquo;s Kappa (\u003cem\u003eκ\u003c/em\u003e) values ranging from 0.477 to 0.519 for the LAD, LCx, and RCA (Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e). In contrast, agreement for the LMCA was notably poor (κ\u0026thinsp;=\u0026thinsp;0.096), reflecting the negligible prevalence of obstructive lesions in this segment and the high sensitivity of this region to image-related artifacts. The assessment's restriction to predefined static frames inherently reduced spatial and contextual information, thereby increasing susceptibility to variability from blooming artifacts and partial volume effects. This limited agreement even among experienced physicians underscores the intrinsic diagnostic challenge of interpreting CCTA solely from static 2D images. Consequently, a third-reader adjudication process was implemented to establish the final expert consensus, which served as the reference standard for the subsequent GP-AI performance analysis.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eInter-Observer Agreement (Physician 1 vs. Physician 2)\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"4\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eVessel\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eKappa (p)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eICC (95% CI)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003ep-value\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLMCA\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.096 (0.029)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.553 (0.354\u0026ndash;0.705)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLAD\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.501 (\u0026lt;\u0026thinsp;0.001)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.647 (0.475\u0026ndash;0.771)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCX\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.519 (\u0026lt;\u0026thinsp;0.001)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.622 (0.442\u0026ndash;0.754)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRCA\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.477 (\u0026lt;\u0026thinsp;0.001)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.790 (0.674\u0026ndash;0.868)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003e3.3. Diagnostic Accuracy for Detection of Significant (\u0026ge;\u0026thinsp;50%) Stenosis\u003c/h2\u003e \u003cp\u003eWhen compared with invasive quantitative coronary angiography (QCA) as the reference standard, the Expert Physician Consensus demonstrated consistently high diagnostic performance across all coronary territories. Agreement with QCA was robust, with Cohen\u0026rsquo;s Kappa (\u003cem\u003eκ\u003c/em\u003e) values ranging from 0.774 to 0.933 (all p\u0026thinsp;\u0026lt;\u0026thinsp;0.001), accompanied by sensitivity and specificity values generally exceeding 90%. These findings indicate that experienced readers maintain high concordance with invasive reference measurements despite the methodological reliance on constrained static image inputs.\u003c/p\u003e \u003cp\u003eIn contrast, a marked disparity was observed in the performance of the frontier GP-AI models, which exhibited substantial diagnostic limitations. Notably, none of the GP-AI models achieved statistically significant agreement with QCA in the LAD or LMCA\u0026mdash;the most clinically critical coronary segments. For all four models, Kappa values in these territories were negligible and statistically non-significant (p\u0026thinsp;\u0026gt;\u0026thinsp;0.05).\u003c/p\u003e \u003cp\u003eWhile marginal statistical significance was reached by selected models in the LCx and RCA, overall diagnostic accuracy, including sensitivity and specificity, remained well below the thresholds required for clinical reliability. For instance, in the LAD, Gemini 2.5 achieved a Kappa value of only 0.160 (p\u0026thinsp;=\u0026thinsp;0.153), while ChatGPT-4o demonstrated virtually no agreement (κ\u0026thinsp;=\u0026thinsp;0.004, p\u0026thinsp;=\u0026thinsp;0.963). Detailed diagnostic performance metrics for each coronary artery, including positive (PPV) and negative predictive values (NPV), are presented in Table\u0026nbsp;\u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e3\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab3\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 3\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eDiagnostic Performance for LMCA-LAD-CX-RCA Stenosis (\u0026gt;\u0026thinsp;50%) vs. QCA\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"8\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c8\" colnum=\"8\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eEvaluator\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eKappa (κ)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003ep-value\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eSensitivity\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eSpecificity\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003ePPV\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003eNPV\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c8\"\u003e \u003cp\u003eAccuracy\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eGemini 2.5- LMCA\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e-0.016\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.897\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e98.4%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e98.4%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e96.8%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eGrok 4- LMCA\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.000\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e1.000\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e100%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e100%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e100%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eClaude 3.5- LMCA\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e-0.025\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.820\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e95.1%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e98.3%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e93.5%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eChatGPT-4o- LMCA\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.000\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e1.000\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e100%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e100%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e100%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eConsensus- LMCA\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e1.000\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e100%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e100%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e100%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e100%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e100%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eGemini 2.5- LAD\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.160\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.153\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e78.6%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e38.2%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e51.2%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e68.4%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e56.5%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eGrok 4- LAD\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.109\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.290\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e82.1%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e29.4%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e48.9%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e66.7%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e53.2%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eClaude 3.5- LAD\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.064\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.573\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e71.4%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e35.3%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e47.6%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e60.0%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e51.6%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eChatGPT-4o- LAD\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.004\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.963\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e85.7%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e14.7%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e45.3%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e55.6%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e46.8%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eConsensus- LAD\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.774\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e92.9%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e85.3%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e83.9%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e93.5%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e87.1%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eGemini 2.5- CX\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.342\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.007\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e50.0%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e82.9%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e58.8%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e77.3%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e72.1%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eGrok 4- CX\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.037\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.771\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e42.9%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e61.0%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e36.0%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e67.6%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e54.8%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eClaude 3.5- CX\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.136\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.285\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e42.9%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e70.7%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e42.9%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e70.7%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e61.3%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eChatGPT-4o- CX\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.218\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.060\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e71.4%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e53.7%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e44.1%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e78.6%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e59.7%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eConsensus- CX\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.789\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e90.5%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e90.2%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e82.6%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e94.9%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e90.5%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eGemini 2.5- RCA\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.338\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.008\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e64.0%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e70.3%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e59.3%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e74.3%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e67.7%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eGrok 4- RCA\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.245\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.042\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e72.0%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e54.1%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e51.4%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e74.1%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e61.3%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eClaude 3.5- RCA\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.214\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.090\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e48.0%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e73.0%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e54.5%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e67.5%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e62.9%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eChatGPT-4o- RCA\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.281\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.025\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e64.0%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e64.9%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e55.2%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e72.7%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e64.5%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eConsensus- RCA\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.933\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e96.0%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e97.3%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e96.0%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e97.3%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e96.8%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec9\" class=\"Section2\"\u003e \u003ch2\u003e3.4. Continuous Stenosis Assessment and Plaque Characterization\u003c/h2\u003e \u003cp\u003eThe analysis of continuous stenosis severity revealed a pronounced performance gap between expert readers and the frontier GP-AI models. The Expert Consensus demonstrated high concordance with QCA-derived stenosis degrees across all coronary territories. In contrast, the majority of the GP-AI models failed to show meaningful continuous correlation with invasive reference measurements.\u003c/p\u003e \u003cp\u003eGemini 2.5 represented a partial exception, achieving a moderate intraclass correlation coefficient (ICC\u0026thinsp;=\u0026thinsp;0.515, p\u0026thinsp;\u0026lt;\u0026thinsp;0.001) specifically in the right coronary artery (RCA). This finding constituted the sole instance of statistically significant continuous agreement between a GP-AI model and QCA observed in the present study. However, none of the GP-AI models exhibited consistent or clinically acceptable continuous agreement across multiple coronary segments. Detailed ICC results are presented in Table\u0026nbsp;\u003cspan refid=\"Tab4\" class=\"InternalRef\"\u003e4\u003c/span\u003e.\u003c/p\u003e \u003cp\u003eRegarding plaque characterization, performance across all GP-AI models was uniformly limited. Agreement with the Expert Consensus on plaque morphology (calcified, non-calcified, or mixed) was largely negligible. Only Gemini 2.5 showed slight-to-moderate agreement in the LCx and RCA territories; however, these agreement levels remain insufficient for reliable clinical interpretation. Plaque characterization results are summarized in Table\u0026nbsp;\u003cspan refid=\"Tab5\" class=\"InternalRef\"\u003e5\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab4\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 4\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eAgreement Between AI Models and QCA-Derived Stenosis Grades\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"11\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c8\" colnum=\"8\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c9\" colnum=\"9\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c10\" colnum=\"10\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c11\" colnum=\"11\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eParameter\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eGemini ICC (95% CI)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003ep\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eGrok ICC (95% CI)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003ep\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eClaude ICC (95% CI)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003ep\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c8\"\u003e \u003cp\u003eChatGPT ICC (95% CI)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c9\"\u003e \u003cp\u003ep\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c10\"\u003e \u003cp\u003eConsensus ICC (95% CI)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c11\"\u003e \u003cp\u003ep\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eQCA LMCA grade\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.05 (\u0026minus;\u0026thinsp;0.20 to 0.295)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.348\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.00 (\u0026minus;\u0026thinsp;0.248 to 0.248)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.500\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.269 (0.022 to 0.485)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.017\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.206 (\u0026minus;\u0026thinsp;0.044 to 0.432)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e0.053\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e0.612 (0.43 to 0.747)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eQCA LAD grade\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.159 (\u0026minus;\u0026thinsp;0.092 to 0.392)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.106\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.187 (\u0026minus;\u0026thinsp;0.064 to 0.416)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.071\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.091 (\u0026minus;\u0026thinsp;0.16 to 0.332)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.239\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.111 (\u0026minus;\u0026thinsp;0.141 to 0.349)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e0.194\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e0.908 (0.853 to 0.944)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eQCA CX grade\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.401 (0.168 to 0.592)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.001\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.143 (\u0026minus;\u0026thinsp;0.109 to 0.378)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.131\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.274 (0.027 to 0.488)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.015\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.141 (\u0026minus;\u0026thinsp;0.111 to 0.375)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e0.136\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e0.847 (0.758 to 0.905)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eQCA RCA grade\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.515 (0.306 to 0.677)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.327 (0.086 to 0.532)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.004\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.235 (\u0026minus;\u0026thinsp;0.013 to 0.457)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.032\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.305 (0.062 to 0.514)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e0.007\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e0.942 (0.906 to 0.965)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003ctfoot\u003e \u003ctr\u003e\u003ctd colspan=\"11\"\u003eICC: Intraclass Correlation Coefficient (95% Confidence Interval)\u003c/td\u003e\u003c/tr\u003e \u003c/tfoot\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab5\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 5\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eAgreement on Plaque Characterization (Kappa)\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"4\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eVessel\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAI Model\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eKappa (κ)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eAgreement Level\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLMCA\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAll Models\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eNon-significant\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eNone\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLAD\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAll Models\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eNon-significant\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eNone\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCX\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eGemini 2.5\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.150 (p\u0026thinsp;=\u0026thinsp;0.025)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eSlight\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRCA\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eGemini 2.5\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.203 (p\u0026thinsp;=\u0026thinsp;0.009)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eFair\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e"},{"header":"4. DISCUSSION","content":"\u003cp\u003eThis study provides a critical objective clinical evaluation revealing the true diagnostic limitations of general-purpose multimodal artificial intelligence (GP-AI) models in complex cardiovascular imaging tasks. While these models excel in text-based and semantic reasoning, they exhibit significant deficiencies in areas requiring advanced visual-anatomical interpretation, such as coronary CT angiography (CCTA). Although general-purpose artificial intelligence (GP-AI) models are being heavily debated as the next generation of clinical decision support systems [\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e], our findings emphasize that their visual reasoning capacities are not yet mature enough for complex cardiovascular imaging. Our findings reveal a significant \u0026ldquo;semantic\u0026ndash;visual gap,\u0026rdquo; defining the fundamental divergence between linguistic intelligence and reliable radiological assessment [\u003cspan additionalcitationids=\"CR22\" citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eA critical methodological element in interpreting this study is that both human experts and GP-AI models performed assessments on identical standardized static JPEG images. Neither group was provided with the full volumetric CCTA datasets during the assessment process. Instead, two high-resolution curved MPR frames, intentionally selected to depict the most severe lesions or characteristic anatomy, were used for each major epicardial coronary artery. Thus, the comparison between human and artificial intelligence was performed on a fully symmetrical and methodologically fair basis in terms of data input.\u003c/p\u003e \u003cp\u003eEvaluations were based on two separate reference standards. Plaque morphology was compared with expert physician consensus based on all volumetric CCTA data; the degree of luminal stenosis and the presence of \u003cb\u003e\u0026ge;\u003c/b\u003e\u0026thinsp;50% obstructive disease were analyzed based on quantitative coronary angiography (QCA) results. This dual-reference approach provides a sound validation basis for the biological and technical nature of plaque characterization and hemodynamically significant stenosis detection [\u003cspan additionalcitationids=\"CR25\" citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eThe significant performance difference between GP-AI models and human experts is largely related to the diagnostic challenges inherent in the static two-dimensional imaging approach. The moderate-to-low inter-observer agreement observed among human experts\u0026mdash;specifically the markedly low agreement in the LMCA (\u003cem\u003eκ\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.096) highlights the intrinsic difficulty of this task. This low concordance in the LMCA is likely due to its short anatomical length, the relatively lower prevalence of isolated obstructive disease in this segment, and its high susceptibility to artifacts in static frames. These factors create a 'diagnostic blind spot' when volumetric scrolling is unavailable, explicitly demonstrating that static-image CCTA assessment is inherently constrained, even for highly trained specialists.\u003c/p\u003e \u003cp\u003eConsequently, it is entirely unsurprising that GP-AI models, which lack both contextual clinical knowledge and full three-dimensional spatial reasoning, failed significantly in the LMCA and LAD. These 'small-target' segments require advanced anatomical contextual inference that current multimodal architectures lack. The models' inability to achieve statistical significance in these prognostically critical territories\u0026mdash;often misinterpreting vascular overlaps, blooming artifacts, or calcium shadows as stenosis\u0026mdash;represents a major barrier to their clinical reliability and increases the risk of severe misinterpretations [\u003cspan additionalcitationids=\"CR28 CR29\" citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eWithin this general picture of failure, the limited performance improvement observed in the right coronary artery (RCA) constitutes a notable exception. The more isolated anatomical course and relatively lower artifact load of the RCA may have allowed some models, particularly Gemini 2.5, to partially capture the change in stenosis severity on a continuous scale. However, the discrepancy between continuous correlation and binary classification success reveals that the models struggle to define clinically meaningful decision limits and exhibit a tendency toward systematic overestimation.\u003c/p\u003e \u003cp\u003eFinally, while the GP-AI models used (GPT-4o, Gemini 2.5, Claude 3.5 Sonnet, and Grok 4) were the most current versions as of February 2026, the rapid evolution of AI technologies means that more advanced versions are quickly introduced. However, since this study aimed to reveal the fundamental competency limitations of general-purpose architectures in tasks requiring expert-level radiological evaluation, the findings retain their conceptual and methodological validity.\u003c/p\u003e \u003cp\u003eLimitations\u003c/p\u003e \u003cp\u003eOur study design carries specific inherent limitations that must be explicitly acknowledged. First, the unblinded selection of key frames by the preparation team introduces a potential selection bias, as it intentionally guides the presented images toward the most severe lesion. While this does not reflect real-world volumetric CCTA reading where the clinician must actively search for pathology, it was a necessary methodological compromise to establish a standardized ground truth for evaluating pure visual diagnostic capacity. Second, because this study evaluates decision-making within a constrained framework utilizing only a few standardized static images per vessel rather than actual clinical volumetric interpretation, extrapolation of these findings to real-world, unconstrained clinical practice should be interpreted with caution.\u003c/p\u003e"},{"header":"5. CONCLUSION","content":"\u003cp\u003eThis study demonstrates that consensus interpretation generated by experienced physicians exhibits strong agreement with invasive quantitative coronary angiography (QCA) when using a \u003cb\u003e\u0026ge;\u003c/b\u003e %50 stenosis threshold and remains the most reliable reference approach in coronary CT angiography (CCTA) assessment.\u003c/p\u003e \u003cp\u003eIn contrast, current general-purpose multimodal artificial intelligence (GP-AI) models have shown significant limitations in terms of visual diagnostic performance. Despite their known competencies in text-based medical tasks, these systems have failed to achieve a consistent and clinically acceptable level of accuracy when applied to static two-dimensional CCTA images, particularly in anatomically and prognostically critical coronary segments. These findings indicate that direct transfer of general visual reasoning capabilities to highly specialized cardiovascular imaging tasks is not feasible under current conditions.\u003c/p\u003e \u003cp\u003eTherefore, at the current stage, it is not medically appropriate for GP-AI systems to replace physicians in interpreting coronary CT angiography results or for patients to directly consult these systems for diagnostic decisions. CCTA assessment currently requires the clinical experience, anatomical knowledge, and contextual interpretation skills of expert cardiologists and radiologists.\u003c/p\u003e \u003cp\u003eThe limited performance signal observed with the Gemini 2.5 model in the right coronary artery (RCA) suggests that clinically relevant feature recognition capacity may be partially present in some pioneering architectures. However, this finding is insufficient to support the independent clinical use of GP-AI models, suggesting that the identified shortcomings are related to a lack of domain adaptation and input representation.\u003c/p\u003e \u003cp\u003eIn conclusion, current GP-AI models are not suitable for the independent clinical interpretation of coronary CT angiography. Importantly, this limitation applies specifically to broad, non-domain-specific architectures and does not detract from the proven clinical value of dedicated, task-specific cardiovascular AI platforms. Their potential as supportive tools in the future can only be achieved through the development of architectures incorporating domain-specific targeted training, radiological physics, and three-dimensional spatial relationships, and their use under expert physician supervision.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003eEthics approval and consent to participate: This study was approved by the Baskent University Institutional Review Board (Project no: KA25/407). All procedures were performed in accordance with the principles of the Declaration of Helsinki. Given the retrospective nature of the study, the requirement for written informed consent was waived by the Institutional Review Board.\u003c/p\u003e\n\u003cp\u003eConsent for publication: Not applicable.\u003c/p\u003e\n\u003cp\u003eAvailability of data and materials: The datasets generated and/or analysed during the current study are not publicly available due to patient privacy and institutional regulations but are available from the corresponding author on reasonable request.\u003c/p\u003e\n\u003cp\u003eCompeting interests: The authors declare that they have no competing interests.\u003c/p\u003e\n\u003cp\u003eFunding: This study was supported by the Baskent University Research Fund (Project no: KA25/407).\u003c/p\u003e\n\u003cp\u003eAuthors\u0026apos; contributions: S.O. conceptualized and designed the study. S.O., İ.G.B., and G.E. collected the clinical data and performed the expert image evaluations. Z.G.B. and S.O. conducted the artificial intelligence model assessments and statistical analysis. S.O. wrote the main manuscript text. M.Y. supervised the research project and critically revised the manuscript. All authors reviewed and approved the final manuscript.\u003c/p\u003e\n\u003cp\u003eAcknowledgements: Not applicable.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eVrints CJM, Senior R, Crea F, et al. 2024 ESC Guidelines for the management of chronic coronary syndromes. Eur Heart J. 2024;45(36):3415-3537.\u003c/li\u003e\n\u003cli\u003eKelion AD, Nicol ED. The rationale for the primacy of coronary CT angiography in the National Institute for Health and Care Excellence (NICE) guideline (CG95) for the investigation of chest pain of recent onset. J Cardiovasc Comput Tomogr. 2018;12:516\u0026ndash;22.\u003c/li\u003e\n\u003cli\u003eAbbara S, Blanke P, Maroules CD, et al. SCCT guidelines for the performance and acquisition of coronary computed tomographic angiography: A report of the society of Cardiovascular Computed Tomography Guidelines Committee. J Cardiovasc Comput Tomogr. 2016;10(6):435-449.\u003c/li\u003e\n\u003cli\u003eConte E, Sala E. AI-assisted CCTA: supporting diagnosis across the CAD spectrum. Int J Cardiovasc Imaging. 2025;41:825-826.\u003c/li\u003e\n\u003cli\u003eD\u0026rsquo;Costa Z, Karlsberg RP, Cho GW. Artificial-intelligence-assisted CCTA quantifies sex differences in coronary atherosclerotic burden at low atheroma volumes. IJC Heart \u0026amp; Vasculature. 2025;60:101758.\u003c/li\u003e\n\u003cli\u003eLiao J, Huang L, Qu M, Chen B, Wang G. Artificial Intelligence in Coronary CT Angiography: Current Status and Future Prospects. Front Cardiovasc Med. 2022;9:896366.\u003c/li\u003e\n\u003cli\u003eBozyel S, Şimşek E, Ko\u0026ccedil;yiğit Burunkaya D, et al. Artificial intelligence-based clinical decision support systems in cardiovascular diseases. Anatol J Cardiol. 2024;28(2):74-86.\u003c/li\u003e\n\u003cli\u003evan Assen M, De Cecco CN, et al. Inter-observer variability of coronary artery calcium scoring and CCTA interpretation. J Cardiovasc Comput Tomogr. 2019;13(4):228\u0026ndash;233.\u003c/li\u003e\n\u003cli\u003eBudoff MJ, et al. Interobserver variability among expert readers quantifying plaque volume on coronary CT angiography. J Cardiovasc Comput Tomogr. 2022;16(6):501\u0026ndash;507.\u003c/li\u003e\n\u003cli\u003eInterobserver variability of coronary stenosis characterization and its relation to plaque composition. JACC Cardiovasc Imaging. 2024.\u003c/li\u003e\n\u003cli\u003eChen M, Wang X, Hao G, et al. Diagnostic performance of deep learning-based vascular extraction and stenosis detection technique for coronary artery disease. Br J Radiol. 2020;93:20191028.\u003c/li\u003e\n\u003cli\u003eCathAI: fully automated coronary angiography interpretation and stenosis estimation. NPJ Digit Med. 2023;6(1):142.\u003c/li\u003e\n\u003cli\u003eSarangi PK, Datta S, Panda BB, et al. Evaluating ChatGPT-4\u0026rsquo;s Performance in Identifying Radiological Anatomy in FRCR Part 1 Examination Questions. Indian J Radiol Imaging. 2025;35:287-294.\u003c/li\u003e\n\u003cli\u003eGillette J, Lu M, Heston TF. Large Language Models Perform at Chance Level in the Diagnosis of Pediatric Pneumonia Using Chest Radiographs. Cureus. 2025;17(9):e92596.\u003c/li\u003e\n\u003cli\u003eGemini Team, Google. Gemini: A Family of Highly Capable Multimodal Models. arXiv preprint arXiv:2312.11805. 2023.\u003c/li\u003e\n\u003cli\u003eAnthropic. The Claude 3 Model Family: Opus, Sonnet, Haiku. Technical Report. 2024.\u003c/li\u003e\n\u003cli\u003eOpenAI. GPT-4 Technical Report. arXiv preprint arXiv:2303.08774. 2023.\u003c/li\u003e\n\u003cli\u003eReiber JH, Serruys PW, Kooijman CJ, et al. Assessment of short-, medium-, and long-term variations in arterial dimensions from computer-assisted quantitation of coronary cineangiograms. Circulation. 1985;71(2):280-288.\u003c/li\u003e\n\u003cli\u003eLandis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33(1):159-174.\u003c/li\u003e\n\u003cli\u003eG\u0026uuml;neş YC, Cesur T. Large Language Models: Could They Be the Next Generation of Clinical Decision Support Systems in Cardiovascular Diseases? Anatol J Cardiol. 2024;28(7):371-372.\u003c/li\u003e\n\u003cli\u003eTopol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med. 2019;25(1):44\u0026ndash;56.\u003c/li\u003e\n\u003cli\u003eDavenport T, Kalakota R. The potential for artificial intelligence in healthcare. Future Healthc J. 2019;6(2):94\u0026ndash;98.\u003c/li\u003e\n\u003cli\u003eLarge Language Models for disease diagnosis: a scoping review. NPJ Digit Med. 2025;8:11.\u003c/li\u003e\n\u003cli\u003eHulten E, Villines TC, Cheezum MK, et al. The role of coronary CT angiography in the diagnosis and management of coronary artery disease. J Nucl Cardiol. 2017;24(5):1609\u0026ndash;1624.\u003c/li\u003e\n\u003cli\u003eKnuuti J, Wijns W, Saraste A, et al. 2019 ESC Guidelines for the diagnosis and management of chronic coronary syndromes. Eur Heart J. 2020;41(3):407\u0026ndash;477.\u003c/li\u003e\n\u003cli\u003eClinical expert consensus document on quantitative coronary angiography. Cardiovasc Interv Ther. 2020;35(2):105\u0026ndash;116.\u003c/li\u003e\n\u003cli\u003ePark SH. Artificial intelligence in radiology: practical issues and challenges. Radiology. 2018;287(3):749\u0026ndash;772.\u003c/li\u003e\n\u003cli\u003eLarge Language Models in Medical Image Analysis: A Systematic Review. Bioengineering (Basel). 2025;12(8):818.\u003c/li\u003e\n\u003cli\u003eMultimodal Large Language Models in Medical Imaging. Korean J Radiol. 2025;26(9):843\u0026ndash;853.\u003c/li\u003e\n\u003cli\u003eBest Practices for the Safe Use of Large Language Models and Generative AI in Radiology. Radiology. 2025;312(3):e241516.\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Coronary CT Angiography (CCTA), Artificial Intelligence, Large Language Models (LLM), Quantitative Coronary Angiography (QCA), Diagnostic Accuracy, Machine Learning","lastPublishedDoi":"10.21203/rs.3.rs-9135888/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-9135888/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eBackground: Coronary CT angiography (CCTA) is a first-line diagnostic modality for coronary artery disease (CAD), yet its interpretation requires significant expert experience. Although general-purpose multimodal artificial intelligence (GP-AI) models have shown promise in text-based medical tasks, their visual diagnostic performance in evaluating complex CCTA data remains poorly defined.\u003c/p\u003e\n\u003cp\u003eMethods: This single-center retrospective study included 63 patients (252 vessel-based image sets) who underwent both CCTA and invasive coronary angiography. Expert physician consensus and four frontier GP-AI models (GPT-4o, Gemini 2.5, Claude 3.5 Sonnet, and Grok 4) evaluated identical standardized static images using a zero-shot approach with default generation parameters. Obstructive disease was defined as ≥\u003cstrong\u003e \u003c/strong\u003e50% luminal stenosis. Diagnostic performance was validated against expert consensus for plaque characterization and quantitative coronary angiography (QCA) for stenosis severity.\u003c/p\u003e\n\u003cp\u003eResults: Expert consensus demonstrated robust agreement with QCA across all coronary territories (κ = 0.774–0.933, p \u0026lt; 0.001). In contrast, a marked performance disparity was observed for the GP-AI models; none achieved statistically significant agreement with QCA in the prognostically critical left anterior descending (LAD) or left main coronary arteries (LMCA) (p \u0026gt; 0.05). While Gemini 2.5 showed a moderate correlation in the right coronary artery (ICC = 0.515), overall continuous stenosis assessment and plaque characterization remained uniformly limited and clinically unreliable across all models.\u003c/p\u003e\n\u003cp\u003eConclusion: Expert physician interpretation remains the reference standard for CCTA. Current frontier GP-AI models are not suitable for independent clinical interpretation of coronary imaging, particularly in anatomically complex segments. These findings emphasize that general visual reasoning cannot yet replace domain-specific cardiovascular AI solutions or expert clinical judgment in specialized radiological tasks.\u003c/p\u003e","manuscriptTitle":"Diagnostic Performance of Expert Physicians Versus General-Purpose Artificial Intelligence Using Standardized Static Coronary CT Images: A Dual-Reference Validation Study","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-04-05 17:05:57","doi":"10.21203/rs.3.rs-9135888/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"423536f1-d038-4c7b-bb27-322764fbd38d","owner":[],"postedDate":"April 5th, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2026-05-01T07:53:24+00:00","versionOfRecord":[],"versionCreatedAt":"2026-04-05 17:05:57","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-9135888","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-9135888","identity":"rs-9135888","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.