Diagnostic Performance of Expert Physicians Versus General-Purpose Artificial Intelligence Using Standardized Static Coronary CT Images: A Dual-Reference Validation | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Diagnostic Performance of Expert Physicians Versus General-Purpose Artificial Intelligence Using Standardized Static Coronary CT Images: A Dual-Reference Validation Sefa Okar, ZİYA GÖKALP BİLGEL, İSA GÖKTÜRK BALCI, GÜRCAN ERBAY, and 1 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-8997340/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Background Coronary CT angiography (CCTA) is a first-line diagnostic modality for coronary artery disease (CAD), yet its interpretation requires significant expert experience. Although general-purpose multimodal artificial intelligence (GP-AI) models have shown promise in text-based medical tasks, their visual diagnostic performance in evaluating complex CCTA data remains poorly defined. Methods This single-center retrospective study included 63 patients (252 vessel-based image sets) who underwent both CCTA and invasive coronary angiography. Expert physician consensus and four frontier GP-AI models (GPT-4o, Gemini 2.5, Claude 3.5 Sonnet, and Grok 4) evaluated identical standardized static images using a zero-shot approach with default generation parameters. Obstructive disease was defined as ≥ 50% luminal stenosis. Diagnostic performance was validated against expert consensus for plaque characterization and quantitative coronary angiography (QCA) for stenosis severity. Results Expert consensus demonstrated robust agreement with QCA across all coronary territories (kappa = 0.774–0.933, p < 0.001). In contrast, a marked performance disparity was observed for the GP-AI models; none achieved statistically significant agreement with QCA in the prognostically critical left anterior descending (LAD) or left main coronary arteries (LMCA) (p > 0.05). While Gemini 2.5 showed a moderate correlation in the right coronary artery (ICC = 0.515), overall continuous stenosis assessment and plaque characterization remained uniformly limited and clinically unreliable across all models. Conclusion Expert physician interpretation remains the reference standard for CCTA. Current frontier GP-AI models are not suitable for independent clinical interpretation of coronary imaging, particularly in anatomically complex segments. These findings emphasize that general visual reasoning cannot yet replace domain-specific cardiovascular AI solutions or expert clinical judgment in specialized radiological tasks. Health sciences/Cardiology Biological sciences/Computational biology and bioinformatics Health sciences/Diseases Health sciences/Medical research Coronary CT Angiography (CCTA) Artificial Intelligence Large Language Models (LLM) Quantitative Coronary Angiography (QCA) Diagnostic Accuracy Machine Learning Figures Figure 1 Figure 2 Figure 3 1. INTRODUCTION Coronary artery disease (CAD) remains the leading cause of morbidity and mortality globally [ 1 ]. Due to its ability to provide non-invasive imaging of coronary atherosclerosis and stenosis, recent guidelines have established Coronary CT Angiography (CCTA) as the "first-line test" [ 1 , 2 , 9 ] and the "cornerstone" of diagnostic management [ 3 ]. Beyond merely identifying lumen stenosis, CCTA provides critical data regarding coronary anatomy and plaque characterization [ 3 , 4 ]. Currently, the interpretation of these images relies heavily on the expertise of specialized cardiologists and radiologists [ 5 ]. However, as reported in previous studies, visual assessment of CCTA is time-consuming, highly dependent on reader experience, and hallmarked by significant interobserver variability [ 3 , 5 , 21 , 22 ]. To address these limitations, artificial intelligence (AI) technologies are increasingly being integrated into cardiovascular imaging [ 5 ]. "Narrow" AI tools specifically trained for CCTA—such as Cleerly and HeartFlow—have successfully entered clinical workflows, offering automated vessel segmentation and stenosis grading [ 3 , 6 , 28 ]. Extensive clinical trials have demonstrated that these purpose-built platforms enhance diagnostic consistency and reduce reading times. In contrast, the role of general-purpose multimodal artificial intelligence (GP-AI) like ChatGPT, Gemini, Grok, and Claude in medical imaging remains a subject of intense debate. The vast majority of existing studies have evaluated these models solely on text-based tasks [ 7 ]. The only notable study testing visual diagnostic capability was limited to pediatric chest radiographs, where models performed at a "chance level," failing to demonstrate consistent radiological reasoning [ 8 ]. To date, there is virtually no data rigorously evaluating the ability of these multimodal models to interpret coronary angiography. This study aims to bridge that specific gap. Moving beyond text-based benchmarking, we directly tested the visual diagnostic performance of four frontier GP-AI models (Gemini 2.5, Grok 4, Claude 3.5 Sonnet, and ChatGPT-4o). In our study design, both the AI models and the expert physician consensus were presented with the exact same standardized static images (curved multiplanar reconstruction (cMPR)). We then evaluated the diagnostic accuracy of their interpretations against two rigorous reference standards: the full clinical CCTA reports and invasive quantitative coronary angiography (QCA). 2. MATERIALS AND METHODS 2.1. Study Design and Study Population This study was designed as a retrospective diagnostic accuracy investigation conducted at a single tertiary referral center between May 2022 and December 2024. Consecutive patients who underwent Coronary Computed Tomography Angiography (CCTA) for clinical indications and subsequently received invasive coronary angiography (ICA) within 30 days were screened for inclusion (n = 66). To ensure a methodologically rigorous evaluation focused exclusively on native coronary artery anatomy, patients with a history of prior coronary stent implantation or coronary artery bypass grafting (CABG) were excluded (n = 3). Following the application of these exclusion criteria, a total of 63 patients constituted the final study cohort. All procedures were performed in accordance with the principles of the Declaration of Helsinki. This study was approved by Baskent University Institutional Review Board (Project no: KA25/407) and supported by Baskent University Research Fund. Given the retrospective nature of the study, the requirement for written informed consent was waived by the institutional review board. 2.2. Imaging Protocol and Coronary Computed Tomography Angiography Acquisition. To create a standardized visual dataset for a fair comparison between human experts and GP-AI models, representative 2D cMPR keyframes were selected by a dedicated 'preparation team' of two cardiovascular imaging specialists. This team was unblinded, with full access to patients' complete 3D CCTA volumetric datasets, clinical histories, original radiology reports, and reference invasive coronary angiography (ICA) results. Their objective was to deliberately isolate the optimal static frame demonstrating the most severe lesion or characteristic pathology for each vessel. Subsequently, these optimized static images were evaluated by two independent, blinded expert readers. Having no access to the clinical context, 3D CCTA volumes, or ICA results, these physicians reviewed the images under strictly identical, constrained visual conditions as the GP-AI models. This two-stage design effectively eliminated clinical and spatial context as confounding variables 2.3. Image Standardization and Selection of Representative Keyframes To enable a fair and objective comparison between human experts and general-purpose artificial intelligence (GP-AI) models, a 'Representative Keyframe Selection Protocol' was implemented to create a standardized visual dataset (Fig. 1 ). In the preparatory phase of this process, a dedicated preparation team consisting of a Level-3 cardiovascular radiologist and a Level-2 cardiologist reviewed the scans, providing full (unblinded) access to all 3D CCTA volumetric datasets, clinical histories, original radiology reports, and reference invasive coronary angiography (ICA) results of the patients. The primary goal of this team was to intentionally isolate two high-resolution static cMPR frames that optimally showed the most severe lesion or characteristic pathology for each major epicardial coronary artery (LAD, LCx, and RCA). These selected images (JPEG) were completely anonymized by removing all patient identifiers and radiological markings. The main diagnostic evaluation phase of the study was performed in a blinded manner by two independent expert readers (a radiologist and a cardiologist). These physicians, having no access to the patients' clinical context, 3D CCTA volumes, or reference ICA results, evaluated the images under completely identical and restricted visual conditions, just like the GP-AI models. This two-stage design successfully eliminated the influence of clinical and spatial context as confounding variables and ensured that the pure visual recognition capabilities of both human experts and GP-AI models were tested against a standard reference. 2.4. GP-AI Models and Evaluation Methodology In this study, four frontier general-purpose multimodal artificial intelligence (GP-AI) models were evaluated using their most advanced versions available as of February 2026: GPT-4o (OpenAI), Gemini 2.5 (Google), Claude 3.5 Sonnet (Anthropic), and Grok 4 (xAI) [ 10 , 11 , 12 ]. All models were accessed through their official web-based interfaces using default generation parameters. To ensure methodological consistency, minimize stochastic variability, and address reproducibility concerns, each image was processed in a new and independent chat session with only a single iteration to prevent cross-contamination of data (zero-shot evaluation). For each major epicardial coronary artery, both GP-AI models and blinded expert readers were required to report predefined diagnostic parameters based on the standardized image sets ( Fig. 2 ) . These parameters included: (i) the presence or absence of obstructive coronary stenosis (defined as ≥ 50% luminal narrowing relative to reference vessel diameter), (ii) the estimated percentage of luminal narrowing, and (iii) the morphological classification of detected atherosclerotic plaques as calcified, non-calcified (soft), or mixed. 2.5. Reference Standards To avoid subjective interpretation in the assessment of coronary artery lumen stenosis, Quantitative Coronary Angiography (QCA) was established as the reference standard. Invasive coronary angiography procedures were performed using the Siemens Artis zee angiography system. All angiographic images were analyzed by an independent interventional cardiologist blinded to CCTA and AI findings using syngo QCA software (Siemens Healthineers, Erlangen, Germany). In accordance with standard validation protocols [ 14 ], an automated edge-detection algorithm was used to determine arterial contours. Calibration was achieved using catheter tip contrast filling as a reference scaling tool. Minimal lumen diameter (MLD) and reference vessel diameter (RVD) were measured from the end-diastolic square, indicating the most severe stenosis and the least foreshortening (Fig. 3 ). A stenosis threshold of ≥ 50% relative to the reference vessel diameter was defined as hemodynamically significant obstructive coronary artery disease. For plaque characterization, considering the limitations of conventional angiography in differentiating plaque components, expert physician consensus, validated by an experienced cardiologist and radiologist with full access to the patients' volumetric CCTA datasets, was accepted as the reference standard. 2.6. Statistical Analysis The diagnostic performance of artificial intelligence models and human readers was evaluated by calculating sensitivity, specificity, positive predictive value, and negative predictive value. Inter-method agreement for categorical variables was assessed using Cohen’s Kappa (κ) coefficient, interpreted according to the criteria established by Landis and Koch [ 13 ]. All statistical analyses were performed using SPSS version 26.0 (IBM Corp., Armonk, NY, USA), and a two-sided p-value of < 0.05 was considered statistically significant. 3. RESULTS 3.1. Baseline Patient Characteristics The final study cohort comprised 63 patients with a mean age of $ 60.25 \pm 10.59 $ years. The population demonstrated a high prevalence of cardiovascular risk factors: hypertension was present in 58.7% of patients, diabetes mellitus in 38.1%, and a family history of coronary artery disease in 68.3%. Based on the reference invasive coronary angiography (ICA) results, obstructive coronary artery disease (CAD) was identified in 55.6% of the cohort, reflecting a significant disease burden. Regarding clinical management following ICA, 27 patients (42.9%) underwent percutaneous coronary intervention (PCI), while 8 patients (12.7%) were referred for coronary artery bypass grafting (CABG) (Table 1 ). Table 1 Baseline characteristics of the study cohort Characteristic Value Age, years (Mean ± SD) 60.25 ± 10.59 Sex (Male), n (%) 41 (65.1%) BMI, kg/m² (Median) 27.12 Hypertension, n (%) 37 (58.7%) Diabetes Mellitus, n (%) 24 (38.1%) Family History of CAD, n (%) 43 (68.3%) No Obstructive CAD (< 50% stenosis), n (%) 28 (44.4%) Obstructive CAD (≥50% stenosis), n (%) 35 (55.6%) Medical Therapy, n (%) 28 (44.4%) PCI, n (%) 27 (42.9%) CABG, n (%) 8 (12.7%) 3.2. Reliability of Human Expert Consensus Inter-reader reliability among human observers was first evaluated. For the detection of significant coronary stenosis, inter-observer agreement was generally moderate across the main epicardial coronary arteries. Cohen’s Kappa values ranged from 0.477 to 0.519 for the left anterior descending artery (LAD), left circumflex artery (LCx), and right coronary artery (RCA) (Table 2 ). In contrast, agreement for the left main coronary artery (LMCA) was markedly lower (κ = 0.096). The restriction of assessment to predefined static image frames may reduce spatial and contextual information, increasing variability related to blooming artifacts and partial volume effects. Based on these findings, a third-reader adjudication process was applied to establish the final expert consensus reference. Table 2 Inter-Observer Agreement (Physician 1 vs. Physician 2) Vessel Kappa (p) ICC (95% CI) p-value LMCA 0.096 (0.029) 0.553 (0.354–0.705) < 0.001 LAD 0.501 (< 0.001) 0.647 (0.475–0.771) < 0.001 CX 0.519 (< 0.001) 0.622 (0.442–0.754) < 0.001 RCA 0.477 (< 0.001) 0.790 (0.674–0.868) 50%) Stenosis When compared with invasive quantitative coronary angiography (QCA) as the reference standard, the Expert Physician Consensus demonstrated consistently high diagnostic performance across all coronary territories. Agreement with QCA was strong, with Cohen’s Kappa values ranging from 0.774 to 0.933 (all p < 0.001), accompanied by high sensitivity and specificity, generally exceeding 90%. These findings indicate that experienced readers maintain close concordance with invasive reference measurements despite reliance on standardized static image inputs. In contrast, all evaluated GP-AI models exhibited substantial diagnostic limitations. Notably, none of the GP-AI models achieved statistically significant agreement with QCA in the LAD or LMCA, the most clinically critical coronary segments. For example, in the LAD, Gemini 2.5 achieved a Kappa value of only 0.160 (p = 0.153), while ChatGPT-4o demonstrated virtually no agreement (kappa = 0.004, p = 0.963). Although marginal statistical significance was observed for selected models in the LCx and RCA, overall diagnostic accuracy remained below thresholds required for clinical applicability. For example, in the LAD, Gemini 2.5 achieved a Kappa value of only 0.160 (p = 0.153), while ChatGPT-4o demonstrated virtually no agreement (κ = 0.004). Detailed diagnostic performance metrics for each coronary artery are provided in Tables 3 . Table 3 Diagnostic Performance for LMCA-LAD-CX-RCA Stenosis (> 50%) vs. QCA Evaluator Kappa (κ) p-value Sensitivity Specificity PPV NPV Accuracy Gemini 2.5- LMCA -0.016 0.897 0% 98.4% 0% 98.4% 96.8% Grok 4- LMCA 0.000 1.000 - 100% - 100% 100% Claude 3.5- LMCA -0.025 0.820 0% 95.1% 0% 98.3% 93.5% ChatGPT-4o- LMCA 0.000 1.000 - 100% - 100% 100% Consensus- LMCA 1.000 < 0.001 100% 100% 100% 100% 100% Gemini 2.5- LAD 0.160 0.153 78.6% 38.2% 51.2% 68.4% 56.5% Grok 4- LAD 0.109 0.290 82.1% 29.4% 48.9% 66.7% 53.2% Claude 3.5- LAD 0.064 0.573 71.4% 35.3% 47.6% 60.0% 51.6% ChatGPT-4o- LAD 0.004 0.963 85.7% 14.7% 45.3% 55.6% 46.8% Consensus- LAD 0.774 < 0.001 92.9% 85.3% 83.9% 93.5% 87.1% Gemini 2.5- CX 0.342 0.007 50.0% 82.9% 58.8% 77.3% 72.1% Grok 4- CX 0.037 0.771 42.9% 61.0% 36.0% 67.6% 54.8% Claude 3.5- CX 0.136 0.285 42.9% 70.7% 42.9% 70.7% 61.3% ChatGPT-4o- CX 0.218 0.060 71.4% 53.7% 44.1% 78.6% 59.7% Consensus- CX 0.789 < 0.001 90.5% 90.2% 82.6% 94.9% 90.5% Gemini 2.5- RCA 0.338 0.008 64.0% 70.3% 59.3% 74.3% 67.7% Grok 4- RCA 0.245 0.042 72.0% 54.1% 51.4% 74.1% 61.3% Claude 3.5- RCA 0.214 0.090 48.0% 73.0% 54.5% 67.5% 62.9% ChatGPT-4o- RCA 0.281 0.025 64.0% 64.9% 55.2% 72.7% 64.5% Consensus- RCA 0.933 < 0.001 96.0% 97.3% 96.0% 97.3% 96.8% 3.4. Continuous Stenosis Assessment and GP-AI Plaque Characterization Analysis of continuous stenosis severity revealed a pronounced performance gap between expert readers and GP-AI models. The Expert Consensus showed high concordance with QCA-derived stenosis grades across all coronary arteries. In contrast, the majority of GP-AI models failed to demonstrate meaningful continuous correlation with invasive reference measurements. Gemini 2.5 represented a partial exception, achieving moderate intraclass correlation in the right coronary artery (ICC = 0.515, p < 0.001). This finding constitutes the only instance of statistically significant continuous agreement between a GP-AI model and QCA observed in the present study. Nevertheless, no GP-AI model demonstrated consistent or clinically acceptable continuous agreement across more than one coronary territory. Detailed ICC results are presented in Table 4 . With respect to plaque characterization, performance across all GP-AI models was uniformly limited. Agreement with the Expert Consensus regarding plaque morphology (calcified, non-calcified, or mixed) was largely non-significant. Only Gemini 2.5 demonstrated slight-to-fair agreement in the LCx and RCA territories; however, these levels of concordance remain insufficient for reliable clinical interpretation. Plaque characterization results are summarized in Table 5 . Table 4 Agreement Between AI Models and QCA-Derived Stenosis Grades Parameter Gemini ICC (95% CI) p Grok ICC (95% CI) p Claude ICC (95% CI) p ChatGPT ICC (95% CI) p Consensus ICC (95% CI) p QCA LMCA grade 0.05 (− 0.20 to 0.295) 0.348 0.00 (− 0.248 to 0.248) 0.500 0.269 (0.022 to 0.485) 0.017 0.206 (− 0.044 to 0.432) 0.053 0.612 (0.43 to 0.747) < 0.001 QCA LAD grade 0.159 (− 0.092 to 0.392) 0.106 0.187 (− 0.064 to 0.416) 0.071 0.091 (− 0.16 to 0.332) 0.239 0.111 (− 0.141 to 0.349) 0.194 0.908 (0.853 to 0.944) < 0.001 QCA CX grade 0.401 (0.168 to 0.592) 0.001 0.143 (− 0.109 to 0.378) 0.131 0.274 (0.027 to 0.488) 0.015 0.141 (− 0.111 to 0.375) 0.136 0.847 (0.758 to 0.905) < 0.001 QCA RCA grade 0.515 (0.306 to 0.677) < 0.001 0.327 (0.086 to 0.532) 0.004 0.235 (− 0.013 to 0.457) 0.032 0.305 (0.062 to 0.514) 0.007 0.942 (0.906 to 0.965) < 0.001 ICC: Intraclass Correlation Coefficient (95% Confidence Interval) Table 5 Agreement on Plaque Characterization (Kappa) Vessel AI Model Kappa (κ) Agreement Level LMCA All Models Non-significant None LAD All Models Non-significant None CX Gemini 2.5 0.150 (p = 0.025) Slight RCA Gemini 2.5 0.203 (p = 0.009) Fair 4. DISCUSSION This study provides a critical objective clinical evaluation revealing the true diagnostic limitations of general-purpose multimodal artificial intelligence (GP-AI) models in complex cardiovascular imaging tasks. While these models excel in text-based and semantic reasoning, they exhibit significant deficiencies in areas requiring advanced visual-anatomical interpretation, such as coronary CT angiography (CCTA). Our findings reveal a significant “semantic–visual gap,” defining the fundamental divergence between linguistic intelligence and reliable radiological assessment [ 15 – 17 ]. A critical methodological element in interpreting this study is that both human experts and GP-AI models performed assessments on identical standardized static JPEG images. Neither group was provided with the full volumetric CCTA datasets during the assessment process. Instead, two high-resolution curved MPR frames, intentionally selected to depict the most severe lesions or characteristic anatomy, were used for each major epicardial coronary artery. Thus, the comparison between human and artificial intelligence was performed on a fully symmetrical and methodologically fair basis in terms of data input. Evaluations were based on two separate reference standards. Plaque morphology was compared with expert physician consensus based on all volumetric CCTA data; the degree of luminal stenosis and the presence of $ \ge 50\% $ obstructive disease were analyzed based on quantitative coronary angiography (QCA) results. This dual-reference approach provides a sound validation basis for the biological and technical nature of plaque characterization and hemodynamically significant stenosis detection [ 18 – 20 ]. The significant performance difference between GP-AI models and human experts is largely related to the diagnostic challenges inherent in the static two-dimensional imaging approach. The moderate-to-low inter-observer agreement observed among human experts—specifically the markedly low agreement in the LMCA ( $ \kappa = 0.096 $ )—highlights the intrinsic difficulty of this task. This low concordance in the LMCA is likely due to its short anatomical length, the relatively lower prevalence of isolated obstructive disease in this segment, and its high susceptibility to artifacts in static frames. These factors create a 'diagnostic blind spot' when volumetric scrolling is unavailable, explicitly demonstrating that static-image CCTA assessment is inherently constrained, even for highly trained specialists. Consequently, it is entirely unsurprising that GP-AI models, which lack both contextual clinical knowledge and full three-dimensional spatial reasoning, failed significantly in the LMCA and LAD. These 'small-target' segments require advanced anatomical contextual inference that current multimodal architectures lack. The models' inability to achieve statistical significance in these prognostically critical territories—often misinterpreting vascular overlaps, blooming artifacts, or calcium shadows as stenosis—represents a major barrier to their clinical reliability and increases the risk of severe misinterpretations [ 23 , 25 , 26 ]. Within this general picture of failure, the limited performance improvement observed in the right coronary artery (RCA) constitutes a notable exception. The more isolated anatomical course and relatively lower artifact load of the RCA may have allowed some models, particularly Gemini 2.5, to partially capture the change in stenosis severity on a continuous scale. However, the discrepancy between continuous correlation and binary classification success reveals that the models struggle to define clinically meaningful decision limits and exhibit a tendency toward systematic overestimation. Finally, while the GP-AI models used (GPT-4o, Gemini 2.5, Claude 3.5 Sonnet, and Grok 4) were the most current versions as of February 2026, the rapid evolution of AI technologies means that more advanced versions are quickly introduced. However, since this study aimed to reveal the fundamental competency limitations of general-purpose architectures in tasks requiring expert-level radiological evaluation, the findings retain their conceptual and methodological validity. Limitations Our study design carries specific inherent limitations that must be explicitly acknowledged. First, the unblinded selection of key frames by the preparation team introduces a potential selection bias, as it intentionally guides the presented images toward the most severe lesion. While this does not reflect real-world volumetric CCTA reading where the clinician must actively search for pathology, it was a necessary methodological compromise to establish a standardized ground truth for evaluating pure visual diagnostic capacity. Second, because this study evaluates decision-making within a constrained framework utilizing only a few standardized static images per vessel rather than actual clinical volumetric interpretation, extrapolation of these findings to real-world, unconstrained clinical practice should be interpreted with caution. 5. CONCLUSION This study demonstrates that consensus interpretation generated by experienced physicians exhibits strong agreement with invasive quantitative coronary angiography (QCA) when using a ≥ 50% stenosis threshold and remains the most reliable reference approach in coronary CT angiography (CCTA) assessment. In contrast, current frontier GP-AI models have shown significant limitations in terms of visual diagnostic performance. Despite their known competencies in text-based medical tasks, these systems have failed to achieve a consistent and clinically acceptable level of accuracy when applied to static two-dimensional CCTA images, particularly in anatomically and prognostically critical coronary segments. These findings indicate that direct transfer of general visual reasoning capabilities to highly specialized cardiovascular imaging tasks is not feasible under current conditions. Therefore, at the current stage, it is not medically appropriate for GP-AI systems to replace physicians in interpreting CCTA results or for patients to directly consult these systems for diagnostic decisions. CCTA assessment currently requires the clinical experience, anatomical knowledge, and contextual interpretation skills of expert cardiologists and radiologists. The limited performance signal observed with the Gemini 2.5 model in the right coronary artery (RCA) suggests that clinically relevant feature recognition capacity may be partially present in some pioneering architectures. However, this finding is insufficient to support the independent clinical use of GP-AI models, suggesting that the identified shortcomings are related to a lack of domain adaptation and input representation. In conclusion, current GP-AI models are not suitable for the independent clinical interpretation of coronary CT angiography. Importantly, this limitation applies specifically to broad, non-domain-specific architectures and does not detract from the proven clinical value of dedicated, task-specific cardiovascular AI platforms. Their potential as supportive tools in the future can only be achieved through the development of architectures incorporating domain-specific targeted training, radiological physics, and three-dimensional spatial relationships, and their use under expert physician supervision. Declaration of Generative AI and AI-assisted technologies in the writing process During the preparation of this work the authors used Google Gemini in order to assist with language editing, formatting, and citation placement. After using this tool/service, the authors reviewed and edited the content as needed and take full responsibility for the content of the published article. An example of a Curved Multi-Planar Reformat (Curved MPR) image of the Right Coronary Artery (RCA) from two standardized sequences. This technique straightens the tortuous vessel along the centerline to clearly visualize the vessel trajectory and plaque burden and facilitate lumen assessment. This structured command set was sent to all artificial intelligence models evaluated in the study. This template enabled the models to analyze visual data and report the presence of significant stenosis (> 50%), estimated stenosis percentage, plaque characterization, and confidence score in a standardized format. Declarations Competing Interests The authors declare no competing interests. Funding This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. Author Contribution S.O. conceptualized the study, developed the methodology, conducted the AI evaluations and formal statistical analysis, and wrote the original draft. Z.G.B. and I.G.B. acquired the clinical data, curated the CCTA datasets, and performed the expert evaluations. G.E. and M.Y. provided critical supervision, validated the reference standards, and guided the clinical interpretation. All authors reviewed and approved the manuscript. Acknowledgement Declaration of AI-assisted technologies in the writing process: > During the preparation of this work, the authors used generative AI (Gemini) to assist with language editing, formatting, and structural refinement. After using this tool, the authors thoroughly reviewed and edited the content as needed and take full responsibility for the content and scientific integrity of the published article. We have no other specific acknowledgements to declare. Data Availability The datasets generated and analyzed during the current study are not publicly available due to institutional restrictions regarding patient privacy. De-identified data are available from the corresponding author upon reasonable request and subject to institutional ethical approval. References Vrints CJM, Senior R, Crea F, et al. 2024 ESC Guidelines for the management of chronic coronary syndromes. Eur Heart J . 2024;45(36):3415–3537. Kelion AD, Nicol ED. The rationale for the primacy of coronary CT angiography in the National Institute for Health and Care Excellence (NICE) guideline (CG95) for the investigation of chest pain of recent onset. J Cardiovasc Comput Tomogr . 2018;12:516–22. Conte E, Sala E. AI-assisted CCTA: supporting diagnosis across the CAD spectrum. Int J Cardiovasc Imaging . 2025;41:825–826. D’Costa Z, Karlsberg RP, Cho GW. Artificial-intelligence-assisted CCTA quantifies sex differences in coronary atherosclerotic burden at low atheroma volumes. IJC Heart & Vasculature . 2025;60:101758. Liao J, Huang L, Qu M, Chen B, Wang G. Artificial Intelligence in Coronary CT Angiography: Current Status and Future Prospects. Front Cardiovasc Med . 2022;9:896366. Chen M, Wang X, Hao G, et al. Diagnostic performance of deep learning-based vascular extraction and stenosis detection technique for coronary artery disease. Br J Radiol . 2020;93:20191028. Sarangi PK, Datta S, Panda BB, et al. Evaluating ChatGPT-4’s Performance in Identifying Radiological Anatomy in FRCR Part 1 Examination Questions. Indian J Radiol Imaging . 2025;35:287–294. Gillette J, Lu M, Heston TF. Large Language Models Perform at Chance Level in the Diagnosis of Pediatric Pneumonia Using Chest Radiographs. Cureus . 2025;17(9):e92596. Abbara S, Blanke P, Maroules CD, et al. SCCT guidelines for the performance and acquisition of coronary computed tomographic angiography: A report of the society of Cardiovascular Computed Tomography Guidelines Committee. J Cardiovasc Comput Tomogr . 2016;10(6):435–449. Gemini Team, Google. Gemini: A Family of Highly Capable Multimodal Models. arXiv preprint arXiv:2312.11805 . 2023. Anthropic. The Claude 3 Model Family: Opus, Sonnet, Haiku. Technical Report . 2024. OpenAI. GPT-4 Technical Report. arXiv preprint arXiv:2303.08774 . 2023. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics . 1977;33(1):159–174. Reiber JH, Serruys PW, Kooijman CJ, et al. Assessment of short-, medium-, and long-term variations in arterial dimensions from computer-assisted quantitation of coronary cineangiograms. Circulation . 1985;71(2):280–288. Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med . 2019;25(1):44–56. Davenport T, Kalakota R. The potential for artificial intelligence in healthcare. Future Healthc J . 2019;6(2):94–98. Large Language Models for disease diagnosis: a scoping review. NPJ Digit Med . 2025;8:11. Hulten E, Villines TC, Cheezum MK, et al. The role of coronary CT angiography in the diagnosis and management of coronary artery disease. J Nucl Cardiol . 2017;24(5):1609–1624. Knuuti J, Wijns W, Saraste A, et al. 2019 ESC Guidelines for the diagnosis and management of chronic coronary syndromes. Eur Heart J . 2020;41(3):407–477. Clinical expert consensus document on quantitative coronary angiography. Cardiovasc Interv Ther . 2020;35(2):105–116. van Assen M, De Cecco CN, et al. Inter-observer variability of coronary artery calcium scoring and CCTA interpretation. J Cardiovasc Comput Tomogr . 2019;13(4):228–233. Budoff MJ, et al. Interobserver variability among expert readers quantifying plaque volume on coronary CT angiography. J Cardiovasc Comput Tomogr . 2022;16(6):501–507. Park SH. Artificial intelligence in radiology: practical issues and challenges. Radiology . 2018;287(3):749–772. Large Language Models in Medical Image Analysis: A Systematic Review. Bioengineering (Basel) . 2025;12(8):818. Multimodal Large Language Models in Medical Imaging. Korean J Radiol . 2025;26(9):843–853. Best Practices for the Safe Use of Large Language Models and Generative AI in Radiology. Radiology . 2025;312(3):e241516. Interobserver variability of coronary stenosis characterization and its relation to plaque composition. JACC Cardiovasc Imaging . 2024. CathAI: fully automated coronary angiography interpretation and stenosis estimation. NPJ Digit Med . 2023;6(1):142. Additional Declarations No competing interests reported. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-8997340","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":601368069,"identity":"2d06f7d1-304e-4c77-8184-3f220d279278","order_by":0,"name":"Sefa Okar","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA4ElEQVRIie3PMQrCMBSA4QeBdHnaNaLoFSqBINjDOHVKdwdxdPQCgmdwco4E6/JwFlwUQcGpIIiT2Do5tXUTzA8JCeQjCYDL9YuZbAwBAgCWL0WrGqE34YOc4DcEg3xfTur7zfFgRqH0Z3S77kY9BM+uF0WksdUyMEmkxDZe9nWSPQyjaFdEAkLeTLkNgWpLqXlGBKoS4l0e5mnDDuFZ6mclAgpWE6uy69gpnlQgDUIpVtNIdokrFk8F8rK/1Mk7puYedufETjd9H7d9zyaF5DMu3nPV43ks/ea0y+Vy/U8vqs9Jq3zmXqgAAAAASUVORK5CYII=","orcid":"","institution":"Başkent University","correspondingAuthor":true,"prefix":"","firstName":"Sefa","middleName":"","lastName":"Okar","suffix":""},{"id":601368070,"identity":"353f9567-2eb6-4c03-937d-5688fc171d68","order_by":1,"name":"ZİYA GÖKALP BİLGEL","email":"","orcid":"","institution":"Başkent University","correspondingAuthor":false,"prefix":"","firstName":"ZİYA","middleName":"GÖKALP","lastName":"BİLGEL","suffix":""},{"id":601368071,"identity":"82665b04-d971-474c-a2d8-0e88d5d96cab","order_by":2,"name":"İSA GÖKTÜRK BALCI","email":"","orcid":"","institution":"Başkent University Hospital","correspondingAuthor":false,"prefix":"","firstName":"İSA","middleName":"GÖKTÜRK","lastName":"BALCI","suffix":""},{"id":601368074,"identity":"d59c113a-8fae-4c65-8f39-69310446a0f2","order_by":3,"name":"GÜRCAN ERBAY","email":"","orcid":"","institution":"Başkent University Hospital","correspondingAuthor":false,"prefix":"","firstName":"GÜRCAN","middleName":"","lastName":"ERBAY","suffix":""},{"id":601368083,"identity":"cf43989c-01f2-42d2-97fe-2b1204b5fd6d","order_by":4,"name":"MUSTAFA YILMAZ","email":"","orcid":"","institution":"Başkent University","correspondingAuthor":false,"prefix":"","firstName":"MUSTAFA","middleName":"","lastName":"YILMAZ","suffix":""}],"badges":[],"createdAt":"2026-02-28 18:23:14","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-8997340/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-8997340/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":104340149,"identity":"56c4e43f-cadb-486e-a07e-86df5e3e67f8","added_by":"auto","created_at":"2026-03-10 16:32:48","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":217318,"visible":true,"origin":"","legend":"\u003cp\u003eStudy Methodology for Artificial Intelligence Assessment\u003c/p\u003e\n\u003cp\u003eAn example of a Curved Multi-Planar Reformat (Curved MPR) image of the Right Coronary Artery (RCA) from two standardized sequences. This technique straightens the tortuous vessel along the centerline to clearly visualize the vessel trajectory and plaque burden and facilitate lumen assessment.\u003c/p\u003e","description":"","filename":"FIGURE1.png","url":"https://assets-eu.researchsquare.com/files/rs-8997340/v1/7cbb8bec51f2f1c4975052ca.png"},{"id":104340146,"identity":"1e04f269-d720-45c1-b55a-e1e05cc87d91","added_by":"auto","created_at":"2026-03-10 16:32:47","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":58388,"visible":true,"origin":"","legend":"\u003cp\u003eStandard JSON Prompt Template Used to Query Artificial Intelligence Models.\u003c/p\u003e\n\u003cp\u003eThis structured command set was sent to all artificial intelligence models evaluated in the study. This template enabled the models to analyze visual data and report the presence of significant stenosis (\u0026gt;50%), estimated stenosis percentage, plaque characterization, and confidence score in a standardized format.\u003c/p\u003e","description":"","filename":"FIGURE2.png","url":"https://assets-eu.researchsquare.com/files/rs-8997340/v1/d20bd86947af55e6ae238fc4.png"},{"id":104405393,"identity":"825bbd4a-8d23-400e-b925-4b35b97d7dcb","added_by":"auto","created_at":"2026-03-11 12:22:46","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":530646,"visible":true,"origin":"","legend":"\u003cp\u003eQuantitative Coronary Angiography (QCA) Analysis (Gold Standard). Examples of analyses performed with syngo QCA software (Siemens Healthineers), used as the reference method in the study. (A) Example of significant stenosis in the Left Anterior Descending Artery (LAD). The 60% diameter stenosis measured in the analysis meets the study's significant stenosis threshold (≥50%). (B) Example of non-obstructive disease in the Right Coronary Artery (RCA). The 30% diameter stenosis measured in the analysis was not considered significant.\u003c/p\u003e","description":"","filename":"FIGURE3.png","url":"https://assets-eu.researchsquare.com/files/rs-8997340/v1/93e62b12401c9e9474066103.png"},{"id":104781597,"identity":"1b36a84d-5251-4682-8bcd-f0d70e5ba2cb","added_by":"auto","created_at":"2026-03-17 07:55:59","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1639013,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8997340/v1/7ab6c963-83cd-451b-940a-1a9d034d5de8.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Diagnostic Performance of Expert Physicians Versus General-Purpose Artificial Intelligence Using Standardized Static Coronary CT Images: A Dual-Reference Validation","fulltext":[{"header":"1. INTRODUCTION","content":"\u003cp\u003eCoronary artery disease (CAD) remains the leading cause of morbidity and mortality globally [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e]. Due to its ability to provide non-invasive imaging of coronary atherosclerosis and stenosis, recent guidelines have established Coronary CT Angiography (CCTA) as the \"first-line test\" [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e, \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e, \u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e] and the \"cornerstone\" of diagnostic management [\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e]. Beyond merely identifying lumen stenosis, CCTA provides critical data regarding coronary anatomy and plaque characterization [\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e, \u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e]. Currently, the interpretation of these images relies heavily on the expertise of specialized cardiologists and radiologists [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e]. However, as reported in previous studies, visual assessment of CCTA is time-consuming, highly dependent on reader experience, and hallmarked by significant interobserver variability [\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e, \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e, \u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e, \u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eTo address these limitations, artificial intelligence (AI) technologies are increasingly being integrated into cardiovascular imaging [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e]. \"Narrow\" AI tools specifically trained for CCTA\u0026mdash;such as Cleerly and HeartFlow\u0026mdash;have successfully entered clinical workflows, offering automated vessel segmentation and stenosis grading [\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e, \u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e, \u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e]. Extensive clinical trials have demonstrated that these purpose-built platforms enhance diagnostic consistency and reduce reading times.\u003c/p\u003e \u003cp\u003eIn contrast, the role of general-purpose multimodal artificial intelligence (GP-AI) like ChatGPT, Gemini, Grok, and Claude in medical imaging remains a subject of intense debate. The vast majority of existing studies have evaluated these models solely on text-based tasks [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e]. The only notable study testing visual diagnostic capability was limited to pediatric chest radiographs, where models performed at a \"chance level,\" failing to demonstrate consistent radiological reasoning [\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eTo date, there is virtually no data rigorously evaluating the ability of these multimodal models to interpret coronary angiography. This study aims to bridge that specific gap. Moving beyond text-based benchmarking, we directly tested the visual diagnostic performance of four frontier GP-AI models (Gemini 2.5, Grok 4, Claude 3.5 Sonnet, and ChatGPT-4o). In our study design, both the AI models and the expert physician consensus were presented with the exact same standardized static images (curved multiplanar reconstruction (cMPR)). We then evaluated the diagnostic accuracy of their interpretations against two rigorous reference standards: the full clinical CCTA reports and invasive quantitative coronary angiography (QCA).\u003c/p\u003e"},{"header":"2. MATERIALS AND METHODS","content":"\u003cp\u003e2.1. Study Design and Study Population This study was designed as a retrospective diagnostic accuracy investigation conducted at a single tertiary referral center between May 2022 and December 2024. Consecutive patients who underwent Coronary Computed Tomography Angiography (CCTA) for clinical indications and subsequently received invasive coronary angiography (ICA) within 30 days were screened for inclusion (n\u0026thinsp;=\u0026thinsp;66). To ensure a methodologically rigorous evaluation focused exclusively on native coronary artery anatomy, patients with a history of prior coronary stent implantation or coronary artery bypass grafting (CABG) were excluded (n\u0026thinsp;=\u0026thinsp;3). Following the application of these exclusion criteria, a total of 63 patients constituted the final study cohort. All procedures were performed in accordance with the principles of the Declaration of Helsinki. This study was approved by Baskent University Institutional Review Board (Project no: KA25/407) and supported by Baskent University Research Fund. Given the retrospective nature of the study, the requirement for written informed consent was waived by the institutional review board.\u003c/p\u003e\u003cp\u003e2.2. Imaging Protocol and Coronary Computed Tomography Angiography Acquisition. To create a standardized visual dataset for a fair comparison between human experts and GP-AI models, representative 2D cMPR keyframes were selected by a dedicated 'preparation team' of two cardiovascular imaging specialists. This team was unblinded, with full access to patients' complete 3D CCTA volumetric datasets, clinical histories, original radiology reports, and reference invasive coronary angiography (ICA) results. Their objective was to deliberately isolate the optimal static frame demonstrating the most severe lesion or characteristic pathology for each vessel.\u003c/p\u003e\u003cp\u003eSubsequently, these optimized static images were evaluated by two independent, blinded expert readers. Having no access to the clinical context, 3D CCTA volumes, or ICA results, these physicians reviewed the images under strictly identical, constrained visual conditions as the GP-AI models. This two-stage design effectively eliminated clinical and spatial context as confounding variables\u003c/p\u003e \u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003e2.3. Image Standardization and Selection of Representative Keyframes\u003c/h2\u003e \u003cp\u003eTo enable a fair and objective comparison between human experts and general-purpose artificial intelligence (GP-AI) models, a 'Representative Keyframe Selection Protocol' was implemented to create a standardized visual dataset (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). In the preparatory phase of this process, a dedicated preparation team consisting of a Level-3 cardiovascular radiologist and a Level-2 cardiologist reviewed the scans, providing full (unblinded) access to all 3D CCTA volumetric datasets, clinical histories, original radiology reports, and reference invasive coronary angiography (ICA) results of the patients.\u003c/p\u003e \u003cp\u003eThe primary goal of this team was to intentionally isolate two high-resolution static cMPR frames that optimally showed the most severe lesion or characteristic pathology for each major epicardial coronary artery (LAD, LCx, and RCA). These selected images (JPEG) were completely anonymized by removing all patient identifiers and radiological markings.\u003c/p\u003e \u003cp\u003eThe main diagnostic evaluation phase of the study was performed in a blinded manner by two independent expert readers (a radiologist and a cardiologist). These physicians, having no access to the patients' clinical context, 3D CCTA volumes, or reference ICA results, evaluated the images under completely identical and restricted visual conditions, just like the GP-AI models. This two-stage design successfully eliminated the influence of clinical and spatial context as confounding variables and ensured that the pure visual recognition capabilities of both human experts and GP-AI models were tested against a standard reference.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec4\" class=\"Section2\"\u003e \u003ch2\u003e2.4. GP-AI Models and Evaluation Methodology\u003c/h2\u003e \u003cp\u003eIn this study, four frontier general-purpose multimodal artificial intelligence (GP-AI) models were evaluated using their most advanced versions available as of February 2026: GPT-4o (OpenAI), Gemini 2.5 (Google), Claude 3.5 Sonnet (Anthropic), and Grok 4 (xAI) [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e, \u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e, \u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e]. All models were accessed through their official web-based interfaces using default generation parameters. To ensure methodological consistency, minimize stochastic variability, and address reproducibility concerns, each image was processed in a new and independent chat session with only a single iteration to prevent cross-contamination of data (zero-shot evaluation).\u003c/p\u003e \u003cp\u003eFor each major epicardial coronary artery, both GP-AI models and blinded expert readers were required to report predefined diagnostic parameters based on the standardized image sets \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e\u003cb\u003e)\u003c/b\u003e. These parameters included: (i) the presence or absence of obstructive coronary stenosis (defined as \u0026ge;\u0026thinsp;50% luminal narrowing relative to reference vessel diameter), (ii) the estimated percentage of luminal narrowing, and (iii) the morphological classification of detected atherosclerotic plaques as calcified, non-calcified (soft), or mixed.\u003c/p\u003e \u003cp\u003e2.5. Reference Standards\u003c/p\u003e \u003cp\u003eTo avoid subjective interpretation in the assessment of coronary artery lumen stenosis, Quantitative Coronary Angiography (QCA) was established as the reference standard. Invasive coronary angiography procedures were performed using the Siemens Artis zee angiography system. All angiographic images were analyzed by an independent interventional cardiologist blinded to CCTA and AI findings using syngo QCA software (Siemens Healthineers, Erlangen, Germany). In accordance with standard validation protocols [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e], an automated edge-detection algorithm was used to determine arterial contours. Calibration was achieved using catheter tip contrast filling as a reference scaling tool. Minimal lumen diameter (MLD) and reference vessel diameter (RVD) were measured from the end-diastolic square, indicating the most severe stenosis and the least foreshortening (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e). A stenosis threshold of \u0026ge;\u0026thinsp;50% relative to the reference vessel diameter was defined as hemodynamically significant obstructive coronary artery disease. For plaque characterization, considering the limitations of conventional angiography in differentiating plaque components, expert physician consensus, validated by an experienced cardiologist and radiologist with full access to the patients' volumetric CCTA datasets, was accepted as the reference standard.\u003c/p\u003e \u003cp\u003e2.6. Statistical Analysis The diagnostic performance of artificial intelligence models and human readers was evaluated by calculating sensitivity, specificity, positive predictive value, and negative predictive value. Inter-method agreement for categorical variables was assessed using Cohen\u0026rsquo;s Kappa (κ) coefficient, interpreted according to the criteria established by Landis and Koch [\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e]. All statistical analyses were performed using SPSS version 26.0 (IBM Corp., Armonk, NY, USA), and a two-sided p-value of \u0026lt;\u0026thinsp;0.05 was considered statistically significant.\u003c/p\u003e \u003c/div\u003e"},{"header":"3. RESULTS","content":"\u003cdiv id=\"Sec6\" class=\"Section2\"\u003e \u003ch2\u003e3.1. Baseline Patient Characteristics\u003c/h2\u003e \u003cp\u003eThe final study cohort comprised 63 patients with a mean age of \u003cspan\u003e$\u003c/span\u003e60.25 \\pm 10.59\u003cspan\u003e$\u003c/span\u003e years. The population demonstrated a high prevalence of cardiovascular risk factors: hypertension was present in 58.7% of patients, diabetes mellitus in 38.1%, and a family history of coronary artery disease in 68.3%. Based on the reference invasive coronary angiography (ICA) results, obstructive coronary artery disease (CAD) was identified in 55.6% of the cohort, reflecting a significant disease burden. Regarding clinical management following ICA, 27 patients (42.9%) underwent percutaneous coronary intervention (PCI), while 8 patients (12.7%) were referred for coronary artery bypass grafting (CABG) (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e).\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eBaseline characteristics of the study cohort\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"2\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCharacteristic\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eValue\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAge, years (Mean\u0026thinsp;\u0026plusmn;\u0026thinsp;SD)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e60.25\u0026thinsp;\u0026plusmn;\u0026thinsp;10.59\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSex (Male), n (%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e41 (65.1%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eBMI, kg/m\u0026sup2; (Median)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e27.12\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eHypertension, n (%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e37 (58.7%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDiabetes Mellitus, n (%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e24 (38.1%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eFamily History of CAD, n (%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e43 (68.3%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eNo Obstructive CAD (\u0026lt;\u0026thinsp;50% stenosis), n (%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e28 (44.4%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eObstructive CAD (\u0026amp;ge;50% stenosis), n (%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e35 (55.6%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMedical Therapy, n (%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e28 (44.4%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ePCI, n (%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e27 (42.9%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCABG, n (%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e8 (12.7%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec7\" class=\"Section2\"\u003e \u003ch2\u003e3.2. Reliability of Human Expert Consensus\u003c/h2\u003e \u003cp\u003eInter-reader reliability among human observers was first evaluated. For the detection of significant coronary stenosis, inter-observer agreement was generally moderate across the main epicardial coronary arteries. Cohen\u0026rsquo;s Kappa values ranged from 0.477 to 0.519 for the left anterior descending artery (LAD), left circumflex artery (LCx), and right coronary artery (RCA) (Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e). In contrast, agreement for the left main coronary artery (LMCA) was markedly lower (κ\u0026thinsp;=\u0026thinsp;0.096). The restriction of assessment to predefined static image frames may reduce spatial and contextual information, increasing variability related to blooming artifacts and partial volume effects. Based on these findings, a third-reader adjudication process was applied to establish the final expert consensus reference.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eInter-Observer Agreement (Physician 1 vs. Physician 2)\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"4\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eVessel\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eKappa (p)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eICC (95% CI)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003ep-value\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLMCA\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.096 (0.029)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.553 (0.354\u0026ndash;0.705)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLAD\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.501 (\u0026lt;\u0026thinsp;0.001)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.647 (0.475\u0026ndash;0.771)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCX\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.519 (\u0026lt;\u0026thinsp;0.001)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.622 (0.442\u0026ndash;0.754)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRCA\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.477 (\u0026lt;\u0026thinsp;0.001)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.790 (0.674\u0026ndash;0.868)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003e3.3. Diagnostic Accuracy for Detection of Significant (\u0026gt;\u0026thinsp;50%) Stenosis\u003c/h2\u003e \u003cp\u003eWhen compared with invasive quantitative coronary angiography (QCA) as the reference standard, the Expert Physician Consensus demonstrated consistently high diagnostic performance across all coronary territories. Agreement with QCA was strong, with Cohen\u0026rsquo;s Kappa values ranging from 0.774 to 0.933 (all p\u0026thinsp;\u0026lt;\u0026thinsp;0.001), accompanied by high sensitivity and specificity, generally exceeding 90%. These findings indicate that experienced readers maintain close concordance with invasive reference measurements despite reliance on standardized static image inputs.\u003c/p\u003e \u003cp\u003eIn contrast, all evaluated GP-AI models exhibited substantial diagnostic limitations. Notably, none of the GP-AI models achieved statistically significant agreement with QCA in the LAD or LMCA, the most clinically critical coronary segments. For example, in the LAD, Gemini 2.5 achieved a Kappa value of only 0.160 (p\u0026thinsp;=\u0026thinsp;0.153), while ChatGPT-4o demonstrated virtually no agreement (kappa\u0026thinsp;=\u0026thinsp;0.004, p\u0026thinsp;=\u0026thinsp;0.963).\u003c/p\u003e \u003cp\u003eAlthough marginal statistical significance was observed for selected models in the LCx and RCA, overall diagnostic accuracy remained below thresholds required for clinical applicability. For example, in the LAD, Gemini 2.5 achieved a Kappa value of only 0.160 (p\u0026thinsp;=\u0026thinsp;0.153), while ChatGPT-4o demonstrated virtually no agreement (κ\u0026thinsp;=\u0026thinsp;0.004). Detailed diagnostic performance metrics for each coronary artery are provided in Tables\u0026nbsp;\u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e3\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab3\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 3\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eDiagnostic Performance for LMCA-LAD-CX-RCA Stenosis (\u0026gt;\u0026thinsp;50%) vs. QCA\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"8\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c8\" colnum=\"8\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eEvaluator\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eKappa (κ)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003ep-value\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eSensitivity\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eSpecificity\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003ePPV\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003eNPV\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c8\"\u003e \u003cp\u003eAccuracy\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eGemini 2.5- LMCA\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e-0.016\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.897\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e98.4%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e98.4%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e96.8%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eGrok 4- LMCA\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.000\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e1.000\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e100%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e100%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e100%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eClaude 3.5- LMCA\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e-0.025\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.820\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e95.1%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e98.3%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e93.5%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eChatGPT-4o- LMCA\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.000\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e1.000\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e100%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e100%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e100%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eConsensus- LMCA\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e1.000\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e100%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e100%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e100%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e100%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e100%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eGemini 2.5- LAD\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.160\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.153\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e78.6%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e38.2%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e51.2%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e68.4%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e56.5%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eGrok 4- LAD\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.109\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.290\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e82.1%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e29.4%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e48.9%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e66.7%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e53.2%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eClaude 3.5- LAD\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.064\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.573\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e71.4%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e35.3%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e47.6%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e60.0%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e51.6%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eChatGPT-4o- LAD\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.004\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.963\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e85.7%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e14.7%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e45.3%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e55.6%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e46.8%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eConsensus- LAD\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.774\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e92.9%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e85.3%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e83.9%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e93.5%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e87.1%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eGemini 2.5- CX\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.342\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.007\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e50.0%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e82.9%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e58.8%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e77.3%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e72.1%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eGrok 4- CX\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.037\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.771\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e42.9%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e61.0%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e36.0%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e67.6%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e54.8%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eClaude 3.5- CX\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.136\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.285\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e42.9%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e70.7%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e42.9%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e70.7%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e61.3%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eChatGPT-4o- CX\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.218\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.060\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e71.4%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e53.7%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e44.1%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e78.6%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e59.7%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eConsensus- CX\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.789\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e90.5%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e90.2%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e82.6%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e94.9%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e90.5%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eGemini 2.5- RCA\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.338\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.008\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e64.0%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e70.3%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e59.3%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e74.3%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e67.7%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eGrok 4- RCA\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.245\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.042\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e72.0%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e54.1%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e51.4%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e74.1%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e61.3%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eClaude 3.5- RCA\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.214\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.090\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e48.0%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e73.0%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e54.5%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e67.5%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e62.9%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eChatGPT-4o- RCA\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.281\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.025\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e64.0%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e64.9%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e55.2%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e72.7%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e64.5%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eConsensus- RCA\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.933\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e96.0%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e97.3%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e96.0%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e97.3%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e96.8%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec9\" class=\"Section2\"\u003e \u003ch2\u003e3.4. Continuous Stenosis Assessment and GP-AI Plaque Characterization\u003c/h2\u003e \u003cp\u003eAnalysis of continuous stenosis severity revealed a pronounced performance gap between expert readers and GP-AI models. The Expert Consensus showed high concordance with QCA-derived stenosis grades across all coronary arteries. In contrast, the majority of GP-AI models failed to demonstrate meaningful continuous correlation with invasive reference measurements.\u003c/p\u003e \u003cp\u003eGemini 2.5 represented a partial exception, achieving moderate intraclass correlation in the right coronary artery (ICC\u0026thinsp;=\u0026thinsp;0.515, p\u0026thinsp;\u0026lt;\u0026thinsp;0.001). This finding constitutes the only instance of statistically significant continuous agreement between a GP-AI model and QCA observed in the present study. Nevertheless, no GP-AI model demonstrated consistent or clinically acceptable continuous agreement across more than one coronary territory. Detailed ICC results are presented in Table\u0026nbsp;\u003cspan refid=\"Tab4\" class=\"InternalRef\"\u003e4\u003c/span\u003e.\u003c/p\u003e \u003cp\u003eWith respect to plaque characterization, performance across all GP-AI models was uniformly limited. Agreement with the Expert Consensus regarding plaque morphology (calcified, non-calcified, or mixed) was largely non-significant. Only Gemini 2.5 demonstrated slight-to-fair agreement in the LCx and RCA territories; however, these levels of concordance remain insufficient for reliable clinical interpretation. Plaque characterization results are summarized in Table\u0026nbsp;\u003cspan refid=\"Tab5\" class=\"InternalRef\"\u003e5\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab4\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 4\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eAgreement Between AI Models and QCA-Derived Stenosis Grades\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"11\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c8\" colnum=\"8\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c9\" colnum=\"9\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c10\" colnum=\"10\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c11\" colnum=\"11\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eParameter\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eGemini ICC (95% CI)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003ep\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eGrok ICC (95% CI)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003ep\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eClaude ICC (95% CI)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003ep\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c8\"\u003e \u003cp\u003eChatGPT ICC (95% CI)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c9\"\u003e \u003cp\u003ep\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c10\"\u003e \u003cp\u003eConsensus ICC (95% CI)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c11\"\u003e \u003cp\u003ep\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eQCA LMCA grade\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.05 (\u0026minus;\u0026thinsp;0.20 to 0.295)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.348\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.00 (\u0026minus;\u0026thinsp;0.248 to 0.248)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.500\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.269 (0.022 to 0.485)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.017\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.206 (\u0026minus;\u0026thinsp;0.044 to 0.432)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e0.053\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e0.612 (0.43 to 0.747)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eQCA LAD grade\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.159 (\u0026minus;\u0026thinsp;0.092 to 0.392)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.106\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.187 (\u0026minus;\u0026thinsp;0.064 to 0.416)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.071\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.091 (\u0026minus;\u0026thinsp;0.16 to 0.332)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.239\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.111 (\u0026minus;\u0026thinsp;0.141 to 0.349)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e0.194\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e0.908 (0.853 to 0.944)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eQCA CX grade\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.401 (0.168 to 0.592)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.001\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.143 (\u0026minus;\u0026thinsp;0.109 to 0.378)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.131\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.274 (0.027 to 0.488)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.015\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.141 (\u0026minus;\u0026thinsp;0.111 to 0.375)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e0.136\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e0.847 (0.758 to 0.905)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eQCA RCA grade\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.515 (0.306 to 0.677)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.327 (0.086 to 0.532)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.004\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.235 (\u0026minus;\u0026thinsp;0.013 to 0.457)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.032\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.305 (0.062 to 0.514)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e0.007\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e0.942 (0.906 to 0.965)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003ctfoot\u003e \u003ctr\u003e\u003ctd colspan=\"11\"\u003eICC: Intraclass Correlation Coefficient (95% Confidence Interval)\u003c/td\u003e\u003c/tr\u003e \u003c/tfoot\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab5\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 5\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eAgreement on Plaque Characterization (Kappa)\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"4\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eVessel\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAI Model\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eKappa (κ)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eAgreement Level\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLMCA\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAll Models\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eNon-significant\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eNone\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLAD\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAll Models\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eNon-significant\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eNone\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCX\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eGemini 2.5\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.150 (p\u0026thinsp;=\u0026thinsp;0.025)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eSlight\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRCA\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eGemini 2.5\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.203 (p\u0026thinsp;=\u0026thinsp;0.009)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eFair\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e"},{"header":"4. DISCUSSION","content":"\u003cp\u003eThis study provides a critical objective clinical evaluation revealing the true diagnostic limitations of general-purpose multimodal artificial intelligence (GP-AI) models in complex cardiovascular imaging tasks. While these models excel in text-based and semantic reasoning, they exhibit significant deficiencies in areas requiring advanced visual-anatomical interpretation, such as coronary CT angiography (CCTA). Our findings reveal a significant \u0026ldquo;semantic\u0026ndash;visual gap,\u0026rdquo; defining the fundamental divergence between linguistic intelligence and reliable radiological assessment [\u003cspan additionalcitationids=\"CR16\" citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eA critical methodological element in interpreting this study is that both human experts and GP-AI models performed assessments on identical standardized static JPEG images. Neither group was provided with the full volumetric CCTA datasets during the assessment process. Instead, two high-resolution curved MPR frames, intentionally selected to depict the most severe lesions or characteristic anatomy, were used for each major epicardial coronary artery. Thus, the comparison between human and artificial intelligence was performed on a fully symmetrical and methodologically fair basis in terms of data input.\u003c/p\u003e \u003cp\u003eEvaluations were based on two separate reference standards. Plaque morphology was compared with expert physician consensus based on all volumetric CCTA data; the degree of luminal stenosis and the presence of \u003cspan\u003e$\u003c/span\u003e\\ge 50\\%\u003cspan\u003e$\u003c/span\u003e obstructive disease were analyzed based on quantitative coronary angiography (QCA) results. This dual-reference approach provides a sound validation basis for the biological and technical nature of plaque characterization and hemodynamically significant stenosis detection [\u003cspan additionalcitationids=\"CR19\" citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eThe significant performance difference between GP-AI models and human experts is largely related to the diagnostic challenges inherent in the static two-dimensional imaging approach. The moderate-to-low inter-observer agreement observed among human experts\u0026mdash;specifically the markedly low agreement in the LMCA (\u003cspan\u003e$\u003c/span\u003e\\kappa\u0026thinsp;=\u0026thinsp;0.096\u003cspan\u003e$\u003c/span\u003e)\u0026mdash;highlights the intrinsic difficulty of this task. This low concordance in the LMCA is likely due to its short anatomical length, the relatively lower prevalence of isolated obstructive disease in this segment, and its high susceptibility to artifacts in static frames. These factors create a 'diagnostic blind spot' when volumetric scrolling is unavailable, explicitly demonstrating that static-image CCTA assessment is inherently constrained, even for highly trained specialists.\u003c/p\u003e \u003cp\u003eConsequently, it is entirely unsurprising that GP-AI models, which lack both contextual clinical knowledge and full three-dimensional spatial reasoning, failed significantly in the LMCA and LAD. These 'small-target' segments require advanced anatomical contextual inference that current multimodal architectures lack. The models' inability to achieve statistical significance in these prognostically critical territories\u0026mdash;often misinterpreting vascular overlaps, blooming artifacts, or calcium shadows as stenosis\u0026mdash;represents a major barrier to their clinical reliability and increases the risk of severe misinterpretations [\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e, \u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e, \u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eWithin this general picture of failure, the limited performance improvement observed in the right coronary artery (RCA) constitutes a notable exception. The more isolated anatomical course and relatively lower artifact load of the RCA may have allowed some models, particularly Gemini 2.5, to partially capture the change in stenosis severity on a continuous scale. However, the discrepancy between continuous correlation and binary classification success reveals that the models struggle to define clinically meaningful decision limits and exhibit a tendency toward systematic overestimation.\u003c/p\u003e \u003cp\u003eFinally, while the GP-AI models used (GPT-4o, Gemini 2.5, Claude 3.5 Sonnet, and Grok 4) were the most current versions as of February 2026, the rapid evolution of AI technologies means that more advanced versions are quickly introduced. However, since this study aimed to reveal the fundamental competency limitations of general-purpose architectures in tasks requiring expert-level radiological evaluation, the findings retain their conceptual and methodological validity.\u003c/p\u003e \u003cp\u003eLimitations\u003c/p\u003e \u003cp\u003eOur study design carries specific inherent limitations that must be explicitly acknowledged. First, the unblinded selection of key frames by the preparation team introduces a potential selection bias, as it intentionally guides the presented images toward the most severe lesion. While this does not reflect real-world volumetric CCTA reading where the clinician must actively search for pathology, it was a necessary methodological compromise to establish a standardized ground truth for evaluating pure visual diagnostic capacity. Second, because this study evaluates decision-making within a constrained framework utilizing only a few standardized static images per vessel rather than actual clinical volumetric interpretation, extrapolation of these findings to real-world, unconstrained clinical practice should be interpreted with caution.\u003c/p\u003e"},{"header":"5. CONCLUSION","content":"\u003cp\u003eThis study demonstrates that consensus interpretation generated by experienced physicians exhibits strong agreement with invasive quantitative coronary angiography (QCA) when using a\u0026thinsp;\u0026ge;\u0026thinsp;50% stenosis threshold and remains the most reliable reference approach in coronary CT angiography (CCTA) assessment.\u003c/p\u003e \u003cp\u003eIn contrast, current frontier GP-AI models have shown significant limitations in terms of visual diagnostic performance. Despite their known competencies in text-based medical tasks, these systems have failed to achieve a consistent and clinically acceptable level of accuracy when applied to static two-dimensional CCTA images, particularly in anatomically and prognostically critical coronary segments. These findings indicate that direct transfer of general visual reasoning capabilities to highly specialized cardiovascular imaging tasks is not feasible under current conditions.\u003c/p\u003e \u003cp\u003eTherefore, at the current stage, it is not medically appropriate for GP-AI systems to replace physicians in interpreting CCTA results or for patients to directly consult these systems for diagnostic decisions. CCTA assessment currently requires the clinical experience, anatomical knowledge, and contextual interpretation skills of expert cardiologists and radiologists.\u003c/p\u003e \u003cp\u003eThe limited performance signal observed with the Gemini 2.5 model in the right coronary artery (RCA) suggests that clinically relevant feature recognition capacity may be partially present in some pioneering architectures. However, this finding is insufficient to support the independent clinical use of GP-AI models, suggesting that the identified shortcomings are related to a lack of domain adaptation and input representation.\u003c/p\u003e \u003cp\u003eIn conclusion, current GP-AI models are not suitable for the independent clinical interpretation of coronary CT angiography. Importantly, this limitation applies specifically to broad, non-domain-specific architectures and does not detract from the proven clinical value of dedicated, task-specific cardiovascular AI platforms. Their potential as supportive tools in the future can only be achieved through the development of architectures incorporating domain-specific targeted training, radiological physics, and three-dimensional spatial relationships, and their use under expert physician supervision.\u003c/p\u003e \u003cp\u003eDeclaration of Generative AI and AI-assisted technologies in the writing process During the preparation of this work the authors used Google Gemini in order to assist with language editing, formatting, and citation placement. After using this tool/service, the authors reviewed and edited the content as needed and take full responsibility for the content of the published article.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eAn example of a Curved Multi-Planar Reformat (Curved MPR) image of the Right Coronary Artery (RCA) from two standardized sequences. This technique straightens the tortuous vessel along the centerline to clearly visualize the vessel trajectory and plaque burden and facilitate lumen assessment.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eThis structured command set was sent to all artificial intelligence models evaluated in the study. This template enabled the models to analyze visual data and report the presence of significant stenosis (\u0026gt;\u0026thinsp;50%), estimated stenosis percentage, plaque characterization, and confidence score in a standardized format.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e"},{"header":"Declarations","content":" \u003ch2\u003eCompeting Interests\u003c/h2\u003e \u003cp\u003eThe authors declare no competing interests.\u003c/p\u003e \u003c/p\u003e\u003ch2\u003eFunding\u003c/h2\u003e \u003cp\u003eThis research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.\u003c/p\u003e\u003ch2\u003eAuthor Contribution\u003c/h2\u003e\u003cp\u003eS.O. conceptualized the study, developed the methodology, conducted the AI evaluations and formal statistical analysis, and wrote the original draft. Z.G.B. and I.G.B. acquired the clinical data, curated the CCTA datasets, and performed the expert evaluations. G.E. and M.Y. provided critical supervision, validated the reference standards, and guided the clinical interpretation. All authors reviewed and approved the manuscript.\u003c/p\u003e\u003ch2\u003eAcknowledgement\u003c/h2\u003e\u003cp\u003eDeclaration of AI-assisted technologies in the writing process: \u0026gt; During the preparation of this work, the authors used generative AI (Gemini) to assist with language editing, formatting, and structural refinement. After using this tool, the authors thoroughly reviewed and edited the content as needed and take full responsibility for the content and scientific integrity of the published article. We have no other specific acknowledgements to declare.\u003c/p\u003e\u003ch2\u003eData Availability\u003c/h2\u003e\u003cp\u003eThe datasets generated and analyzed during the current study are not publicly available due to institutional restrictions regarding patient privacy. De-identified data are available from the corresponding author upon reasonable request and subject to institutional ethical approval.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eVrints CJM, Senior R, Crea F, et al. 2024 ESC Guidelines for the management of chronic coronary syndromes. \u003cem\u003eEur Heart J\u003c/em\u003e. 2024;45(36):3415\u0026ndash;3537.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKelion AD, Nicol ED. The rationale for the primacy of coronary CT angiography in the National Institute for Health and Care Excellence (NICE) guideline (CG95) for the investigation of chest pain of recent onset. \u003cem\u003eJ Cardiovasc Comput Tomogr\u003c/em\u003e. 2018;12:516\u0026ndash;22.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eConte E, Sala E. AI-assisted CCTA: supporting diagnosis across the CAD spectrum. \u003cem\u003eInt J Cardiovasc Imaging\u003c/em\u003e. 2025;41:825\u0026ndash;826.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eD\u0026rsquo;Costa Z, Karlsberg RP, Cho GW. Artificial-intelligence-assisted CCTA quantifies sex differences in coronary atherosclerotic burden at low atheroma volumes. \u003cem\u003eIJC Heart \u0026amp; Vasculature\u003c/em\u003e. 2025;60:101758.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLiao J, Huang L, Qu M, Chen B, Wang G. Artificial Intelligence in Coronary CT Angiography: Current Status and Future Prospects. \u003cem\u003eFront Cardiovasc Med\u003c/em\u003e. 2022;9:896366.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChen M, Wang X, Hao G, et al. Diagnostic performance of deep learning-based vascular extraction and stenosis detection technique for coronary artery disease. \u003cem\u003eBr J Radiol\u003c/em\u003e. 2020;93:20191028.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSarangi PK, Datta S, Panda BB, et al. Evaluating ChatGPT-4\u0026rsquo;s Performance in Identifying Radiological Anatomy in FRCR Part 1 Examination Questions. \u003cem\u003eIndian J Radiol Imaging\u003c/em\u003e. 2025;35:287\u0026ndash;294.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGillette J, Lu M, Heston TF. Large Language Models Perform at Chance Level in the Diagnosis of Pediatric Pneumonia Using Chest Radiographs. \u003cem\u003eCureus\u003c/em\u003e. 2025;17(9):e92596.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAbbara S, Blanke P, Maroules CD, et al. SCCT guidelines for the performance and acquisition of coronary computed tomographic angiography: A report of the society of Cardiovascular Computed Tomography Guidelines Committee. \u003cem\u003eJ Cardiovasc Comput Tomogr\u003c/em\u003e. 2016;10(6):435\u0026ndash;449.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGemini Team, Google. Gemini: A Family of Highly Capable Multimodal Models. \u003cem\u003earXiv preprint arXiv:2312.11805\u003c/em\u003e. 2023.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAnthropic. The Claude 3 Model Family: Opus, Sonnet, Haiku. \u003cem\u003eTechnical Report\u003c/em\u003e. 2024.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eOpenAI. GPT-4 Technical Report. \u003cem\u003earXiv preprint arXiv:2303.08774\u003c/em\u003e. 2023.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLandis JR, Koch GG. The measurement of observer agreement for categorical data. \u003cem\u003eBiometrics\u003c/em\u003e. 1977;33(1):159\u0026ndash;174.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eReiber JH, Serruys PW, Kooijman CJ, et al. Assessment of short-, medium-, and long-term variations in arterial dimensions from computer-assisted quantitation of coronary cineangiograms. \u003cem\u003eCirculation\u003c/em\u003e. 1985;71(2):280\u0026ndash;288.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTopol EJ. High-performance medicine: the convergence of human and artificial intelligence. \u003cem\u003eNat Med\u003c/em\u003e. 2019;25(1):44\u0026ndash;56.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDavenport T, Kalakota R. The potential for artificial intelligence in healthcare. \u003cem\u003eFuture Healthc J\u003c/em\u003e. 2019;6(2):94\u0026ndash;98.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLarge Language Models for disease diagnosis: a scoping review. \u003cem\u003eNPJ Digit Med\u003c/em\u003e. 2025;8:11.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHulten E, Villines TC, Cheezum MK, et al. The role of coronary CT angiography in the diagnosis and management of coronary artery disease. \u003cem\u003eJ Nucl Cardiol\u003c/em\u003e. 2017;24(5):1609\u0026ndash;1624.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKnuuti J, Wijns W, Saraste A, et al. 2019 ESC Guidelines for the diagnosis and management of chronic coronary syndromes. \u003cem\u003eEur Heart J\u003c/em\u003e. 2020;41(3):407\u0026ndash;477.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eClinical expert consensus document on quantitative coronary angiography. \u003cem\u003eCardiovasc Interv Ther\u003c/em\u003e. 2020;35(2):105\u0026ndash;116.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003evan Assen M, De Cecco CN, et al. Inter-observer variability of coronary artery calcium scoring and CCTA interpretation. \u003cem\u003eJ Cardiovasc Comput Tomogr\u003c/em\u003e. 2019;13(4):228\u0026ndash;233.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBudoff MJ, et al. Interobserver variability among expert readers quantifying plaque volume on coronary CT angiography. \u003cem\u003eJ Cardiovasc Comput Tomogr\u003c/em\u003e. 2022;16(6):501\u0026ndash;507.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePark SH. Artificial intelligence in radiology: practical issues and challenges. \u003cem\u003eRadiology\u003c/em\u003e. 2018;287(3):749\u0026ndash;772.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLarge Language Models in Medical Image Analysis: A Systematic Review. \u003cem\u003eBioengineering (Basel)\u003c/em\u003e. 2025;12(8):818.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMultimodal Large Language Models in Medical Imaging. \u003cem\u003eKorean J Radiol\u003c/em\u003e. 2025;26(9):843\u0026ndash;853.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBest Practices for the Safe Use of Large Language Models and Generative AI in Radiology. \u003cem\u003eRadiology\u003c/em\u003e. 2025;312(3):e241516.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eInterobserver variability of coronary stenosis characterization and its relation to plaque composition. \u003cem\u003eJACC Cardiovasc Imaging\u003c/em\u003e. 2024.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCathAI: fully automated coronary angiography interpretation and stenosis estimation. \u003cem\u003eNPJ Digit Med\u003c/em\u003e. 2023;6(1):142.\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Coronary CT Angiography (CCTA), Artificial Intelligence, Large Language Models (LLM), Quantitative Coronary Angiography (QCA), Diagnostic Accuracy, Machine Learning","lastPublishedDoi":"10.21203/rs.3.rs-8997340/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-8997340/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003ch2\u003eBackground\u003c/h2\u003e \u003cp\u003eCoronary CT angiography (CCTA) is a first-line diagnostic modality for coronary artery disease (CAD), yet its interpretation requires significant expert experience. Although general-purpose multimodal artificial intelligence (GP-AI) models have shown promise in text-based medical tasks, their visual diagnostic performance in evaluating complex CCTA data remains poorly defined.\u003c/p\u003e\u003ch2\u003eMethods\u003c/h2\u003e \u003cp\u003eThis single-center retrospective study included 63 patients (252 vessel-based image sets) who underwent both CCTA and invasive coronary angiography. Expert physician consensus and four frontier GP-AI models (GPT-4o, Gemini 2.5, Claude 3.5 Sonnet, and Grok 4) evaluated identical standardized static images using a zero-shot approach with default generation parameters. Obstructive disease was defined as \u0026ge;\u0026thinsp;50% luminal stenosis. Diagnostic performance was validated against expert consensus for plaque characterization and quantitative coronary angiography (QCA) for stenosis severity.\u003c/p\u003e\u003ch2\u003eResults\u003c/h2\u003e \u003cp\u003eExpert consensus demonstrated robust agreement with QCA across all coronary territories (kappa\u0026thinsp;=\u0026thinsp;0.774\u0026ndash;0.933, p\u0026thinsp;\u0026lt;\u0026thinsp;0.001). In contrast, a marked performance disparity was observed for the GP-AI models; none achieved statistically significant agreement with QCA in the prognostically critical left anterior descending (LAD) or left main coronary arteries (LMCA) (p\u0026thinsp;\u0026gt;\u0026thinsp;0.05). While Gemini 2.5 showed a moderate correlation in the right coronary artery (ICC\u0026thinsp;=\u0026thinsp;0.515), overall continuous stenosis assessment and plaque characterization remained uniformly limited and clinically unreliable across all models.\u003c/p\u003e\u003ch2\u003eConclusion\u003c/h2\u003e \u003cp\u003eExpert physician interpretation remains the reference standard for CCTA. Current frontier GP-AI models are not suitable for independent clinical interpretation of coronary imaging, particularly in anatomically complex segments. These findings emphasize that general visual reasoning cannot yet replace domain-specific cardiovascular AI solutions or expert clinical judgment in specialized radiological tasks.\u003c/p\u003e","manuscriptTitle":"Diagnostic Performance of Expert Physicians Versus General-Purpose Artificial Intelligence Using Standardized Static Coronary CT Images: A Dual-Reference Validation","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-03-10 16:32:43","doi":"10.21203/rs.3.rs-8997340/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"423536f1-d038-4c7b-bb27-322764fbd38d","owner":[],"postedDate":"March 10th, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":64000116,"name":"Health sciences/Cardiology"},{"id":64000117,"name":"Biological sciences/Computational biology and bioinformatics"},{"id":64000118,"name":"Health sciences/Diseases"},{"id":64000119,"name":"Health sciences/Medical research"}],"tags":[],"updatedAt":"2026-03-13T21:24:16+00:00","versionOfRecord":[],"versionCreatedAt":"2026-03-10 16:32:43","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-8997340","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-8997340","identity":"rs-8997340","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.