The use of Artificial Intelligence in Diagnosis of Thymic Cancer - Systematic review and Meta Analysis | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Systematic Review The use of Artificial Intelligence in Diagnosis of Thymic Cancer - Systematic review and Meta Analysis Daniel Rodrigo Serbena, Isabela Luiza Fraron Cieslack, Augusto Philippus Lack, and 4 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-7768386/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Background Thymomas are rare mediastinal tumors with broad clinical spectrum, making accurate diagnosis pivotal for treatment planning and prognostication. Conventional imaging and histopathological evaluation persist as the gold standard, however, recent advances in artificial intelligence (AI) have introduced innovative methodologies to enhance diagnostic precision, reproducibility, and efficiency. This systematic review aimed to evaluate the current evidence on the application of AI-based methods in the diagnosis of thymoma. Methods A systematic search was conducted across Pubmed, Web of science, Embase, Scopus, BioRxiv, IEEE Xplore, Digital Library ACM. Eligible studies included original research that investigated AI techniques—such as machine learning, deep learning, or radiomics—for diagnosing or classifying thymoma, based on imaging, pathology, or multimodal data. Data were extracted on AI methodology, diagnostic performance metrics, number of participants, and country of origin. Methodological quality was assessed using APPRAISE-AI. Results 26 studies met inclusion criteria. AI models outperformed radiologists and pathologists in all comparisons, although in some metric models were significantly better than medical professionals. For all outcomes, the top-performing models achieved an Area Under the Curve (AUC) close to 0.95, while mean performance values were comparatively lower. Conclusion AI models typically exhibit diagnostic performance equivalent to radiologists, showing incremental advantages in selected applications. The most favorable outcomes have been observed in differential diagnosis, followed by pathology and risk stratification, with deep learning demonstrating particular effectiveness in pathology. Nevertheless, further investigations incorporating diverse imaging modalities, deep learning approaches, and strategies aimed at augmenting medical professionals’ performance are still required. Machine Learning Deep Learning Thymoma Diagnosis Figures Figure 1 Figure 2 Figure 3 1 Introduction Thymomas and thymic carcinomas are rare epithelial tumors of the thymus gland and fall into the classification of: thymomas, thymic carcinomas, and thymic neuroendocrine tumors. Thymomas are generally indolent and are the most common, accounting for somewhere between 20–25% of mediastinal tumors, and up to ~ 50% of anterior mediastinal masses in some series. ( 1 , 2 ) Although frequently associated with autoimmune diseases, thymic carcinomas are most often diagnosed at Tumor–Node–Metastasis classification (TNM) stages III12, IVA, or IVB ( 3 ). Some cases of thymoma, however, can be detected earlier due to their co-ocurrence with myasthenia gravis ( 4 ). Artificial intelligence (AI), encompassing both machine learning and deep-learning techniques, has been increasingly applied in oncologic imaging, pathology, and clinical decision support. In thymic epithelial tumors, AI-based tools hold promise for earlier detection, automated tumor segmentation, and radiomic analysis of Computer Tomography (CT) or Positron Emission Tomography (PET) scans, which can help distinguish thymomas from thymic carcinomas and predict stage or recurrence risk. Integrating multi-omics data (genomics, transcriptomics, and imaging) with AI-driven analytics could improve prognostic modeling and treatment planning, potentially reducing costs and enhancing outcomes for patients with these rare cancers ( 5 , 6 ). AI is a rapidly evolving field without established techniques, therefore this study aims to contribute to the field by addressing the question “What role do Machine Learning-based models exert on the diagnostic process of Thymus Neoplasms?” This question is examined through a systematic review and meta-analysis of the literature 2 Methods A systematic review was performed according to PRISMA guidelines (The PRISMA 2020 statement: an updated guideline for reporting systematic reviews - PMC) and with a Prospero registration CRD42024602310. The Seven databases - Pubmed, Web of science, Embase, Scopus, BioRxiv, IEEE Xplore, Digital Library ACM - were searched on 6th of October of 2024. Mendeley and Rayyan were used for duplicate detection, and the latter Rayyan was used for title and abstract screening. Full searches can be accessed in Supplementary_Material_1. To outline the review’s search strategy, the PICO strategy ( 7 ) was used: ( 1 ) Population: Individuals with thymus (all kinds) neoplasms or databases with thymic (all kinds) neoplasms data. Imaging exams (CT, MRI and PET scans) of individuals with thymic neoplasms, and digital slides of human thymic neoplasms. ( 2 ) Intervention: Radiomics or Artificial Intelligence based on Machine Learning. ( 3 ) Control: Not all included studies report a comparator; however, when a comparator is available, any accepted reference standard is eligible (i.e., diagnosis obtained by the neuroradiologist, clinical follow up, histopathology from the final report, et cetera.). ( 4 ) Outcome: The performance of IA Models in predicting diagnosis in relation to thymus cancers. The measure of effect used will be AUC. Inclusion criteria for articles were: ( 1 ) - Human patients with any form of cancer originating in the thymus, datasets with the desired population will also be included; ( 2 ) Studies must use ML, and their subtypes, for model development. However, due to a significant number of studies mixing ML methods with radiomics models and Nomogram, these two will also be included; ( 3 ) The models must be trained with any form of imaging exams; ( 4 ) Models' predictions must be related to the diagnosis of the disease (eg. staging, risk stratification, differentiation from other diseases and malignancy/benign differentiation); ( 5 ) Studies must provide analysis of the models predictions in AUC. The exclusion criteria were: ( 1 ) Non journal articles; ( 2 ) Reviews; ( 3 ) Inaccessibility to articles. All stages of this systematic review occurred with two independent reviewers (S.D. and C.I.) and a third independent reviewer (P.A.) to resolve conflicts between the first two. Zotero was used to detect duplicates. Risk of bias was assessed with the tool APPRAISE-AI, which provides a scored, objective measure of study quality across six key domains: clinical relevance, data quality, methodological conduct, robustness of results, reporting quality, and reproducibility. It consists of 24 items, producing a total score out of 100, which can be interpreted on a five-tier scale ranging from very low quality (0–19) to very high quality (80–100) ( 7 ). In the meta analysis, articles were analysed by subgrouping according to outcome, models of different types within the same study were treated as independent. Subgroups with more than 3 articles were eligible for a meta-analysis. Meta-analysis was conducted in RStudio using the funnel function of the metafor package. The analyses were primarily performed with variables reported alongside their 95% Confidence Intervals (CI). For variables lacking 95% CIs, additional analyses were conducted by inferring estimates from the data extracted from the included articles. Furthermore, to calculate the difference between AI and radiologists performance, the Wescher's t test was conducted. Additionally, AI models were subdivided into Hybrid Nomogram (HN) and ML models, according to use of ML techniques either in feature reduction or model construction. Models that utilized ML techiniques only feature selection were classified as Emsembled Nomogram and models that utilized ML techniques in model construction were classified as ML models. Addionationaly, three main outcomes were analysed in the meta analysis of this systematic review: ( 1 ) Risk stratification; ( 2 ) Differential diagnosis, in other words, the differentiation of cancer in the thymus from other types of diseases; ( 3 ) Pathology. Finally, large language models such as ChatGPT and others were used for text revision and R code script adaptation. Addtionaly, AI-assisted tools were not used for content generation. 3 Results The search retrieved records from seven major databases: PubMed (100 articles), Web of Science (69 articles), Embase (166 articles), Scopus (55 articles), BioRxiv (48 articles), IEEE Xplore (5 articles), and the ACM Digital Library (4 articles), resulting in a total of 438 articles retrieved. Following the removal of 159 duplicates using Zotero, 281 unique records remained. These were then subjected to title and abstract screening, reducing the pool to 75 potentially relevant articles. All 75 underwent full-text screening, from which 25 studies were ultimately included in the final review, as represented in Fig. 1 . The full reading decisions of articles can be found in Supplementary_Material_2. The methodological quality assessment revealed that the majority of studies (17 studies; 68.0%) ( 8 – 24 ) were rated as high methodological quality. A smaller number of studies were labelled as moderate (7 studies; 28.0%) ( 25 – 28 ). Only one study (4.0%) ( 29 ) was classified as having very high. The full list of scores can be observed in Supplementary_Material_3. Within the subgroup of differential diagnosis, heterogeneity is moderate to high (I² = 59.1% for ML and 80.9% for HN). However, the overall pooled analysis across all studies yields an AUC of 0.89 (0.87–0.91), importantly, the test for subgroup differences shows no significant divergence (p = 0.6072). Within the subgroup of pathology, heterogeneity is also moderate to high (I² = 52.8% for ML and 95.7% for HN). However, the overall pooled analysis across all studies yields an AUC of 0.82 (0.78–0.86), importantly, the test for subgroup differences shows no significant divergence (p = 0.3304). Within the subgroup of risk stratification, heterogeneity is high (I² = 16.7% for ML and 92.3% for hybrid nomograms). However, the overall pooled analysis across all studies yields an AUC of 0.82 (0.79–0.84), importantly, the test for subgroup differences shows no significant divergence (p = 0.9952). Heterogeneity was significant (p < 0.05) in all cases, both within subgroups and across groups. However, as the review’s primary purpose was to synthesize available evidence rather than establish a single pooled effect, heterogeneity was expected. Within included studies, imaging modalities, on which the models did the prediction, were evaluated for their role in pathology identification, differential diagnosis, and risk stratification or staging. For pathology detection, MRI was used in 1/7 studies (14.3%) ( 30 ), while CT was applied in 6/7 studies (85.7%) ( 31 – 36 ). In differential diagnosis, PET and MRI were each used in 1/8 studies (12.5% each) ( 37 , 38 ), whereas CT was employed in 7/8 studies (87.5%) ( 37 , 39 – 44 ), one study utilized both PET and CT ( 37 ). For risk stratification, staging, and distinguishing between benign and malignant findings, CT was exclusively used in 100% of the studies. Overall, CT emerged as the predominant imaging modality across all outcome categories, with MRI and PET contributing minimally. Table 2 summarizes the meta-analysis of model performance across three main clinical objectives—pathology (P), differential diagnosis (DD), and risk stratification (RS)—comparing both traditional machine learning models (ML) and hybrid nomograms (EN). Included metrics include AUC, sensitivity, specificity, and accuracy. Objective AUC Sensitivity Specificity Accuracy P.ML 0.84 (0.8–0.88) 0.67 (0.58, 0.75) 0.83 (0.78, 0.88) 0.76 (0.74, 0.79) P.HN* 0.81 (0.75–0.86) 0.64 (0.58, 0.71) 0.76 (0.69, 0.83) 0.71 (0.64, 0.77) DD.ML* 0.88 (0.86–0.91) 0.84 (0.81, 0.87) 0.84 (0.82, 0.87) 0.84 (0.82, 0.86) DD.HN* 0.9 (0.86–0.93) 0.86 (0.76, 0.96) 0.85 (0.72, 0.98) 0.8 (0.69, 0.9) RS.ML 0.82 (0.79–0.85) 0.67 (0.61, 0.74) 0.78 (0.72, 0.84) 0.73 (0.71, 0.76) RS.HN 0.82 (0.8–0.84) 0.71 (0.59, 0.84) 0.84 (0.75, 0.93) 0.74 (0.66, 0.83) *External validation data. Table 2 - AUC, Sensitivity, Specificity and Accuracy meta analysis results by subgroups. AUC - Area Under the Curve; P - Pathology; DD - Differential Diagnosis; RS - Risk Stratification; ML - Machine Learning Model; HN - Hybrid Nomogram. The performance evaluation of the ML model showed promising results across different applications. For pathology prediction, the models achieved an NPV of 0.705 (95% CI: 0.628–0.782) and a PPV of 0.711 (95% CI: 0.656–0.765), indicating moderate discriminative ability. In the context of differential diagnosis, however, the model demonstrated stronger performance with a mean PPV of 0.904 (95% CI: 0.885–0.923), reflecting excellent predictive value, and a mean NPV of 0.76 (95% CI: 0.724–0.796), showing good overall reliability. Comparing the performance of radiologists with HN revealed no statistically significant differences in sensitivity (p = 0.731), specificity (p = 0.175), or accuracy (p = 0.356). Similarly, comparisons with machine learning models showed no significant differences in sensitivity (p = 0.548), specificity (p = 0.228), or accuracy (p = 0.212). In the pathology model performance graph (Fig. 3 A), deep learning ( 36 , 33 ) clearly outperformed all other machine learning techniques, achieving the highest mean AUC of 0.898. The next best model was random forest ( 34 ) with an AUC of 0.791, followed by KNN (0.767), GLM (0.760), and SVM (0.760) ( 35 ), all of which showed similar but noticeably lower performance compared to deep learning. For differential diagnosis model performance (Fig. 3 B), CNN ( 43 ) achieved the top performance with a mean AUC of 0.896, closely followed by Adaboost (0.8675) and MLP (0.8645). SVM (0.863) and LightGBM (0.863) ( 42 ) also performed well, while logistic regression ( 45 , 46 ) and KNN achieved moderately lower results. Decision trees ( 38 ) were the weakest performers in this category, with an AUC of just 0.678, showing a significant gap compared to the leading models. In the risk stratification model performance graph (Fig. 3 C), KNN ( 46 , 47 ) achieved the highest mean AUC of 0.887, with MLP (0.867) and LDA (0.856) following closely ( 46 ). Several other models, including SVM, SGD, and Naive Bayes variants ( 46 , 47 ) also performed relatively well, ranging between 0.831 and 0.844. In contrast, decision trees ( 46 , 47 ) again showed the lowest performance, with a mean AUC of 0.555, reinforcing their consistent underperformance across all three tasks. For pathology (P.ML and P.EN), performance is moderate, with AUCs of 0.84 for ML and 0.81 for EN. Both show relatively high specificity (around 0.76–0.83), but sensitivity is lower (0.64–0.67). Accuracy remains modest (0.71–0.76), indicating that pathology prediction is still a challenging task with moderate diagnostic reliability. In differential diagnosis (DD.ML and DD.EN), results are notably stronger. Machine learning models ( 38 , 43 – 45 ) achieve an AUC of 0.88, with high sensitivity (0.84) and balanced specificity (0.84), leading to a high accuracy of 0.84. HN perform even better, with an AUC of 0.90, sensitivity of 0.86, and specificity of 0.85, though accuracy (0.80) shows slightly wider variability. For risk stratification (RS.ML and RS.EN), performance is weaker than in differential diagnosis but still clinically useful. Both ML and EN models ( 46 – 52 ) show AUCs around 0.82, with relatively low sensitivity (0.67 for ML and 0.71 for EN), but moderate specificity (0.78–0.84), with Accuracy is in the 0.73–0.74 range. The results across the three graphs, in Fig. 3 , illustrate how different modeling strategies perform depending on the clinical objective. In pathology (Fig. 3 .A), deep learning–based models ( 33 , 36 ) clearly outperform traditional approaches. Conventional models such as Random Forest, SVM, GLM, and KNN ( 34 , 35 ) still achieve moderate predictive power but remain well below the performance of deep learning signatures, underscoring the added value of advanced feature representation in this domain. For diagnosis differential (Fig. 3 .B), performance is more evenly distributed across models, with several traditional algorithms—including KNN, MLP, LDA, and SVM ( 44 , 45 ) —achieving strong results in the 0.85–0.89 range. HN ( 37 , 39 – 41 ) and boosting methods like XGBoost, GDBT, and Bernoulli NB ( 44 , 45 ) also provide competitive accuracy, while simpler methods such as Logistic Regression, Gaussian NB, and Decision Tree ( 38 , 44 , 45 ) show weaker performance, with DT in particular lagging behind significantly. In risk stratification (Fig. 3 .C), results are consistently high across all models. The Combined nomogram ( 48 , 51 ) achieves the best performance at 0.945, closely followed by Adaboost and LightGBM ( 46 , 52 ). Even the lowest-performing models, such as KNN and MLP ( 46 , 47 ), maintain strong results above 0.84. 4 Discussion 4.1 Main Findings The meta-analysis shows that model performance varies by outcome. Differential diagnosis achieves the best results, especially with HN (AUC up to 0.90, high sensitivity and specificity) ( 37 , 39 – 41 ), making it the most reliable task. Pathology models ( 30 – 36 ) perform moderately (AUC ~ 0.81–0.84) but are limited by low sensitivity. Risk stratification models ( 46 – 52 ) show consistent but weaker performance (AUC ~ 0.82, accuracy ~ 0.73–0.74), with specificity stronger than sensitivity. Overall, HN enhance performance, particularly in differential diagnosis, while pathology and risk stratification remain more challenging tasks. In pathology, machine learning models (P.ML) ( 33 – 36 ) achieve a slightly higher AUC (0.84) than HN (P.EN: 0.81) ( 30 – 33 , 36 ), but both approaches show low sensitivity (~ 0.64–0.67). This means ML alone can capture patterns reasonably well, but neither technique excels at detecting true positives. In differential diagnosis, both ML ( 38 , 43 – 45 ) and HN ( 37 , 39 – 41 ) very well, with ML reaching AUC 0.88 and HN improving further to 0.90. Here, ML already provides strong diagnostic accuracy (0.84), but HN approaches add robustness, especially in sensitivity and specificity. In risk stratification, ML ( 46 , 47 , 52 ) and HN ( 48 – 51 ) perform similarly (AUC ~ 0.82), though ML shows slightly lower accuracy (0.73) compared to HN (0.74). Both approaches are limited by relatively low sensitivity, but HN again provide a marginal improvement. 4.2 Comparison with similar research A meta-analysis of 83 studies on the validation of generative AI compared to the diagnostic performance of physicians revealed a diagnostic accuracy of 52.1%, which is relatively low considering general clinical outcomes. AI models and physician performance showed no statistically significant differences (p = 0.10), even when compared to non-specialist physicians (p = 0.93). However, AI models performed worse than specialist physicians (p = 0.007), which highlights limitations in the face of advanced clinical knowledge ( 53 ). A meta-analysis of 19 articles compared the diagnostic ability of AI and clinicians with different levels of expertise in dermatological cancers. It showed that AI had a sensitivity of 0.87 and specificity of 0. 771, surpassing general practitioners (SEN 0.7978, SPE 0.736) and equaling specialist dermatologists (SEN 0.842, SPE 0.744) ( 54 ) A systematic review examined the diagnostic accuracy of AI in digital pathology images and found an average sensitivity across all studies of 0.963 (95%CI 0.941–0.977) and an average specificity of 0.933 (CI95% 0.905–0.954). Gastrointestinal pathology, with the largest sample size of 14 AI models, demonstrates strong performance with a mean sensitivity of 0.93 and specificity of 0.94. Uropathology also excels, with 8 models achieving high mean sensitivity (0.95) and specificity (0.96). Breast pathology, represented by 8 models, shows slightly lower metrics (0.83 sensitivity, 0.88 specificity). Notably, neuropathology stands out with perfect sensitivity ( 1 ) and high specificity (0.95), though it is based on only one model. Conversely, cardiothoracic and head & neck pathology exhibit high sensitivities (0.98 each) but lower specificities (0.76 and 0.72, respectively) ( 55 ). Furthermore, in staging and RF model ( 10.1371/journal.pone.02614010 ) ( 34 ) achieved an AUC of 0.838 (0.669–0.934), 0.749 (0.406–0.936) for sensitivity, 0.751 (0.608–0.857) for specificity, 0.775 (0.635–0.854) for accuracy, 0.75 (0.608–0.834). A study evaluating radiomics signatures of computed tomography for predicting risk categorization and clinical stage of thymomas ( 56 ) compared the performance of models based on non-contrast-enhanced CT (NECT), contrast-enhanced CT (CECT), and radiologist assessment. The NECT-based model achieved an AUC of 0.829 (95% CI: 0.757–0.900), with sensitivity, specificity, and accuracy of 0.712, 0.806, and 0.819, respectively. The CECT-based model performed slightly better, with an AUC of 0.860 (95% CI: 0.803–0.917), sensitivity of 0.699, specificity of 0.889, and accuracy of 0.869. In comparison, radiologist assessment yielded a lower accuracy of 0.779, with sensitivity and specificity not reported. Taken together, these findings show that while pathology prediction is most enhanced by deep learning, disease differentiation requires careful model selection with HN or neural methods offering advantages, and risk stratification appears to be a well-defined task where most models can achieve robust and clinically useful performance. 4.3 Implications and actions needed Further research utilizing imaging modalities beyond computed tomography is needed to enhance the generalizability of current findings. Additionally, more attention should be given to augmentation strategies in which artificial intelligence works alongside radiologists to optimize diagnostic accuracy and clinical decision-making. Abbreviations AI — Artificial Intelligence; ML — Machine Learning; LR — Logistic Regression. KNN — k-Nearest Neighbors; SVM — Support Vector Machine; RF — Random Forest; DT — Decision Tree; MLP — Multi-Layer Perceptron; LDA — Linear Discriminant Analysis; SGD — Stochastic Gradient Descent Classifier; GDBT — Gradient Boosted Decision Trees; XGBoost — eXtreme Gradient Boosting; LightGBM — Light Gradient Boosting Machine; Adaboost — Adaptive Boosting; NB (Bernoulli/Gaussian) — Naïve Bayes (Bernoulli distribution / Gaussian distribution); Linear SVC — Linear Support Vector Classifier; DL — Deep Learning; DLR_Sig — Deep Learning Radiomics Signature (using Logistic Regression); DLRN — Deep Learning–based Radiomic Nomogram (Combined features) GLM — Generalized Linear Model; UECT — Unenhanced Computed Tomography; CECT — Contrast-Enhanced Computed Tomography; MRI — Magnetic Resonance Imaging; PET — Positron Emission Tomography; CT — Computed Tomography. Declarations 5 Conflict of Interest The authors declare no conflicts of interest. 6 Author Contributions Daniel Rodrigo Serbena - Conceptualization, Methodology, Data curation, Formal analysis, Writing – original draft; Isabela Luiza Fraron Cieslack - Data curation, Writing – original draft; Augusto Philippus Lack - Data curation; Gabriela Vitoria Fraron Cieslack - Data curation; Weber Claudio Francisco Nunes da Silva - Writing – review & editing; Roberta Fabbri - Writing – review & editing; Juliana Sartori Bonini - Study supervision. 7 Funding None References Alqaidy D, Thymoma (2023) Overv Diagnostics 13(18):2982 Robinson SP, Akhondi H Thymoma. In: StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2025 [cited 2025 Sep 29]. Available from: http://www.ncbi.nlm.nih.gov/books/NBK559291/ Roden AC, Ahmad U, Cardillo G, Girard N, Jain D, Marom EM et al (2022) Thymic Carcinomas—A Concise Multidisciplinary Update on Recent Developments From the Thymic Carcinoma Working Group of the International Thymic Malignancy Interest Group. J Thorac Oncol 17(5):637–650 Falkson CB, Bezjak A, Darling G, Gregg R, Malthaner R, Maziak DE et al (2009) The Management of Thymoma: A Systematic Review and Practice Guideline. J Thorac Oncol 4(7):911–919 Bajwa J, Munir U, Nori A, Williams B (2021) Artificial intelligence in healthcare: transforming the practice of medicine. Future Healthc J 8(2):e188–e194 Pang J, Xiu W, Ma X (2023) Application of Artificial Intelligence in the Diagnosis, Treatment, and Prognostic Evaluation of Mediastinal Malignant Tumors. JCM 12(8):2818 Kwong JCC, Khondker A, Lajkosz K, McDermott MBA, Frigola XB, McCradden MD et al (2023) APPRAISE-AI Tool for Quantitative Evaluation of AI Studies for Clinical Decision Support. JAMA Netw Open 6(9):e2335377 Wang X, Sun W, Liang H, Mao X, Lu Z (2019) Radiomics Signatures of Computed Tomography Imaging for Predicting Risk Categorization and Clinical Stage of Thymomas. Biomed Res Int 2019:1–10 Blüthgen C, Patella M, Euler A, Baessler B, Martini K, Von Spiczak J et al (2021) Computed tomography radiomics for the prediction of thymic epithelial tumor histology, TNM stage and myasthenia gravis. Al-Kadi OS. editor PLoS ONE 16(12):e0261401 Liu W, Wang W, Guo M, Zhang H (2024) Tumor habitat and peritumoral region evolution–based imaging features to assess risk categorization of thymomas. Clin Radiol 79(9):e1117–e1125 Gao C, Yang L, Xu Y, Wang T, Ding H, Gao X et al (2024) Differentiating low-risk thymomas from high-risk thymomas: preoperative radiomics nomogram based on contrast enhanced CT to minimize unnecessary invasive thoracotomy. BMC Med Imaging 24(1):197 Liang Z, Li J, Tang Y, Zhang Y, Chen C, Li S et al (2024) Predicting the risk category of thymoma with machine learning-based computed tomography radiomics signatures and their between-imaging phase differences. Sci Rep 14(1):19215 Zhou H, Bai HX, Jiao Z, Cui B, Wu J, Zheng H et al (2023) Deep learning-based radiomic nomogram to predict risk categorization of thymic epithelial tumors: A multicenter study. Eur J Radiol 168:111136 Chen X, Feng B, Xu K, Chen Y, Duan X, Jin Z et al (2023) Development and validation of a deep learning radiomics nomogram for preoperatively differentiating thymic epithelial tumor histologic subtypes. Eur Radiol 33(10):6804–6816 Liu W, Wang W, Zhang H, Guo M, Xu Y, Liu X (2023) Development and Validation of Multi-Omics Thymoma Risk Classification Model Based on Transfer Learning. J Digit Imaging 36(5):2015–2024 Shen Q, Shan Y, Xu W, Hu G, Chen W, Feng Z et al (2021) Risk stratification of thymic epithelial tumors by using a nomogram combined with radiomic features and TNM staging. Eur Radiol 31(1):423–435 Ai J, Wang Z, Ai S, Li H, Gao H, Shi G et al (2025) Development and Validation of a CT-Radiomics Nomogram for the Diagnosis of Small Prevascular Mediastinal Nodules: Reducing Nontherapeutic Surgeries. Acad Radiol 32(1):506–517 Huang X, Wang X, Liu Y, Wang Z, Li S, Kuang P (2024) Contrast-enhanced CT-based radiomics differentiate anterior mediastinum lymphoma from thymoma without myasthenia gravis and calcification. Clin Radiol 79(4):e500–e510 Yang Y, Cheng J, Peng Z, Yi L, Lin Z, He A et al (2024) Development and Validation of Contrast-Enhanced CT-Based Deep Transfer Learning and Combined Clinical-Radiomics Model to Discriminate Thymomas and Thymic Cysts: A Multicenter Study. Acad Radiol 31(4):1615–1628 Li J, Cui N, Jiang Z, Li W, Liu W, Wang S et al (2023) Differentiating thymic epithelial tumors from mediastinal lymphomas: preoperative nomograms based on PET/CT radiomic features to minimize unnecessary anterior mediastinal surgery. J Cancer Res Clin Oncol 149(15):14101–14112 He W, Xia C, Chen X, Yu J, Liu J, Pu H et al (2022) Computed Tomography-Based Radiomics for Differentiation of Thymic Epithelial Tumors and Lymphomas in Anterior Mediastinum. Front Oncol 12:869982 Zhang C, Yang Q, Lin F, Ma H, Zhang H, Zhang R et al (2021) CT-Based Radiomics Nomogram for Differentiation of Anterior Mediastinal Thymic Cyst From Thymic Epithelial Tumor. Front Oncol 11:744021 Liu L, Lu F, Pang P, Shao G (2020) Can computed tomography-based radiomics potentially discriminate between anterior mediastinal cysts and type B1 and B2 thymomas? BioMed. Eng OnLine 19(1):89 Xiao G, Hu YC, Ren JL, Qin P, Han JC, Qu XY et al (2021) MR imaging of thymomas: a combined radiomics nomogram to predict histologic subtypes. Eur Radiol 31(1):447–457 Mahmoudi S, Gruenewald LD, Eichler K, Martin SS, Booz C, Bernatz S et al (2023) Advanced biomedical imaging for accurate discrimination and prognostication of mediastinal masses. Eur J Clin Invest 53(12):e14075 Lin CY, Yen YT, Huang LT, Chen TY, Liu YS, Tang SY et al (2022) An MRI-Based Clinical-Perfusion Model Predicts Pathological Subtypes of Prevascular Mediastinal Tumors. Diagnostics 12(4):889 Ohira R, Yanagawa M, Suzuki Y, Hata A, Miyata T, Kikuchi N et al (2022) CT-based radiomics analysis for differentiation between thymoma and thymic carcinoma. J Thorac Dis 14(5):1342–1352 Tian D, Yan HJ, Shiiya H, Sato M, Shinozaki-Ushiku A, Nakajima J (2023) Machine learning-based radiomic computed tomography phenotyping of thymic epithelial tumors: Predicting pathological and survival outcomes. J Thorac Cardiovasc Surg 165(2):502–516e9 Yang Y, Cheng J, Chen L, Cui C, Liu S, Zuo M (2024) Application of machine learning for the differentiation of thymomas and thymic cysts using deep transfer learning: A multi-center comparison of diagnostic performance based on different dimensional models. Thorac Cancer 15(31):2235–2247 Xiao G, Hu YC, Ren JL, Qin P, Han JC, Qu XY et al (2021) MR imaging of thymomas: a combined radiomics nomogram to predict histologic subtypes. Eur Radiol 31(1):447–457 Tian D, Yan HJ, Shiiya H, Sato M, Shinozaki-Ushiku A, Nakajima J (2023) Machine learning-based radiomic computed tomography phenotyping of thymic epithelial tumors: Predicting pathological and survival outcomes. J Thorac Cardiovasc Surg 165(2):502–516e9 Liu L, Lu F, Pang P, Shao G (2020) Can computed tomography-based radiomics potentially discriminate between anterior mediastinal cysts and type B1 and B2 thymomas? BioMed. Eng OnLine 19(1):89 Chen X, Feng B, Xu K, Chen Y, Duan X, Jin Z et al (2023) Development and validation of a deep learning radiomics nomogram for preoperatively differentiating thymic epithelial tumor histologic subtypes. Eur Radiol 33(10):6804–6816 Blüthgen C, Patella M, Euler A, Baessler B, Martini K, Von Spiczak J et al (2021) Computed tomography radiomics for the prediction of thymic epithelial tumor histology, TNM stage and myasthenia gravis. Al-Kadi OS. editor PLoS ONE 16(12):e0261401 Hu J, Zhao Y, Li M, Liu Y, Wang F, Weng Q et al (2020) Machine-learning-based computed tomography radiomic analysis for histologic subtype classification of thymic epithelial tumours. Eur J Radiol 126:108929 Zhou H, Bai HX, Jiao Z, Cui B, Wu J, Zheng H et al (2023) Deep learning-based radiomic nomogram to predict risk categorization of thymic epithelial tumors: A multicenter study. Eur J Radiol 168:111136 Li J, Cui N, Jiang Z, Li W, Liu W, Wang S et al (2023) Differentiating thymic epithelial tumors from mediastinal lymphomas: preoperative nomograms based on PET/CT radiomic features to minimize unnecessary anterior mediastinal surgery. J Cancer Res Clin Oncol 149(15):14101–14112 Lin CY, Yen YT, Huang LT, Chen TY, Liu YS, Tang SY et al (2022) An MRI-Based Clinical-Perfusion Model Predicts Pathological Subtypes of Prevascular Mediastinal Tumors. Diagnostics 12(4):889 He W, Xia C, Chen X, Yu J, Liu J, Pu H et al (2022) Computed Tomography-Based Radiomics for Differentiation of Thymic Epithelial Tumors and Lymphomas in Anterior Mediastinum. Front Oncol 12:869982 Zhang C, Yang Q, Lin F, Ma H, Zhang H, Zhang R et al (2021) CT-Based Radiomics Nomogram for Differentiation of Anterior Mediastinal Thymic Cyst From Thymic Epithelial Tumor. Front Oncol 11:744021 Ai J, Wang Z, Ai S, Li H, Gao H, Shi G et al (2025) Development and Validation of a CT-Radiomics Nomogram for the Diagnosis of Small Prevascular Mediastinal Nodules: Reducing Nontherapeutic Surgeries. Acad Radiol 32(1):506–517 Yang Y, Cheng J, Peng Z, Yi L, Lin Z, He A et al (2024) Development and Validation of Contrast-Enhanced CT-Based Deep Transfer Learning and Combined Clinical-Radiomics Model to Discriminate Thymomas and Thymic Cysts: A Multicenter Study. Acad Radiol 31(4):1615–1628 Yang Y, Cheng J, Chen L, Cui C, Liu S, Zuo M (2024) Application of machine learning for the differentiation of thymomas and thymic cysts using deep transfer learning: A multi-center comparison of diagnostic performance based on different dimensional models. Thorac Cancer 15(31):2235–2247 Huang X, Wang X, Liu Y, Wang Z, Li S, Kuang P (2024) Contrast-enhanced CT-based radiomics differentiate anterior mediastinum lymphoma from thymoma without myasthenia gravis and calcification. Clin Radiol 79(4):e500–e510 Yang Y, Cheng J, Peng Z, Yi L, Lin Z, He A et al (2024) Development and Validation of Contrast-Enhanced CT-Based Deep Transfer Learning and Combined Clinical-Radiomics Model to Discriminate Thymomas and Thymic Cysts: A Multicenter Study. Acad Radiol 31(4):1615–1628 Feng XL, Wang SZ, Chen HH, Huang YX, Xin YK, Zhang T et al (2022) Optimizing the radiomics-machine-learning model based on non-contrast enhanced CT for the simplified risk categorization of thymic epithelial tumors: A large cohort retrospective study. Lung Cancer 166:150–160 Kayi Cangir A, Orhan K, Kahya Y, Özakıncı H, Kazak BB, Konuk Balcı BM et al (2021) CT imaging-based machine learning model: a potential modality for predicting low-risk and high-risk groups of thymoma: Impact of surgical modality choice. World J Surg Onc 19(1):147 Liang Z, Li J, Tang Y, Zhang Y, Chen C, Li S et al (2024) Predicting the risk category of thymoma with machine learning-based computed tomography radiomics signatures and their between-imaging phase differences. Sci Rep 14(1):19215 Shen Q, Shan Y, Xu W, Hu G, Chen W, Feng Z et al (2021) Risk stratification of thymic epithelial tumors by using a nomogram combined with radiomic features and TNM staging. Eur Radiol 31(1):423–435 Gao C, Yang L, Xu Y, Wang T, Ding H, Gao X et al (2024) Differentiating low-risk thymomas from high-risk thymomas: preoperative radiomics nomogram based on contrast enhanced CT to minimize unnecessary invasive thoracotomy. BMC Med Imaging 24(1):197 Liu W, Wang W, Zhang H, Guo M, Xu Y, Liu X (2023) Development and Validation of Multi-Omics Thymoma Risk Classification Model Based on Transfer Learning. J Digit Imaging 36(5):2015–2024 Liu W, Wang W, Guo M, Zhang H (2024) Tumor habitat and peritumoral region evolution–based imaging features to assess risk categorization of thymomas. Clin Radiol 79(9):e1117–e1125 Takita H, Kabata D, Walston SL, Tatekawa H, Saito K, Tsujimoto Y et al (2025) A systematic review and meta-analysis of diagnostic performance comparison between generative AI and physicians. npj Digit Med 8(1):175 Salinas MP, Sepúlveda J, Hidalgo L, Peirano D, Morel M, Uribe P et al (2024) Author Correction: A systematic review and meta-analysis of artificial intelligence versus clinicians for skin cancer diagnosis. npj Digit Med 7(1):141 McGenity C, Clarke EL, Jennings C, Matthews G, Cartlidge C, Freduah-Agyemang H et al (2024) Artificial intelligence in digital pathology: a systematic review and meta-analysis of diagnostic test accuracy. npj Digit Med 7(1):114 Wang X, Sun W, Liang H, Mao X, Lu Z (2019) Radiomics Signatures of Computed Tomography Imaging for Predicting Risk Categorization and Clinical Stage of Thymomas. Biomed Res Int 2019:1–10 Additional Declarations The authors declare no competing interests. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-7768386","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Systematic Review","associatedPublications":[],"authors":[{"id":523927205,"identity":"56ef7d6a-552d-440f-92ec-0148ed0583d0","order_by":0,"name":"Daniel Rodrigo Serbena","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABFklEQVRIiWNgGAWjYHACNgh1AIQKGBj4QZyEAqK1GDAwSDaAtBgQqYUBpMUAxsAF5Nt7nz34uMOOge/42YMHPhgw5BufX5344YEBgzy/2AGsWgzOHDc3nHkmmUHyTF7CwRkGDJbbbrzdLAF0mOHM2QnYtUiksUnztjED3ZNjcJjHgMHA7MbZDSAtCQa3sWuRn/+MTfpvWz2Dwfk3Bof/ALUYzzi7+Qc+LQw32NikGdsOMxjcANoC9LWBAX/vNry2GJxJYzfsbTvOI3njjcHBHgMJA4kbvNssEgwkcPpFvv0Y24OfbdVyfOdzjD/8qLAx4O8/u/kmkCHPL43DYVDAA6UlgCgByiAe8B8gRfUoGAWjYBSMAAAAXbFeXzdkaHcAAAAASUVORK5CYII=","orcid":"","institution":"Universidade Estadual do Centro Oeste","correspondingAuthor":true,"prefix":"","firstName":"Daniel","middleName":"Rodrigo","lastName":"Serbena","suffix":""},{"id":523927206,"identity":"f384576f-d3ae-4db5-b4db-991d2d785ff3","order_by":1,"name":"Isabela Luiza Fraron Cieslack","email":"","orcid":"","institution":"Universidade Estadual do Centro Oeste","correspondingAuthor":false,"prefix":"","firstName":"Isabela","middleName":"Luiza Fraron","lastName":"Cieslack","suffix":""},{"id":523927207,"identity":"705c35ae-1ca8-45d7-9cbd-e01fe2a25849","order_by":2,"name":"Augusto Philippus Lack","email":"","orcid":"","institution":"Universidade Estadual do Centro Oeste","correspondingAuthor":false,"prefix":"","firstName":"Augusto","middleName":"Philippus","lastName":"Lack","suffix":""},{"id":523927208,"identity":"11c4cd4f-4863-4dcc-b20e-09fd0a734b38","order_by":3,"name":"Gabriel Vitória Fraron Cieslack","email":"","orcid":"","institution":"Centro Universitario Campo Real","correspondingAuthor":false,"prefix":"","firstName":"Gabriel","middleName":"Vitória Fraron","lastName":"Cieslack","suffix":""},{"id":523927209,"identity":"c3941450-2cf5-4f69-b64c-86ee69ed9dd8","order_by":4,"name":"Weber Claudio Francisco Nunes da Silva","email":"","orcid":"","institution":"Universidade Estadual do Centro Oeste","correspondingAuthor":false,"prefix":"","firstName":"Weber","middleName":"Claudio Francisco Nunes da","lastName":"Silva","suffix":""},{"id":523927210,"identity":"c08eabd5-924f-4459-863f-18bdbbd7406a","order_by":5,"name":"Roberta Fabbri","email":"","orcid":"","institution":"Universidade Estadual do Centro Oeste","correspondingAuthor":false,"prefix":"","firstName":"Roberta","middleName":"","lastName":"Fabbri","suffix":""},{"id":523927211,"identity":"2d86bd3b-14c0-40c3-9220-764b0a6f06f0","order_by":6,"name":"Juliana Sartori Bonini","email":"","orcid":"","institution":"Universidade Estadual do Centro Oeste","correspondingAuthor":false,"prefix":"","firstName":"Juliana","middleName":"Sartori","lastName":"Bonini","suffix":""}],"badges":[],"createdAt":"2025-10-02 17:57:13","currentVersionCode":1,"declarations":{"humanSubjects":false,"vertebrateSubjects":false,"conflictsOfInterestStatement":false,"humanSubjectEthicalGuidelines":false,"humanSubjectConsent":false,"humanSubjectClinicalTrial":false,"humanSubjectCaseReport":false,"vertebrateSubjectEthicalGuidelines":false},"doi":"10.21203/rs.3.rs-7768386/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-7768386/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":92845691,"identity":"33608d1c-11fd-41ba-9234-e22b48cb9014","added_by":"auto","created_at":"2025-10-06 09:33:05","extension":"docx","order_by":0,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":410405,"visible":true,"origin":"","legend":"","description":"","filename":"FrontiersTemplate.docx","url":"https://assets-eu.researchsquare.com/files/rs-7768386/v1/839a483e22816b69fef39af2.docx"},{"id":92844886,"identity":"450ed403-0fe9-47fe-b1a9-64e98c0c20cc","added_by":"auto","created_at":"2025-10-06 09:25:06","extension":"json","order_by":1,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":342,"visible":true,"origin":"","legend":"","description":"","filename":"rs7768386.json","url":"https://assets-eu.researchsquare.com/files/rs-7768386/v1/eeba065a394cd5944ddaf98f.json"},{"id":92844877,"identity":"9d876bb9-246a-4614-ab16-385189744441","added_by":"auto","created_at":"2025-10-06 09:25:05","extension":"xml","order_by":2,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":128316,"visible":true,"origin":"","legend":"","description":"","filename":"rs77683860enriched.xml","url":"https://assets-eu.researchsquare.com/files/rs-7768386/v1/31fc9a3edcde00d00cadc1eb.xml"},{"id":92844879,"identity":"8fcc0056-042c-4970-9330-9bffce2d70b8","added_by":"auto","created_at":"2025-10-06 09:25:05","extension":"png","order_by":6,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":25125,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-7768386/v1/9e84af977740443ae6baf39b.png"},{"id":92844902,"identity":"8ec8406a-db98-4da8-ab09-aaf258e0b0cb","added_by":"auto","created_at":"2025-10-06 09:25:08","extension":"png","order_by":7,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":194786,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-7768386/v1/4b8171e0c13ecdae565976fb.png"},{"id":92844900,"identity":"c3713238-657b-4bb5-af5e-0ae026e273d8","added_by":"auto","created_at":"2025-10-06 09:25:08","extension":"png","order_by":8,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":37582,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-7768386/v1/caf9af29a304b26bf56f8041.png"},{"id":92844889,"identity":"5812c30a-5d7f-4e46-b203-8c31eb282c5e","added_by":"auto","created_at":"2025-10-06 09:25:06","extension":"xml","order_by":9,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":126798,"visible":true,"origin":"","legend":"","description":"","filename":"rs77683860structuring.xml","url":"https://assets-eu.researchsquare.com/files/rs-7768386/v1/d0993b8f8119cd91322cd5a9.xml"},{"id":92844885,"identity":"9b9aa966-0604-4060-9503-50e3fc2806cc","added_by":"auto","created_at":"2025-10-06 09:25:06","extension":"html","order_by":10,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":135751,"visible":true,"origin":"","legend":"","description":"","filename":"earlyproof.html","url":"https://assets-eu.researchsquare.com/files/rs-7768386/v1/de3f9cfdf544dbf42b923321.html"},{"id":92844890,"identity":"1c59ade9-0eff-4350-a136-32e0670fc1a8","added_by":"auto","created_at":"2025-10-06 09:25:07","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":164014,"visible":true,"origin":"","legend":"\u003cp\u003eFluxogram of search process.\u003c/p\u003e","description":"","filename":"floatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-7768386/v1/184577063a5dce76939324d2.png"},{"id":92844876,"identity":"b57bc1ac-f8d6-4fbc-9f4b-ff60a21ba947","added_by":"auto","created_at":"2025-10-06 09:25:05","extension":"jpeg","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":888343,"visible":true,"origin":"","legend":"\u003cp\u003eMeta analysis' forest plots of Artificial Intelligence models predicting (A) Differential Diagnosis; (B) Pathology; (C) Risk stratification.\u003c/p\u003e","description":"","filename":"floatimage2.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-7768386/v1/51eb92e1fc4d58a67f92d60b.jpeg"},{"id":92844875,"identity":"e47e4185-4285-48bd-b82f-ad9b36fd4897","added_by":"auto","created_at":"2025-10-06 09:25:04","extension":"jpeg","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":176536,"visible":true,"origin":"","legend":"\u003cp\u003eRanking of machine learning techniques in descending order by outcome: (A) Pathology; (B) Differential Diagnosis, (C) Risk Stratification. LR — Logistic Regression. KNN — k-Nearest Neighbors; SVM — Support Vector Machine; RF — Random Forest; DT — Decision Tree; MLP — Multi-Layer Perceptron; LDA — Linear Discriminant Analysis; SGD — Stochastic Gradient Descent Classifier; GDBT — Gradient Boosted Decision Trees; XGBoost — eXtreme Gradient Boosting; LightGBM — Light Gradient Boosting Machine; Adaboost — Adaptive Boosting; NB (Bernoulli/Gaussian) — Naïve Bayes (Bernoulli distribution / Gaussian distribution); Linear SVC — Linear Support Vector Classifier; DL — Deep Learning; DLR_Sig — Deep Learning Radiomics Signature (using Logistic Regression); DLRN — Deep Learning–based Radiomic Nomogram (Combined features) GLM — Generalized Linear Model; UECT — Unenhanced Computed Tomography; CECT — Contrast-Enhanced Computed Tomography.\u003c/p\u003e","description":"","filename":"floatimage3.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-7768386/v1/b162c907a05cf9fe3ff31eed.jpeg"},{"id":92845695,"identity":"439dc41b-3db4-489d-b898-5c792d1868e9","added_by":"auto","created_at":"2025-10-06 09:33:10","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1771432,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7768386/v1/0a46ce65-7ea2-4338-8166-fc4804dc1123.pdf"}],"financialInterests":"The authors declare no competing interests.","formattedTitle":"\u003cp\u003e\u003cstrong\u003eThe use of Artificial Intelligence in Diagnosis of Thymic Cancer - Systematic review and Meta Analysis\u003c/strong\u003e\u003c/p\u003e","fulltext":[{"header":"1 Introduction","content":"\u003cp\u003eThymomas and thymic carcinomas are rare epithelial tumors of the thymus gland and fall into the classification of: thymomas, thymic carcinomas, and thymic neuroendocrine tumors. Thymomas are generally indolent and are the most common, accounting for somewhere between 20\u0026ndash;25% of mediastinal tumors, and up to ~\u0026thinsp;50% of anterior mediastinal masses in some series. (\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e, \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e)\u003c/p\u003e\u003cp\u003eAlthough frequently associated with autoimmune diseases, thymic carcinomas are most often diagnosed at Tumor\u0026ndash;Node\u0026ndash;Metastasis classification (TNM) stages III12, IVA, or IVB (\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e). Some cases of thymoma, however, can be detected earlier due to their co-ocurrence with myasthenia gravis (\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e).\u003c/p\u003e\u003cp\u003eArtificial intelligence (AI), encompassing both machine learning and deep-learning techniques, has been increasingly applied in oncologic imaging, pathology, and clinical decision support. In thymic epithelial tumors, AI-based tools hold promise for earlier detection, automated tumor segmentation, and radiomic analysis of Computer Tomography (CT) or Positron Emission Tomography (PET) scans, which can help distinguish thymomas from thymic carcinomas and predict stage or recurrence risk. Integrating multi-omics data (genomics, transcriptomics, and imaging) with AI-driven analytics could improve prognostic modeling and treatment planning, potentially reducing costs and enhancing outcomes for patients with these rare cancers (\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e, \u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e).\u003c/p\u003e\u003cp\u003eAI is a rapidly evolving field without established techniques, therefore this study aims to contribute to the field by addressing the question \u0026ldquo;What role do Machine Learning-based models exert on the diagnostic process of Thymus Neoplasms?\u0026rdquo; This question is examined through a systematic review and meta-analysis of the literature\u003c/p\u003e"},{"header":"2 Methods","content":"\u003cp\u003eA systematic review was performed according to PRISMA guidelines (The PRISMA 2020 statement: an updated guideline for reporting systematic reviews - PMC) and with a Prospero registration CRD42024602310. The Seven databases - Pubmed, Web of science, Embase, Scopus, BioRxiv, IEEE Xplore, Digital Library ACM - were searched on 6th of October of 2024. Mendeley and Rayyan were used for duplicate detection, and the latter Rayyan was used for title and abstract screening. Full searches can be accessed in Supplementary_Material_1.\u003c/p\u003e\u003cp\u003eTo outline the review\u0026rsquo;s search strategy, the PICO strategy (\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e) was used: (\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e) Population: Individuals with thymus (all kinds) neoplasms or databases with thymic (all kinds) neoplasms data. Imaging exams (CT, MRI and PET scans) of individuals with thymic neoplasms, and digital slides of human thymic neoplasms. (\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e) Intervention: Radiomics or Artificial Intelligence based on Machine Learning. (\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e) Control: Not all included studies report a comparator; however, when a comparator is available, any accepted reference standard is eligible (i.e., diagnosis obtained by the neuroradiologist, clinical follow up, histopathology from the final report, et cetera.). (\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e) Outcome: The performance of IA Models in predicting diagnosis in relation to thymus cancers. The measure of effect used will be AUC.\u003c/p\u003e\u003cp\u003eInclusion criteria for articles were: (\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e) - Human patients with any form of cancer originating in the thymus, datasets with the desired population will also be included; (\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e) Studies must use ML, and their subtypes, for model development. However, due to a significant number of studies mixing ML methods with radiomics models and Nomogram, these two will also be included; (\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e) The models must be trained with any form of imaging exams; (\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e) Models' predictions must be related to the diagnosis of the disease (eg. staging, risk stratification, differentiation from other diseases and malignancy/benign differentiation); (\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e) Studies must provide analysis of the models predictions in AUC. The exclusion criteria were: (\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e) Non journal articles; (\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e) Reviews; (\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e) Inaccessibility to articles.\u003c/p\u003e\u003cp\u003eAll stages of this systematic review occurred with two independent reviewers (S.D. and C.I.) and a third independent reviewer (P.A.) to resolve conflicts between the first two. Zotero was used to detect duplicates.\u003c/p\u003e\u003cp\u003eRisk of bias was assessed with the tool APPRAISE-AI, which provides a scored, objective measure of study quality across six key domains: clinical relevance, data quality, methodological conduct, robustness of results, reporting quality, and reproducibility. It consists of 24 items, producing a total score out of 100, which can be interpreted on a five-tier scale ranging from very low quality (0\u0026ndash;19) to very high quality (80\u0026ndash;100) (\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e).\u003c/p\u003e\u003cp\u003eIn the meta analysis, articles were analysed by subgrouping according to outcome, models of different types within the same study were treated as independent. Subgroups with more than 3 articles were eligible for a meta-analysis. Meta-analysis was conducted in RStudio using the funnel function of the metafor package. The analyses were primarily performed with variables reported alongside their 95% Confidence Intervals (CI). For variables lacking 95% CIs, additional analyses were conducted by inferring estimates from the data extracted from the included articles.\u003c/p\u003e\u003cp\u003eFurthermore, to calculate the difference between AI and radiologists performance, the Wescher's t test was conducted. Additionally, AI models were subdivided into Hybrid Nomogram (HN) and ML models, according to use of ML techniques either in feature reduction or model construction. Models that utilized ML techiniques only feature selection were classified as Emsembled Nomogram and models that utilized ML techniques in model construction were classified as ML models.\u003c/p\u003e\u003cp\u003eAddionationaly, three main outcomes were analysed in the meta analysis of this systematic review: (\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e) Risk stratification; (\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e) Differential diagnosis, in other words, the differentiation of cancer in the thymus from other types of diseases; (\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e) Pathology.\u003c/p\u003e\u003cp\u003eFinally, large language models such as ChatGPT and others were used for text revision and R code script adaptation. Addtionaly, AI-assisted tools were not used for content generation.\u003c/p\u003e"},{"header":"3 Results","content":"\u003cp\u003eThe search retrieved records from seven major databases: PubMed (100 articles), Web of Science (69 articles), Embase (166 articles), Scopus (55 articles), BioRxiv (48 articles), IEEE Xplore (5 articles), and the ACM Digital Library (4 articles), resulting in a total of 438 articles retrieved. Following the removal of 159 duplicates using Zotero, 281 unique records remained. These were then subjected to title and abstract screening, reducing the pool to 75 potentially relevant articles. All 75 underwent full-text screening, from which 25 studies were ultimately included in the final review, as represented in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e. The full reading decisions of articles can be found in Supplementary_Material_2.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003eThe methodological quality assessment revealed that the majority of studies (17 studies; 68.0%) (\u003cspan additionalcitationids=\"CR9 CR10 CR11 CR12 CR13 CR14 CR15 CR16 CR17 CR18 CR19 CR20 CR21 CR22 CR23\" citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e) were rated as high methodological quality. A smaller number of studies were labelled as moderate (7 studies; 28.0%) (\u003cspan additionalcitationids=\"CR26 CR27\" citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e). Only one study (4.0%) (\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e) was classified as having very high. The full list of scores can be observed in Supplementary_Material_3.\u003c/p\u003e\u003cp\u003eWithin the subgroup of differential diagnosis, heterogeneity is moderate to high (I\u0026sup2; = 59.1% for ML and 80.9% for HN). However, the overall pooled analysis across all studies yields an AUC of 0.89 (0.87\u0026ndash;0.91), importantly, the test for subgroup differences shows no significant divergence (p\u0026thinsp;=\u0026thinsp;0.6072).\u003c/p\u003e\u003cp\u003eWithin the subgroup of pathology, heterogeneity is also moderate to high (I\u0026sup2; = 52.8% for ML and 95.7% for HN). However, the overall pooled analysis across all studies yields an AUC of 0.82 (0.78\u0026ndash;0.86), importantly, the test for subgroup differences shows no significant divergence (p\u0026thinsp;=\u0026thinsp;0.3304).\u003c/p\u003e\u003cp\u003eWithin the subgroup of risk stratification, heterogeneity is high (I\u0026sup2; = 16.7% for ML and 92.3% for hybrid nomograms). However, the overall pooled analysis across all studies yields an AUC of 0.82 (0.79\u0026ndash;0.84), importantly, the test for subgroup differences shows no significant divergence (p\u0026thinsp;=\u0026thinsp;0.9952).\u003c/p\u003e\u003cp\u003eHeterogeneity was significant (p\u0026thinsp;\u0026lt;\u0026thinsp;0.05) in all cases, both within subgroups and across groups. However, as the review\u0026rsquo;s primary purpose was to synthesize available evidence rather than establish a single pooled effect, heterogeneity was expected.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003eWithin included studies, imaging modalities, on which the models did the prediction, were evaluated for their role in pathology identification, differential diagnosis, and risk stratification or staging. For pathology detection, MRI was used in 1/7 studies (14.3%) (\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e), while CT was applied in 6/7 studies (85.7%) (\u003cspan additionalcitationids=\"CR32 CR33 CR34 CR35\" citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e). In differential diagnosis, PET and MRI were each used in 1/8 studies (12.5% each) (\u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e37\u003c/span\u003e, \u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e), whereas CT was employed in 7/8 studies (87.5%) (\u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e37\u003c/span\u003e, \u003cspan additionalcitationids=\"CR40 CR41 CR42 CR43\" citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e44\u003c/span\u003e), one study utilized both PET and CT (\u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e37\u003c/span\u003e). For risk stratification, staging, and distinguishing between benign and malignant findings, CT was exclusively used in 100% of the studies. Overall, CT emerged as the predominant imaging modality across all outcome categories, with MRI and PET contributing minimally.\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003esummarizes the meta-analysis of model performance across three main clinical objectives\u0026mdash;pathology (P), differential diagnosis (DD), and risk stratification (RS)\u0026mdash;comparing both traditional machine learning models (ML) and hybrid nomograms (EN). Included metrics include AUC, sensitivity, specificity, and accuracy.\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"5\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u003cp\u003eObjective\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003eAUC\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u003cp\u003eSensitivity\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c4\"\u003e\u003cp\u003eSpecificity\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c5\"\u003e\u003cp\u003eAccuracy\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eP.ML\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e0.84 (0.8\u0026ndash;0.88)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.67 (0.58, 0.75)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e0.83 (0.78, 0.88)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e0.76 (0.74, 0.79)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eP.HN*\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e0.81 (0.75\u0026ndash;0.86)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.64 (0.58, 0.71)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e0.76 (0.69, 0.83)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e0.71 (0.64, 0.77)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eDD.ML*\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e0.88 (0.86\u0026ndash;0.91)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.84 (0.81, 0.87)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e0.84 (0.82, 0.87)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e0.84 (0.82, 0.86)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eDD.HN*\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e0.9 (0.86\u0026ndash;0.93)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.86 (0.76, 0.96)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e0.85 (0.72, 0.98)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e0.8 (0.69, 0.9)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eRS.ML\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e0.82 (0.79\u0026ndash;0.85)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.67 (0.61, 0.74)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e0.78 (0.72, 0.84)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e0.73 (0.71, 0.76)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eRS.HN\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e0.82 (0.8\u0026ndash;0.84)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.71 (0.59, 0.84)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e0.84 (0.75, 0.93)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e0.74 (0.66, 0.83)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003e*External validation data.\u003c/p\u003e\u003cp\u003eTable\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e2\u003c/span\u003e - AUC, Sensitivity, Specificity and Accuracy meta analysis results by subgroups.\u003c/p\u003e\u003cp\u003eAUC - Area Under the Curve; P - Pathology; DD - Differential Diagnosis; RS - Risk Stratification; ML - Machine Learning Model; HN - Hybrid Nomogram.\u003c/p\u003e\u003cp\u003eThe performance evaluation of the ML model showed promising results across different applications. For pathology prediction, the models achieved an NPV of 0.705 (95% CI: 0.628\u0026ndash;0.782) and a PPV of 0.711 (95% CI: 0.656\u0026ndash;0.765), indicating moderate discriminative ability. In the context of differential diagnosis, however, the model demonstrated stronger performance with a mean PPV of 0.904 (95% CI: 0.885\u0026ndash;0.923), reflecting excellent predictive value, and a mean NPV of 0.76 (95% CI: 0.724\u0026ndash;0.796), showing good overall reliability.\u003c/p\u003e\u003cp\u003eComparing the performance of radiologists with HN revealed no statistically significant differences in sensitivity (p\u0026thinsp;=\u0026thinsp;0.731), specificity (p\u0026thinsp;=\u0026thinsp;0.175), or accuracy (p\u0026thinsp;=\u0026thinsp;0.356). Similarly, comparisons with machine learning models showed no significant differences in sensitivity (p\u0026thinsp;=\u0026thinsp;0.548), specificity (p\u0026thinsp;=\u0026thinsp;0.228), or accuracy (p\u0026thinsp;=\u0026thinsp;0.212).\u003c/p\u003e\u003cp\u003eIn the pathology model performance graph (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eA), deep learning (\u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e, \u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e) clearly outperformed all other machine learning techniques, achieving the highest mean AUC of 0.898. The next best model was random forest (\u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e) with an AUC of 0.791, followed by KNN (0.767), GLM (0.760), and SVM (0.760) (\u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e), all of which showed similar but noticeably lower performance compared to deep learning.\u003c/p\u003e\u003cp\u003eFor differential diagnosis model performance (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eB), CNN (\u003cspan citationid=\"CR43\" class=\"CitationRef\"\u003e43\u003c/span\u003e) achieved the top performance with a mean AUC of 0.896, closely followed by Adaboost (0.8675) and MLP (0.8645). SVM (0.863) and LightGBM (0.863) (\u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e) also performed well, while logistic regression (\u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e45\u003c/span\u003e, \u003cspan citationid=\"CR46\" class=\"CitationRef\"\u003e46\u003c/span\u003e) and KNN achieved moderately lower results. Decision trees (\u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e) were the weakest performers in this category, with an AUC of just 0.678, showing a significant gap compared to the leading models.\u003c/p\u003e\u003cp\u003eIn the risk stratification model performance graph (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eC), KNN (\u003cspan citationid=\"CR46\" class=\"CitationRef\"\u003e46\u003c/span\u003e, \u003cspan citationid=\"CR47\" class=\"CitationRef\"\u003e47\u003c/span\u003e) achieved the highest mean AUC of 0.887, with MLP (0.867) and LDA (0.856) following closely (\u003cspan citationid=\"CR46\" class=\"CitationRef\"\u003e46\u003c/span\u003e). Several other models, including SVM, SGD, and Naive Bayes variants (\u003cspan citationid=\"CR46\" class=\"CitationRef\"\u003e46\u003c/span\u003e, \u003cspan citationid=\"CR47\" class=\"CitationRef\"\u003e47\u003c/span\u003e) also performed relatively well, ranging between 0.831 and 0.844. In contrast, decision trees (\u003cspan citationid=\"CR46\" class=\"CitationRef\"\u003e46\u003c/span\u003e, \u003cspan citationid=\"CR47\" class=\"CitationRef\"\u003e47\u003c/span\u003e) again showed the lowest performance, with a mean AUC of 0.555, reinforcing their consistent underperformance across all three tasks.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003eFor pathology (P.ML and P.EN), performance is moderate, with AUCs of 0.84 for ML and 0.81 for EN. Both show relatively high specificity (around 0.76\u0026ndash;0.83), but sensitivity is lower (0.64\u0026ndash;0.67). Accuracy remains modest (0.71\u0026ndash;0.76), indicating that pathology prediction is still a challenging task with moderate diagnostic reliability.\u003c/p\u003e\u003cp\u003eIn differential diagnosis (DD.ML and DD.EN), results are notably stronger. Machine learning models (\u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e, \u003cspan additionalcitationids=\"CR44\" citationid=\"CR43\" class=\"CitationRef\"\u003e43\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e45\u003c/span\u003e) achieve an AUC of 0.88, with high sensitivity (0.84) and balanced specificity (0.84), leading to a high accuracy of 0.84. HN perform even better, with an AUC of 0.90, sensitivity of 0.86, and specificity of 0.85, though accuracy (0.80) shows slightly wider variability.\u003c/p\u003e\u003cp\u003eFor risk stratification (RS.ML and RS.EN), performance is weaker than in differential diagnosis but still clinically useful. Both ML and EN models (\u003cspan additionalcitationids=\"CR47 CR48 CR49 CR50 CR51\" citationid=\"CR46\" class=\"CitationRef\"\u003e46\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR52\" class=\"CitationRef\"\u003e52\u003c/span\u003e) show AUCs around 0.82, with relatively low sensitivity (0.67 for ML and 0.71 for EN), but moderate specificity (0.78\u0026ndash;0.84), with Accuracy is in the 0.73\u0026ndash;0.74 range.\u003c/p\u003e\u003cp\u003eThe results across the three graphs, in Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e, illustrate how different modeling strategies perform depending on the clinical objective. In pathology (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e.A), deep learning\u0026ndash;based models (\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e, \u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e) clearly outperform traditional approaches. Conventional models such as Random Forest, SVM, GLM, and KNN (\u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e, \u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e) still achieve moderate predictive power but remain well below the performance of deep learning signatures, underscoring the added value of advanced feature representation in this domain.\u003c/p\u003e\u003cp\u003eFor diagnosis differential (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e.B), performance is more evenly distributed across models, with several traditional algorithms\u0026mdash;including KNN, MLP, LDA, and SVM (\u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e44\u003c/span\u003e, \u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e45\u003c/span\u003e) \u0026mdash;achieving strong results in the 0.85\u0026ndash;0.89 range. HN (\u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e37\u003c/span\u003e, \u003cspan additionalcitationids=\"CR40\" citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e) and boosting methods like XGBoost, GDBT, and Bernoulli NB (\u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e44\u003c/span\u003e, \u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e45\u003c/span\u003e) also provide competitive accuracy, while simpler methods such as Logistic Regression, Gaussian NB, and Decision Tree (\u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e, \u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e44\u003c/span\u003e, \u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e45\u003c/span\u003e) show weaker performance, with DT in particular lagging behind significantly.\u003c/p\u003e\u003cp\u003eIn risk stratification (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e.C), results are consistently high across all models. The Combined nomogram (\u003cspan citationid=\"CR48\" class=\"CitationRef\"\u003e48\u003c/span\u003e, \u003cspan citationid=\"CR51\" class=\"CitationRef\"\u003e51\u003c/span\u003e) achieves the best performance at 0.945, closely followed by Adaboost and LightGBM (\u003cspan citationid=\"CR46\" class=\"CitationRef\"\u003e46\u003c/span\u003e, \u003cspan citationid=\"CR52\" class=\"CitationRef\"\u003e52\u003c/span\u003e). Even the lowest-performing models, such as KNN and MLP (\u003cspan citationid=\"CR46\" class=\"CitationRef\"\u003e46\u003c/span\u003e, \u003cspan citationid=\"CR47\" class=\"CitationRef\"\u003e47\u003c/span\u003e), maintain strong results above 0.84.\u003c/p\u003e"},{"header":"4 Discussion","content":"\u003cdiv id=\"Sec5\" class=\"Section2\"\u003e\u003ch2\u003e4.1 Main Findings\u003c/h2\u003e\u003cp\u003eThe meta-analysis shows that model performance varies by outcome. Differential diagnosis achieves the best results, especially with HN (AUC up to 0.90, high sensitivity and specificity) (\u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e37\u003c/span\u003e, \u003cspan additionalcitationids=\"CR40\" citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e), making it the most reliable task. Pathology models (\u003cspan additionalcitationids=\"CR31 CR32 CR33 CR34 CR35\" citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e) perform moderately (AUC\u0026thinsp;~\u0026thinsp;0.81\u0026ndash;0.84) but are limited by low sensitivity. Risk stratification models (\u003cspan additionalcitationids=\"CR47 CR48 CR49 CR50 CR51\" citationid=\"CR46\" class=\"CitationRef\"\u003e46\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR52\" class=\"CitationRef\"\u003e52\u003c/span\u003e) show consistent but weaker performance (AUC\u0026thinsp;~\u0026thinsp;0.82, accuracy\u0026thinsp;~\u0026thinsp;0.73\u0026ndash;0.74), with specificity stronger than sensitivity. Overall, HN enhance performance, particularly in differential diagnosis, while pathology and risk stratification remain more challenging tasks.\u003c/p\u003e\u003cp\u003eIn pathology, machine learning models (P.ML) (\u003cspan additionalcitationids=\"CR34 CR35\" citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e) achieve a slightly higher AUC (0.84) than HN (P.EN: 0.81) (\u003cspan additionalcitationids=\"CR31 CR32\" citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e, \u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e), but both approaches show low sensitivity (~\u0026thinsp;0.64\u0026ndash;0.67). This means ML alone can capture patterns reasonably well, but neither technique excels at detecting true positives.\u003c/p\u003e\u003cp\u003eIn differential diagnosis, both ML (\u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e, \u003cspan additionalcitationids=\"CR44\" citationid=\"CR43\" class=\"CitationRef\"\u003e43\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e45\u003c/span\u003e) and HN (\u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e37\u003c/span\u003e, \u003cspan additionalcitationids=\"CR40\" citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e) very well, with ML reaching AUC 0.88 and HN improving further to 0.90. Here, ML already provides strong diagnostic accuracy (0.84), but HN approaches add robustness, especially in sensitivity and specificity.\u003c/p\u003e\u003cp\u003eIn risk stratification, ML (\u003cspan citationid=\"CR46\" class=\"CitationRef\"\u003e46\u003c/span\u003e, \u003cspan citationid=\"CR47\" class=\"CitationRef\"\u003e47\u003c/span\u003e, \u003cspan citationid=\"CR52\" class=\"CitationRef\"\u003e52\u003c/span\u003e) and HN (\u003cspan additionalcitationids=\"CR49 CR50\" citationid=\"CR48\" class=\"CitationRef\"\u003e48\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR51\" class=\"CitationRef\"\u003e51\u003c/span\u003e) perform similarly (AUC\u0026thinsp;~\u0026thinsp;0.82), though ML shows slightly lower accuracy (0.73) compared to HN (0.74). Both approaches are limited by relatively low sensitivity, but HN again provide a marginal improvement.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec6\" class=\"Section2\"\u003e\u003ch2\u003e4.2 Comparison with similar research\u003c/h2\u003e\u003cp\u003eA meta-analysis of 83 studies on the validation of generative AI compared to the diagnostic performance of physicians revealed a diagnostic accuracy of 52.1%, which is relatively low considering general clinical outcomes. AI models and physician performance showed no statistically significant differences (p\u0026thinsp;=\u0026thinsp;0.10), even when compared to non-specialist physicians (p\u0026thinsp;=\u0026thinsp;0.93). However, AI models performed worse than specialist physicians (p\u0026thinsp;=\u0026thinsp;0.007), which highlights limitations in the face of advanced clinical knowledge (\u003cspan citationid=\"CR53\" class=\"CitationRef\"\u003e53\u003c/span\u003e).\u003c/p\u003e\u003cp\u003eA meta-analysis of 19 articles compared the diagnostic ability of AI and clinicians with different levels of expertise in dermatological cancers. It showed that AI had a sensitivity of 0.87 and specificity of 0. 771, surpassing general practitioners (SEN 0.7978, SPE 0.736) and equaling specialist dermatologists (SEN 0.842, SPE 0.744) (\u003cspan citationid=\"CR54\" class=\"CitationRef\"\u003e54\u003c/span\u003e)\u003c/p\u003e\u003cp\u003eA systematic review examined the diagnostic accuracy of AI in digital pathology images and found an average sensitivity across all studies of 0.963 (95%CI 0.941\u0026ndash;0.977) and an average specificity of 0.933 (CI95% 0.905\u0026ndash;0.954). Gastrointestinal pathology, with the largest sample size of 14 AI models, demonstrates strong performance with a mean sensitivity of 0.93 and specificity of 0.94. Uropathology also excels, with 8 models achieving high mean sensitivity (0.95) and specificity (0.96). Breast pathology, represented by 8 models, shows slightly lower metrics (0.83 sensitivity, 0.88 specificity). Notably, neuropathology stands out with perfect sensitivity (\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e) and high specificity (0.95), though it is based on only one model. Conversely, cardiothoracic and head \u0026amp; neck pathology exhibit high sensitivities (0.98 each) but lower specificities (0.76 and 0.72, respectively) (\u003cspan citationid=\"CR55\" class=\"CitationRef\"\u003e55\u003c/span\u003e).\u003c/p\u003e\u003cp\u003eFurthermore, in staging and RF model (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1371/journal.pone.02614010\u003c/span\u003e\u003cspan address=\"10.1371/journal.pone.02614010\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e) (\u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e) achieved an AUC of 0.838 (0.669\u0026ndash;0.934), 0.749 (0.406\u0026ndash;0.936) for sensitivity, 0.751 (0.608\u0026ndash;0.857) for specificity, 0.775 (0.635\u0026ndash;0.854) for accuracy, 0.75 (0.608\u0026ndash;0.834).\u003c/p\u003e\u003cp\u003eA study evaluating radiomics signatures of computed tomography for predicting risk categorization and clinical stage of thymomas (\u003cspan citationid=\"CR56\" class=\"CitationRef\"\u003e56\u003c/span\u003e) compared the performance of models based on non-contrast-enhanced CT (NECT), contrast-enhanced CT (CECT), and radiologist assessment. The NECT-based model achieved an AUC of 0.829 (95% CI: 0.757\u0026ndash;0.900), with sensitivity, specificity, and accuracy of 0.712, 0.806, and 0.819, respectively. The CECT-based model performed slightly better, with an AUC of 0.860 (95% CI: 0.803\u0026ndash;0.917), sensitivity of 0.699, specificity of 0.889, and accuracy of 0.869. In comparison, radiologist assessment yielded a lower accuracy of 0.779, with sensitivity and specificity not reported.\u003c/p\u003e\u003cp\u003eTaken together, these findings show that while pathology prediction is most enhanced by deep learning, disease differentiation requires careful model selection with HN or neural methods offering advantages, and risk stratification appears to be a well-defined task where most models can achieve robust and clinically useful performance.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec7\" class=\"Section2\"\u003e\u003ch2\u003e4.3 Implications and actions needed\u003c/h2\u003e\u003cp\u003eFurther research utilizing imaging modalities beyond computed tomography is needed to enhance the generalizability of current findings. Additionally, more attention should be given to augmentation strategies in which artificial intelligence works alongside radiologists to optimize diagnostic accuracy and clinical decision-making.\u003c/p\u003e\u003c/div\u003e"},{"header":"Abbreviations","content":"\u003cp\u003eAI — Artificial Intelligence; ML — Machine Learning; LR — Logistic Regression. KNN — k-Nearest Neighbors; SVM — Support Vector Machine; RF — Random Forest; DT — Decision Tree; MLP — Multi-Layer Perceptron; LDA — Linear Discriminant Analysis; SGD — Stochastic Gradient Descent Classifier; GDBT — Gradient Boosted Decision Trees; XGBoost — eXtreme Gradient Boosting; LightGBM — Light Gradient Boosting Machine; Adaboost — Adaptive Boosting; NB (Bernoulli/Gaussian) — Naïve Bayes (Bernoulli distribution / Gaussian distribution); Linear SVC — Linear Support Vector Classifier; DL — Deep Learning; DLR_Sig — Deep Learning Radiomics Signature (using Logistic Regression); DLRN — Deep Learning–based Radiomic Nomogram (Combined features) GLM — Generalized Linear Model; UECT — Unenhanced Computed Tomography; CECT — Contrast-Enhanced Computed Tomography; MRI — Magnetic Resonance Imaging; PET — Positron Emission Tomography; CT — Computed Tomography.\u003c/p\u003e\n\u003cp\u003e\u003cbr\u003e\u003c/p\u003e"},{"header":"Declarations","content":"\u003ch2\u003e5 Conflict of Interest\u003c/h2\u003e\n\u003cp\u003eThe authors declare no conflicts of interest.\u003c/p\u003e\n\u003ch2\u003e6 Author Contributions\u003c/h2\u003e\n\u003cp\u003eDaniel Rodrigo Serbena - Conceptualization, Methodology, Data curation, Formal analysis, Writing \u0026ndash; original draft;\u003c/p\u003e\n\u003cp\u003eIsabela Luiza Fraron Cieslack \u0026nbsp;- Data curation, Writing \u0026ndash; original draft;\u003c/p\u003e\n\u003cp\u003eAugusto Philippus Lack - Data curation;\u003c/p\u003e\n\u003cp\u003eGabriela Vitoria Fraron Cieslack - Data curation;\u003c/p\u003e\n\u003cp\u003eWeber Claudio Francisco Nunes da Silva - Writing \u0026ndash; review \u0026amp; editing;\u003c/p\u003e\n\u003cp\u003eRoberta Fabbri - Writing \u0026ndash; review \u0026amp; editing;\u003c/p\u003e\n\u003cp\u003eJuliana Sartori Bonini - Study supervision.\u003c/p\u003e\n\u003ch2\u003e7 Funding\u003c/h2\u003e\n\u003cp\u003eNone\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eAlqaidy D, Thymoma (2023) Overv Diagnostics 13(18):2982\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eRobinson SP, Akhondi H Thymoma. In: StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2025 [cited 2025 Sep 29]. Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://www.ncbi.nlm.nih.gov/books/NBK559291/\u003c/span\u003e\u003cspan address=\"http://www.ncbi.nlm.nih.gov/books/NBK559291/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eRoden AC, Ahmad U, Cardillo G, Girard N, Jain D, Marom EM et al (2022) Thymic Carcinomas\u0026mdash;A Concise Multidisciplinary Update on Recent Developments From the Thymic Carcinoma Working Group of the International Thymic Malignancy Interest Group. J Thorac Oncol 17(5):637\u0026ndash;650\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eFalkson CB, Bezjak A, Darling G, Gregg R, Malthaner R, Maziak DE et al (2009) The Management of Thymoma: A Systematic Review and Practice Guideline. J Thorac Oncol 4(7):911\u0026ndash;919\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eBajwa J, Munir U, Nori A, Williams B (2021) Artificial intelligence in healthcare: transforming the practice of medicine. Future Healthc J 8(2):e188\u0026ndash;e194\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003ePang J, Xiu W, Ma X (2023) Application of Artificial Intelligence in the Diagnosis, Treatment, and Prognostic Evaluation of Mediastinal Malignant Tumors. JCM 12(8):2818\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eKwong JCC, Khondker A, Lajkosz K, McDermott MBA, Frigola XB, McCradden MD et al (2023) APPRAISE-AI Tool for Quantitative Evaluation of AI Studies for Clinical Decision Support. JAMA Netw Open 6(9):e2335377\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eWang X, Sun W, Liang H, Mao X, Lu Z (2019) Radiomics Signatures of Computed Tomography Imaging for Predicting Risk Categorization and Clinical Stage of Thymomas. Biomed Res Int 2019:1\u0026ndash;10\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eBl\u0026uuml;thgen C, Patella M, Euler A, Baessler B, Martini K, Von Spiczak J et al (2021) Computed tomography radiomics for the prediction of thymic epithelial tumor histology, TNM stage and myasthenia gravis. Al-Kadi OS. editor PLoS ONE 16(12):e0261401\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLiu W, Wang W, Guo M, Zhang H (2024) Tumor habitat and peritumoral region evolution\u0026ndash;based imaging features to assess risk categorization of thymomas. Clin Radiol 79(9):e1117\u0026ndash;e1125\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eGao C, Yang L, Xu Y, Wang T, Ding H, Gao X et al (2024) Differentiating low-risk thymomas from high-risk thymomas: preoperative radiomics nomogram based on contrast enhanced CT to minimize unnecessary invasive thoracotomy. BMC Med Imaging 24(1):197\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLiang Z, Li J, Tang Y, Zhang Y, Chen C, Li S et al (2024) Predicting the risk category of thymoma with machine learning-based computed tomography radiomics signatures and their between-imaging phase differences. Sci Rep 14(1):19215\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eZhou H, Bai HX, Jiao Z, Cui B, Wu J, Zheng H et al (2023) Deep learning-based radiomic nomogram to predict risk categorization of thymic epithelial tumors: A multicenter study. Eur J Radiol 168:111136\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eChen X, Feng B, Xu K, Chen Y, Duan X, Jin Z et al (2023) Development and validation of a deep learning radiomics nomogram for preoperatively differentiating thymic epithelial tumor histologic subtypes. Eur Radiol 33(10):6804\u0026ndash;6816\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLiu W, Wang W, Zhang H, Guo M, Xu Y, Liu X (2023) Development and Validation of Multi-Omics Thymoma Risk Classification Model Based on Transfer Learning. J Digit Imaging 36(5):2015\u0026ndash;2024\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eShen Q, Shan Y, Xu W, Hu G, Chen W, Feng Z et al (2021) Risk stratification of thymic epithelial tumors by using a nomogram combined with radiomic features and TNM staging. Eur Radiol 31(1):423\u0026ndash;435\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eAi J, Wang Z, Ai S, Li H, Gao H, Shi G et al (2025) Development and Validation of a CT-Radiomics Nomogram for the Diagnosis of Small Prevascular Mediastinal Nodules: Reducing Nontherapeutic Surgeries. Acad Radiol 32(1):506\u0026ndash;517\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eHuang X, Wang X, Liu Y, Wang Z, Li S, Kuang P (2024) Contrast-enhanced CT-based radiomics differentiate anterior mediastinum lymphoma from thymoma without myasthenia gravis and calcification. Clin Radiol 79(4):e500\u0026ndash;e510\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eYang Y, Cheng J, Peng Z, Yi L, Lin Z, He A et al (2024) Development and Validation of Contrast-Enhanced CT-Based Deep Transfer Learning and Combined Clinical-Radiomics Model to Discriminate Thymomas and Thymic Cysts: A Multicenter Study. Acad Radiol 31(4):1615\u0026ndash;1628\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLi J, Cui N, Jiang Z, Li W, Liu W, Wang S et al (2023) Differentiating thymic epithelial tumors from mediastinal lymphomas: preoperative nomograms based on PET/CT radiomic features to minimize unnecessary anterior mediastinal surgery. J Cancer Res Clin Oncol 149(15):14101\u0026ndash;14112\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eHe W, Xia C, Chen X, Yu J, Liu J, Pu H et al (2022) Computed Tomography-Based Radiomics for Differentiation of Thymic Epithelial Tumors and Lymphomas in Anterior Mediastinum. Front Oncol 12:869982\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eZhang C, Yang Q, Lin F, Ma H, Zhang H, Zhang R et al (2021) CT-Based Radiomics Nomogram for Differentiation of Anterior Mediastinal Thymic Cyst From Thymic Epithelial Tumor. Front Oncol 11:744021\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLiu L, Lu F, Pang P, Shao G (2020) Can computed tomography-based radiomics potentially discriminate between anterior mediastinal cysts and type B1 and B2 thymomas? BioMed. Eng OnLine 19(1):89\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eXiao G, Hu YC, Ren JL, Qin P, Han JC, Qu XY et al (2021) MR imaging of thymomas: a combined radiomics nomogram to predict histologic subtypes. Eur Radiol 31(1):447\u0026ndash;457\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMahmoudi S, Gruenewald LD, Eichler K, Martin SS, Booz C, Bernatz S et al (2023) Advanced biomedical imaging for accurate discrimination and prognostication of mediastinal masses. Eur J Clin Invest 53(12):e14075\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLin CY, Yen YT, Huang LT, Chen TY, Liu YS, Tang SY et al (2022) An MRI-Based Clinical-Perfusion Model Predicts Pathological Subtypes of Prevascular Mediastinal Tumors. Diagnostics 12(4):889\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eOhira R, Yanagawa M, Suzuki Y, Hata A, Miyata T, Kikuchi N et al (2022) CT-based radiomics analysis for differentiation between thymoma and thymic carcinoma. J Thorac Dis 14(5):1342\u0026ndash;1352\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eTian D, Yan HJ, Shiiya H, Sato M, Shinozaki-Ushiku A, Nakajima J (2023) Machine learning-based radiomic computed tomography phenotyping of thymic epithelial tumors: Predicting pathological and survival outcomes. J Thorac Cardiovasc Surg 165(2):502\u0026ndash;516e9\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eYang Y, Cheng J, Chen L, Cui C, Liu S, Zuo M (2024) Application of machine learning for the differentiation of thymomas and thymic cysts using deep transfer learning: A multi-center comparison of diagnostic performance based on different dimensional models. Thorac Cancer 15(31):2235\u0026ndash;2247\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eXiao G, Hu YC, Ren JL, Qin P, Han JC, Qu XY et al (2021) MR imaging of thymomas: a combined radiomics nomogram to predict histologic subtypes. Eur Radiol 31(1):447\u0026ndash;457\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eTian D, Yan HJ, Shiiya H, Sato M, Shinozaki-Ushiku A, Nakajima J (2023) Machine learning-based radiomic computed tomography phenotyping of thymic epithelial tumors: Predicting pathological and survival outcomes. J Thorac Cardiovasc Surg 165(2):502\u0026ndash;516e9\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLiu L, Lu F, Pang P, Shao G (2020) Can computed tomography-based radiomics potentially discriminate between anterior mediastinal cysts and type B1 and B2 thymomas? BioMed. Eng OnLine 19(1):89\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eChen X, Feng B, Xu K, Chen Y, Duan X, Jin Z et al (2023) Development and validation of a deep learning radiomics nomogram for preoperatively differentiating thymic epithelial tumor histologic subtypes. Eur Radiol 33(10):6804\u0026ndash;6816\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eBl\u0026uuml;thgen C, Patella M, Euler A, Baessler B, Martini K, Von Spiczak J et al (2021) Computed tomography radiomics for the prediction of thymic epithelial tumor histology, TNM stage and myasthenia gravis. Al-Kadi OS. editor PLoS ONE 16(12):e0261401\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eHu J, Zhao Y, Li M, Liu Y, Wang F, Weng Q et al (2020) Machine-learning-based computed tomography radiomic analysis for histologic subtype classification of thymic epithelial tumours. Eur J Radiol 126:108929\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eZhou H, Bai HX, Jiao Z, Cui B, Wu J, Zheng H et al (2023) Deep learning-based radiomic nomogram to predict risk categorization of thymic epithelial tumors: A multicenter study. Eur J Radiol 168:111136\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLi J, Cui N, Jiang Z, Li W, Liu W, Wang S et al (2023) Differentiating thymic epithelial tumors from mediastinal lymphomas: preoperative nomograms based on PET/CT radiomic features to minimize unnecessary anterior mediastinal surgery. J Cancer Res Clin Oncol 149(15):14101\u0026ndash;14112\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLin CY, Yen YT, Huang LT, Chen TY, Liu YS, Tang SY et al (2022) An MRI-Based Clinical-Perfusion Model Predicts Pathological Subtypes of Prevascular Mediastinal Tumors. Diagnostics 12(4):889\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eHe W, Xia C, Chen X, Yu J, Liu J, Pu H et al (2022) Computed Tomography-Based Radiomics for Differentiation of Thymic Epithelial Tumors and Lymphomas in Anterior Mediastinum. Front Oncol 12:869982\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eZhang C, Yang Q, Lin F, Ma H, Zhang H, Zhang R et al (2021) CT-Based Radiomics Nomogram for Differentiation of Anterior Mediastinal Thymic Cyst From Thymic Epithelial Tumor. Front Oncol 11:744021\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eAi J, Wang Z, Ai S, Li H, Gao H, Shi G et al (2025) Development and Validation of a CT-Radiomics Nomogram for the Diagnosis of Small Prevascular Mediastinal Nodules: Reducing Nontherapeutic Surgeries. Acad Radiol 32(1):506\u0026ndash;517\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eYang Y, Cheng J, Peng Z, Yi L, Lin Z, He A et al (2024) Development and Validation of Contrast-Enhanced CT-Based Deep Transfer Learning and Combined Clinical-Radiomics Model to Discriminate Thymomas and Thymic Cysts: A Multicenter Study. Acad Radiol 31(4):1615\u0026ndash;1628\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eYang Y, Cheng J, Chen L, Cui C, Liu S, Zuo M (2024) Application of machine learning for the differentiation of thymomas and thymic cysts using deep transfer learning: A multi-center comparison of diagnostic performance based on different dimensional models. Thorac Cancer 15(31):2235\u0026ndash;2247\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eHuang X, Wang X, Liu Y, Wang Z, Li S, Kuang P (2024) Contrast-enhanced CT-based radiomics differentiate anterior mediastinum lymphoma from thymoma without myasthenia gravis and calcification. Clin Radiol 79(4):e500\u0026ndash;e510\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eYang Y, Cheng J, Peng Z, Yi L, Lin Z, He A et al (2024) Development and Validation of Contrast-Enhanced CT-Based Deep Transfer Learning and Combined Clinical-Radiomics Model to Discriminate Thymomas and Thymic Cysts: A Multicenter Study. Acad Radiol 31(4):1615\u0026ndash;1628\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eFeng XL, Wang SZ, Chen HH, Huang YX, Xin YK, Zhang T et al (2022) Optimizing the radiomics-machine-learning model based on non-contrast enhanced CT for the simplified risk categorization of thymic epithelial tumors: A large cohort retrospective study. Lung Cancer 166:150\u0026ndash;160\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eKayi Cangir A, Orhan K, Kahya Y, \u0026Ouml;zakıncı H, Kazak BB, Konuk Balcı BM et al (2021) CT imaging-based machine learning model: a potential modality for predicting low-risk and high-risk groups of thymoma: Impact of surgical modality choice. World J Surg Onc 19(1):147\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLiang Z, Li J, Tang Y, Zhang Y, Chen C, Li S et al (2024) Predicting the risk category of thymoma with machine learning-based computed tomography radiomics signatures and their between-imaging phase differences. Sci Rep 14(1):19215\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eShen Q, Shan Y, Xu W, Hu G, Chen W, Feng Z et al (2021) Risk stratification of thymic epithelial tumors by using a nomogram combined with radiomic features and TNM staging. Eur Radiol 31(1):423\u0026ndash;435\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eGao C, Yang L, Xu Y, Wang T, Ding H, Gao X et al (2024) Differentiating low-risk thymomas from high-risk thymomas: preoperative radiomics nomogram based on contrast enhanced CT to minimize unnecessary invasive thoracotomy. BMC Med Imaging 24(1):197\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLiu W, Wang W, Zhang H, Guo M, Xu Y, Liu X (2023) Development and Validation of Multi-Omics Thymoma Risk Classification Model Based on Transfer Learning. J Digit Imaging 36(5):2015\u0026ndash;2024\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLiu W, Wang W, Guo M, Zhang H (2024) Tumor habitat and peritumoral region evolution\u0026ndash;based imaging features to assess risk categorization of thymomas. Clin Radiol 79(9):e1117\u0026ndash;e1125\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eTakita H, Kabata D, Walston SL, Tatekawa H, Saito K, Tsujimoto Y et al (2025) A systematic review and meta-analysis of diagnostic performance comparison between generative AI and physicians. npj Digit Med 8(1):175\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSalinas MP, Sep\u0026uacute;lveda J, Hidalgo L, Peirano D, Morel M, Uribe P et al (2024) Author Correction: A systematic review and meta-analysis of artificial intelligence versus clinicians for skin cancer diagnosis. npj Digit Med 7(1):141\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMcGenity C, Clarke EL, Jennings C, Matthews G, Cartlidge C, Freduah-Agyemang H et al (2024) Artificial intelligence in digital pathology: a systematic review and meta-analysis of diagnostic test accuracy. npj Digit Med 7(1):114\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eWang X, Sun W, Liang H, Mao X, Lu Z (2019) Radiomics Signatures of Computed Tomography Imaging for Predicting Risk Categorization and Clinical Stage of Thymomas. Biomed Res Int 2019:1\u0026ndash;10\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"Universidade Estadual do Centro Oeste","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Machine Learning, Deep Learning, Thymoma, Diagnosis","lastPublishedDoi":"10.21203/rs.3.rs-7768386/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-7768386/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003e\u003cb\u003eBackground\u003c/b\u003e\u003c/p\u003e\u003cp\u003eThymomas are rare mediastinal tumors with broad clinical spectrum, making accurate diagnosis pivotal for treatment planning and prognostication. Conventional imaging and histopathological evaluation persist as the gold standard, however, recent advances in artificial intelligence (AI) have introduced innovative methodologies to enhance diagnostic precision, reproducibility, and efficiency. This systematic review aimed to evaluate the current evidence on the application of AI-based methods in the diagnosis of thymoma.\u003c/p\u003e\u003cp\u003e\u003cb\u003eMethods\u003c/b\u003e\u003c/p\u003e\u003cp\u003eA systematic search was conducted across Pubmed, Web of science, Embase, Scopus, BioRxiv, IEEE Xplore, Digital Library ACM. Eligible studies included original research that investigated AI techniques\u0026mdash;such as machine learning, deep learning, or radiomics\u0026mdash;for diagnosing or classifying thymoma, based on imaging, pathology, or multimodal data. Data were extracted on AI methodology, diagnostic performance metrics, number of participants, and country of origin. Methodological quality was assessed using APPRAISE-AI.\u003c/p\u003e\u003cp\u003e\u003cb\u003eResults\u003c/b\u003e\u003c/p\u003e\u003cp\u003e26 studies met inclusion criteria. AI models outperformed radiologists and pathologists in all comparisons, although in some metric models were significantly better than medical professionals. For all outcomes, the top-performing models achieved an Area Under the Curve (AUC) close to 0.95, while mean performance values were comparatively lower.\u003c/p\u003e\u003cp\u003e\u003cb\u003eConclusion\u003c/b\u003e\u003c/p\u003e\u003cp\u003eAI models typically exhibit diagnostic performance equivalent to radiologists, showing incremental advantages in selected applications. The most favorable outcomes have been observed in differential diagnosis, followed by pathology and risk stratification, with deep learning demonstrating particular effectiveness in pathology. Nevertheless, further investigations incorporating diverse imaging modalities, deep learning approaches, and strategies aimed at augmenting medical professionals\u0026rsquo; performance are still required.\u003c/p\u003e","manuscriptTitle":"The use of Artificial Intelligence in Diagnosis of Thymic Cancer - Systematic review and Meta Analysis","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-10-06 09:24:08","doi":"10.21203/rs.3.rs-7768386/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"9a4be788-56a6-4dab-80f7-4848678f45b9","owner":[],"postedDate":"October 6th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2025-10-06T09:24:08+00:00","versionOfRecord":[],"versionCreatedAt":"2025-10-06 09:24:08","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-7768386","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-7768386","identity":"rs-7768386","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.