Development of a CT-Based comprehensive model with deep learning for differentiating pathological types of pulmonary ground-glass nodules | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Development of a CT-Based comprehensive model with deep learning for differentiating pathological types of pulmonary ground-glass nodules Zhang Jian, Boheng Liu, Ji Li, Yang Liu, Jipeng Jiang This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-9268097/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted 6 You are reading this latest preprint version Abstract Background : The lack of reliable clinical features for differentiating benign from malignant pulmonary pure ground-glass nodules (pGGNS) leads to potential misdiagnosis and unnecessary invasive examinations. Although radiomics and deep learning approaches have shown potential in nodule characterization, the diagnostic performance of integrated models combining clinical features, radiomics, and deep learning remains insufficiently defined. This study aimed to develop and validate an integrated model to distinguish benign from malignant pGGNs and to further differentiate pathological subtypes. Materials and Methods: This retrospective study included 1,067 patients with pulmonary pGGNs from Shandong First Medical University Cancer Hospital. Clinical and imaging data were collected, and radiomics features and deep learning (DL) derived features were extracted using Python (version 3.7). Patients were randomly divided into training and validation cohorts. Multiple machine-learning classifiers were constructed, and diagnostic performance was assessed using receiver operating characteristic (ROC) curve analysis. Result : For distinguishing benign from malignant pGGNS (Model 1), clinical features such as age, nodule multiplicity, CEA levels, and amylase were identified as clinically relevant features. Thirty-eight valuable features were selected for model development. Among individual classifiers, the Support Vector Machine (SVM) achieved the highest performance with a validation receiver operating characteristic curve (AUC) of 0.840, followed by random forest (0.829), stochastic gradient descent (0.828), k-nearest neighbors (0.814), XGBoost (0.798), and LightGBM (0.818). The integrated model combining clinical features, radiomics, and deep learning achieved a validation set AUC of 0.871. For pathological subtype classification of pGGNs (Model II), clinical features such as gender, Pro-Gastrin-Releasing-Peptide (ProGRP), AST/ALT ratio (De Ritis ratio), creatine kinase-MB (CKMB), and globulin were identified as informative clinical variables. Twelve valuable features were selected The SVM classifier again showed the best individual performance (validation AUC = 0.831), while the integrated model achieved a superior AUC of 0.853. Conclusion : An integrated model incorporating clinical characteristics, radiomics, and deep learning demonstrates robust performance in distinguishing benign from malignant pulmonary pGGNs and in identifying pathological subtypes, suggesting potential clinical utility for noninvasive decision support. machine learning radiomics deep learning ground-glass opacities pulmonary pathology Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Figure 9 Introduction Cancer remains a leading cause of death worldwide [ 1 , 2 ]. The Cancer Atlas reveals approximately 23.6 million (95% UI: 22.2–24.9 million) new cancer cases and 10 million (95% UI: 9.36–10.6 million) cancer-related deaths occurred globally in 2019 [ 3 ]. Lung cancer represents the most prevalent tumor and the leading cause of cancer-related mortality worldwide, largely attributable to smoking and environmental pollution[ 4 ]. In 2022, an estimated 4.824 million new cases and 3.21 million cancer-related deaths occurred in China, compared to 2.37 million new cases and approximately 640,000 deaths in the United States. Although the age-standardized incidence rate (ASIR) in the United States (303.6 per 100,000) was significantly higher than that in China (201.61 per 100,000), the age-standardized mortality rate (ASMR) in China (96.47 per 100,000) exceeded that of the United States (81.8 per 100,000), reflecting disparities in cancer profiles, healthcare infrastructure, and early detection strategies between the two countries [ 5 ]. Early diagnosis and timely intervention are particularly critical for lung cancers presenting as ground-glass nodules (GGNs), as prognosis is closely related to disease stage at detection [ 6 , 7 ]. Data from the International Early Lung Cancer Action Plan reported that early detection and surgical resection of lung cancer presenting as GGNs or partially solid nodules could achieve near-perfect lung cancer-specific survival [ 6 , 7 ]. The widespread adoption of low-dose computed tomography (LDCT) has significantly increased the detection rate of pulmonary nodules [ 8 ]. The National Lung Screening Trial (NLST) reported a 20% reduction in lung cancer mortality with LDCT screening compared with chest radiography[ 8 ]. However, distinguishing benign from malignant GGNs remains challenging in clinical practice. Pure ground-glass nodules (pGGNs) exhibit limited specificity on conventional CT imaging, and their morphological features often overlap between benign and malignant entities [ 9 , 10 ]. pGGNs may indicate benign conditions such as inflammation, hemorrhage, edema, and focal fibrosis, or they may suggest malignant tumors. Studies indicate that 63%–92.6% of persistent pGGNs represent precancerous lesions or early adenocarcinoma [ 9 , 10 ]. Despite advances in imaging technology, pGGNs may still be misdiagnosed due to factors including small size, indistinct margins, inconspicuous appearance, or proximity to vascular structures[ 11 ]. Consequently, a considerable proportion of patients undergo unnecessary invasive diagnostic procedures or surgical interventions, underscoring the need for accurate, noninvasive methods to stratify malignant risk in pGGNs. Radiomics (RAD) enables the high-throughput extraction of quantitative features from medical images using advanced mathematical algorithms and machine-learning techniques, providing objective descriptors of lesion heterogeneity beyond visual assessment[ 12 ]. As radiomics research deepens, it is applied not only to the overall presentation of lesions but also to disease diagnosis, staging, prognosis, and evaluation of treatment efficacy [ 12 ]. The advantage of radiomics lies in its ability to reveal clinical outcomes and guide clinical decisions through non-invasive means, offering new pathways for disease diagnosis and treatment. Deep learning (DL), with its hierarchical neural network architecture, further enhances feature learning capacity and scalability in large imaging datasets [ 13 ]. The integration of DL with radiomics has shown promise across multiple medical domains [ 14 , 15 ]. However, evidence remains limited regarding the application of integrated clinical, radiomics, and DL models for differentiating pathological subtypes of pulmonary pure ground-glass nodules. Accordingly, this study aimed to develop and validate multimodal machine learning models for the noninvasive characterization of pulmonary pGGNs. First, we constructed an integrated model combining clinical variables, radiomics, and deep learning features to distinguish malignant from benign pGGNs and to estimate malignant risk. Furthermore, we developed a subtype classification model to differentiate pGGNs with malignant potential (including atypical adenomatous hyperplasia and adenocarcinoma in situ) from pGGNs without malignant potential, with the goal of supporting individualized treatment decision-making. Materials and Methods Patient population Retrospective collection of patients with pulmonary pure ground-glass nodules (pGGNs) who underwent surgical treatment at Shandong First Medical University Affiliated Cancer Hospital from August 2023 to August 2025. Inclusion Criteria Complete clinical, pathological, and imaging data are available. The nodule without any solid components. Surgical treatment was performed, with definitive pathological findings obtained postoperatively. Exclusion Criteria pGGNs following radiotherapy or chemotherapy. pGGNs after anti-inflammatory treatment. Invasive procedures before surgery, such as nodule biopsy or radiofrequency ablation. History of other concurrent malignancies. Poor image quality due to respiratory movement or other artifacts. This study was approved by the Ethics Committee of Shandong First Medical University Affiliated Cancer Hospital (Approval No. SDTHEC202509033). Clinical features assessment The clinical variables collected included age, gender, BMI, hypertension, diabetes, smoking history, tumor history, family history, anti-inflammatory therapy, routine laboratory test results and CT scan findings. Laboratory tests encompassed routine blood examinations and common lung cancer tumor markers, such as Pro-Gastrin-Releasing Peptide (ProGRP), Carcinoembryonic Antigen (CEA), Cytokeratin 19 Fragment (CYFRA21-1), Neuron-Specific Enolase (NSE), and Squamous Cell Carcinoma Antigen (SCC). Conventional semantic CT features were independently evaluated by two radiologists with more than 5 years of experience, blinded to pathological results. Assessed features included lesion location, margin characteristics (smooth or blurred), maximum diameter, proximity to the pleura (yes/no), anatomical location (central or peripheral), and nodule multiplicity (single or multiple). Additional clinical characteristics were retrospectively extracted from electronic medical records at Shandong First Medical University Cancer Hospital (Appendix I and II). Pathological diagnosis All pathological diagnoses were based on postoperative surgical specimens. To ensure the reliability of the pathology findings, all patient diagnoses underwent retrospective review by a pathologist with o more than 10 years of experience at two hospitals. All pathology results were confirmed to be reliable. Image Acquisition and Preprocessing The original medical imaging data were acquired using SIEMENS SOMATOM Definition AS and SIEMENS SOMATOM Definition Flash dual-source CT scanners for chest CT examinations and stored in DICOM format. The SimpleITK tool was employed to batch convert all DICOM data into NIfTI format (.nii.gz). This ensured that the converted data fully preserved the spatial position information and grayscale distribution of the images, while facilitating subsequent processing workflows. All images were converted and normalized using an open-source Python 3.7 package. These preprocessed images were then imported into the 3D Slicer 5.2.2 software to adjust to uniform grayscale parameters. Following this preprocessing step, manual segmentation was performed on the 1.5 mm thin-slice images from the preoperative plain CT scans. Normalized image data were loaded using 3D Slicer software (version 5.2.2). Using its “Segment Editor” module, two radiologists manually delineated regions of interest (ROIs) in a randomized order of cases. After delineation of each slice, all segmented slices were fused to generate a volumetric mask. Feature Extraction and Screening in Radiomics Patients were randomly assigned to the training and validation sets at a ratio of 7:3. Radiomics features were extracted from ROIs using an open-source Python-based radiomics library. Based on ROIs, imaging-based shape features, statistical features, and texture features were obtained. The texture feature set was further subdivided into multiple subcategories, including Gray-Level Co-occurrence Matrix (GLCM) features, Gray-Level Run Length Matrix (GLRLM) features, Gray-Level Size-Z-Matrix (GLSZM) features, Neighborhood Gray-Level Difference Matrix (NGTDM) features, and Gray-Level Dependency Matrix (GLDM) features. These multidimensional features comprehensively reflect the geometric morphology, gray-level distribution, and textural heterogeneity of regions of interest, providing rich, in-depth quantitative data support for subsequent machine learning modeling and clinical decision-making. Feature Stability Assessment and Radiomics Feature Construction The intraclass correlation coefficient (ICC) was used to assess the consistency of features manually delineated by physicians. Features with an ICC greater than 0.75 were retained. Features in the training set underwent normality tests. Features meeting normality criteria underwent T-tests, while non-normally distributed features underwent rank-sum tests for initial screening, with a significance threshold of P 0.8. Subsequent refinement employed Lasso regression. The LassoCV method generated 200 candidate lambda values uniformly distributed between 0 and 0.5, with the optimal lambda value selected via 10-fold cross-validation to minimize the mean squared error (MSE). To visually demonstrate the impact of lambda on model performance and coefficients, the program plots the MSE curve against log(Alpha) (including 95% confidence interval error bars) and a path map of Lasso regression coefficients versus lambda, saving these images for future reference. Finally, features with non-zero coefficients corresponding to the optimal lambda are selected, and their names and coefficients are saved. This end-to-end workflow implements data preprocessing, LASSO regression model training, visualization, and result output, providing a foundational data layer for machine learning model development. Model Development This study employed six commonly used classifiers for model construction, specifically including Support Vector Machines (SVM), Random Forest, Stochastic Gradient Descent Classifier (SGD), K-Nearest Neighbors (KNN), XGBoost, and LightGBM. To achieve optimal model performance, Bayesian Optimization was utilized for hyperparameter search. Model validation results are visualized using ROC curves, DCA curves, and calibration curves. To further explore the relationship between radiomic features and model predictions, SHAP (Shapley Additive exPlanations) values were calculated for each feature. The global impact of individual features on model decisions was assessed by analyzing both individual prediction SHAP values and overall feature importance. The primary performance evaluation metrics include accuracy, sensitivity, specificity, F1-score, and AUC. The DeLong test was applied to assess the statistical significance of AUC differences across different groupings within the same machine learning model. Statistical Analysis Statistical analysis was performed using SPSS version 23.0 and Python version 3.7. Normally distributed measurement data are presented as mean ± standard deviation (x̄ ± s), while non-normally distributed data are expressed as median and interquartile range [M (Q1, Q3)]. Categorical data are reported as counts or percentages. A comparative analysis was conducted between the training and validation sets, employing Student's t-tests for semantically meaningful quantitative variables and χ² or Fisher's exact tests for categorical variables. All statistical tests were two-tailed with a significance threshold of p < 0.05. Model performance was assessed using receiver operating characteristic (ROC) curves, incorporating sensitivity, specificity, accuracy (ACC), negative predictive value (NPV), positive predictive value (PPV), and area under the ROC curve (AUC) as evaluation metrics. Calibration curves were used to measure the consistency of observed outcomes. Decision curve analysis (DCA) was employed to determine the clinical utility of the models. Results Baseline characteristics A total of 1,067 patients meeting the inclusion criteria were enrolled in this study, including 333 men and 734 women, with a mean age of 55.30 ± 11.34 years (range: 16–81 years). The dataset was divided into training and validation cohorts at an approximate 7:3 ratio. For the benign vs malignant discrimination model (Model 1, defining the malignant group as positive), the training cohort included 284 malignant and 462 benign cases. In comparison, the validation cohort contained 124 malignant and 197 benign cases. For the model distinguishing benign tumors from atypical and in situ carcinomas (Model 2, defining atypical and in situ carcinoma as positive), the training cohort comprised 370 positive cases and 91 negative cases, while the validation set contained 163 positive cases and 35 negative cases. Baseline characteristics were comparable between the training and validation cohorts for both models, with no statistically significant differences observed in clinical variables (all p > 0.05), as summarized in Table 1 . Table 1. Clinical Baseline Comparison of Training and Validation Sets for Model 1 and Model 2 Model 1 Model 2 Variable Training Validation p value Training Validation p value Age 55.00 (49.06, 63.00) 57.00 (48.00, 63.00) 0.725(M-WU) 53.00 (47.00, 61.00) 54.00(49.00, 61.00) 0.651(M-WU) BMI 23.96 (22.57, 26.67) 24.31 (22.22, 26.45) 0.424(M-WU) 23.96 (22.44, 26.23) 24.03(21.94, 26.58) 0.883(M-WU) Size(mm) 11.79 (9.00, 16.00) 12.00 (8.00, 16.00) 0.697(M-WU) 12.00 (9.00, 16.00) 11.79 (9.00, 16.00) 0.582(M-WU) Gender 0.179 (χ²) 0.999 (χ²) Female 523 (70.1%) 211 (65.7%) 323 (70.1%) 138(69.7%) Male 223 (29.9%) 110 (34.3%) 138 (29.9%) 60 (30.3%) Smoking history 0.933 (χ²) 1.000 (χ²) No 663 (88.9%) 284 (88.5%) 413 (89.6%) 178(89.9%) Yes 83 (11.1%) 37 (11.5%) 48 (10.4%) 20 (10.1%) Tumor history 0.867 (χ²) 0.141 (χ²) No 658 (88.2%) 285 (88.8%) 404 (87.6%) 182(91.9%) Yes 88 (11.8%) 36 (11.2%) 57 (12.4%) 16 (8.1%) Family history 0.660 (χ²) 0.462 (χ²) No 712 (95.4%) 309 (96.3%) 447 (97.0%) 189(95.5%) Yes 34 (4.6%) 12 (3.7%) 14 (3.0%) 9 (4.5%) Location 0.012 (χ²) 0.828 (χ²) RUL 240 (32.2%) 102 (31.8%) 157 (34.1%) 70 (35.4%) LUL 185 (24.8%) 92 (28.7%) 109 (23.6%) 53 (26.8%) LLL 172 (23.1%) 51 (15.9%) 99 (21.5%) 36 (18.2%) RLL 117 (15.7%) 50 (15.6%) 72 (15.6%) 30 (15.2%) RML 32 (4.3%) 26 (8.1%) 24 (5.2%) 9 (4.5%) Group 0.917 (χ²) 0.611 (χ²) Control 462 (61.9%) 197 (61.4%) 91 (19.7%) 35 (17.7%) Experimental 284 (38.1%) 124 (38.6%) 370 (80.3%) 163(82.3%) RUL: Right upper lobe; LUL: Left upper lobe; LLL: Left lower lobe; RLL: Right lower lobe; RML: Right middle lobe; p value for comparing the internal cohort with the external cohort; categorical variables were analyzed by Pearson χ 2 test and Fisher exact test, continuous variables were compared by Student t-test and Mann-Whitney U test. Radiomics and clinical signature In Model 1, multivariable analysis identified age, CEA levels, presence of multiple nodules, and amylase-related parameters as independent predictors for distinguishing malignant from benign pulmonary pure ground-glass nodules (all p < 0.05). In Model 2, sex, ProGRP, the aspartate aminotransferase to alanine aminotransferase (AST/ALT) ratio (De Ritis ratio), CKMB, and globulin were statistically significant ( p < 0.05). The multivariate analysis results for Models 1 and 2 are presented in Tables 2 and 3 , along with Figures 1 and 2 . Univariate correlation tables are provided in Appendix I and II . Table 2. Multivariate Analysis Results for Model 1 Variable OR CI95 lower CI95 upper p value Age 1.033 1.016 1.049 0.000 CEA 1.254 1.101 1.427 0.001 Multiple Lesions 0.669 0.491 0.913 0.011 Amylase 0.992 0.985 1.000 0.049 Cytokeratin 1.025 0.860 1.222 0.780 CEA: Carcinoembryonic Antigen Table 3. Multivariate Analysis Results for Model 2 Variable OR CI95 lower CI95 upper p value ProGRP 0.969 0.951 0.987 0.001 Gender 0.455 0.253 0.819 0.009 Globulin 0.930 0.875 0.989 0.020 AST/ALT 0.503 0.272 0.929 0.028 Creatine Kinase MB 0.843 0.717 0.992 0.040 Albumin 1.062 0.992 1.135 0.082 Eosinophil Count 0.353 0.036 3.449 0.370 Smoking history 0.972 0.426 2.219 0.946 ProGRP: Pro Gastrin Releasing Peptide, AST/ALT: Aspartate aminotransferase to alanine aminotransferase Feature Selection Results In the training cohorts for Models 1 and 2, features with an intraclass correlation coefficient (ICC) > 0.75 were selected for future analysis. Subsequently, normality tests were conducted on these features, followed by preliminary significance analysis using t-tests or rank-sum tests, as appropriate. Highly correlated features (|r| > 0.8) were then eliminated using Pearson's or Spearman's correlation coefficients. Finally, LASSO regression was applied, resulting in the selection on 38 features for Model 1 and 12 features for Model 2. The feature correlation heatmap and LASSO regression workflow are presented in Figure 3 . The retained features along with their respective weights in the LASSO regression are shown in Figure 4 . Model Performance Six machine learning classifiers were employed to establish and validate two radiomics models. Model performance was primarily assessed using the validation set area under the AUC. Among the classifiers, the SVM machine learning classifier demonstrated the best performance, with Model 1 achieving an AUC of 0.840. Detailed results for each classifier in Model 1 are presented in Table 4 . Model 2 attained an AUC of 0.831, with specific classifier outcomes shown in Table 5 . The ROC curves, DCA curves, and calibration curves for the optimal classifier (SVM) in both Model 1 and Model 2 are depicted in Figures 5, 6, and 7, respectively. Table 4. Model results data table for the six machine learning classifiers in Model 1. SVM RF SGD KNN XGBoost LightGBM Training set accuracy 0.788 0.787 0.786 0.745 0.776 0.791 Validation set accuracy 0.776 0.782 0.748 0.751 0.785 0.779 Training Set Recall Index 0.613 0.651 0.651 0.563 0.669 0.669 Verification Set Recall Index 0.581 0.621 0.500 0.589 0.637 0.629 Training set AUC 0.844 0.829 0.828 0.814 0.798 0.818 Validation set AUC 0.840 0.836 0.816 0.792 0.821 0.827 Training set sensitivity 0.613 0.651 0.651 0.563 0.669 0.669 Validation set sensitivity 0.896 0.870 0.868 0.857 0.842 0.866 Training set specificity 0.581 0.621 0.500 0.589 0.637 0.629 Verification set specificity 0.898 0.883 0.904 0.853 0.878 0.873 Training set F1 score 0.688 0.699 0.698 0.627 0.695 0.709 Validation set F1 score 0.667 0.687 0.605 0.646 0.696 0.687 Table 5. Model results data table for the six machine learning classifiers in Model 2. SVM RF SGD KNN XGBoost LightGBM Training set accuracy 0.850 0.857 0.848 0.868 0.850 0.846 Validation set accuracy 0.864 0.818 0.828 0.869 0.793 0.763 Training Set Recall Index 0.959 0.959 0.984 0.981 0.943 0.943 Verification Set Recall Index 0.951 0.883 0.945 0.975 0.834 0.804 Training set AUC 0.844 0.848 0.831 0.792 0.834 0.832 Validation set AUC 0.831 0.782 0.762 0.787 0.793 0.778 Training set sensitivity 0.959 0.959 0.984 0.981 0.943 0.943 Validation set sensitivity 0.407 0.440 0.297 0.407 0.473 0.451 Training set specificity 0.951 0.883 0.945 0.975 0.834 0.804 Verification set specificity 0.457 0.514 0.286 0.371 0.600 0.571 Training set F1 score 0.911 0.915 0.912 0.922 0.910 0.908 Validation set F1 score 0.920 0.889 0.901 0.924 0.869 0.848 Fusion Model Feature-level pre-fusion was performed by combining clinical risk factors and radiomic features selected via LASSO regression. An SVM classifier was then employed to establish the fusion models, exploring whether adding clinical features improved diagnostic performance. For Model 1, the fusion model achieved a validation set AUC of 0.871, while Model 2 achieved a value of 0.853. DeLong's test revealed statistically significant differences between models ( p < 0.05). The ROC curves, DCA curves, and calibration curves for the models are presented in Figure 8 . The SHAP weight plots for both models after integrating clinical features are shown in Figure 9. Conclusion In this study, we developed two machine learning models, one for distinguish benign from malignant pulmonary nodules and another for differentiate benign pulmonary nodules from those with atypical/in situ carcinoma characteristics. Both models, which integrate clinical risk factors with CT radiomic features, demonstrated strong discriminatory power and offer potential for enhancing non-invasive diagnostic accuracy in clinical practice. Discussion The 2021 WHO Histologic Classification of Lung Tumors (5th ed.) introduced important revisions of lung tumors, including the reclassification of adenocarcinoma in situ (AIS) and atypical adenomatoid hyperplasia (AAH) as “glandular precursor lesions,” rather than traditional malignancies[ 16 ]. These lesions, while generally slow-growing, rarely metastasize, have an excellent prognosis after surgical resection, retain malignant potential that cannot be ignored [ 17 , 18 ]. To this end, we further developed a machine learning model (Model 2) to differentiate benign pulmonary nodules from glandular precursor lesions, demonstrating excellent discriminatory performance. pGGNs lack specific imaging features [ 19 ], as both benign and malignant lesions can present as ground-glass opacities on CT imaging[ 19 – 21 ]. Clinicians typically rely on qualitative CT features, such as lesion size, margin characteristics, and solid component ratio, to assess the nature of GGNs. However these methods have limitations in predictive accuracy. For example, patients with pGGN do not exhibit specific clinical manifestations; nearly all ground-glass nodules are detected via CT rather than due to clinical symptoms [ 22 ]. Future more, for pGGN smaller than 10mm, the accuracy rate is less than 50%. This is primarily due to poor puncture accuracy, low nodule density, and a small number of tumor cells, resulting in insufficient tissue acquisition to support a pathological diagnosis [ 23 ]. Similarly, pGGNs do not exhibit any serological specificity [ 24 ]. Blood tumor markers, such as CEA and CYFRA21-1, although having some indicative value, exhibit low sensitivity and specificity in the early stages of lung cancer [ 25 ]. Thus, misdiagnosis remains a significant issue in clinical practice. And accurate diagnosis is essential for optimizing prognosis, treatment selection, and avoiding unnecessary invasive procedures. Our study analyzed clinical characteristics (age, BMI, Size, gender, smoking history, tumor history, family history, and location) and test results (blood cell analysis, blood biochemistry, and blood tumor markers) from cases with different pathological type to develop two machine learning models. Model 1 successfully to distinguished benign from malignant pGGNs, while Model 2 further to identified benign nodules with malignant potential. The models demonstrated strong discriminatory power, suggesting that radiomics and deep learning can enhance clinical decision-making by improving diagnostic accuracy. Advantages This study offers several advantages over previous radiomics research. First, unlike prior radiomics studies typically employed a single machine learning method, we used six distinct classifiers (SVM, Random Forest, SGD, KNN, XGBoost, and LightGBM) to construct models. Results revealed significant variations in diagnostic performance among different algorithms within the validation set. Relying on a single algorithm may compromise model accuracy due to the inherent limitations of the chosen method. Therefore, integrating multiple algorithms enhances the predictive accuracy and reliability of the model. Second, the widespread use of chest CT in routine medical practice has led to an increased detection rate of pGGNs. These cases present diagnostic challenges. Our models help address this challenge by incorporating clinical and radiomic data, reducing the risk of unnecessary invasive procedures while ensuring accurate diagnoses. Furthermore, we also considered the special group of glandular precursor lesions with malignant potential. Different pathological types necessitate individualized treatment planning, thereby avoiding both over-testing and missed diagnoses. Given the scarcity of studies focusing on the differential diagnosis of specific pathological types, the innovation of this research is evident. Third, our comprehensive analysis combined clinical and imaging characteristics with radiomic features. By conducting univariate and multivariate logistic regression analyses, we identified several significant features that enhanced diagnostic performance, offering valuable support for clinical interpretation. Finally, integrating deep learning tools into our model further improved its performance. This technological enhancement allows for more sophisticated and objective analysis, contributing to higher model accuracy and reliability. Limitations Despite its strengths, this study has several limitations. The retrospective nature may introduce bias in the collection of clinical characteristics, and prospective studies would allow for more comprehensive model validation. Additionally, the single-center design of this study may limit its generalizability. Finally, the sample size of certain pathological subtypes was limited, which may have affected the robustness of the model for specific subtypes. To address these limitations, we plan to extend the study duration and further promote multicenter collaboration, while conducting more detailed and in-depth analyses of pathological subtypes. Declarations Conflicting interests: The author(s) declared no potential conflicts of interest concerning the research, authorship, and/or publication of this article. The author(s) did not use artificial intelligence (AI) or AI-assisted technology in the writing process. Funding: The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Natural Science Foundation of China (22027810,82303293), Youth Foundation of PLA General Hospital (22QNFC078), the Natural Science Foundation of Shandong Province (ZR2023QH378),and the Taishan Scholars Program (tsqn202312364). Ethical approval: This study was approved by the Ethics Committee of Shandong First Medical University Affiliated Cancer Hospital (Approval No.SDTHEC202509033). Consent statement: This study is a retrospective cohort study. Our ethics committee waived informed consent due to the retrospective nature of our research, which is unlikely to have adverse effects on the health and rights of patients. Contributorship: ZJ: Formal analysis, Visualization, Writing - original draft; LBH: Investigation, Methodology, Software; LJ: Conceptualization, Data curation, Funding acquisition; LY: Resources, Funding acquisition, Supervision; JJP: Funding acquisition, Project administration, Supervision, Validation, Writing - review & editing. All authors participated in the revision and finalization of the manuscript and have read and agreed to the published version. Research registration unique identifying number (UIN) : Name of the registry: not applicable.Unique identifying number or registration ID: not applicable.Hyperlink to your specific registration (must be publicly accessible and will be checked): not applicable. Guarantor : Jian Zhang, Yang Liu, and Jipeng Liu. Associated Data : This section collects any data citations, data availability statements, or supplementary materials included in this article. Data Availability Statement : The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request. Acknowledgment: We are grateful to those who have assisted with this Research Project. References F. Bray, J. Ferlay, I. Soerjomataram, R.L. Siegel, L.A. Torre, A. Jemal, Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries, CA: a cancer journal for clinicians 68(6) (2018) 394-424. A. Jemal, K.D. Miller, J. Ma, R.L. Siegel, S.A. Fedewa, F. Islami, S.S. Devesa, M.J. Thun, Higher Lung Cancer Incidence in Young Women Than Young Men in the United States, The New England journal of medicine 378(21) (2018) 1999-2009. J.M. Kocarnik, K. Compton, F.E. Dean, et al. Force, Cancer Incidence, Mortality, Years of Life Lost, Years Lived With Disability, and Disability-Adjusted Life Years for 29 Cancer Groups From 2010 to 2019: A Systematic Analysis for the Global Burden of Disease Study 2019, JAMA Oncology 8(3) (2022) 420-444. C. Xia, X. Dong, H. Li, M. Cao, D. Sun, S. He, F. Yang, X. Yan, S. Zhang, N. Li, W. Chen, Cancer statistics in China and the United States, 2022: profiles, trends, and determinants, Chinese medical journal 135(5) (2022) 584-590. R.L. Siegel, A.N. Giaquinto, A. Jemal, Cancer statistics, 2024, CA: a cancer journal for clinicians 74(1) (2024) 12-49. C.I. Henschke, R. Yip, J.P. Smith, A.S. Wolf, R.M. Flores, M. Liang, M.M. Salvatore, Y. Liu, D.M. Xu, D.F. Yankelevitz, CT Screening for Lung Cancer: Part-Solid Nodules in Baseline and Annual Repeat Rounds, AJR. American journal of roentgenology 207(6) (2016) 1176-1184. D.F. Yankelevitz, R. Yip, J.P. Smith, M. Liang, Y. Liu, D.M. Xu, M.M. Salvatore, A.S. Wolf, R.M. Flores, C.I. Henschke, CT Screening for Lung Cancer: Nonsolid Nodules in Baseline and Annual Repeat Rounds, Radiology 277(2) (2015) 555-64. D.R. Aberle, A.M. Adams, C.D. Berg, W.C. Black, J.D. Clapp, R.M. Fagerstrom, I.F. Gareen, C. Gatsonis, P.M. Marcus, J.D. Sicks, Reduced lung-cancer mortality with low-dose computed tomographic screening, The New England journal of medicine 365(5) (2011) 395-409. M. Migliore, M. Fornito, M. Palazzolo, A. Criscione, M. Gangemi, F. Borrata, P. Vigneri, M. Nardini, J. Dunning, Ground glass opacities management in the lung cancer screening era, Annals of translational medicine 6(5) (2018) 90. T. Ye, L. Deng, J. Xiang, Y. Zhang, H. Hu, Y. Sun, Y. Li, L. Shen, S. Wang, L. Xie, H. Chen, Predictors of Pathologic Tumor Invasion and Prognosis for Ground Glass Opacity Featured Lung Adenocarcinoma, The Annals of thoracic surgery 106(6) (2018) 1682-1690. A. Del Ciello, P. Franchi, A. Contegiacomo, G. Cicchetti, L. Bonomo, A.R. Larici, Missed lung cancer: when, where, and why?, Diagnostic and interventional radiology (Ankara, Turkey) 23(2) (2017) 118-126. P. Lambin, R.T.H. Leijenaar, T.M. Deist, J. Peerlings, E.E.C. de Jong, J. van Timmeren, S. Sanduleanu, R. Larue, A.J.G. Even, A. Jochems, Y. van Wijk, H. Woodruff, J. van Soest, T. Lustberg, E. Roelofs, W. van Elmpt, A. Dekker, F.M. Mottaghy, J.E. Wildberger, S. Walsh, Radiomics: the bridge between medical imaging and personalized medicine, Nature Reviews. Clinical oncology 14(12) (2017) 749-762. R.Y. Choi, A.S. Coyner, J. Kalpathy-Cramer, M.F. Chiang, J.P. Campbell, Introduction to Machine Learning, Neural Networks, and Deep Learning, Translational vision science & technology 9(2) (2020) 14. X. Xie, L. Yang, F. Zhao, D. Wang, H. Zhang, X. He, X. Cao, H. Yi, X. He, Y. Hou, A deep learning model combining multimodal radiomics, clinical and imaging features for differentiating ocular adnexal lymphoma from idiopathic orbital inflammation, European radiology 32(10) (2022) 6922-6932. X. Liang, K. Tang, X. Ke, J. Jiang, S. Li, C. Xue, J. Deng, X. Liu, C. Yan, M. Gao, J. Zhou, L. Zhao, Development of an MRI-Based Comprehensive Model Fusing Clinical, Radiomics and Deep Learning Models for Preoperative Histological Stratification in Intracranial Solitary Fibrous Tumor, Journal of magnetic resonance imaging: JMRI 60(2) (2024) 523-533. World Health Organization. WHO Classification of Thoracic Tumours . 5th ed. Lyon, France: International Agency for Research on Press, 2021. W.H. Westra, Early glandular neoplasia of the lung, Respir Res 1(3) (2000) 163-169. Y. He, X. Liu, H. Wang, L. Wu, M. Jiang, H. Guo, J. Zhu, S. Wu, H. Sun, S. Chen, Y. Zhu, C. Zhou, Y. Yang, Mechanisms of Progression and Heterogeneity in Multiple Nodules of Lung Adenocarcinoma, Small Methods 5(6) (2021) e2100082. H. Li, Z. Sun, R. Xiao, Q. Qi, X. Li, H. Huang, X. Wang, J. Zhou, Z. Wang, K. Liu, P. Yin, F. Yang, J. Wang, Stepwise evolutionary genomics of early-stage lung adenocarcinoma manifesting as pure, heterogeneous and part-solid ground-glass nodules, Br J Cancer 127(4) (2022) 747-756. H.-H. Yang, Y.-L. Lv, X.-H. Fan, Z.-Y. Ai, X.-C. Xu, B. Ye, D.-Z. Hu, Factors distinguishing invasive from pre-invasive adenocarcinoma presenting as pure ground glass pulmonary nodules, Radiat Oncol 15(1) (2020) 186. Z.-R. Liang, F.-J. Lv, B.-J. Fu, R.-Y. Lin, W.-J. Li, Z.-G. Chu, Reticulation Sign on Thin-Section CT: Utility for Predicting Invasiveness of Pure Ground-Glass Nodules, AJR. American Journal of Roentgenology, 221(1) (2023): 69-78. J. Zhang, J. Sha, W. Liu, Y. Zhou, H. Liu, Z. Zuo, Quantification of Intratumoral Heterogeneity: Distinguishing Histological Subtypes in Clinical T1 Stage Lung Adenocarcinoma Presenting as Pure Ground-Glass Nodules on Computed Tomography, Academic Radiology 31(10) (2024) 4244-4255. W.-C. Hsu, P.-C. Huang, K.-T. Pan, W.-Y. Chuang, C.-Y. Wu, H.-F. Wong, C.-T. Yang, Y.-L. Wan, Predictors of Invasive Adenocarcinomas among Pure Ground-Glass Nodules Less Than 2 cm in Diameter, Cancers (Basel) 13(16) (2021). M.-C. Chen, H.-S. Yang, Z. Dong, L.-J. Li, X.-M. Li, H.-H. Luo, Q. Li, Y. Zhu, Immunogenomic features of radiologically distinctive nodules in multiple primary lung cancer, Cancer Immunol Immunother 73(11) (2024) 217. F. Hu, H. Huang, Y. Jiang, M. Feng, H. Wang, M. Tang, Y. Zhou, X. Tan, Y. Liu, C. Xu, N. Ding, C. Bai, J. Hu, D. Yang, Y. Zhang, Discriminating invasive adenocarcinoma among lung pure ground-glass nodules: a multi-parameter prediction model, J Thorac Dis 13(9) (2021) 5383-5394. Additional Declarations No competing interests reported. Supplementary Files Appendixtable.docx Cite Share Download PDF Status: Under Review Version 1 posted Reviewers agreed at journal 11 May, 2026 Reviewers invited by journal 29 Apr, 2026 Editor assigned by journal 26 Apr, 2026 Editor invited by journal 06 Apr, 2026 Submission checks completed at journal 04 Apr, 2026 First submitted to journal 04 Apr, 2026 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-9268097","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":636001861,"identity":"a0f7df5a-296e-4f9a-aefe-0727fcfbcd2c","order_by":0,"name":"Zhang Jian","email":"","orcid":"","institution":"Nankai University","correspondingAuthor":false,"prefix":"","firstName":"Zhang","middleName":"","lastName":"Jian","suffix":""},{"id":636001862,"identity":"280dbd0f-82ce-4719-be38-fc67660c92c8","order_by":1,"name":"Boheng Liu","email":"","orcid":"","institution":"First Affiliated Hospital of Chinese PLA General Hospital","correspondingAuthor":false,"prefix":"","firstName":"Boheng","middleName":"","lastName":"Liu","suffix":""},{"id":636001863,"identity":"37b26955-6996-445f-a807-28ad39257794","order_by":2,"name":"Ji Li","email":"","orcid":"","institution":"Shandong Tumor Hospital","correspondingAuthor":false,"prefix":"","firstName":"Ji","middleName":"","lastName":"Li","suffix":""},{"id":636001864,"identity":"974264ad-1085-440b-a0c5-4631fbf53ab8","order_by":3,"name":"Yang Liu","email":"","orcid":"","institution":"Nankai University","correspondingAuthor":false,"prefix":"","firstName":"Yang","middleName":"","lastName":"Liu","suffix":""},{"id":636001865,"identity":"f2184f04-e2cc-4dc0-918d-9c573f7faebf","order_by":4,"name":"Jipeng Jiang","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAyUlEQVRIiWNgGAWjYNACAxs5fvbGxgcfiFTP2MBQkGYs2XO42XAG8Vo+HE7ccCO9TZqDGPW67b3HH/MYHE5suPmwQZqBwU5Ot4GAFrMz5xKbeQzSjRtnJzYYFzAkG5sdIKTlRo4hUIu1bLN0YkPyDIYDidsIarn/BqSFmbFN8mDDYR6itNzgAWlxVuyRYGxsJk7LmRzDmXMM0owleBKbGWcYEOOX42cMPrz5YyNnf/z48x8fKuzkCGpBAwakKR8Fo2AUjIJRgAMAAJiwRkAbxuZAAAAAAElFTkSuQmCC","orcid":"","institution":"First Affiliated Hospital of Chinese PLA General Hospital","correspondingAuthor":true,"prefix":"","firstName":"Jipeng","middleName":"","lastName":"Jiang","suffix":""}],"badges":[],"createdAt":"2026-03-30 14:08:58","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-9268097/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-9268097/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":108957510,"identity":"c23076f1-cab0-4de4-951a-b5b31b102011","added_by":"auto","created_at":"2026-05-11 08:19:16","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":85227,"visible":true,"origin":"","legend":"\u003cp\u003eMultifactor Analysis Forest Plot for Model 1.\u003c/p\u003e","description":"","filename":"image1.png","url":"https://assets-eu.researchsquare.com/files/rs-9268097/v1/39485f5dfdae072c8e2da033.png"},{"id":108978035,"identity":"949e604b-24c4-4abb-a174-5175391d562c","added_by":"auto","created_at":"2026-05-11 11:33:48","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":169352,"visible":true,"origin":"","legend":"\u003cp\u003eMultifactor Analysis Forest Plot for Model 2.\u003c/p\u003e","description":"","filename":"image2.png","url":"https://assets-eu.researchsquare.com/files/rs-9268097/v1/0d78f9a7d0b976df0e9b8f28.png"},{"id":108957521,"identity":"32e2a25a-a212-4985-a725-1275db66c267","added_by":"auto","created_at":"2026-05-11 08:19:16","extension":"jpeg","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":131972,"visible":true,"origin":"","legend":"\u003cp\u003eCorrelation analysis heatmap and LASSO regression plot for Model 1 (A) and Model 2 (B).\u003c/p\u003e","description":"","filename":"image3.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-9268097/v1/d856f6856863aaf33f8f2e51.jpeg"},{"id":108957520,"identity":"3fc1dcba-155f-4d7b-a992-252fb110f0d8","added_by":"auto","created_at":"2026-05-11 08:19:16","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":279344,"visible":true,"origin":"","legend":"\u003cp\u003eBar charts of retained features and coefficients following LASSO regression screening for Model 1 (left) and Model 2 (right).\u003c/p\u003e","description":"","filename":"image4.png","url":"https://assets-eu.researchsquare.com/files/rs-9268097/v1/2168be56059028e1bb3af65c.png"},{"id":108957516,"identity":"7b3806a5-0426-4b2c-bad2-1f8718920fca","added_by":"auto","created_at":"2026-05-11 08:19:16","extension":"jpeg","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":49193,"visible":true,"origin":"","legend":"\u003cp\u003eROC curves for the training and validation sets under the SVM machine learning classifier. (A) Model 1 training set,(B) Model 1 validation set, (C) Model 2 training set, (D) Model 2 validation set.\u003c/p\u003e","description":"","filename":"image6.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-9268097/v1/047324a3c9f5aff739a21fe3.jpeg"},{"id":108957522,"identity":"6333bf08-be0b-47f3-b5d3-28f85ea6a015","added_by":"auto","created_at":"2026-05-11 08:19:16","extension":"jpeg","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":47293,"visible":true,"origin":"","legend":"\u003cp\u003eDecision curve analysis (DCA) curves for the training set (A) and validation set (B) under the SVM machine learning classifier. (A) Model 1 training set,(B) Model 1 validation set,(C) Model 2 training set,(D) Model 2 validation set.\u003c/p\u003e","description":"","filename":"image7.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-9268097/v1/2db82caafd934bca33525aac.jpeg"},{"id":108957517,"identity":"f616c990-1f67-4855-ad13-faf117cf5b4c","added_by":"auto","created_at":"2026-05-11 08:19:16","extension":"jpeg","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":43995,"visible":true,"origin":"","legend":"\u003cp\u003eCalibration curves for the training set (A) and validation set (B) under the SVM machine learning classifier. (A) Model 1 training set, (B) Model 1 validation set, (C) Model 2 training set, (D) Model 2 validation set.\u003c/p\u003e","description":"","filename":"image8.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-9268097/v1/5d877cbf16881203b69f7baf.jpeg"},{"id":108957518,"identity":"e51787a4-9113-49a7-a24e-238d23701bcb","added_by":"auto","created_at":"2026-05-11 08:19:16","extension":"jpeg","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":72984,"visible":true,"origin":"","legend":"\u003cp\u003eROC curves (A/D), DCA curves (B/E), and calibration curves (C/F) for the fusion model of the training set (1) and validation set (2).\u003c/p\u003e","description":"","filename":"image9.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-9268097/v1/28b4a97eb00166789759608e.jpeg"},{"id":108957519,"identity":"33a6a23c-f9d2-4957-90ee-40371da2abf2","added_by":"auto","created_at":"2026-05-11 08:19:16","extension":"jpeg","order_by":9,"title":"Figure 9","display":"","copyAsset":false,"role":"figure","size":92589,"visible":true,"origin":"","legend":"\u003cp\u003eFeature weight SHAP plots for Model 1 (A) and Model 2 (B).\u003c/p\u003e","description":"","filename":"image10.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-9268097/v1/73d74a81228490a92e595348.jpeg"},{"id":108979750,"identity":"c323b45b-3a69-4ef1-913c-4f11ab8eccb9","added_by":"auto","created_at":"2026-05-11 12:01:10","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1466025,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-9268097/v1/395e84c1-8835-4f0e-99d9-fb9ba12ba5e7.pdf"},{"id":108977792,"identity":"80946dea-a2e2-4af1-95cc-b8ade21153e4","added_by":"auto","created_at":"2026-05-11 11:32:55","extension":"docx","order_by":0,"title":"","display":"","copyAsset":false,"role":"supplement","size":37534,"visible":true,"origin":"","legend":"","description":"","filename":"Appendixtable.docx","url":"https://assets-eu.researchsquare.com/files/rs-9268097/v1/a84c4883bacecd46a11dcffb.docx"}],"financialInterests":"No competing interests reported.","formattedTitle":"Development of a CT-Based comprehensive model with deep learning for differentiating pathological types of pulmonary ground-glass nodules","fulltext":[{"header":"Introduction","content":"\u003cp\u003eCancer remains a leading cause of death worldwide [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e, \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e]. The Cancer Atlas reveals approximately 23.6\u0026nbsp;million (95% UI: 22.2\u0026ndash;24.9\u0026nbsp;million) new cancer cases and 10\u0026nbsp;million (95% UI: 9.36\u0026ndash;10.6\u0026nbsp;million) cancer-related deaths occurred globally in 2019 [\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e]. Lung cancer represents the most prevalent tumor and the leading cause of cancer-related mortality worldwide, largely attributable to smoking and environmental pollution[\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e]. In 2022, an estimated 4.824\u0026nbsp;million new cases and 3.21\u0026nbsp;million cancer-related deaths occurred in China, compared to 2.37\u0026nbsp;million new cases and approximately 640,000 deaths in the United States. Although the age-standardized incidence rate (ASIR) in the United States (303.6 per 100,000) was significantly higher than that in China (201.61 per 100,000), the age-standardized mortality rate (ASMR) in China (96.47 per 100,000) exceeded that of the United States (81.8 per 100,000), reflecting disparities in cancer profiles, healthcare infrastructure, and early detection strategies between the two countries [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eEarly diagnosis and timely intervention are particularly critical for lung cancers presenting as ground-glass nodules (GGNs), as prognosis is closely related to disease stage at detection [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e, \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e]. Data from the International Early Lung Cancer Action Plan reported that early detection and surgical resection of lung cancer presenting as GGNs or partially solid nodules could achieve near-perfect lung cancer-specific survival [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e, \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eThe widespread adoption of low-dose computed tomography (LDCT) has significantly increased the detection rate of pulmonary nodules [\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e]. The National Lung Screening Trial (NLST) reported a 20% reduction in lung cancer mortality with LDCT screening compared with chest radiography[\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e]. However, distinguishing benign from malignant GGNs remains challenging in clinical practice. Pure ground-glass nodules (pGGNs) exhibit limited specificity on conventional CT imaging, and their morphological features often overlap between benign and malignant entities [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e, \u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e]. pGGNs may indicate benign conditions such as inflammation, hemorrhage, edema, and focal fibrosis, or they may suggest malignant tumors. Studies indicate that 63%\u0026ndash;92.6% of persistent pGGNs represent precancerous lesions or early adenocarcinoma [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e, \u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eDespite advances in imaging technology, pGGNs may still be misdiagnosed due to factors including small size, indistinct margins, inconspicuous appearance, or proximity to vascular structures[\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e]. Consequently, a considerable proportion of patients undergo unnecessary invasive diagnostic procedures or surgical interventions, underscoring the need for accurate, noninvasive methods to stratify malignant risk in pGGNs.\u003c/p\u003e \u003cp\u003eRadiomics (RAD) enables the high-throughput extraction of quantitative features from medical images using advanced mathematical algorithms and machine-learning techniques, providing objective descriptors of lesion heterogeneity beyond visual assessment[\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e]. As radiomics research deepens, it is applied not only to the overall presentation of lesions but also to disease diagnosis, staging, prognosis, and evaluation of treatment efficacy [\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e]. The advantage of radiomics lies in its ability to reveal clinical outcomes and guide clinical decisions through non-invasive means, offering new pathways for disease diagnosis and treatment. Deep learning (DL), with its hierarchical neural network architecture, further enhances feature learning capacity and scalability in large imaging datasets [\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e]. The integration of DL with radiomics has shown promise across multiple medical domains [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e, \u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e]. However, evidence remains limited regarding the application of integrated clinical, radiomics, and DL models for differentiating pathological subtypes of pulmonary pure ground-glass nodules.\u003c/p\u003e \u003cp\u003eAccordingly, this study aimed to develop and validate multimodal machine learning models for the noninvasive characterization of pulmonary pGGNs. First, we constructed an integrated model combining clinical variables, radiomics, and deep learning features to distinguish malignant from benign pGGNs and to estimate malignant risk. Furthermore, we developed a subtype classification model to differentiate pGGNs with malignant potential (including atypical adenomatous hyperplasia and adenocarcinoma in situ) from pGGNs without malignant potential, with the goal of supporting individualized treatment decision-making.\u003c/p\u003e"},{"header":"Materials and Methods","content":"\u003cp\u003ePatient population\u003c/p\u003e\n\u003cp\u003eRetrospective collection of patients with pulmonary pure ground-glass nodules (pGGNs) who underwent surgical treatment at Shandong First Medical University Affiliated Cancer Hospital from August 2023 to August 2025.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eInclusion Criteria\u003c/p\u003e\n\u003col\u003e\n \u003cli\u003eComplete clinical, pathological, and imaging data are available.\u0026nbsp;\u003c/li\u003e\n \u003cli\u003eThe nodule without any solid components.\u0026nbsp;\u003c/li\u003e\n \u003cli\u003eSurgical treatment was performed, with definitive pathological findings obtained postoperatively.\u003c/li\u003e\n\u003c/ol\u003e\n\u003cp\u003eExclusion Criteria\u003c/p\u003e\n\u003col\u003e\n \u003cli\u003epGGNs following radiotherapy or chemotherapy.\u0026nbsp;\u003c/li\u003e\n \u003cli\u003epGGNs after anti-inflammatory treatment.\u0026nbsp;\u003c/li\u003e\n \u003cli\u003eInvasive procedures before surgery, such as nodule biopsy or radiofrequency ablation.\u003c/li\u003e\n \u003cli\u003eHistory of other concurrent malignancies.\u0026nbsp;\u003c/li\u003e\n \u003cli\u003ePoor image quality due to respiratory movement or other artifacts.\u003c/li\u003e\n\u003c/ol\u003e\n\u003cp\u003eThis study was approved by the Ethics Committee of Shandong First Medical University Affiliated Cancer Hospital (Approval No. SDTHEC202509033).\u003c/p\u003e\n\u003cp\u003eClinical features assessment\u003c/p\u003e\n\u003cp\u003eThe clinical variables collected included age, gender, BMI, hypertension, diabetes, smoking history, tumor history, family history, anti-inflammatory therapy, routine laboratory test results and CT scan findings. Laboratory tests encompassed routine blood examinations and common lung cancer tumor markers, such as Pro-Gastrin-Releasing Peptide (ProGRP), Carcinoembryonic Antigen (CEA), Cytokeratin 19 Fragment (CYFRA21-1), Neuron-Specific Enolase (NSE), and Squamous Cell Carcinoma Antigen (SCC).\u003c/p\u003e\n\u003cp\u003eConventional semantic CT features were independently evaluated by two radiologists with more than 5 years of experience, blinded to pathological results. Assessed features included lesion location, margin characteristics (smooth or blurred), maximum diameter, proximity to the pleura (yes/no), anatomical location (central or peripheral), and nodule multiplicity (single or multiple). Additional clinical characteristics were retrospectively extracted from electronic medical records at Shandong First Medical University Cancer Hospital (Appendix I and II).\u003c/p\u003e\n\u003cp\u003ePathological diagnosis\u003c/p\u003e\n\u003cp\u003eAll pathological diagnoses were based on postoperative surgical specimens. To ensure the reliability of the pathology findings, all patient diagnoses underwent retrospective review by a pathologist with o\u0026nbsp;more than 10 years of experience at two hospitals. All pathology results were confirmed to be reliable.\u003c/p\u003e\n\u003cp\u003eImage Acquisition and Preprocessing\u003c/p\u003e\n\u003cp\u003eThe original medical imaging data were acquired using SIEMENS SOMATOM Definition AS and SIEMENS SOMATOM Definition Flash dual-source CT scanners for chest CT examinations and stored in DICOM format. The SimpleITK tool was employed to batch convert all DICOM data into NIfTI format (.nii.gz). This ensured that the converted data fully preserved the spatial position information and grayscale distribution of the images, while facilitating subsequent processing workflows.\u003c/p\u003e\n\u003cp\u003eAll images were converted and normalized using an open-source Python 3.7 package. These preprocessed images were then imported into the 3D Slicer 5.2.2 software to adjust to uniform grayscale parameters.\u003c/p\u003e\n\u003cp\u003eFollowing this preprocessing step, manual segmentation was performed on the 1.5 mm thin-slice images from the preoperative plain CT scans. Normalized image data were loaded using 3D Slicer software (version 5.2.2). Using its \u0026ldquo;Segment Editor\u0026rdquo; module, two radiologists manually delineated regions of interest (ROIs) in a randomized order of cases. After delineation of each slice, all segmented slices were fused to generate a volumetric mask.\u003c/p\u003e\n\u003cp\u003eFeature Extraction and Screening in Radiomics\u003c/p\u003e\n\u003cp\u003ePatients were randomly assigned to the training and validation sets at a ratio of 7:3. Radiomics features were extracted from ROIs using an open-source Python-based radiomics library. Based on ROIs, imaging-based shape features, statistical features, and texture features were obtained. The texture feature set was further subdivided into multiple subcategories, including Gray-Level Co-occurrence Matrix (GLCM) features, Gray-Level Run Length Matrix (GLRLM) features, Gray-Level Size-Z-Matrix (GLSZM) features, Neighborhood Gray-Level Difference Matrix (NGTDM) features, and Gray-Level Dependency Matrix (GLDM) features. These multidimensional features comprehensively reflect the geometric morphology, gray-level distribution, and textural heterogeneity of regions of interest, providing rich, in-depth quantitative data support for subsequent machine learning modeling and clinical decision-making.\u003c/p\u003e\n\u003cp\u003eFeature Stability Assessment and Radiomics Feature Construction\u003c/p\u003e\n\u003cp\u003eThe intraclass correlation coefficient (ICC) was used to assess the consistency of features manually delineated by physicians. Features with an ICC greater than 0.75 were retained. Features in the training set underwent normality tests. Features meeting normality criteria underwent T-tests, while non-normally distributed features underwent rank-sum tests for initial screening, with a significance threshold of P\u0026lt;0.05. To avoid multicollinearity, Pearson and Spearman correlation coefficients were used to remove highly correlated features, with a screening threshold of |r|\u0026gt; 0.8. Subsequent refinement employed Lasso regression. The LassoCV method generated 200 candidate lambda values uniformly distributed between 0 and 0.5, with the optimal lambda value selected via 10-fold cross-validation to minimize the mean squared error (MSE). To visually demonstrate the impact of lambda on model performance and coefficients, the program plots the MSE curve against log(Alpha) (including 95% confidence interval error bars) and a path map of Lasso regression coefficients versus lambda, saving these images for future reference. Finally, features with non-zero coefficients corresponding to the optimal lambda are selected, and their names and coefficients are saved. This end-to-end workflow implements data preprocessing, LASSO regression model training, visualization, and result output, providing a foundational data layer for machine learning model development.\u003c/p\u003e\n\u003cp\u003eModel Development\u003c/p\u003e\n\u003cp\u003eThis study employed six commonly used classifiers for model construction, specifically including Support Vector Machines (SVM), Random Forest, Stochastic Gradient Descent Classifier (SGD), K-Nearest Neighbors (KNN), XGBoost, and LightGBM. To achieve optimal model performance, Bayesian Optimization was utilized for hyperparameter search.\u003c/p\u003e\n\u003cp\u003eModel validation results are visualized using ROC curves, DCA curves, and calibration curves. To further explore the relationship between radiomic features and model predictions, SHAP (Shapley Additive exPlanations) values were calculated for each feature. The global impact of individual features on model decisions was assessed by analyzing both individual prediction SHAP values and overall feature importance. The primary performance evaluation metrics include accuracy, sensitivity, specificity, F1-score, and AUC. The DeLong test was applied to assess the statistical significance of AUC differences across different groupings within the same machine learning model.\u003c/p\u003e\n\u003cp\u003eStatistical Analysis\u003c/p\u003e\n\u003cp\u003eStatistical analysis was performed using SPSS version 23.0 and Python version 3.7. Normally distributed measurement data are presented as mean \u0026plusmn; standard deviation (x̄ \u0026plusmn; s), while non-normally distributed data are expressed as median and interquartile range [M (Q1, Q3)]. Categorical data are reported as counts or percentages.\u003c/p\u003e\n\u003cp\u003eA comparative analysis was conducted between the training and validation sets, employing Student\u0026apos;s t-tests for semantically meaningful quantitative variables and \u0026chi;\u0026sup2; or Fisher\u0026apos;s exact tests for categorical variables. All statistical tests were two-tailed with a significance threshold of \u003cem\u003ep\u003c/em\u003e\u0026lt; 0.05.\u003c/p\u003e\n\u003cp\u003eModel performance was assessed using receiver operating characteristic (ROC) curves, incorporating sensitivity, specificity, accuracy (ACC), negative predictive value (NPV), positive predictive value (PPV), and area under the ROC curve (AUC) as evaluation metrics. Calibration curves were used to measure the consistency of observed outcomes. Decision curve analysis (DCA) was employed to determine the clinical utility of the models.\u003c/p\u003e"},{"header":"Results","content":"\u003cp\u003e\u003cstrong\u003eBaseline characteristics\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eA total of 1,067 patients meeting the inclusion criteria were enrolled in this study, including 333 men and 734 women, with a mean age of 55.30 \u0026plusmn; 11.34 years (range: 16\u0026ndash;81 years). The dataset was divided into training and validation cohorts at an approximate 7:3 ratio. For the benign vs malignant discrimination model (Model 1, defining the malignant group as positive), the training cohort included 284 malignant and 462 benign cases. In comparison, the validation cohort contained 124 malignant and 197 benign cases. For the model distinguishing benign tumors from atypical and in situ carcinomas (Model 2, defining atypical and in situ carcinoma as positive), the training cohort comprised 370 positive cases and 91 negative cases, while the validation set contained 163 positive cases and 35 negative cases.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eBaseline characteristics were comparable between the training and validation cohorts for both models, with no statistically significant differences observed in clinical variables (all p \u0026gt; 0.05), as summarized in \u003cstrong\u003eTable 1\u003c/strong\u003e.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eTable 1. Clinical Baseline Comparison of Training and Validation Sets for Model 1 and Model 2\u003c/p\u003e\n\u003cdiv align=\"\"\u003e\n \u003ctable style=\"width: 5.0e+2pt;border: none;\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd colspan=\"3\"\u003e\n \u003cp\u003eModel 1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd colspan=\"3\"\u003e\n \u003cp\u003eModel 2\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eVariable\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eTraining\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eValidation\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cem\u003ep\u003c/em\u003e value\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eTraining\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eValidation\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cem\u003ep\u003c/em\u003e value\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eAge\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e55.00 (49.06, 63.00)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e57.00 (48.00, 63.00)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.725(M-WU)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e53.00 (47.00, 61.00)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e54.00(49.00, 61.00)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.651(M-WU)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eBMI\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e23.96 (22.57, 26.67)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e24.31 (22.22, 26.45)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.424(M-WU)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e23.96 (22.44, 26.23)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e24.03(21.94, 26.58)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.883(M-WU)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eSize(mm)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e11.79 (9.00, 16.00)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e12.00 (8.00, 16.00)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.697(M-WU)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e12.00 (9.00, 16.00)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e11.79 (9.00, 16.00)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.582(M-WU)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eGender\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.179 (\u0026chi;\u0026sup2;)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.999 (\u0026chi;\u0026sup2;)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd colspan=\"2\"\u003e\n \u003cp\u003eFemale\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e523 (70.1%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e211 (65.7%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e323 (70.1%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e138(69.7%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eMale\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e223 (29.9%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e110 (34.3%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e138 (29.9%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e60 (30.3%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eSmoking history\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.933 (\u0026chi;\u0026sup2;)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e1.000 (\u0026chi;\u0026sup2;)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eNo\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e663 (88.9%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e284 (88.5%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e413 (89.6%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e178(89.9%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eYes\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e83 (11.1%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e37 (11.5%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e48 (10.4%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e20 (10.1%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eTumor history\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.867 (\u0026chi;\u0026sup2;)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.141 (\u0026chi;\u0026sup2;)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eNo\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e658 (88.2%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e285 (88.8%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e404 (87.6%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e182(91.9%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eYes\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e88 (11.8%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e36 (11.2%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e57 (12.4%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e16 (8.1%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eFamily history\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.660 (\u0026chi;\u0026sup2;)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.462 (\u0026chi;\u0026sup2;)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eNo\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e712 (95.4%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e309 (96.3%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e447 (97.0%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e189(95.5%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eYes\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e34 (4.6%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e12 (3.7%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e14 (3.0%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e9 (4.5%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eLocation\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.012 (\u0026chi;\u0026sup2;)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.828 (\u0026chi;\u0026sup2;)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eRUL\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e240 (32.2%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e102 (31.8%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e157 (34.1%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e70 (35.4%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eLUL\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e185 (24.8%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e92 (28.7%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e109 (23.6%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e53 (26.8%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eLLL\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e172 (23.1%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e51 (15.9%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e99 (21.5%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e36 (18.2%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eRLL\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e117 (15.7%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e50 (15.6%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e72 (15.6%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e30 (15.2%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eRML\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e32 (4.3%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e26 (8.1%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e24 (5.2%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e9 (4.5%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eGroup\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.917 (\u0026chi;\u0026sup2;)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.611 (\u0026chi;\u0026sup2;)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eControl\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e462 (61.9%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e197 (61.4%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e91 (19.7%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e35 (17.7%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eExperimental\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e284 (38.1%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e124 (38.6%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e370 (80.3%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e163(82.3%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n\u003c/div\u003e\n\u003cp\u003eRUL: Right upper lobe; LUL: Left upper lobe; LLL: Left lower lobe; RLL: Right lower lobe; RML: Right middle lobe; \u003cem\u003ep\u003c/em\u003e value for comparing the internal cohort with the external cohort; categorical variables were analyzed by Pearson \u0026chi;\u003csup\u003e2\u0026nbsp;\u003c/sup\u003etest and Fisher exact test, continuous variables were compared by Student t-test and Mann-Whitney U test.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eRadiomics and clinical signature\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eIn Model 1, multivariable analysis identified age, CEA levels, presence of multiple nodules, and amylase-related parameters as independent predictors for distinguishing malignant from benign pulmonary pure ground-glass nodules (all p \u0026lt; 0.05).\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eIn Model 2, sex, ProGRP, the aspartate aminotransferase to alanine aminotransferase (AST/ALT) ratio (De Ritis ratio), CKMB, and globulin were statistically significant (\u003cem\u003ep\u0026nbsp;\u003c/em\u003e\u0026lt; 0.05). The multivariate analysis results for Models 1 and 2 are presented in \u003cstrong\u003eTables 2 and 3\u003c/strong\u003e, along with \u003cstrong\u003eFigures 1 and 2\u003c/strong\u003e. Univariate correlation tables are provided in \u003cstrong\u003eAppendix I\u0026nbsp;\u003c/strong\u003e\u003cstrong\u003eand\u003c/strong\u003e\u003cstrong\u003e\u0026nbsp;II\u003c/strong\u003e.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eTable 2. Multivariate Analysis Results for Model 1\u003c/p\u003e\n\u003ctable style=\"width: 3.8e+2pt;border: none;\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003eVariable\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003eOR\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003eCI95 lower\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003eCI95 upper\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003e\u003cem\u003ep\u003c/em\u003e value\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003eAge\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003e1.033\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003e1.016\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003e1.049\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003e0.000\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003eCEA\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003e1.254\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003e1.101\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003e1.427\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003e0.001\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003eMultiple Lesions\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003e0.669\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003e0.491\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003e0.913\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003e0.011\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003eAmylase\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003e0.992\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003e0.985\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003e1.000\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003e0.049\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003eCytokeratin\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003e1.025\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003e0.860\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003e1.222\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003e0.780\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003eCEA: Carcinoembryonic Antigen\u003c/p\u003e\n\u003cp\u003eTable 3. Multivariate Analysis Results for Model 2\u003c/p\u003e\n\u003ctable style=\"width: 3.7e+2pt;\"\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003eVariable\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003eOR\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003eCI95 lower\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003eCI95 upper\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003e\u003cem\u003ep\u0026nbsp;\u003c/em\u003evalue\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003eProGRP\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003e0.969\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003e0.951\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003e0.987\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003e0.001\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003eGender\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003e0.455\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003e0.253\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003e0.819\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003e0.009\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003eGlobulin\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003e0.930\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003e0.875\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003e0.989\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003e0.020\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003eAST/ALT\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003e0.503\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003e0.272\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003e0.929\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003e0.028\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003eCreatine Kinase MB\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003e0.843\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003e0.717\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003e0.992\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003e0.040\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003eAlbumin\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003e1.062\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003e0.992\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003e1.135\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003e0.082\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003eEosinophil Count\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003e0.353\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003e0.036\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003e3.449\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003e0.370\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003eSmoking history\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003e0.972\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003e0.426\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003e2.219\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd nowrap=\"\"\u003e\n \u003cp\u003e0.946\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003eProGRP: Pro Gastrin Releasing Peptide, AST/ALT: Aspartate aminotransferase to alanine aminotransferase\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFeature Selection Results\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eIn the training cohorts for Models 1 and 2, features with an intraclass correlation coefficient (ICC) \u0026gt; 0.75 were selected for future analysis. Subsequently, normality tests were conducted on these features, followed by preliminary significance analysis using t-tests or rank-sum tests, as appropriate. Highly correlated features (|r| \u0026gt; 0.8) were then eliminated using Pearson\u0026apos;s or Spearman\u0026apos;s correlation coefficients. Finally, LASSO regression was applied, resulting in the selection on 38 features for Model 1 and 12 features for Model 2.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThe feature correlation heatmap and LASSO regression workflow are presented in \u003cstrong\u003eFigure 3\u003c/strong\u003e. The retained features along with their respective weights in the LASSO regression are shown in \u003cstrong\u003eFigure 4\u003c/strong\u003e.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eModel Performance\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eSix machine learning classifiers were employed to establish and validate two radiomics models. Model performance was primarily assessed using the validation set area under the AUC. Among the classifiers, the SVM machine learning classifier demonstrated the best performance, with Model 1 achieving an AUC of 0.840. Detailed results for each classifier in Model 1 are presented in \u003cstrong\u003eTable 4\u003c/strong\u003e. Model 2 attained an AUC of 0.831, with specific classifier outcomes shown in \u003cstrong\u003eTable 5\u003c/strong\u003e. The ROC curves, DCA curves, and calibration curves for the optimal classifier (SVM) in both Model 1 and Model 2 are depicted in \u003cstrong\u003eFigures 5, 6, and 7,\u003c/strong\u003e respectively.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eTable 4. Model results data table for the six machine learning classifiers in Model 1.\u003c/p\u003e\n\u003ctable\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eSVM\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eRF\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eSGD\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eKNN\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eXGBoost\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eLightGBM\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eTraining set accuracy\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.788\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.787\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.786\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.745\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.776\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.791\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eValidation set accuracy\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.776\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.782\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.748\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.751\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.785\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.779\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eTraining Set Recall Index\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.613\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.651\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.651\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.563\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.669\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.669\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eVerification Set Recall Index\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.581\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.621\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.500\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.589\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.637\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.629\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eTraining set AUC\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.844\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.829\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.828\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.814\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.798\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.818\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eValidation set AUC\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.840\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.836\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.816\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.792\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.821\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.827\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eTraining set sensitivity\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.613\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.651\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.651\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.563\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.669\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.669\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eValidation set sensitivity\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.896\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.870\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.868\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.857\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.842\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.866\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eTraining set specificity\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.581\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.621\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.500\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.589\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.637\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.629\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eVerification set specificity\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.898\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.883\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.904\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.853\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.878\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.873\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eTraining set F1 score\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.688\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.699\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.698\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.627\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.695\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.709\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eValidation set F1 score\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.667\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.687\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.605\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.646\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.696\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.687\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003e\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eTable 5. Model results data table for the six machine learning classifiers in Model 2.\u003c/p\u003e\n\u003ctable\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eSVM\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eRF\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eSGD\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eKNN\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eXGBoost\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eLightGBM\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eTraining set accuracy\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.850\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.857\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.848\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.868\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.850\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.846\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eValidation set accuracy\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.864\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.818\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.828\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.869\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.793\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.763\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eTraining Set Recall Index\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.959\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.959\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.984\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.981\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.943\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.943\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eVerification Set Recall Index\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.951\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.883\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.945\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.975\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.834\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.804\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eTraining set AUC\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.844\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.848\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.831\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.792\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.834\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.832\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eValidation set AUC\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.831\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.782\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.762\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.787\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.793\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.778\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eTraining set sensitivity\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.959\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.959\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.984\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.981\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.943\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.943\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eValidation set sensitivity\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.407\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.440\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.297\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.407\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.473\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.451\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eTraining set specificity\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.951\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.883\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.945\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.975\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.834\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.804\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eVerification set specificity\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.457\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.514\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.286\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.371\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.600\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.571\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eTraining set F1 score\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.911\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.915\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.912\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.922\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.910\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.908\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eValidation set F1 score\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.920\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.889\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.901\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.924\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.869\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.848\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003e\u003cstrong\u003eFusion Model\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eFeature-level pre-fusion was performed by combining clinical risk factors and radiomic features selected via LASSO regression. An SVM classifier was then employed to establish the fusion models, exploring whether adding clinical features improved diagnostic performance.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eFor Model 1, the fusion model achieved a validation set AUC of 0.871, while Model 2 achieved a value of 0.853. DeLong\u0026apos;s test revealed statistically significant differences between models (\u003cem\u003ep\u003c/em\u003e \u0026lt; 0.05). The ROC curves, DCA curves, and calibration curves for the models are presented in \u003cstrong\u003eFigure 8\u003c/strong\u003e. The SHAP weight plots for both models after integrating clinical features are shown in \u003cstrong\u003eFigure 9.\u003c/strong\u003e\u003c/p\u003e"},{"header":"Conclusion","content":"\u003cp\u003eIn this study, we developed two machine learning models, one for distinguish benign from malignant pulmonary nodules and another for differentiate benign pulmonary nodules from those with atypical/in situ carcinoma characteristics. Both models, which integrate clinical risk factors with CT radiomic features, demonstrated strong discriminatory power and offer potential for enhancing non-invasive diagnostic accuracy in clinical practice.\u003c/p\u003e"},{"header":"Discussion","content":"\u003cp\u003eThe 2021 WHO Histologic Classification of Lung Tumors (5th ed.) introduced important revisions of lung tumors, including the reclassification of adenocarcinoma in situ (AIS) and atypical adenomatoid hyperplasia (AAH) as \u0026ldquo;glandular precursor lesions,\u0026rdquo; rather than traditional malignancies[\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e]. These lesions, while generally slow-growing, rarely metastasize, have an excellent prognosis after surgical resection, retain malignant potential that cannot be ignored [\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e, \u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e]. To this end, we further developed a machine learning model (Model 2) to differentiate benign pulmonary nodules from glandular precursor lesions, demonstrating excellent discriminatory performance.\u003c/p\u003e \u003cp\u003epGGNs lack specific imaging features [\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e], as both benign and malignant lesions can present as ground-glass opacities on CT imaging[\u003cspan additionalcitationids=\"CR20\" citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e]. Clinicians typically rely on qualitative CT features, such as lesion size, margin characteristics, and solid component ratio, to assess the nature of GGNs. However these methods have limitations in predictive accuracy. For example, patients with pGGN do not exhibit specific clinical manifestations; nearly all ground-glass nodules are detected via CT rather than due to clinical symptoms [\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e]. Future more, for pGGN smaller than 10mm, the accuracy rate is less than 50%. This is primarily due to poor puncture accuracy, low nodule density, and a small number of tumor cells, resulting in insufficient tissue acquisition to support a pathological diagnosis [\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e]. Similarly, pGGNs do not exhibit any serological specificity [\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e]. Blood tumor markers, such as CEA and CYFRA21-1, although having some indicative value, exhibit low sensitivity and specificity in the early stages of lung cancer [\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e]. Thus, misdiagnosis remains a significant issue in clinical practice. And accurate diagnosis is essential for optimizing prognosis, treatment selection, and avoiding unnecessary invasive procedures.\u003c/p\u003e \u003cp\u003eOur study analyzed clinical characteristics (age, BMI, Size, gender, smoking history, tumor history, family history, and location) and test results (blood cell analysis, blood biochemistry, and blood tumor markers) from cases with different pathological type to develop two machine learning models. Model 1 successfully to distinguished benign from malignant pGGNs, while Model 2 further to identified benign nodules with malignant potential. The models demonstrated strong discriminatory power, suggesting that radiomics and deep learning can enhance clinical decision-making by improving diagnostic accuracy.\u003c/p\u003e \u003cdiv id=\"Sec19\" class=\"Section2\"\u003e \u003ch2\u003eAdvantages\u003c/h2\u003e \u003cp\u003eThis study offers several advantages over previous radiomics research. First, unlike prior radiomics studies typically employed a single machine learning method, we used six distinct classifiers (SVM, Random Forest, SGD, KNN, XGBoost, and LightGBM) to construct models. Results revealed significant variations in diagnostic performance among different algorithms within the validation set. Relying on a single algorithm may compromise model accuracy due to the inherent limitations of the chosen method. Therefore, integrating multiple algorithms enhances the predictive accuracy and reliability of the model.\u003c/p\u003e \u003cp\u003eSecond, the widespread use of chest CT in routine medical practice has led to an increased detection rate of pGGNs. These cases present diagnostic challenges. Our models help address this challenge by incorporating clinical and radiomic data, reducing the risk of unnecessary invasive procedures while ensuring accurate diagnoses. Furthermore, we also considered the special group of glandular precursor lesions with malignant potential. Different pathological types necessitate individualized treatment planning, thereby avoiding both over-testing and missed diagnoses. Given the scarcity of studies focusing on the differential diagnosis of specific pathological types, the innovation of this research is evident.\u003c/p\u003e \u003cp\u003eThird, our comprehensive analysis combined clinical and imaging characteristics with radiomic features. By conducting univariate and multivariate logistic regression analyses, we identified several significant features that enhanced diagnostic performance, offering valuable support for clinical interpretation.\u003c/p\u003e \u003cp\u003eFinally, integrating deep learning tools into our model further improved its performance. This technological enhancement allows for more sophisticated and objective analysis, contributing to higher model accuracy and reliability.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec20\" class=\"Section2\"\u003e \u003ch2\u003eLimitations\u003c/h2\u003e \u003cp\u003eDespite its strengths, this study has several limitations. The retrospective nature may introduce bias in the collection of clinical characteristics, and prospective studies would allow for more comprehensive model validation. Additionally, the single-center design of this study may limit its generalizability. Finally, the sample size of certain pathological subtypes was limited, which may have affected the robustness of the model for specific subtypes. To address these limitations, we plan to extend the study duration and further promote multicenter collaboration, while conducting more detailed and in-depth analyses of pathological subtypes.\u003c/p\u003e "},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eConflicting interests:\u003c/strong\u003e The author(s) declared no potential conflicts of interest concerning the research, authorship, and/or publication of this article. The author(s) did not use artificial intelligence (AI) or AI-assisted technology in the writing process.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding:\u003c/strong\u003e The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Natural Science Foundation of China (22027810,82303293), Youth Foundation of PLA General Hospital (22QNFC078), the Natural Science Foundation of Shandong Province (ZR2023QH378),and the Taishan Scholars Program (tsqn202312364).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eEthical approval: \u003c/strong\u003eThis study was approved by the Ethics Committee of Shandong First Medical University Affiliated Cancer Hospital (Approval No.SDTHEC202509033).\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eConsent statement:\u003c/strong\u003e This study is a retrospective cohort study. Our ethics committee waived informed consent due to the retrospective nature of our research, which is unlikely to have adverse effects on the health and rights of patients.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eContributorship:\u003c/strong\u003e ZJ: Formal analysis, Visualization, Writing - original draft; LBH: Investigation, Methodology, Software; LJ: Conceptualization, Data curation, Funding acquisition; LY: Resources, Funding acquisition, Supervision; JJP: Funding acquisition, Project administration, Supervision, Validation, Writing - review \u0026amp; editing. All authors participated in the revision and finalization of the manuscript and have read and agreed to the published version.\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eResearch registration unique identifying number (UIN)\u003c/strong\u003e\u003cstrong\u003e: \u003c/strong\u003eName of the registry: not applicable.Unique identifying number or registration ID: not applicable.Hyperlink to your specific registration (must be publicly accessible and will be checked): not applicable.\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eGuarantor\u003c/strong\u003e\u003cstrong\u003e:\u003c/strong\u003eJian Zhang, Yang Liu, and Jipeng Liu.\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eAssociated Data\u003c/strong\u003e\u003cstrong\u003e:\u003c/strong\u003eThis section collects any data citations, data availability statements, or supplementary materials included in this article.\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eData Availability Statement\u003c/strong\u003e\u003cstrong\u003e:\u003c/strong\u003eThe datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eAcknowledgment: \u003c/strong\u003eWe are grateful to those who have assisted with this Research Project.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eF. Bray, J. Ferlay, I. Soerjomataram, R.L. Siegel, L.A. Torre, A. Jemal, Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries, CA: a cancer journal for clinicians 68(6) (2018) 394-424.\u003c/li\u003e\n\u003cli\u003eA. Jemal, K.D. Miller, J. Ma, R.L. Siegel, S.A. Fedewa, F. Islami, S.S. Devesa, M.J. Thun, Higher Lung Cancer Incidence in Young Women Than Young Men in the United States, The New England journal of medicine 378(21) (2018) 1999-2009.\u003c/li\u003e\n\u003cli\u003eJ.M. Kocarnik, K. Compton, F.E. Dean, et al. Force, Cancer Incidence, Mortality, Years of Life Lost, Years Lived With Disability, and Disability-Adjusted Life Years for 29 Cancer Groups From 2010 to 2019: A Systematic Analysis for the Global Burden of Disease Study 2019, JAMA Oncology 8(3) (2022) 420-444.\u003c/li\u003e\n\u003cli\u003eC. Xia, X. Dong, H. Li, M. Cao, D. Sun, S. He, F. Yang, X. Yan, S. Zhang, N. Li, W. Chen, Cancer statistics in China and the United States, 2022: profiles, trends, and determinants, Chinese medical journal 135(5) (2022) 584-590.\u003c/li\u003e\n\u003cli\u003eR.L. Siegel, A.N. Giaquinto, A. Jemal, Cancer statistics, 2024, CA: a cancer journal for clinicians 74(1) (2024) 12-49.\u003c/li\u003e\n\u003cli\u003eC.I. Henschke, R. Yip, J.P. Smith, A.S. Wolf, R.M. Flores, M. Liang, M.M. Salvatore, Y. Liu, D.M. Xu, D.F. Yankelevitz, CT Screening for Lung Cancer: Part-Solid Nodules in Baseline and Annual Repeat Rounds, AJR. American journal of roentgenology 207(6) (2016) 1176-1184.\u003c/li\u003e\n\u003cli\u003eD.F. Yankelevitz, R. Yip, J.P. Smith, M. Liang, Y. Liu, D.M. Xu, M.M. Salvatore, A.S. Wolf, R.M. Flores, C.I. Henschke, CT Screening for Lung Cancer: Nonsolid Nodules in Baseline and Annual Repeat Rounds, Radiology 277(2) (2015) 555-64.\u003c/li\u003e\n\u003cli\u003eD.R. Aberle, A.M. Adams, C.D. Berg, W.C. Black, J.D. Clapp, R.M. Fagerstrom, I.F. Gareen, C. Gatsonis, P.M. Marcus, J.D. Sicks, Reduced lung-cancer mortality with low-dose computed tomographic screening, The New England journal of medicine 365(5) (2011) 395-409.\u003c/li\u003e\n\u003cli\u003eM. Migliore, M. Fornito, M. Palazzolo, A. Criscione, M. Gangemi, F. Borrata, P. Vigneri, M. Nardini, J. Dunning, Ground glass opacities management in the lung cancer screening era, Annals of translational medicine 6(5) (2018) 90.\u003c/li\u003e\n\u003cli\u003eT. Ye, L. Deng, J. Xiang, Y. Zhang, H. Hu, Y. Sun, Y. Li, L. Shen, S. Wang, L. Xie, H. Chen, Predictors of Pathologic Tumor Invasion and Prognosis for Ground Glass Opacity Featured Lung Adenocarcinoma, The Annals of thoracic surgery 106(6) (2018) 1682-1690.\u003c/li\u003e\n\u003cli\u003eA. Del Ciello, P. Franchi, A. Contegiacomo, G. Cicchetti, L. Bonomo, A.R. Larici, Missed lung cancer: when, where, and why?, Diagnostic and interventional radiology (Ankara, Turkey) 23(2) (2017) 118-126.\u003c/li\u003e\n\u003cli\u003eP. Lambin, R.T.H. Leijenaar, T.M. Deist, J. Peerlings, E.E.C. de Jong, J. van Timmeren, S. Sanduleanu, R. Larue, A.J.G. Even, A. Jochems, Y. van Wijk, H. Woodruff, J. van Soest, T. Lustberg, E. Roelofs, W. van Elmpt, A. Dekker, F.M. Mottaghy, J.E. Wildberger, S. Walsh, Radiomics: the bridge between medical imaging and personalized medicine, Nature Reviews. Clinical oncology 14(12) (2017) 749-762.\u003c/li\u003e\n\u003cli\u003eR.Y. Choi, A.S. Coyner, J. Kalpathy-Cramer, M.F. Chiang, J.P. Campbell, Introduction to Machine Learning, Neural Networks, and Deep Learning, Translational vision science \u0026amp; technology 9(2) (2020) 14.\u003c/li\u003e\n\u003cli\u003eX. Xie, L. Yang, F. Zhao, D. Wang, H. Zhang, X. He, X. Cao, H. Yi, X. He, Y. Hou, A deep learning model combining multimodal radiomics, clinical and imaging features for differentiating ocular adnexal lymphoma from idiopathic orbital inflammation, European radiology 32(10) (2022) 6922-6932.\u003c/li\u003e\n\u003cli\u003eX. Liang, K. Tang, X. Ke, J. Jiang, S. Li, C. Xue, J. Deng, X. Liu, C. Yan, M. Gao, J. Zhou, L. Zhao, Development of an MRI-Based Comprehensive Model Fusing Clinical, Radiomics and Deep Learning Models for Preoperative Histological Stratification in Intracranial Solitary Fibrous Tumor, Journal of magnetic resonance imaging: JMRI 60(2) (2024) 523-533.\u003c/li\u003e\n\u003cli\u003eWorld Health Organization. \u003cem\u003eWHO Classification of Thoracic Tumours\u003c/em\u003e. 5th ed. Lyon, France: International Agency for Research on Press, 2021.\u003c/li\u003e\n\u003cli\u003eW.H. Westra, Early glandular neoplasia of the lung, Respir Res 1(3) (2000) 163-169.\u003c/li\u003e\n\u003cli\u003eY. He, X. Liu, H. Wang, L. Wu, M. Jiang, H. Guo, J. Zhu, S. Wu, H. Sun, S. Chen, Y. Zhu, C. Zhou, Y. Yang, Mechanisms of Progression and Heterogeneity in Multiple Nodules of Lung Adenocarcinoma, Small Methods 5(6) (2021) e2100082.\u003c/li\u003e\n\u003cli\u003eH. Li, Z. Sun, R. Xiao, Q. Qi, X. Li, H. Huang, X. Wang, J. Zhou, Z. Wang, K. Liu, P. Yin, F. Yang, J. Wang, Stepwise evolutionary genomics of early-stage lung adenocarcinoma manifesting as pure, heterogeneous and part-solid ground-glass nodules, Br J Cancer 127(4) (2022) 747-756.\u003c/li\u003e\n\u003cli\u003eH.-H. Yang, Y.-L. Lv, X.-H. Fan, Z.-Y. Ai, X.-C. Xu, B. Ye, D.-Z. Hu, Factors distinguishing invasive from pre-invasive adenocarcinoma presenting as pure ground glass pulmonary nodules, Radiat Oncol 15(1) (2020) 186.\u003c/li\u003e\n\u003cli\u003eZ.-R. Liang, F.-J. Lv, B.-J. Fu, R.-Y. Lin, W.-J. Li, Z.-G. Chu, Reticulation Sign on Thin-Section CT: Utility for Predicting Invasiveness of Pure Ground-Glass Nodules, AJR. American Journal of Roentgenology, 221(1) (2023): 69-78.\u003c/li\u003e\n\u003cli\u003eJ. Zhang, J. Sha, W. Liu, Y. Zhou, H. Liu, Z. Zuo, Quantification of Intratumoral Heterogeneity: Distinguishing Histological Subtypes in Clinical T1 Stage Lung Adenocarcinoma Presenting as Pure Ground-Glass Nodules on Computed Tomography, Academic Radiology 31(10) (2024) 4244-4255.\u003c/li\u003e\n\u003cli\u003eW.-C. Hsu, P.-C. Huang, K.-T. Pan, W.-Y. Chuang, C.-Y. Wu, H.-F. Wong, C.-T. Yang, Y.-L. Wan, Predictors of Invasive Adenocarcinomas among Pure Ground-Glass Nodules Less Than 2 cm in Diameter, Cancers (Basel) 13(16) (2021).\u003c/li\u003e\n\u003cli\u003eM.-C. Chen, H.-S. Yang, Z. Dong, L.-J. Li, X.-M. Li, H.-H. Luo, Q. Li, Y. Zhu, Immunogenomic features of radiologically distinctive nodules in multiple primary lung cancer, Cancer Immunol Immunother 73(11) (2024) 217.\u003c/li\u003e\n\u003cli\u003eF. Hu, H. Huang, Y. Jiang, M. Feng, H. Wang, M. Tang, Y. Zhou, X. Tan, Y. Liu, C. Xu, N. Ding, C. Bai, J. Hu, D. Yang, Y. Zhang, Discriminating invasive adenocarcinoma among lung pure ground-glass nodules: a multi-parameter prediction model, J Thorac Dis 13(9) (2021) 5383-5394.\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"bmc-cancer","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"bcan","sideBox":"Learn more about [BMC Cancer](http://bmccancer.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/bcan/default.aspx","title":"BMC Cancer","twitterHandle":"BMC_series","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"machine learning, radiomics, deep learning, ground-glass opacities, pulmonary pathology","lastPublishedDoi":"10.21203/rs.3.rs-9268097/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-9268097/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003e\u003cstrong\u003eBackground\u003c/strong\u003e:\u003c/p\u003e\n\u003cp\u003eThe lack of reliable clinical features for differentiating benign from malignant pulmonary pure ground-glass nodules (pGGNS) leads to potential misdiagnosis and unnecessary invasive examinations. Although radiomics and deep learning approaches have shown potential in nodule characterization, the diagnostic performance of integrated models combining clinical features, radiomics, and deep learning remains insufficiently defined.\u0026nbsp;This study aimed to develop and validate an integrated model to distinguish benign from malignant pGGNs and to further differentiate pathological subtypes.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eMaterials and Methods:\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis retrospective study included 1,067 patients with pulmonary pGGNs from Shandong First Medical University Cancer Hospital. Clinical and imaging data were collected, and radiomics features and deep learning (DL) derived features were extracted using Python (version 3.7). Patients were randomly divided into training and validation cohorts. Multiple machine-learning classifiers were constructed, and diagnostic performance was assessed using receiver operating characteristic (ROC) curve analysis.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eResult\u003c/strong\u003e:\u003c/p\u003e\n\u003cp\u003eFor distinguishing benign from malignant pGGNS (Model 1), clinical features such as age, nodule multiplicity, CEA levels, and amylase were identified as clinically relevant features. Thirty-eight valuable features were selected for model development. Among individual classifiers, the Support Vector Machine (SVM) achieved the highest performance with a validation receiver operating characteristic curve (AUC) of 0.840, followed by random forest (0.829), stochastic gradient descent (0.828), k-nearest neighbors (0.814), XGBoost (0.798), and LightGBM (0.818). The integrated model combining clinical features, radiomics, and deep learning achieved a validation set AUC of 0.871.\u003c/p\u003e\n\u003cp\u003eFor pathological subtype classification of pGGNs (Model II), clinical features such as gender, Pro-Gastrin-Releasing-Peptide (ProGRP), AST/ALT ratio (De Ritis ratio), creatine kinase-MB (CKMB), and globulin were identified as informative clinical variables. Twelve valuable features were selected The SVM classifier again showed the best individual performance (validation AUC = 0.831), while the integrated model achieved a superior AUC of 0.853.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConclusion\u003c/strong\u003e:\u003c/p\u003e\n\u003cp\u003eAn integrated model incorporating clinical characteristics, radiomics, and deep learning demonstrates robust performance in distinguishing benign from malignant pulmonary pGGNs and in identifying pathological subtypes, suggesting potential clinical utility for noninvasive decision support.\u003c/p\u003e","manuscriptTitle":"Development of a CT-Based comprehensive model with deep learning for differentiating pathological types of pulmonary ground-glass nodules","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-05-11 08:18:59","doi":"10.21203/rs.3.rs-9268097/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"reviewerAgreed","content":"319137579491206691695808744597377493984","date":"2026-05-11T19:22:57+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2026-04-29T15:09:29+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2026-04-26T19:39:05+00:00","index":"","fulltext":""},{"type":"editorInvited","content":"","date":"2026-04-06T19:38:22+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2026-04-04T14:03:55+00:00","index":"","fulltext":""},{"type":"submitted","content":"BMC Cancer","date":"2026-04-04T13:58:40+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"bmc-cancer","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"bcan","sideBox":"Learn more about [BMC Cancer](http://bmccancer.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/bcan/default.aspx","title":"BMC Cancer","twitterHandle":"BMC_series","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"6590fbf9-3403-4586-b111-1dc8170e399f","owner":[],"postedDate":"May 11th, 2026","published":true,"recentEditorialEvents":[{"type":"reviewerAgreed","content":"319137579491206691695808744597377493984","date":"2026-05-11T19:22:57+00:00","index":67,"fulltext":""},{"type":"reviewersInvited","content":"30","date":"2026-04-29T15:09:29+00:00","index":"","fulltext":""}],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[],"tags":[],"updatedAt":"2026-05-11T08:18:59+00:00","versionOfRecord":[],"versionCreatedAt":"2026-05-11 08:18:59","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-9268097","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-9268097","identity":"rs-9268097","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.