Accurate Prediction of Disease-Free and Overall Survival in Non-Small Cell Lung Cancer Using Patient-Level Multimodal Weakly Supervised Learning

doi:10.21203/rs.3.rs-5353171/v1

Accurate Prediction of Disease-Free and Overall Survival in Non-Small Cell Lung Cancer Using Patient-Level Multimodal Weakly Supervised Learning

2024 · doi:10.21203/rs.3.rs-5353171/v1

preprint OA: closed CC-BY-4.0

📄 Open PDF Full text JSON View at publisher

Full text 111,424 characters · extracted from preprint-html · click to expand

Accurate Prediction of Disease-Free and Overall Survival in Non-Small Cell Lung Cancer Using Patient-Level Multimodal Weakly Supervised Learning | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Accurate Prediction of Disease-Free and Overall Survival in Non-Small Cell Lung Cancer Using Patient-Level Multimodal Weakly Supervised Learning Nanying Che, Yongmeng Li, Xiaodong Chai, Moxuan Yang, Jiahang Xiong, and 6 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-5353171/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 19 Jun, 2025 Read the published version in npj Precision Oncology → Version 1 posted 16 You are reading this latest preprint version Abstract With the rapid progress in artificial intelligence (AI) and digital pathology, prognosis prediction for non-small cell lung cancer (NSCLC) patients has become a critical component of personalized medicine. In this study, we developed a multimodal AI model that integrates whole-slide images and dense clinical data to predict disease-free survival (DFS) and overall survival (OS) with high accuracy for NSCLC patients undergoing surgery. Utilizing data from 618 patients at Beijing Chest Hospital, the model achieved outstanding performance, with areas under the curve of 0.8084 for predicting progression and 0.8021 for predicting death in the test set. Importantly, the model demonstrated accurate prediction of 5-year DFS and OS, achieving accuracies of 0.7680 for DFS and 0.7760 for OS. By categorizing patients into high-risk and low-risk groups, the model identified significant differences in survival outcomes, with hazard ratios of 4.85 for progression and 4.57 for death, both with p-values below 0.0001. Additionally, it uncovered novel digital biomarkers associated with poor prognosis, offering further insights into NSCLC treatment. This model has the potential to revolutionize postoperative decision-making by providing clinicians with a precise tool for predicting DFS and OS, thereby improving patient outcomes. Biological sciences/Cancer/Lung cancer/Non small cell lung cancer Health sciences/Biomarkers/Prognostic markers artificial intelligence weakly supervised learning non-small cell lung cancer whole-slide image prognosis Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Introduction Lung cancer is the leading cause of cancer-related death and the second most commonly diagnosed cancer, accounting for approximately one in five (18.0%) cancer deaths and one in ten (11.4%) cancer diagnoses 1 . Non-small cell lung cancer (NSCLC) represents 85% of all lung cancer cases 2 . In clinical practice, NSCLC treatment is primarily guided by TNM staging 3 . Early-stage NSCLC patients (stage 0, stage ⅠA, stage ⅠB and stage ⅡA without high-risk factors) generally do not require postoperative interventions, whereas patients with stage ⅠB or ⅡA with high-risk factors, stage ⅡB, and stage Ⅲ typically require postoperative treatment. However, patients within the same stage often exhibit different clinical outcomes, posing challenges in determining the need for postoperative interventions based solely on TNM staging. In early-stage NSCLC patients, the risk of disease progression or cancer-related death is not entirely eliminated. For stage ⅠA NSCLC patients undergoing surgery alone, the 5-year disease-free survival (DFS) rate is 84.5%, and the 5-year overall survival (OS) rate is 96.8% 4 . Conversely, within the group of NSCLC patients who require postoperative interventions but do not receive them, some remain free from disease progression or cancer-related death. For stage ⅡB/ⅢA NSCLC patients who undergo surgery alone, the 3-year progression-free survival rate is 36.1% 5 , and the 5-year OS rate for stage ⅢA patients is 26% 6 . Prognostic prediction is crucial for determining whether postoperative interventions are necessary. Accurate tools that predict DFS and OS for NSCLC patients are essential for personalized treatment and improved disease management. Artificial intelligence (AI)-based pathology has significantly advanced in the application to NSCLC, particularly in areas such as pathological diagnosis 7 , 8 , molecular phenotype prediction 9 , 10 , gene mutation prediction 7 , and prognostic prediction 11 – 25 . Among these applications, prognostic prediction holds the greatest clinical importance for NSCLC patients. Previous studies on prognostic prediction have either excluded clinical data or incorporated only minimal clinical information. Although these studies successfully distinguished different prognostic groups, they lacked a strong correlation between predicted and actual survival outcomes. Additionally, they did not effectively predict DFS and OS, which are critical in NSCLC prognosis. Several digital biomarkers have emerged from these studies. For example, the density of tumor-infiltrating lymphocytes (TILs) has been identified as a biomarker associated with worse prognosis 14 , and the growth pattern of adenocarcinoma has also been linked to prognosis 15 . Moreover, a recent study developed and validated four digital biomarkers based on tertiary lymphoid structures and necrosis 19 . However, the digital biomarkers associated with prognosis have not been fully elucidated. In this work, we have developed a multimodal AI model for prognostic prediction in NSCLC patients undergoing surgery, referred to as AIM-LCpro. Our model uses a patient-level weakly supervised learning approach that integrates WSIs with dense clinical data (Fig. 1 ). It not only categorizes patients into high-risk and low-risk groups but also predicts precise DFS and OS for each patient. Through model visualizations, we have identified several novel digital biomarkers associated with poor prognosis in NSCLC patients. This model has the potential to guide decisions regarding the need for postoperative interventions and improve overall prognosis in NSCLC patients. Results 3.1 Baseline characteristics of the study cohort In the study cohort, 173 patients (27.99%) experienced disease progression within 5 years, and 121 patients (19.58%) died of NSCLC within the same period. Of the total cohort, 353 patients (57.12%) did not require postoperative interventions, while 265 patients (42.88%) did. The baseline characteristics of the study cohort are presented in Table 1 . Table 1 Characteristics of the study cohort Characteristics Training set Validation set Test set Total Age, years (mean ± SD) 60.21 ± 9.16 59.65 ± 8.33 58.70 ± 9.53 59.85 ± 9.16 Gender, n(%) Male 246(57.08%) 34(54.84%) 76(60.80%) 356(57.61%) Female 185(42.92%) 28(45.16%) 49(39.20%) 262(42.39%) Number of WSIs 1847 246 536 2629 Smoking history, n(%) 196(45.48%) 27(43.55%) 59(47.20%) 282(45.63%) Patients who require postoperative interventions, n(%) 191(44.32%) 21(33.87%) 53(42.40%) 265(42.88%) Patients who do not require postoperative interventions, n(%) 240(55.68%) 41(66.13%) 72(57.60%) 353(57.12%) Pathology subtypes, n(%) Lung adenocarcinoma 313(72.62%) 43(69.35%) 87(69.60%) 443(71.68%) Lung squamous cell carcinoma 113(26.22%) 16(25.81%) 36(28.80%) 165(26.70%) Lung adenosquamous carcinoma 4(0.93%) 3(4.84%) 2(1.60%) 9(1.46%) Unknown 1(0.23%) 0 0 1(0.16%) Stage, n(%) 0 2(0.46%) 0 0 2(0.32%) Ⅰ 276(64.04%) 44(70.97%) 79(63.20%) 399(64.56%) Ⅱ 73(16.94%) 6(9.68%) 20(16.00%) 99(16.02%) Ⅲ 80(18.56%) 12(19.35%) 26(20.80%) 118(19.09%) Grade, n(%) Well differentiated (G1) 107(24.83%) 19(30.65%) 36(28.80%) 162(26.21%) Moderately differentiated (G2) 195(45.24%) 26(41.94%) 52(41.60%) 273(44.17%) Poorly differentiated (G3) 106(24.59%) 13(20.97%) 33(26.40%) 152(24.60%) Unknown 23(5.34%) 4(6.45%) 4(3.20%) 31(5.02%) Progression within 5 years (YES/NO/UNKNOWN), n(%) 119/309/3 (27.61%)/(71.69%)/(0.70%) 18/44/0 (29.03%)/(70.97%)/(0) 36/89/0 (28.80%)/(71.20%)/(0) 173/442/3 (27.99%)/(71.52%)/(0.49%) Death within 5 years* (YES/NO/UNKNOWN), n(%) 82/344/5 (19.03%)/(79.81%)/(1.16%) 13/49/0 (20.97)/(79.03%)/(0) 26/99/0 (20.80%)/(79.20%)/(0) 121/492/5 (19.58%)/(79.61%)/(0.81%) *Number of patients who died of NSCLC. SD: Standard Deviation 3.2. Performance of AIM-LCpro in predicting prognosis of NSCLC patients In the training, validation, and test sets, the areas under the ROC curves (AUCs) for predicting progression within 5 years were 0.9925, 0.8801, and 0.8084, respectively (Fig. 2 a- 2 c). Similarly, the AUCs for predicting death within 5 years were 0.9826, 0.8477, and 0.8021, respectively (Fig. 2 d- 2 f). These results suggest that our model has the potential to accurately distinguish between patients who will experience progression and those who will experience death. We applied the selected thresholds to predict outcomes in the training, validation, and test sets, achieving strong performance (Supplementary Tables 5–13). Specifically, when predicting whether patients' disease would progress within 5 years or whether they would die within 5 years in the test set, our model demonstrated high accuracy (0.7680 and 0.7760, respectively). In the test set, the model exhibited a sensitivity of 0.5556 for predicting progression within 5 years and 0.5385 for predicting death within the same period. Additionally, the model’s specificity for predicting progression and death within 5 years was 0.8539 and 0.8384, respectively. Harrell's C-index was also used to evaluate the performance of AIM-LCpro. In the test set, the Harrell's C-index for predicting progression and death within 5 years was 0.7748 and 0.7775, respectively (Supplementary Table 14). 3.3. High-risk and low-risk Groups The AIM-LCpro model was able to categorize patients into high-risk or low-risk groups based on two criteria: predicting progression within 5 years and death within 5 years (Fig. 3 ). For instance, if the model predicted that a patient's disease would progress within 5 years, the patient was categorized as high-risk; otherwise, they were categorized as low-risk. In the test set, there was a statistically significant difference between high-risk and low-risk groups for all patients, with P-values less than 0.0001 and Hazard Ratios (HR) of 4.85 for progression and 4.57 for death (Fig. 3 a and 3 d). Among patients who did not require postoperative interventions, the difference between high-risk and low-risk groups remained significant, with a P-value of 0.0030 and HR of 5.01 for progression, and a P-value of 0.0443 and HR of 4.10 for death (Fig. 3 b and 3 e). Similarly, for patients who required postoperative interventions, the high-risk group demonstrated a significant difference compared to the low-risk group, with P-values less than 0.0001 and HR of 4.34 for progression, and a P-value of 0.0036 and HR of 3.51 for death (Fig. 3 c and 3 f). 3.4. Consistency between predicted and actual K-M curves The AIM-LCpro model's predictive accuracy for both 5-year progression and death outcomes aligned with actual survival data, with no statistically significant discrepancies observed (for progression: P = 0.5029, HR = 0.85; for death: P = 0.2321, HR = 1.10), as shown in Fig. 4 a and Fig. 4 d. For patients who did not require postoperative interventions, the model's survival predictions were also consistent with actual outcomes, with no statistically significant differences (P = 0.4636, HR = 1.48 for progression; P = 0.3091, HR = 1.76 for death), as illustrated in Fig. 4 b and Fig. 4 e. Similarly, for patients requiring postoperative interventions, the model maintained its accuracy, showing no significant variance between predicted and actual survival (P = 0.0580, HR = 0.56 for progression; P = 0.5253, HR = 0.81 for death), as depicted in Fig. 4 c and Fig. 4 f. 3.5. Investigation of prognostic digital biomarkers through AIM-LCpro To intuitively display the pathological features associated with prognosis, we mapped the prognostic-related features extracted by the AIM-LCpro model onto WSIs in the form of heatmaps. As shown in Fig. 5 a, when comparing the heatmaps of progression and death in the test set, the number of hotspots for progression was greater than for death. Moreover, the hotspots for patients who died within 5 years were largely contained within the progression hotspots. Given that fewer patients died within 5 years compared to those with progression, it is possible that the model learned fewer features for death prediction. The consistency in the distribution of risk hotspots across both groups highlights the model's predictive capabilities. By analyzing these heatmaps, we can better understand the model’s predictions, identify areas that contribute to these predictions, and potentially uncover new digital biomarkers. The test set included 84 patients with non-mucinous adenocarcinoma (NMA), 36 patients with squamous cell carcinoma (SCC), and 5 patients with other NSCLC types (Fig. 5 b). In SCC, risk hotspots were predominantly concentrated in the tumor regions (Fig. 5 c), where tumor cells were disorderly arranged with enlarged and bizarre nuclei, and frequent mitotic figures were observed (Fig. 5 c). Similar to SCC, in NMA, risk hotspots also tended to localize within the tumor areas (Fig. 5 d- 5 j). We further analyzed these regions covered by hotspots. Of the 84 patients with NMA, risk hotspots were found to be distributed in micropapillary adenocarcinoma (MPA) and solid adenocarcinoma (SPA). As shown in Fig. 5 d and Fig. 5 e, MPA was present in 11 patients, of which 5/11 and 3/11 patients had risk hotspots in the 5-year progression and 5-year death heatmaps, respectively. Surprisingly, 30 patients had SPA, and all of them had risk hotspots in their SPA areas in both the 5-year progression and 5-year death heatmaps, although the instance-level hotspots did not cover all SPA regions (Fig. 5 d and Fig. 5 f). These two histological subtypes are coincidentally classified as high-grade patterns in the 5th edition of the WHO classification of thoracic tumors. Interestingly, the most common histological type, lepidic adenocarcinoma (LPA), was not identified as a risk hotspot at all (Fig. 5 d and Fig. 5 g), further demonstrating the model's reliability as LPA was considered a low-grade histology. Regarding the other two NMA histological subtypes, acinar adenocarcinoma (APA) and papillary adenocarcinoma (PPA), the distribution of risk hotspots was uneven. For APA, we identified two types of glands more likely to be covered by risk hotspots. As shown in Fig. 5 h, the first type consisted of small, irregular glands made up of pleomorphic cells, surrounded by desmoplastic stroma, which was often hypovascular and composed of collagen fibers interspersed with fibroblasts and lymphocytes. The second type consisted of large, irregular glands with multilayered cells, characterized by significant cellular and nuclear pleomorphism. These cells were crowded, and some protruded into the glandular lumen, forming structures similar to a "papillary" pattern without a central axis (Fig. 5 h). The stroma in these areas was loose and rich in neomicrovessels, consistent with the pure stromal regions identified as risk hotspots, as demonstrated in Fig. 5 i. Fewer areas of PPA were identified as risk hotspots, with the model appearing to recognize regions with crowded cell arrangements as high-risk (Fig. 5 j). The pathological features related to prognosis identified by our model may serve as digital biomarkers and warrant further validation in future studies. Discussion We demonstrated that a multimodal model combining dense clinical data with WSIs can successfully predict the prognosis of NSCLC patients undergoing surgery. The model effectively screens for and utilizes prognostic information, achieving a high level of accuracy. To our knowledge, no other prognostic prediction models for surgical NSCLC patients have yet entered clinical application. Our model's ability to predict which patients do or do not require postoperative treatment aligns closely with clinical application scenarios. Previous studies relied heavily on manual annotation or predefined image features 11 – 19 . In contrast, our model does not require manual WSI annotation, significantly reducing the manpower involved. Additionally, it does not rely on predefined image features. Instead, it uses CAMEL2 to automatically screen and extract regions associated with prognosis 26 . By avoiding predefined features, the model is free to search for prognostic regions across the entire WSI without limitations. It can categorize patients into high-risk and low-risk groups while predicting 5-year DFS and OS. Moreover, there is no statistically significant difference between the predicted and actual survival outcomes, which strengthens the validity of stratifying patients into high-risk and low-risk groups. In clinical practice, physicians need tools to predict patient outcomes. If the model predicts NSCLC patients are at risk of progression or death, they can be recommended for postoperative interventions. Conversely, if patients are predicted to remain free from progression or death, chemotherapy can be avoided, aiding in more personalized treatment plans. NSCLC exhibits significant tumor heterogeneity 27 , 28 . This heterogeneity applies not only to tumor epithelial cells but also to the various microenvironments interacting with tumor cells 29 . The digital biomarkers identified by our model from WSIs may reflect this heterogeneity and aid in personalizing treatment for NSCLC patients. Similar to traditional biomarkers, digital biomarkers serve as indicators for diagnosis, prognosis, and therapeutic responses and should demonstrate clinical validity 30 , 31 . The clinical utility of new biomarkers can be evaluated by their association with existing biomarkers or by directly proving their usefulness 32 . Our model identified areas with a high mitotic index in SCC as risk hotspots, consistent with previous findings that associate a high mitotic index with poor prognosis 33 . Additionally, MPA and SPA were identified as risk areas, in line with high-grade growth patterns defined by the latest WHO classification of thoracic tumors. Furthermore, LPA, known for having the best prognosis, was not recognized as a risk area. APA, associated with intermediate prognosis, was widely distributed across the slides. Through heatmap analysis, we identified histological characteristics in APA that may indicate poor prognosis, though further evidence and additional data are needed to verify this. Predicting the prognosis of NSCLC surgery patients raises ethical concerns. For example, knowing a poor prognosis in advance may affect patients' quality of life. Additionally, the question of who bears responsibility for harm caused by incorrect predictions remains unanswered. Our study has several limitations. First, to avoid potential information leakage, we did not include information about subsequent treatments after progression, which could have compromised the model's credibility. Second, the clinical benefits of altering postoperative intervention strategies based on the model's predictions have not yet been validated. It remains to be seen how much patients would benefit from such an approach. Finally, our model is based on a relatively small cohort. Further studies with larger sample sizes are needed to enhance the model's ability to predict NSCLC prognosis. Methods 2.1. Study population and inclusion/exclusion criteria We enrolled 641 NSCLC patients who underwent lung surgery at Beijing Chest Hospital between January 2016 and November 2017. After excluding 23 patients, 618 patients (BCH study cohort) were ultimately included in the study. The inclusion criteria were: (I) NSCLC patients who underwent radical surgery; (II) NSCLC patients who did not undergo lymph node dissection and had no evidence of distant metastases before surgery; (III) NSCLC patients who agreed to follow-up. The exclusion criteria were: (I) patients with other incurable malignant tumors; (II) patients who died from other diseases before progression within 5 years after surgery; (III) cases where all primary tumor tissues were frozen prior to being fixed in formalin. The study was conducted in accordance with the principles of the Declaration of Helsinki and approved by the Ethics Committee of Beijing Chest Hospital, Capital Medical University (YJS-2023-16). 2.2. Data collection protocols 2.2.1. WSI acquisition All WSIs in this study were formalin-fixed paraffin-embedded (FFPE) whole-slide H&E-stained images of primary tumor tissues. Frozen slides and frozen paraffin slides were excluded. A total of 2,629 WSIs were acquired for the BCH study cohort, scanned using the KFBio KF-PRO-400 scanner, and saved at magnifications of 400×, 200×, 100×, and 50×. 2.2.2. Clinical data acquisition Clinical variables were collected from inpatient medical records and included age, gender, smoking history, family history, TNM stages, lymph node dissection and metastasis, tumor size, CT data, pathological data, postoperative treatment, and risk factors. Risk factors included poorly differentiated tumors, vascular invasion, wedge resection, visceral pleural involvement, and unknown lymph node status. All patients were followed up through telephone and outpatient services, with a postoperative follow-up period of over 5 years for all patients. 2.3. Preprocessing of WSIs and clinical data 2.3.1. Image segmentation and feature extraction Glass regions were filtered out using RGB channel pixel variance calculations, and tissue regions were extracted and cut into 2,048 × 2,048-pixel image patches at 20× magnification. For image features, we used the pre-trained CAMEL2 26 weakly supervised framework to extract features from patches in the training set, extracting intermediate features from CAMEL2 as the patch's image feature representation. Each patch had two image feature representations: one for progression and one for death. To obtain patient-level image feature representations, we sorted all patch image feature representations in descending order based on the prediction probability output by CAMEL2, then averaged the top 10% of patch-level features to generate the patient-level image feature representation. 2.3.2. Clinical data standardization and normalization Clinical data contained discrete and categorical variables. Discrete variables were normalized by scaling values between 0 and 1. Categorical variables, such as gender and disease type, were one-hot encoded using positional coding, where a one-dimensional vector represented two-dimensional information. 2.4. Development of the AI model 2.4.1. Architecture of the multimodal AI model The workflow is shown in Supplementary Fig. 1. We employed a two-stage training strategy: first, classification training for prognostic metrics (progression and death), followed by regression training for time prediction (progression time and death time) based on the classification model weights. 2.4.2. Training procedure and algorithm selection Preprocessed clinical features from the training set patients were passed through a clinical feature network to obtain clinical feature representations. This network consisted of linear layers, Batch Normalization layers, and ReLU layers. The patient-level image and clinical feature representations were concatenated and fed into the classification head network, which output the probability of progression or death. The classification head network comprised linear layers, Batch Normalization layers, and ReLU layers, with two independent linear layers in the final stage. The network was trained using cross-entropy loss. For regression training, the clinical feature network weights were frozen, and two separate time prediction head networks were trained to output progression and death times. The time prediction head consisted of linear layers, Batch Normalization layers, ReLU layers, and a final Sigmoid layer. The output was multiplied by 60 to obtain the specific progression or death month. The network was trained using the L1 loss function. During inference, the network simultaneously output classification results and time predictions. For patients classified as negative samples, the corresponding time was set to 60 months; otherwise, the network's original predicted output was retained. The model assigned a probability of progression and death within 5 years to each patient. For the training and validation sets, thresholds were selected based on sensitivity, specificity, accuracy, and the Youden index. For patients who did not require postoperative treatment, thresholds of 0.1461 and 0.2092 were selected for progression and death within 5 years, respectively. For patients who required postoperative treatment, thresholds of 0.3123 and 0.3391 were used. These thresholds were applied to the test set. 2.5. Model training and validation strategy 2.5.1. Dataset division for training, validation, and testing The BCH study cohort was divided into training (428 patients), validation (62 patients), and test sets (125 patients) for predicting 5-year progression (Supplementary Table 1), and into training (426 patients), validation (62 patients), and test sets (125 patients) for predicting 5-year death (Supplementary Table 2). The training, validation, and test sets were comparable (Supplementary Tables 3 and 4). 2.6. Statistical analysis Categorical data were evaluated using Pearson's chi-squared test or Fisher's exact test. Measurement data were expressed as mean ± standard deviation and analyzed using the independent samples t-test or analysis of variance. Survival curves were generated using the Kaplan–Meier method. When survival curves did not intersect, they were compared using the log–rank test. When survival curves intersected, the Rényi test was utilized to make comparisons. Harrell's C-index was computed in R using the Hmisc package. All tests were two-tailed, and a P-value less than 0.05 was considered statistically significant. Statistical analysis was performed using SPSS software 26.0 or GraphPad Prism 10. Declarations Competing interests None declared Author Contribution N.C. and S.W conceived and designed the study. Y.L. collected clinical data, conducted the analyses and wrote the manuscript. X.C. participated in the interpretation of digital biomarkers and wrote the manuscript. M.Y., J.X., J.Z. and Y.C. participated in the establishment of the model. G.X. and W.W. provided assistance in establishing the model. H.L. provided assistance in the interpretation of digital biomarkers. All authors reviewed the manuscript. Acknowledgements This work was supported by Beijing AI + Health Cultivation Innovation Project (No. Z241100007724001), Beijing Municipal Public Welfare Development and Reform Pilot Project for Medical Research Institutes (No. JYY2023-15), Beijing Nova Program, and 2023 Science and Technology Projects of Qinghai Province, China (Basic Research Program, No. 2023-ZJ-732). Data Availability Data are available upon reasonable request. Code availability The code can be accessed online: https://github.com/ThoroughFuture . References Sung, H., et al. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J Clin 71, 209–249 (2021). Reck, M. & Rabe, K.F. Precision Diagnosis and Treatment for Advanced Non-Small-Cell Lung Cancer. The New England journal of medicine 377, 849–861 (2017). Ettinger, D.S., et al. NCCN Guidelines® Insights: Non-Small Cell Lung Cancer, Version 2.2023. Journal of the National Comprehensive Cancer Network: JNCCN 21, 340–350 (2023). Jiang, Y., et al. The impact of adjuvant EGFR-TKIs and 14-gene molecular assay on stage I non-small cell lung cancer with sensitive EGFR mutations. EClinicalMedicine 64, 102205 (2023). Scagliotti, G.V., et al. Randomized phase III study of surgery alone or surgery plus preoperative cisplatin and gemcitabine in stages IB to IIIA non-small-cell lung cancer. Journal of clinical oncology: official journal of the American Society of Clinical Oncology 30, 172–178 (2012). Douillard, J.Y., et al. Adjuvant vinorelbine plus cisplatin versus observation in patients with completely resected stage IB-IIIA non-small-cell lung cancer (Adjuvant Navelbine International Trialist Association [ANITA]): a randomised controlled trial. The Lancet. Oncology 7, 719–727 (2006). Coudray, N., et al. Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning. Nat Med 24, 1559–1567 (2018). Chen, C.L., et al. An annotation-free whole-slide training approach to pathological classification of lung cancer types using deep learning. Nat Commun 12, 1193 (2021). Diao, J.A., et al. Human-interpretable image features derived from densely mapped cancer pathology slides predict diverse molecular phenotypes. Nat Commun 12, 1613 (2021). Wu, J., et al. Artificial intelligence-assisted system for precision diagnosis of PD-L1 expression in non-small cell lung cancer. Modern pathology: an official journal of the United States and Canadian Academy of Pathology, Inc 35, 403–411 (2022). Yu, K.H., et al. Predicting non-small cell lung cancer prognosis by fully automated microscopic pathology image features. Nat Commun 7, 12474 (2016). Luo, X., et al. Comprehensive Computational Pathological Image Analysis Predicts Lung Cancer Prognosis. Journal of thoracic oncology: official publication of the International Association for the Study of Lung Cancer 12, 501–509 (2017). Wang, Y., et al. Multi-scale pathology image texture signature is a prognostic factor for resectable lung adenocarcinoma: a multi-center, retrospective study. J Transl Med 20, 595 (2022). Pan, X., et al. Computerized tumor-infiltrating lymphocytes density score predicts survival of patients with resectable lung adenocarcinoma. iScience 25, 105605 (2022). Alsubaie, N., Raza, S.E.A., Snead, D. & Rajpoot, N.M. Growth Pattern Fingerprinting for Automatic Analysis of Lung Adenocarcinoma Overall Survival. Ieee Access 11, 23335–23346 (2023). Wang, H., Xing, F., Su, H., Stromberg, A. & Yang, L. Novel image markers for non-small cell lung cancer classification and survival prediction. BMC bioinformatics 15, 310 (2014). Wang, X., et al. Prediction of recurrence in early stage non-small cell lung cancer using computer extracted nuclear features from digital H&E images. Scientific reports 7, 13543 (2017). Alsubaie, N.M., Snead, D. & Rajpoot, N.M. Tumour Nuclear Morphometrics Predict Survival in Lung Adenocarcinoma. Ieee Access 9, 12322–12331 (2021). Kludt, C., et al. Next-generation lung cancer pathology: Development and validation of diagnostic and prognostic algorithms. Cell reports. Medicine 5, 101697 (2024). Zhao, L., et al. CoADS: Cross attention based dual-space graph network for survival prediction of lung cancer using whole slide images. Comput Methods Programs Biomed 236, 107559 (2023). Diao, S., et al. Automated Cellular-Level Dual Global Fusion of Whole-Slide Imaging for Lung Adenocarcinoma Prognosis. Cancers (Basel) 15(2023). Shim, W.S., et al. DeepRePath: Identifying the Prognostic Features of Early-Stage Lung Adenocarcinoma Using Multi-Scale Pathology Images and Deep Convolutional Neural Networks. Cancers (Basel) 13(2021). Zheng, Y., et al. Graph Attention-Based Fusion of Pathology Images and Gene Expression for Prediction of Cancer Survival. IEEE Trans Med Imaging 43, 3085–3097 (2024). Hattori, H., Sakashita, S., Tsuboi, M., Ishii, G. & Tanaka, T. Tumor-identification method for predicting recurrence of early-stage lung adenocarcinoma using digital pathology images by machine learning. Journal of pathology informatics 14, 100175 (2023). Kim, P.J., et al. A new model using deep learning to predict recurrence after surgical resection of lung adenocarcinoma. Scientific reports 14, 6366 (2024). Xu, G., et al. CAMEL2: Enhancing Weakly Supervised Learning for Histopathology Images by Incorporating the Significance Ratio. Adv. Intell. Syst. 6, 12 (2024). Gridelli, C., et al. Non-small-cell lung cancer. Nature reviews. Disease primers 1, 15009 (2015). Chen, Z., Fillmore, C.M., Hammerman, P.S., Kim, C.F. & Wong, K.K. Non-small-cell lung cancers: a heterogeneous set of diseases. Nature reviews. Cancer 14, 535–546 (2014). Quail, D.F. & Joyce, J.A. Microenvironmental regulation of tumor progression and metastasis. Nat Med 19, 1423–1437 (2013). Arya, S.S., Dias, S.B., Jelinek, H.F., Hadjileontiadis, L.J. & Pappa, A.M. The convergence of traditional and digital biomarkers through AI-assisted biosensing: A new era in translational diagnostics? Biosensors & bioelectronics 235, 115387 (2023). Montag, C., Elhai, J.D. & Dagum, P. On Blurry Boundaries When Defining Digital Biomarkers: How Much Biology Needs to Be in a Digital Biomarker? Frontiers in psychiatry 12, 740292 (2021). Song, Y., Kang, K., Kim, I. & Kim, T.J. Pathological Digital Biomarkers: Validation and Application. Appl. Sci.-Basel 12, 13 (2022). Gürel, D., et al. The prognostic value of morphologic findings for lung squamous cell carcinoma patients. Pathology, research and practice 212, 1–9 (2016). Additional Declarations No competing interests reported. Supplementary Files supplementarymaterial.docx Cite Share Download PDF Status: Published Journal Publication published 19 Jun, 2025 Read the published version in npj Precision Oncology → Version 1 posted Editorial decision: Revision requested 24 Dec, 2024 Reviews received at journal 16 Dec, 2024 Reviews received at journal 15 Dec, 2024 Reviews received at journal 03 Dec, 2024 Reviews received at journal 27 Nov, 2024 Reviewers agreed at journal 23 Nov, 2024 Reviewers agreed at journal 22 Nov, 2024 Reviewers agreed at journal 22 Nov, 2024 Reviewers agreed at journal 22 Nov, 2024 Reviewers agreed at journal 20 Nov, 2024 Reviewers agreed at journal 08 Nov, 2024 Reviewers agreed at journal 07 Nov, 2024 Reviewers invited by journal 07 Nov, 2024 Editor assigned by journal 07 Nov, 2024 Submission checks completed at journal 07 Nov, 2024 First submitted to journal 29 Oct, 2024 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-5353171","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":381709353,"identity":"c72cea2d-80f1-489a-8a3a-0faee68acfb7","order_by":0,"name":"Nanying Che","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAzUlEQVRIiWNgGAWjYDACCRBhwMbMz97Y+PAD8Voq+Nglew43G0sQr+WMHL/BjfQ2AR5idPDPbj72mLfNTFpy5sM2oH47Od0GQpbcOZZuzNuWZswvndj2oIAh2djsAAEtBhI5ZtK8bceSJWcnthtIMBxI3EZYS/43oJb/9RtuHmyT4CFOSw6bNM8ZNmaDG4xEapG4kWYmOaeCjVmyJxEYyAZE+IV/RvIziTfgqDz+8OGHCjs5glpAgAkRHQZEKAcBxh9EKhwFo2AUjIIRCgCdFz5IBV3DJAAAAABJRU5ErkJggg==","orcid":"","institution":"Beijing Tuberculosis and Thoracic Tumor Research Institute/ Beijing Chest Hospital, Capital Medical University","correspondingAuthor":true,"prefix":"","firstName":"Nanying","middleName":"","lastName":"Che","suffix":""},{"id":381709354,"identity":"b9866d91-17d9-4de5-9f38-ea4a48847743","order_by":1,"name":"Yongmeng Li","email":"","orcid":"","institution":"Beijing Tuberculosis and Thoracic Tumor Research Institute/ Beijing Chest Hospital, Capital Medical University","correspondingAuthor":false,"prefix":"","firstName":"Yongmeng","middleName":"","lastName":"Li","suffix":""},{"id":381709355,"identity":"1db6a733-c6dc-4043-bd02-ea05a44b67d9","order_by":2,"name":"Xiaodong Chai","email":"","orcid":"","institution":"Beijing Tuberculosis and Thoracic Tumor Research Institute/ Beijing Chest Hospital, Capital Medical University","correspondingAuthor":false,"prefix":"","firstName":"Xiaodong","middleName":"","lastName":"Chai","suffix":""},{"id":381709356,"identity":"357484e2-131d-4683-b31b-04234baed7d2","order_by":3,"name":"Moxuan Yang","email":"","orcid":"","institution":"Department of Physics, Capital Normal University, Beijing 100048, China","correspondingAuthor":false,"prefix":"","firstName":"Moxuan","middleName":"","lastName":"Yang","suffix":""},{"id":381709357,"identity":"79438f89-5c3f-4a47-90b6-dddb35691a25","order_by":4,"name":"Jiahang Xiong","email":"","orcid":"","institution":"Thorough Lab, Thorough Future, Beijing 100036, China","correspondingAuthor":false,"prefix":"","firstName":"Jiahang","middleName":"","lastName":"Xiong","suffix":""},{"id":381709358,"identity":"a531b761-783b-4eae-9686-c2af9c0f94fd","order_by":5,"name":"Junyang Zeng","email":"","orcid":"","institution":"College of Light Industry Science and Engineering, Tianjin University of Science and Technology, Tianjin 300222, China","correspondingAuthor":false,"prefix":"","firstName":"Junyang","middleName":"","lastName":"Zeng","suffix":""},{"id":381709359,"identity":"d0c4ed4e-5c38-44df-aecc-ef9c6e8bcda9","order_by":6,"name":"Yun Chen","email":"","orcid":"","institution":"School of Technology, Beijing Forestry University, Beijing 100083, China","correspondingAuthor":false,"prefix":"","firstName":"Yun","middleName":"","lastName":"Chen","suffix":""},{"id":381709360,"identity":"af551903-4803-40aa-8bca-f63d0c083811","order_by":7,"name":"Gang Xu","email":"","orcid":"","institution":"Multiscale Research Institute of Complex Systems, Fudan University, Shanghai 200433, China","correspondingAuthor":false,"prefix":"","firstName":"Gang","middleName":"","lastName":"Xu","suffix":""},{"id":381709361,"identity":"3271cc01-cc8f-4ac6-b4ef-1e8292b9c075","order_by":8,"name":"Haifeng Lin","email":"","orcid":"","institution":"Beijing Tuberculosis and Thoracic Tumor Research Institute/ Beijing Chest Hospital, Capital Medical University","correspondingAuthor":false,"prefix":"","firstName":"Haifeng","middleName":"","lastName":"Lin","suffix":""},{"id":381709362,"identity":"7e2ee478-eb7b-45aa-972e-57f52c783912","order_by":9,"name":"Wei Wang","email":"","orcid":"","institution":"Thorough Lab, Thorough Future, Beijing 100036, China","correspondingAuthor":false,"prefix":"","firstName":"Wei","middleName":"","lastName":"Wang","suffix":""},{"id":381709363,"identity":"aee9d9a2-8b0b-4940-94ce-a8132a15c9ba","order_by":10,"name":"Shuhao Wang","email":"","orcid":"","institution":"Thorough Lab, Thorough Future, Beijing 100036, China","correspondingAuthor":false,"prefix":"","firstName":"Shuhao","middleName":"","lastName":"Wang","suffix":""}],"badges":[],"createdAt":"2024-10-29 09:53:18","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-5353171/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-5353171/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1038/s41698-025-00981-y","type":"published","date":"2025-06-19T15:57:19+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":70963181,"identity":"0932356f-2aa5-45aa-8417-dbfbc1cede24","added_by":"auto","created_at":"2024-12-09 15:46:31","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":1078596,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eAn overview of AIM-LCpro.\u003c/strong\u003e AIM-LCpro uses a patient-level weakly supervised learning approach to predict DFS and OS. WSIs and dense clinical data are integrated.\u003c/p\u003e","description":"","filename":"Onlinefigure1.png","url":"https://assets-eu.researchsquare.com/files/rs-5353171/v1/5dadc0eada09c6d7bfbe22ef.png"},{"id":70964439,"identity":"e8dfe8b6-5d68-46ee-8d5f-04676781030f","added_by":"auto","created_at":"2024-12-09 16:02:32","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":510179,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eROC curves for predicting prognosis of NSCLC patients using AIM-LCpro model. \u003c/strong\u003eThe ROC curves for predicting \u003cstrong\u003ea,\u003c/strong\u003e progression in 5 years in the training set. \u003cstrong\u003eb, \u003c/strong\u003eprogression in 5 years in the validation set. \u003cstrong\u003ec, \u003c/strong\u003eprogression in 5 years in the test set. \u003cstrong\u003ed, \u003c/strong\u003edeath in 5 years in the training set. \u003cstrong\u003ee, \u003c/strong\u003edeath in 5 years in the validation set. \u003cstrong\u003ef, \u003c/strong\u003edeath in 5 years in the test set.\u003c/p\u003e","description":"","filename":"Onlinefigure2.png","url":"https://assets-eu.researchsquare.com/files/rs-5353171/v1/3851b61c0937d4cffb1163a8.png"},{"id":70963180,"identity":"4d7fa771-17ad-42a7-b957-fcaedbdca112","added_by":"auto","created_at":"2024-12-09 15:46:31","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":556209,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eComparison between the K-M curves of high-risk groups and low-risk groups. \u003c/strong\u003eProgression in 5 years:\u003cstrong\u003e a, \u003c/strong\u003efor the test set (log-rank test P value\u0026lt;0.0001, HR=4.85); \u003cstrong\u003eb,\u003c/strong\u003e among patients who do not require postoperative interventions (log-rank test P value=0.0030, HR=5.01); \u003cstrong\u003ec, \u003c/strong\u003eamong patients who require postoperative interventions (log-rank test P value\u0026lt;0.0001, HR=4.34). Death in 5 years:\u003cstrong\u003e d, \u003c/strong\u003efor the test set (log-rank test P value\u0026lt;0.0001, HR=4.57); \u003cstrong\u003ee,\u003c/strong\u003e among patients who do not require postoperative interventions (log-rank test P value=0.0443, HR=4.10); \u003cstrong\u003ef,\u003c/strong\u003e among patients who require postoperative interventions (log-rank test P value=0.0036, HR=3.51). HR: Hazard Ratio (log-rank).\u003c/p\u003e","description":"","filename":"Onlinefigure3.png","url":"https://assets-eu.researchsquare.com/files/rs-5353171/v1/72b7a2aa4eba4b3c523cf6d5.png"},{"id":70964187,"identity":"1a768dbd-a644-4e21-8b4e-9f752bfa7c2d","added_by":"auto","created_at":"2024-12-09 15:54:31","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":616890,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eComparison between the predicted and actual K-M curves in the test set. \u003c/strong\u003eProgression in 5 years:\u003cstrong\u003e a, \u003c/strong\u003efor the test set (log-rank test P value=0.5029, HR=0.85); \u003cstrong\u003eb,\u003c/strong\u003e among patients who do not require postoperative interventions (Rényi test P value=0.4636, HR=1.48); \u003cstrong\u003ec, \u003c/strong\u003eamong patients who require postoperative interventions (log-rank test P value=0.0580, HR=0.56). Death in 5 years:\u003cstrong\u003e d, \u003c/strong\u003efor the test set (Rényi test P value=0.2321, HR=1.10); \u003cstrong\u003ee,\u003c/strong\u003e among patients who do not require postoperative interventions (Rényi test P value=0.3091, HR=1.76); \u003cstrong\u003ef,\u003c/strong\u003eamong patients who require postoperative interventions (log-rank test P value=0.5253, HR=0.81). HR: Hazard Ratio (log-rank).\u003c/p\u003e","description":"","filename":"Onlinefigure4.png","url":"https://assets-eu.researchsquare.com/files/rs-5353171/v1/0df36323054557ba73ec8a16.png"},{"id":70963182,"identity":"bf2651d1-4537-4b19-8b01-60c1dab5f88d","added_by":"auto","created_at":"2024-12-09 15:46:31","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":1467479,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eInterpreted pathological features.\u003c/strong\u003e \u003cstrong\u003ea, \u003c/strong\u003eThe comparison of the number of hotspots in 5-year progression and 5-year death patients. \u003cstrong\u003eb, \u003c/strong\u003eThe composition of histological types of NSCLC patients in the test set. \u003cstrong\u003ec, \u003c/strong\u003eThe distribution of hotspots on WSIs for patients with SCC, along with the enlargement of example regions. Red arrow: mitotic figures; black arrow: enlarged and bizarre nuclei. \u003cstrong\u003ed, \u003c/strong\u003eThe actual and predicted results of NMA subtypes. \u003cstrong\u003ee, f, g, \u003c/strong\u003eThe distribution of hotspots on WSIs for patients with MPA (\u003cstrong\u003ee\u003c/strong\u003e), SPA (\u003cstrong\u003ef\u003c/strong\u003e), LPA (\u003cstrong\u003eg\u003c/strong\u003e), along with the enlargement of example regions. \u003cstrong\u003eh, \u003c/strong\u003eThe distribution of hotspots on WSIs for patients with APA. The enlarged regions display the features of small glands(upside) and big glands(underside). \u003cstrong\u003ei, \u003c/strong\u003eThe stromal regions identified as risk hotspots. \u003cstrong\u003ej, \u003c/strong\u003eThe distribution of hotspots on WSIs for patients with PPA, along with the enlargement of example regions. SCC, squamous cell carcinomas; NMA, non-mucinous adenocarcinoma; SPA, solid adenocarcinoma; MPA, micropapillary adenocarcinoma; LPA, lepidic adenocarcinoma; APA, acinar adenocarcinoma; PPA, papillary adenocarcinoma.\u003c/p\u003e","description":"","filename":"Onlinefigure5.png","url":"https://assets-eu.researchsquare.com/files/rs-5353171/v1/f446061b8fbfe3e3bb0b9fb1.png"},{"id":85231330,"identity":"f799e507-985c-489a-8ab7-601de1228a29","added_by":"auto","created_at":"2025-06-23 16:06:10","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":7525453,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-5353171/v1/ee73fa10-76e0-4275-942a-4934aaaa1ead.pdf"},{"id":70963185,"identity":"7232d5df-7107-47b5-aa02-cb77b9fc0e3a","added_by":"auto","created_at":"2024-12-09 15:46:31","extension":"docx","order_by":8,"title":"","display":"","copyAsset":false,"role":"supplement","size":32360289,"visible":true,"origin":"","legend":"","description":"","filename":"supplementarymaterial.docx","url":"https://assets-eu.researchsquare.com/files/rs-5353171/v1/f0088cd69cc2ed81c4165435.docx"}],"financialInterests":"No competing interests reported.","formattedTitle":"Accurate Prediction of Disease-Free and Overall Survival in Non-Small Cell Lung Cancer Using Patient-Level Multimodal Weakly Supervised Learning","fulltext":[{"header":"Introduction","content":"\u003cp\u003eLung cancer is the leading cause of cancer-related death and the second most commonly diagnosed cancer, accounting for approximately one in five (18.0%) cancer deaths and one in ten (11.4%) cancer diagnoses\u003csup\u003e\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e\u003c/sup\u003e. Non-small cell lung cancer (NSCLC) represents 85% of all lung cancer cases\u003csup\u003e\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e\u003c/sup\u003e. In clinical practice, NSCLC treatment is primarily guided by TNM staging\u003csup\u003e\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e\u003c/sup\u003e. Early-stage NSCLC patients (stage 0, stage ⅠA, stage ⅠB and stage ⅡA without high-risk factors) generally do not require postoperative interventions, whereas patients with stage ⅠB or ⅡA with high-risk factors, stage ⅡB, and stage Ⅲ typically require postoperative treatment. However, patients within the same stage often exhibit different clinical outcomes, posing challenges in determining the need for postoperative interventions based solely on TNM staging.\u003c/p\u003e \u003cp\u003eIn early-stage NSCLC patients, the risk of disease progression or cancer-related death is not entirely eliminated. For stage ⅠA NSCLC patients undergoing surgery alone, the 5-year disease-free survival (DFS) rate is 84.5%, and the 5-year overall survival (OS) rate is 96.8%\u003csup\u003e4\u003c/sup\u003e. Conversely, within the group of NSCLC patients who require postoperative interventions but do not receive them, some remain free from disease progression or cancer-related death. For stage ⅡB/ⅢA NSCLC patients who undergo surgery alone, the 3-year progression-free survival rate is 36.1%\u003csup\u003e5\u003c/sup\u003e, and the 5-year OS rate for stage ⅢA patients is 26%\u003csup\u003e6\u003c/sup\u003e. Prognostic prediction is crucial for determining whether postoperative interventions are necessary. Accurate tools that predict DFS and OS for NSCLC patients are essential for personalized treatment and improved disease management.\u003c/p\u003e \u003cp\u003eArtificial intelligence (AI)-based pathology has significantly advanced in the application to NSCLC, particularly in areas such as pathological diagnosis\u003csup\u003e\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e,\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e\u003c/sup\u003e, molecular phenotype prediction\u003csup\u003e\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e,\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e\u003c/sup\u003e, gene mutation prediction\u003csup\u003e\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e\u003c/sup\u003e, and prognostic prediction\u003csup\u003e\u003cspan additionalcitationids=\"CR12 CR13 CR14 CR15 CR16 CR17 CR18 CR19 CR20 CR21 CR22 CR23 CR24\" citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e\u003c/sup\u003e. Among these applications, prognostic prediction holds the greatest clinical importance for NSCLC patients. Previous studies on prognostic prediction have either excluded clinical data or incorporated only minimal clinical information. Although these studies successfully distinguished different prognostic groups, they lacked a strong correlation between predicted and actual survival outcomes. Additionally, they did not effectively predict DFS and OS, which are critical in NSCLC prognosis.\u003c/p\u003e \u003cp\u003eSeveral digital biomarkers have emerged from these studies. For example, the density of tumor-infiltrating lymphocytes (TILs) has been identified as a biomarker associated with worse prognosis\u003csup\u003e\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e\u003c/sup\u003e, and the growth pattern of adenocarcinoma has also been linked to prognosis\u003csup\u003e\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e\u003c/sup\u003e. Moreover, a recent study developed and validated four digital biomarkers based on tertiary lymphoid structures and necrosis\u003csup\u003e\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e\u003c/sup\u003e. However, the digital biomarkers associated with prognosis have not been fully elucidated.\u003c/p\u003e \u003cp\u003eIn this work, we have developed a multimodal AI model for prognostic prediction in NSCLC patients undergoing surgery, referred to as AIM-LCpro. Our model uses a patient-level weakly supervised learning approach that integrates WSIs with dense clinical data (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). It not only categorizes patients into high-risk and low-risk groups but also predicts precise DFS and OS for each patient. Through model visualizations, we have identified several novel digital biomarkers associated with poor prognosis in NSCLC patients. This model has the potential to guide decisions regarding the need for postoperative interventions and improve overall prognosis in NSCLC patients.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e"},{"header":"Results","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003e3.1 Baseline characteristics of the study cohort\u003c/h2\u003e \u003cp\u003eIn the study cohort, 173 patients (27.99%) experienced disease progression within 5 years, and 121 patients (19.58%) died of NSCLC within the same period. Of the total cohort, 353 patients (57.12%) did not require postoperative interventions, while 265 patients (42.88%) did. The baseline characteristics of the study cohort are presented in Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003e Characteristics of the study cohort\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"5\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCharacteristics\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eTraining set\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eValidation set\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eTest set\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eTotal\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAge, years (mean\u0026thinsp;\u0026plusmn;\u0026thinsp;SD)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e60.21\u0026thinsp;\u0026plusmn;\u0026thinsp;9.16\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e59.65\u0026thinsp;\u0026plusmn;\u0026thinsp;8.33\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e58.70\u0026thinsp;\u0026plusmn;\u0026thinsp;9.53\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e59.85\u0026thinsp;\u0026plusmn;\u0026thinsp;9.16\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eGender, n(%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMale\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e246(57.08%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e34(54.84%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e76(60.80%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e356(57.61%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eFemale\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e185(42.92%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e28(45.16%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e49(39.20%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e262(42.39%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eNumber of WSIs\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e1847\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e246\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e536\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e2629\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSmoking history, n(%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e196(45.48%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e27(43.55%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e59(47.20%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e282(45.63%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ePatients who require postoperative interventions, n(%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e191(44.32%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e21(33.87%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e53(42.40%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e265(42.88%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ePatients who do not require postoperative interventions, n(%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e240(55.68%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e41(66.13%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e72(57.60%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e353(57.12%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ePathology subtypes, n(%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLung adenocarcinoma\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e313(72.62%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e43(69.35%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e87(69.60%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e443(71.68%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLung\u0026nbsp;squamous\u0026nbsp;cell\u0026nbsp;carcinoma\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e113(26.22%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e16(25.81%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e36(28.80%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e165(26.70%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLung\u0026nbsp;adenosquamous\u0026nbsp;carcinoma\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e4(0.93%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e3(4.84%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e2(1.60%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e9(1.46%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eUnknown\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e1(0.23%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e1(0.16%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eStage, n(%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e2(0.46%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e2(0.32%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eⅠ\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e276(64.04%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e44(70.97%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e79(63.20%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e399(64.56%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eⅡ\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e73(16.94%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e6(9.68%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e20(16.00%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e99(16.02%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eⅢ\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e80(18.56%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e12(19.35%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e26(20.80%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e118(19.09%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eGrade, n(%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eWell differentiated (G1)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e107(24.83%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e19(30.65%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e36(28.80%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e162(26.21%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eModerately differentiated (G2)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e195(45.24%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e26(41.94%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e52(41.60%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e273(44.17%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ePoorly differentiated (G3)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e106(24.59%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e13(20.97%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e33(26.40%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e152(24.60%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eUnknown\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e23(5.34%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e4(6.45%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e4(3.20%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e31(5.02%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eProgression within 5 years\u003c/p\u003e \u003cp\u003e(YES/NO/UNKNOWN), n(%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e119/309/3\u003c/p\u003e \u003cp\u003e(27.61%)/(71.69%)/(0.70%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e18/44/0\u003c/p\u003e \u003cp\u003e(29.03%)/(70.97%)/(0)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e36/89/0\u003c/p\u003e \u003cp\u003e(28.80%)/(71.20%)/(0)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e173/442/3\u003c/p\u003e \u003cp\u003e(27.99%)/(71.52%)/(0.49%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDeath within 5 years*\u003c/p\u003e \u003cp\u003e(YES/NO/UNKNOWN), n(%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e82/344/5\u003c/p\u003e \u003cp\u003e(19.03%)/(79.81%)/(1.16%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e13/49/0\u003c/p\u003e \u003cp\u003e(20.97)/(79.03%)/(0)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e26/99/0\u003c/p\u003e \u003cp\u003e(20.80%)/(79.20%)/(0)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e121/492/5\u003c/p\u003e \u003cp\u003e(19.58%)/(79.61%)/(0.81%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colspan=\"5\" nameend=\"c5\" namest=\"c1\"\u003e \u003cp\u003e*Number of patients who died of NSCLC. SD: Standard Deviation\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003e3.2. Performance of AIM-LCpro in predicting prognosis of NSCLC patients\u003c/h3\u003e\n\u003cp\u003e \u003c/p\u003e \u003cp\u003eIn the training, validation, and test sets, the areas under the ROC curves (AUCs) for predicting progression within 5 years were 0.9925, 0.8801, and 0.8084, respectively (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003ea-\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003ec). Similarly, the AUCs for predicting death within 5 years were 0.9826, 0.8477, and 0.8021, respectively (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003ed-\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003ef). These results suggest that our model has the potential to accurately distinguish between patients who will experience progression and those who will experience death.\u003c/p\u003e \u003cp\u003eWe applied the selected thresholds to predict outcomes in the training, validation, and test sets, achieving strong performance (Supplementary Tables\u0026nbsp;5\u0026ndash;13). Specifically, when predicting whether patients' disease would progress within 5 years or whether they would die within 5 years in the test set, our model demonstrated high accuracy (0.7680 and 0.7760, respectively). In the test set, the model exhibited a sensitivity of 0.5556 for predicting progression within 5 years and 0.5385 for predicting death within the same period. Additionally, the model\u0026rsquo;s specificity for predicting progression and death within 5 years was 0.8539 and 0.8384, respectively.\u003c/p\u003e \u003cp\u003eHarrell's C-index was also used to evaluate the performance of AIM-LCpro. In the test set, the Harrell's C-index for predicting progression and death within 5 years was 0.7748 and 0.7775, respectively (Supplementary Table\u0026nbsp;14).\u003c/p\u003e\n\u003ch3\u003e3.3. High-risk and low-risk Groups\u003c/h3\u003e\n\u003cp\u003e \u003c/p\u003e \u003cp\u003eThe AIM-LCpro model was able to categorize patients into high-risk or low-risk groups based on two criteria: predicting progression within 5 years and death within 5 years (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e). For instance, if the model predicted that a patient's disease would progress within 5 years, the patient was categorized as high-risk; otherwise, they were categorized as low-risk. In the test set, there was a statistically significant difference between high-risk and low-risk groups for all patients, with P-values less than 0.0001 and Hazard Ratios (HR) of 4.85 for progression and 4.57 for death (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003ea and \u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003ed).\u003c/p\u003e \u003cp\u003eAmong patients who did not require postoperative interventions, the difference between high-risk and low-risk groups remained significant, with a P-value of 0.0030 and HR of 5.01 for progression, and a P-value of 0.0443 and HR of 4.10 for death (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eb and \u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003ee). Similarly, for patients who required postoperative interventions, the high-risk group demonstrated a significant difference compared to the low-risk group, with P-values less than 0.0001 and HR of 4.34 for progression, and a P-value of 0.0036 and HR of 3.51 for death (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003ec and \u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003ef).\u003c/p\u003e\n\u003ch3\u003e3.4. Consistency between predicted and actual K-M curves\u003c/h3\u003e\n\u003cp\u003e \u003c/p\u003e \u003cp\u003eThe AIM-LCpro model's predictive accuracy for both 5-year progression and death outcomes aligned with actual survival data, with no statistically significant discrepancies observed (for progression: P\u0026thinsp;=\u0026thinsp;0.5029, HR\u0026thinsp;=\u0026thinsp;0.85; for death: P\u0026thinsp;=\u0026thinsp;0.2321, HR\u0026thinsp;=\u0026thinsp;1.10), as shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003ea and Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003ed.\u003c/p\u003e \u003cp\u003eFor patients who did not require postoperative interventions, the model's survival predictions were also consistent with actual outcomes, with no statistically significant differences (P\u0026thinsp;=\u0026thinsp;0.4636, HR\u0026thinsp;=\u0026thinsp;1.48 for progression; P\u0026thinsp;=\u0026thinsp;0.3091, HR\u0026thinsp;=\u0026thinsp;1.76 for death), as illustrated in Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eb and Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003ee.\u003c/p\u003e \u003cp\u003eSimilarly, for patients requiring postoperative interventions, the model maintained its accuracy, showing no significant variance between predicted and actual survival (P\u0026thinsp;=\u0026thinsp;0.0580, HR\u0026thinsp;=\u0026thinsp;0.56 for progression; P\u0026thinsp;=\u0026thinsp;0.5253, HR\u0026thinsp;=\u0026thinsp;0.81 for death), as depicted in Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003ec and Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003ef.\u003c/p\u003e\n\u003ch3\u003e3.5. Investigation of prognostic digital biomarkers through AIM-LCpro\u003c/h3\u003e\n\u003cp\u003e \u003c/p\u003e \u003cp\u003eTo intuitively display the pathological features associated with prognosis, we mapped the prognostic-related features extracted by the AIM-LCpro model onto WSIs in the form of heatmaps. As shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003ea, when comparing the heatmaps of progression and death in the test set, the number of hotspots for progression was greater than for death. Moreover, the hotspots for patients who died within 5 years were largely contained within the progression hotspots. Given that fewer patients died within 5 years compared to those with progression, it is possible that the model learned fewer features for death prediction. The consistency in the distribution of risk hotspots across both groups highlights the model's predictive capabilities. By analyzing these heatmaps, we can better understand the model\u0026rsquo;s predictions, identify areas that contribute to these predictions, and potentially uncover new digital biomarkers.\u003c/p\u003e \u003cp\u003eThe test set included 84 patients with non-mucinous adenocarcinoma (NMA), 36 patients with squamous cell carcinoma (SCC), and 5 patients with other NSCLC types (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eb). In SCC, risk hotspots were predominantly concentrated in the tumor regions (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003ec), where tumor cells were disorderly arranged with enlarged and bizarre nuclei, and frequent mitotic figures were observed (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003ec). Similar to SCC, in NMA, risk hotspots also tended to localize within the tumor areas (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003ed-\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003ej). We further analyzed these regions covered by hotspots.\u003c/p\u003e \u003cp\u003eOf the 84 patients with NMA, risk hotspots were found to be distributed in micropapillary adenocarcinoma (MPA) and solid adenocarcinoma (SPA). As shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003ed and Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003ee, MPA was present in 11 patients, of which 5/11 and 3/11 patients had risk hotspots in the 5-year progression and 5-year death heatmaps, respectively. Surprisingly, 30 patients had SPA, and all of them had risk hotspots in their SPA areas in both the 5-year progression and 5-year death heatmaps, although the instance-level hotspots did not cover all SPA regions (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003ed and Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003ef). These two histological subtypes are coincidentally classified as high-grade patterns in the 5th edition of the WHO classification of thoracic tumors. Interestingly, the most common histological type, lepidic adenocarcinoma (LPA), was not identified as a risk hotspot at all (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003ed and Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eg), further demonstrating the model's reliability as LPA was considered a low-grade histology.\u003c/p\u003e \u003cp\u003eRegarding the other two NMA histological subtypes, acinar adenocarcinoma (APA) and papillary adenocarcinoma (PPA), the distribution of risk hotspots was uneven. For APA, we identified two types of glands more likely to be covered by risk hotspots. As shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eh, the first type consisted of small, irregular glands made up of pleomorphic cells, surrounded by desmoplastic stroma, which was often hypovascular and composed of collagen fibers interspersed with fibroblasts and lymphocytes. The second type consisted of large, irregular glands with multilayered cells, characterized by significant cellular and nuclear pleomorphism. These cells were crowded, and some protruded into the glandular lumen, forming structures similar to a \"papillary\" pattern without a central axis (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eh). The stroma in these areas was loose and rich in neomicrovessels, consistent with the pure stromal regions identified as risk hotspots, as demonstrated in Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003ei. Fewer areas of PPA were identified as risk hotspots, with the model appearing to recognize regions with crowded cell arrangements as high-risk (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003ej). The pathological features related to prognosis identified by our model may serve as digital biomarkers and warrant further validation in future studies.\u003c/p\u003e"},{"header":"Discussion","content":"\u003cp\u003eWe demonstrated that a multimodal model combining dense clinical data with WSIs can successfully predict the prognosis of NSCLC patients undergoing surgery. The model effectively screens for and utilizes prognostic information, achieving a high level of accuracy. To our knowledge, no other prognostic prediction models for surgical NSCLC patients have yet entered clinical application. Our model's ability to predict which patients do or do not require postoperative treatment aligns closely with clinical application scenarios.\u003c/p\u003e \u003cp\u003ePrevious studies relied heavily on manual annotation or predefined image features\u003csup\u003e\u003cspan additionalcitationids=\"CR12 CR13 CR14 CR15 CR16 CR17 CR18\" citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e\u003c/sup\u003e. In contrast, our model does not require manual WSI annotation, significantly reducing the manpower involved. Additionally, it does not rely on predefined image features. Instead, it uses CAMEL2 to automatically screen and extract regions associated with prognosis\u003csup\u003e\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e\u003c/sup\u003e. By avoiding predefined features, the model is free to search for prognostic regions across the entire WSI without limitations. It can categorize patients into high-risk and low-risk groups while predicting 5-year DFS and OS. Moreover, there is no statistically significant difference between the predicted and actual survival outcomes, which strengthens the validity of stratifying patients into high-risk and low-risk groups.\u003c/p\u003e \u003cp\u003eIn clinical practice, physicians need tools to predict patient outcomes. If the model predicts NSCLC patients are at risk of progression or death, they can be recommended for postoperative interventions. Conversely, if patients are predicted to remain free from progression or death, chemotherapy can be avoided, aiding in more personalized treatment plans.\u003c/p\u003e \u003cp\u003eNSCLC exhibits significant tumor heterogeneity\u003csup\u003e\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e,\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e\u003c/sup\u003e. This heterogeneity applies not only to tumor epithelial cells but also to the various microenvironments interacting with tumor cells\u003csup\u003e\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e\u003c/sup\u003e. The digital biomarkers identified by our model from WSIs may reflect this heterogeneity and aid in personalizing treatment for NSCLC patients.\u003c/p\u003e \u003cp\u003eSimilar to traditional biomarkers, digital biomarkers serve as indicators for diagnosis, prognosis, and therapeutic responses and should demonstrate clinical validity\u003csup\u003e\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e,\u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e\u003c/sup\u003e. The clinical utility of new biomarkers can be evaluated by their association with existing biomarkers or by directly proving their usefulness\u003csup\u003e\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e\u003c/sup\u003e. Our model identified areas with a high mitotic index in SCC as risk hotspots, consistent with previous findings that associate a high mitotic index with poor prognosis\u003csup\u003e\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e\u003c/sup\u003e. Additionally, MPA and SPA were identified as risk areas, in line with high-grade growth patterns defined by the latest WHO classification of thoracic tumors. Furthermore, LPA, known for having the best prognosis, was not recognized as a risk area. APA, associated with intermediate prognosis, was widely distributed across the slides. Through heatmap analysis, we identified histological characteristics in APA that may indicate poor prognosis, though further evidence and additional data are needed to verify this.\u003c/p\u003e \u003cp\u003ePredicting the prognosis of NSCLC surgery patients raises ethical concerns. For example, knowing a poor prognosis in advance may affect patients' quality of life. Additionally, the question of who bears responsibility for harm caused by incorrect predictions remains unanswered.\u003c/p\u003e \u003cp\u003eOur study has several limitations. First, to avoid potential information leakage, we did not include information about subsequent treatments after progression, which could have compromised the model's credibility. Second, the clinical benefits of altering postoperative intervention strategies based on the model's predictions have not yet been validated. It remains to be seen how much patients would benefit from such an approach. Finally, our model is based on a relatively small cohort. Further studies with larger sample sizes are needed to enhance the model's ability to predict NSCLC prognosis.\u003c/p\u003e"},{"header":"Methods","content":"\u003cdiv id=\"Sec10\" class=\"Section2\"\u003e \u003ch2\u003e2.1. Study population and inclusion/exclusion criteria\u003c/h2\u003e \u003cp\u003eWe enrolled 641 NSCLC patients who underwent lung surgery at Beijing Chest Hospital between January 2016 and November 2017. After excluding 23 patients, 618 patients (BCH study cohort) were ultimately included in the study.\u003c/p\u003e \u003cp\u003eThe inclusion criteria were: (I) NSCLC patients who underwent radical surgery; (II) NSCLC patients who did not undergo lymph node dissection and had no evidence of distant metastases before surgery; (III) NSCLC patients who agreed to follow-up. The exclusion criteria were: (I) patients with other incurable malignant tumors; (II) patients who died from other diseases before progression within 5 years after surgery; (III) cases where all primary tumor tissues were frozen prior to being fixed in formalin.\u003c/p\u003e \u003cp\u003e The study was conducted in accordance with the principles of the Declaration of Helsinki and approved by the Ethics Committee of Beijing Chest Hospital, Capital Medical University (YJS-2023-16).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec11\" class=\"Section2\"\u003e \u003ch2\u003e2.2. Data collection protocols\u003c/h2\u003e \u003cdiv id=\"Sec12\" class=\"Section3\"\u003e \u003ch2\u003e2.2.1. WSI acquisition\u003c/h2\u003e \u003cp\u003eAll WSIs in this study were formalin-fixed paraffin-embedded (FFPE) whole-slide H\u0026amp;E-stained images of primary tumor tissues. Frozen slides and frozen paraffin slides were excluded. A total of 2,629 WSIs were acquired for the BCH study cohort, scanned using the KFBio KF-PRO-400 scanner, and saved at magnifications of 400\u0026times;, 200\u0026times;, 100\u0026times;, and 50\u0026times;.\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv id=\"Sec13\" class=\"Section2\"\u003e \u003ch2\u003e2.2.2. Clinical data acquisition\u003c/h2\u003e \u003cp\u003eClinical variables were collected from inpatient medical records and included age, gender, smoking history, family history, TNM stages, lymph node dissection and metastasis, tumor size, CT data, pathological data, postoperative treatment, and risk factors. Risk factors included poorly differentiated tumors, vascular invasion, wedge resection, visceral pleural involvement, and unknown lymph node status. All patients were followed up through telephone and outpatient services, with a postoperative follow-up period of over 5 years for all patients.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec14\" class=\"Section2\"\u003e \u003ch2\u003e2.3. Preprocessing of WSIs and clinical data\u003c/h2\u003e \u003cdiv id=\"Sec15\" class=\"Section3\"\u003e \u003ch2\u003e2.3.1. Image segmentation and feature extraction\u003c/h2\u003e \u003cp\u003eGlass regions were filtered out using RGB channel pixel variance calculations, and tissue regions were extracted and cut into 2,048 \u0026times; 2,048-pixel image patches at 20\u0026times; magnification. For image features, we used the pre-trained CAMEL2\u003csup\u003e26\u003c/sup\u003e weakly supervised framework to extract features from patches in the training set, extracting intermediate features from CAMEL2 as the patch's image feature representation. Each patch had two image feature representations: one for progression and one for death. To obtain patient-level image feature representations, we sorted all patch image feature representations in descending order based on the prediction probability output by CAMEL2, then averaged the top 10% of patch-level features to generate the patient-level image feature representation.\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv id=\"Sec16\" class=\"Section2\"\u003e \u003ch2\u003e2.3.2. Clinical data standardization and normalization\u003c/h2\u003e \u003cp\u003eClinical data contained discrete and categorical variables. Discrete variables were normalized by scaling values between 0 and 1. Categorical variables, such as gender and disease type, were one-hot encoded using positional coding, where a one-dimensional vector represented two-dimensional information.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec17\" class=\"Section2\"\u003e \u003ch2\u003e2.4. Development of the AI model\u003c/h2\u003e \u003cdiv id=\"Sec18\" class=\"Section3\"\u003e \u003ch2\u003e2.4.1. Architecture of the multimodal AI model\u003c/h2\u003e \u003cp\u003eThe workflow is shown in Supplementary Fig.\u0026nbsp;1. We employed a two-stage training strategy: first, classification training for prognostic metrics (progression and death), followed by regression training for time prediction (progression time and death time) based on the classification model weights.\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv id=\"Sec19\" class=\"Section2\"\u003e \u003ch2\u003e2.4.2. Training procedure and algorithm selection\u003c/h2\u003e \u003cp\u003ePreprocessed clinical features from the training set patients were passed through a clinical feature network to obtain clinical feature representations. This network consisted of linear layers, Batch Normalization layers, and ReLU layers. The patient-level image and clinical feature representations were concatenated and fed into the classification head network, which output the probability of progression or death. The classification head network comprised linear layers, Batch Normalization layers, and ReLU layers, with two independent linear layers in the final stage. The network was trained using cross-entropy loss.\u003c/p\u003e \u003cp\u003eFor regression training, the clinical feature network weights were frozen, and two separate time prediction head networks were trained to output progression and death times. The time prediction head consisted of linear layers, Batch Normalization layers, ReLU layers, and a final Sigmoid layer. The output was multiplied by 60 to obtain the specific progression or death month. The network was trained using the L1 loss function. During inference, the network simultaneously output classification results and time predictions. For patients classified as negative samples, the corresponding time was set to 60 months; otherwise, the network's original predicted output was retained.\u003c/p\u003e \u003cp\u003eThe model assigned a probability of progression and death within 5 years to each patient. For the training and validation sets, thresholds were selected based on sensitivity, specificity, accuracy, and the Youden index. For patients who did not require postoperative treatment, thresholds of 0.1461 and 0.2092 were selected for progression and death within 5 years, respectively. For patients who required postoperative treatment, thresholds of 0.3123 and 0.3391 were used. These thresholds were applied to the test set.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec20\" class=\"Section2\"\u003e \u003ch2\u003e2.5. Model training and validation strategy\u003c/h2\u003e \u003cdiv id=\"Sec21\" class=\"Section3\"\u003e \u003ch2\u003e2.5.1. Dataset division for training, validation, and testing\u003c/h2\u003e \u003cp\u003eThe BCH study cohort was divided into training (428 patients), validation (62 patients), and test sets (125 patients) for predicting 5-year progression (Supplementary Table\u0026nbsp;1), and into training (426 patients), validation (62 patients), and test sets (125 patients) for predicting 5-year death (Supplementary Table\u0026nbsp;2). The training, validation, and test sets were comparable (Supplementary Tables\u0026nbsp;3 and 4).\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv id=\"Sec22\" class=\"Section2\"\u003e \u003ch2\u003e2.6. Statistical analysis\u003c/h2\u003e \u003cp\u003eCategorical data were evaluated using Pearson's chi-squared test or Fisher's exact test. Measurement data were expressed as mean\u0026thinsp;\u0026plusmn;\u0026thinsp;standard deviation and analyzed using the independent samples t-test or analysis of variance. Survival curves were generated using the Kaplan\u0026ndash;Meier method. When survival curves did not intersect, they were compared using the log\u0026ndash;rank test. When survival curves intersected, the R\u0026eacute;nyi test was utilized to make comparisons. Harrell's C-index was computed in R using the Hmisc package. All tests were two-tailed, and a P-value less than 0.05 was considered statistically significant. Statistical analysis was performed using SPSS software 26.0 or GraphPad Prism 10.\u003c/p\u003e "},{"header":"Declarations","content":"\u003cp\u003e \u003ch2\u003eCompeting interests\u003c/h2\u003e \u003cp\u003eNone declared\u003c/p\u003e \u003c/p\u003e\u003ch2\u003eAuthor Contribution\u003c/h2\u003e\u003cp\u003eN.C. and S.W conceived and designed the study. Y.L. collected clinical data, conducted the analyses and wrote the manuscript. X.C. participated in the interpretation of digital biomarkers and wrote the manuscript. M.Y., J.X., J.Z. and Y.C. participated in the establishment of the model. G.X. and W.W. provided assistance in establishing the model. H.L. provided assistance in the interpretation of digital biomarkers. All authors reviewed the manuscript.\u003c/p\u003e\u003ch2\u003eAcknowledgements\u003c/h2\u003e \u003cp\u003eThis work was supported by Beijing AI\u0026thinsp;+\u0026thinsp;Health Cultivation Innovation Project (No. Z241100007724001), Beijing Municipal Public Welfare Development and Reform Pilot Project for Medical Research Institutes (No. JYY2023-15), Beijing Nova Program, and 2023 Science and Technology Projects of Qinghai Province, China (Basic Research Program, No. 2023-ZJ-732).\u003c/p\u003e\u003ch2\u003eData Availability\u003c/h2\u003e\u003cp\u003eData are available upon reasonable request.\u003c/p\u003e\u003cdiv id=\"Sec24\" class=\"Section2\"\u003e \u003ch2\u003eCode availability\u003c/h2\u003e \u003cp\u003eThe code can be accessed online: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/ThoroughFuture\u003c/span\u003e\u003cspan address=\"https://github.com/ThoroughFuture\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/p\u003e \u003c/div\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eSung, H., \u003cem\u003eet al.\u003c/em\u003e Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J Clin 71, 209\u0026ndash;249 (2021).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eReck, M. \u0026amp; Rabe, K.F. Precision Diagnosis and Treatment for Advanced Non-Small-Cell Lung Cancer. The New England journal of medicine 377, 849\u0026ndash;861 (2017).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eEttinger, D.S., \u003cem\u003eet al.\u003c/em\u003e NCCN Guidelines\u0026reg; Insights: Non-Small Cell Lung Cancer, Version 2.2023. Journal of the National Comprehensive Cancer Network: JNCCN 21, 340\u0026ndash;350 (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJiang, Y., \u003cem\u003eet al.\u003c/em\u003e The impact of adjuvant EGFR-TKIs and 14-gene molecular assay on stage I non-small cell lung cancer with sensitive EGFR mutations. EClinicalMedicine 64, 102205 (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eScagliotti, G.V., \u003cem\u003eet al.\u003c/em\u003e Randomized phase III study of surgery alone or surgery plus preoperative cisplatin and gemcitabine in stages IB to IIIA non-small-cell lung cancer. Journal of clinical oncology: official journal of the American Society of Clinical Oncology 30, 172\u0026ndash;178 (2012).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDouillard, J.Y., \u003cem\u003eet al.\u003c/em\u003e Adjuvant vinorelbine plus cisplatin versus observation in patients with completely resected stage IB-IIIA non-small-cell lung cancer (Adjuvant Navelbine International Trialist Association [ANITA]): a randomised controlled trial. The Lancet. Oncology 7, 719\u0026ndash;727 (2006).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCoudray, N., \u003cem\u003eet al.\u003c/em\u003e Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning. Nat Med 24, 1559\u0026ndash;1567 (2018).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChen, C.L., \u003cem\u003eet al.\u003c/em\u003e An annotation-free whole-slide training approach to pathological classification of lung cancer types using deep learning. Nat Commun 12, 1193 (2021).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDiao, J.A., \u003cem\u003eet al.\u003c/em\u003e Human-interpretable image features derived from densely mapped cancer pathology slides predict diverse molecular phenotypes. Nat Commun 12, 1613 (2021).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWu, J., \u003cem\u003eet al.\u003c/em\u003e Artificial intelligence-assisted system for precision diagnosis of PD-L1 expression in non-small cell lung cancer. Modern pathology: an official journal of the United States and Canadian Academy of Pathology, Inc 35, 403\u0026ndash;411 (2022).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYu, K.H., \u003cem\u003eet al.\u003c/em\u003e Predicting non-small cell lung cancer prognosis by fully automated microscopic pathology image features. Nat Commun 7, 12474 (2016).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLuo, X., \u003cem\u003eet al.\u003c/em\u003e Comprehensive Computational Pathological Image Analysis Predicts Lung Cancer Prognosis. Journal of thoracic oncology: official publication of the International Association for the Study of Lung Cancer 12, 501\u0026ndash;509 (2017).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang, Y., \u003cem\u003eet al.\u003c/em\u003e Multi-scale pathology image texture signature is a prognostic factor for resectable lung adenocarcinoma: a multi-center, retrospective study. J Transl Med 20, 595 (2022).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePan, X., \u003cem\u003eet al.\u003c/em\u003e Computerized tumor-infiltrating lymphocytes density score predicts survival of patients with resectable lung adenocarcinoma. iScience 25, 105605 (2022).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAlsubaie, N., Raza, S.E.A., Snead, D. \u0026amp; Rajpoot, N.M. Growth Pattern Fingerprinting for Automatic Analysis of Lung Adenocarcinoma Overall Survival. Ieee Access 11, 23335\u0026ndash;23346 (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang, H., Xing, F., Su, H., Stromberg, A. \u0026amp; Yang, L. Novel image markers for non-small cell lung cancer classification and survival prediction. BMC bioinformatics 15, 310 (2014).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang, X., \u003cem\u003eet al.\u003c/em\u003e Prediction of recurrence in early stage non-small cell lung cancer using computer extracted nuclear features from digital H\u0026amp;E images. Scientific reports 7, 13543 (2017).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAlsubaie, N.M., Snead, D. \u0026amp; Rajpoot, N.M. Tumour Nuclear Morphometrics Predict Survival in Lung Adenocarcinoma. Ieee Access 9, 12322\u0026ndash;12331 (2021).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKludt, C., \u003cem\u003eet al.\u003c/em\u003e Next-generation lung cancer pathology: Development and validation of diagnostic and prognostic algorithms. Cell reports. Medicine 5, 101697 (2024).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhao, L., \u003cem\u003eet al.\u003c/em\u003e CoADS: Cross attention based dual-space graph network for survival prediction of lung cancer using whole slide images. Comput Methods Programs Biomed 236, 107559 (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDiao, S., \u003cem\u003eet al.\u003c/em\u003e Automated Cellular-Level Dual Global Fusion of Whole-Slide Imaging for Lung Adenocarcinoma Prognosis. Cancers (Basel) 15(2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eShim, W.S., \u003cem\u003eet al.\u003c/em\u003e DeepRePath: Identifying the Prognostic Features of Early-Stage Lung Adenocarcinoma Using Multi-Scale Pathology Images and Deep Convolutional Neural Networks. Cancers (Basel) 13(2021).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZheng, Y., \u003cem\u003eet al.\u003c/em\u003e Graph Attention-Based Fusion of Pathology Images and Gene Expression for Prediction of Cancer Survival. IEEE Trans Med Imaging 43, 3085\u0026ndash;3097 (2024).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHattori, H., Sakashita, S., Tsuboi, M., Ishii, G. \u0026amp; Tanaka, T. Tumor-identification method for predicting recurrence of early-stage lung adenocarcinoma using digital pathology images by machine learning. Journal of pathology informatics 14, 100175 (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKim, P.J., \u003cem\u003eet al.\u003c/em\u003e A new model using deep learning to predict recurrence after surgical resection of lung adenocarcinoma. Scientific reports 14, 6366 (2024).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eXu, G., \u003cem\u003eet al.\u003c/em\u003e CAMEL2: Enhancing Weakly Supervised Learning for Histopathology Images by Incorporating the Significance Ratio. Adv. Intell. Syst. 6, 12 (2024).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGridelli, C., \u003cem\u003eet al.\u003c/em\u003e Non-small-cell lung cancer. Nature reviews. Disease primers 1, 15009 (2015).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChen, Z., Fillmore, C.M., Hammerman, P.S., Kim, C.F. \u0026amp; Wong, K.K. Non-small-cell lung cancers: a heterogeneous set of diseases. Nature reviews. Cancer 14, 535\u0026ndash;546 (2014).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eQuail, D.F. \u0026amp; Joyce, J.A. Microenvironmental regulation of tumor progression and metastasis. Nat Med 19, 1423\u0026ndash;1437 (2013).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eArya, S.S., Dias, S.B., Jelinek, H.F., Hadjileontiadis, L.J. \u0026amp; Pappa, A.M. The convergence of traditional and digital biomarkers through AI-assisted biosensing: A new era in translational diagnostics? Biosensors \u0026amp; bioelectronics 235, 115387 (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMontag, C., Elhai, J.D. \u0026amp; Dagum, P. On Blurry Boundaries When Defining Digital Biomarkers: How Much Biology Needs to Be in a Digital Biomarker? Frontiers in psychiatry 12, 740292 (2021).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSong, Y., Kang, K., Kim, I. \u0026amp; Kim, T.J. Pathological Digital Biomarkers: Validation and Application. Appl. Sci.-Basel 12, 13 (2022).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eG\u0026uuml;rel, D., \u003cem\u003eet al.\u003c/em\u003e The prognostic value of morphologic findings for lung squamous cell carcinoma patients. Pathology, research and practice 212, 1\u0026ndash;9 (2016).\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"npj-precision-oncology","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"npjprecisiononcology","sideBox":"Learn more about [npj Precision Oncology](http://www.nature.com/npjprecisiononcology/)","snPcode":"41698","submissionUrl":"https://submission.springernature.com/new-submission/41698/3","title":"npj Precision Oncology","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"NPJ","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"artificial intelligence, weakly supervised learning, non-small cell lung cancer, whole-slide image, prognosis","lastPublishedDoi":"10.21203/rs.3.rs-5353171/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-5353171/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eWith the rapid progress in artificial intelligence (AI) and digital pathology, prognosis prediction for non-small cell lung cancer (NSCLC) patients has become a critical component of personalized medicine. In this study, we developed a multimodal AI model that integrates whole-slide images and dense clinical data to predict disease-free survival (DFS) and overall survival (OS) with high accuracy for NSCLC patients undergoing surgery. Utilizing data from 618 patients at Beijing Chest Hospital, the model achieved outstanding performance, with areas under the curve of 0.8084 for predicting progression and 0.8021 for predicting death in the test set. Importantly, the model demonstrated accurate prediction of 5-year DFS and OS, achieving accuracies of 0.7680 for DFS and 0.7760 for OS. By categorizing patients into high-risk and low-risk groups, the model identified significant differences in survival outcomes, with hazard ratios of 4.85 for progression and 4.57 for death, both with p-values below 0.0001. Additionally, it uncovered novel digital biomarkers associated with poor prognosis, offering further insights into NSCLC treatment. This model has the potential to revolutionize postoperative decision-making by providing clinicians with a precise tool for predicting DFS and OS, thereby improving patient outcomes.\u003c/p\u003e","manuscriptTitle":"Accurate Prediction of Disease-Free and Overall Survival in Non-Small Cell Lung Cancer Using Patient-Level Multimodal Weakly Supervised Learning","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-12-09 15:46:26","doi":"10.21203/rs.3.rs-5353171/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2024-12-24T16:45:44+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2024-12-16T09:56:25+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2024-12-15T05:58:26+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2024-12-03T13:41:19+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2024-11-27T18:28:24+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"204445113962313051613236311546410816885","date":"2024-11-23T05:28:28+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"103159685403062208590204483106353272538","date":"2024-11-22T17:26:48+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"43418296694163021549032331035312225588","date":"2024-11-22T17:17:42+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"176650484640058592072662861545550512116","date":"2024-11-22T06:30:08+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"17359239918922467520805360573360164980","date":"2024-11-20T08:24:22+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"192378051302505813079783359779716687273","date":"2024-11-09T03:14:19+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"238480871515527906903209790086635598296","date":"2024-11-07T20:59:06+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2024-11-07T20:47:33+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2024-11-07T17:57:54+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2024-11-07T08:46:50+00:00","index":"","fulltext":""},{"type":"submitted","content":"npj Precision Oncology","date":"2024-10-29T09:46:32+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"npj-precision-oncology","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"npjprecisiononcology","sideBox":"Learn more about [npj Precision Oncology](http://www.nature.com/npjprecisiononcology/)","snPcode":"41698","submissionUrl":"https://submission.springernature.com/new-submission/41698/3","title":"npj Precision Oncology","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"NPJ","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"e4916f18-5561-4994-a1f7-a5e11934bc68","owner":[],"postedDate":"December 9th, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[{"id":40658304,"name":"Biological sciences/Cancer/Lung cancer/Non small cell lung cancer"},{"id":40658305,"name":"Health sciences/Biomarkers/Prognostic markers"}],"tags":[],"updatedAt":"2025-06-23T16:00:13+00:00","versionOfRecord":{"articleIdentity":"rs-5353171","link":"https://doi.org/10.1038/s41698-025-00981-y","journal":{"identity":"npj-precision-oncology","isVorOnly":false,"title":"npj Precision Oncology"},"publishedOn":"2025-06-19 15:57:19","publishedOnDateReadable":"June 19th, 2025"},"versionCreatedAt":"2024-12-09 15:46:26","video":"","vorDoi":"10.1038/s41698-025-00981-y","vorDoiUrl":"https://doi.org/10.1038/s41698-025-00981-y","workflowStages":[]},"version":"v1","identity":"rs-5353171","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-5353171","identity":"rs-5353171","version":["v1"]},"buildId":"qtupq5eGEP_6zYnWcrvyt","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2024) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00
unpaywall: last seen: 2026-05-23T02:00:01.238055+00:00

License: CC-BY-4.0