The impact of lymph node resection and survival prediction by machine learning in esophageal squamous cell carcinoma patients over 60 years old: a clinical trial based on the SEER database and Chinese population

preprint OA: closed CC-BY-4.0
📄 Open PDF Full text JSON View at publisher
Full text 124,759 characters · extracted from preprint-html · click to expand
The impact of lymph node resection and survival prediction by machine learning in esophageal squamous cell carcinoma patients over 60 years old: a clinical trial based on the SEER database and Chinese population | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article The impact of lymph node resection and survival prediction by machine learning in esophageal squamous cell carcinoma patients over 60 years old: a clinical trial based on the SEER database and Chinese population Bin Hou, Qifan Zhao, Yanfei Cao, Wei Tian, Tian Ma, Xin Ding, and 3 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-6557728/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted 7 You are reading this latest preprint version Abstract Background: This study aims to investigate the relationship between long-term survival in esophageal squamous cell carcinoma (ESCC) patients and various clinical factors, including age, sex, examined lymph nodes (ELN), tumor size, T stage, N stage, grade, and surgical procedure. These findings aim to provide surgeons with precise information to avoid overtreatment. Materials and Methods: Random forest and Cox proportional hazard models were developed and validated using data from the National Cancer Institute Surveillance, Epidemiology, and End Results (SEER) database (2013–2018) and Shaanxi Provincial Hospital. A web-based recommendation system was constructed to facilitate the selection of an optimal number of lymph nodes for visualizing survival curves and score trends under different conditions. This study has been registered in the Chinese Clinical Trial Registry (No. ChiCTR2400081083). Results: The optimal number of ELN for two randomly selected ESCC patients was determined to be 33. In the N0 group, patients with 15–33 ELN had a median survival of 55.0 months, significantly longer than those with 33 ELN (26.5 months). Statistically significant differences were observed between the 15–33 ELN group and both the 33 ELN groups (P = 0.03) in N0 patients from both the SEER database and our independent cohort (15–33 ELN vs. >33 ELN: 36.0 months vs. 13.0 months, P < 0.001). No significant difference was found in N+ patients, suggesting that the number of retrieved lymph nodes has minimal impact on prognosis in this subgroup. Conclusions: Our findings indicate that examining fewer than 15 or more than 33 lymph nodes increases prognostic risks in ESCC patients over 60 years old. Esophageal squamous cell carcinoma (ESCC) Random survival forest (RSF) Lymph node resection Survival analysis Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 INTRODUCTION Esophageal cancer is the sixth most commonly diagnosed malignancy and the fifth leading cause of cancer-related mortality in China. In 2016, over 250,000 new cases were reported, with esophageal squamous cell carcinoma (ESCC) accounting for approximately 90% of all esophageal cancer cases [ 1 ] . As global life expectancy increases, the incidence of ESCC in patients aged 60 years and older is expected to rise progressively [ 2 , 3 ] . In Japan, ESCC predominantly affects individuals in their 60s and 70s [ 4 ] . Although recent studies suggest that esophagectomy is feasible and effective in elderly patients, those aged 60 years and older often exhibit poorer physical and psychological conditions compared to younger patients. Postoperative mortality (POM) rates ranging from 5.6–18.0% have been reported in this age group [ 5 , 6 ] . Currently, the treatment of ESCC patients is guided by the TNM staging system [ 7 ] , which considers only three indicators to inform clinical decisions. Important characteristics such as age, gender, surgical method, tumor size, and examined lymph nodes are overlooked. Moreover, the TNM staging system lacks the ability to predict individual patient prognosis. Therefore, there is a pressing need to design a more accurate and personalized system to guide clinical treatment and predict the prognosis of elderly (> 60 years old) ESCC patients prior to surgery. Lymphadenectomy is a critical component of radical esophagectomy. While extensive lymph node dissection can reduce recurrence rates and improve long-term survival, it may also increase surgical trauma and postoperative complications, including anastomotic leakage, recurrent laryngeal nerve injury, and pneumonia [ 6 , 8 ] . Thus, optimizing surgical procedures is particularly important for elderly patients. According to the latest NCCN guidelines [ 9 ], dissection of more than 15 lymph nodes is recommended. However, few studies have focused on the long-term prognosis of elderly patients (> 60 years old) undergoing extensive lymph node dissection for esophageal cancer. Two key controversies remain unresolved: whether age itself is an independent risk factor for complications and whether elderly patients benefit from extensive lymph node dissection. To address these questions, we propose integrating pathological data with artificial intelligence (AI) to provide a novel perspective on this issue. Although computational systems have been widely developed to assist in clinical decision-making [ 10 – 12 ] , they have not yet been adopted in ESCC clinical practice. In this study, AI was utilized for deep learning through the National Cancer Institute Surveillance, Epidemiology, and End Results (SEER) database. By combining clinicopathological characteristics with machine learning, we developed a predictive model for ESCC patients. This model provides recommendations and incorporates a module to explain the risk score and 5-year survival curve. Additionally, many studies treat random forest models as "black boxes," lacking transparency and trustworthiness. To address this limitation, we applied explainable AI technology to establish a communication bridge between clinicians and the model [ 11 – 13 ] . This enables clinicians to understand the rationale behind the recommendations provided by the random forest model, thereby facilitating informed decisions regarding the optimal number of examined regional lymph nodes. METHOD Eligibility Criteria and Patient Information Based on the November 2020 submission, we selected 647 medical cases as the training cohort from the SEER Research Plus Data database: Incidence - SEER Research Plus Data, 18 Registries, Nov 2020 Sub (2000–2018) - Linked to County Attributes - Total U.S., 1969–2019 Counties, National Cancer Institute, DCCPS, Surveillance Research Program, released April 2021. Cases were included if they met the following criteria: 1) patients aged > 60 years diagnosed pathologically with esophageal squamous cell carcinoma (ESCC) between January 2013 and December 2018; 2) patients who underwent esophagectomy; 3) expected survival of more than 30 days; 4) regional examined lymph nodes greater than one. Conversely, cases were excluded if they met any of the following criteria: 1) unidentified or absent examined and positive regional lymph nodes; 2) uncertain or missing tumor size data, overall survival (OS); 3) patients who received neoadjuvant radio-chemotherapy or immunotherapy prior to surgery; 4) patients with other malignancies. We identified crucial characteristics of ESCC patients for determining OS, including demographic information (age and sex), ESCC-related attributes (TNM stage, histology type, tumor size, number of regional nodes examined, positive lymph nodes, grade), and treatment details (surgery of the primary site). The primary endpoint was patient survival time. For the external validation cohort, the same inclusion and exclusion criteria as the training group were applied, resulting in the collection of data from 102 patients between January 2013 and December 2018. Pathologists evaluated resected specimens based on the eighth edition of the Union for International Cancer Control (UICC) and the American Joint Committee on Cancer (AJCC) TNM classification system, and all pathological information was collected post-esophagectomy [ 7 , 9 ] . Variables The training and test datasets consist of categorical and numerical covariates. The dataset includes three numerical variables: the number of positive regional nodes, the number of regional nodes examined, and tumor size. Additionally, it features eight categorical variables. To avoid introducing any ordering or hierarchy not present in the original data, we transformed the eight categorical features using one-hot encoding to represent multiple categorical values in binary form. For example, prior to transformation, the "Grade" field could only take four values: Grade I to Grade IV. After transformation, this single field is replaced by four binary fields, each capable of taking a value of 0 or 1, thereby indicating the patient's grade information. Finally, normalization was performed to accelerate the training process. Explainable Machine Learning Survival Model Design In this section, we trained and evaluated the performance of two machine learning models for survival prediction. The random survival forest predicts risk scores based on patient covariates. the predicted risk score equals the sum of the estimated ensemble cumulative hazard function. Mathematically, where n denotes the total number of distinct event times in the training data. First, we developed a random survival forest to predict risk scores. This model fits multiple survival trees on various subsets of our training set and uses averaging to enhance predictive accuracy and prevent overfitting. To tune hyperparameters, we implemented a random search strategy within the following ranges: the number of trees in [2, 300], the maximum number of features for the best node split in [10, 40], and the maximum depth of the tree in [10, 100]. Next, we trained a Cox Proportional Hazards model to formulate the relationship between the log hazard and patient covariates, tuning the hyperparameters using the Random Search Method. Specifically, the penalizer was tuned in [0.0001, 1], and the learning rate in [0.001, 1]. Regarding the explainable module, Shapley values were calculated for each clinical feature to understand their individual contributions to predictions derived from the random forest. Initially, for individual predictions, we calculated each feature's Shapley value and generated a waterfall plot to visualize how the model arrived at its predictions based on each clinical feature value of ESCC patients. To calculate global feature importance, we averaged the absolute Shapley values per feature across the large training dataset. Features were then sorted by decreasing importance and plotted. Finally, a personalized optimal examined regional node number system was developed to guide surgeons in lymph node dissection based on different risk scores and 5-year survival probabilities under varying examined regional nodes. The flowchart of this study is presented in Fig. 1 . Model Training and Evaluation The concordance index (C-index) was utilized to evaluate the model's performance. This metric quantifies the proportion of correctly ordered patient pairs relative to all comparable pairs. The C-index ranges from 0.5 to 1, with higher values indicating superior model performance. In this study, the 647 SEER data records were divided into two subsets: a training set comprising 517 records (80%) and a validation set consisting of 130 records (20%). Five-fold cross-validation was employed to optimize the hyperparameters of each model and select the best-performing model for survival prediction. Subsequently, the two models underwent external validation to assess their generalizability. The entire workflow is illustrated in Fig. 1 . This work adheres to the STROCSS criteria. [ 15 ] Statistical Analysis Clinical outcomes were estimated using the Kaplan‒Meier method, and significant differences were examined using the log-rank test. A p-value < 0.05 was considered statistically significant. All statistical analyses were performed using SPSS software (version 24.0; IBM, Chicago, IL, USA) and GraphPad Prism software (version 6.01; GraphPad Software, Inc., La Jolla, CA, USA). RESULTS Patient Baseline Characteristics In this study, we included 749 ESCC patients aged over 60 years who underwent esophagectomy based on the inclusion criteria: 647 patients were allocated to the training dataset (SEER), and 102 patients were assigned to the testing dataset. In the training cohort, squamous cell carcinoma accounted for the majority of histological types (87.01%), with keratinizing squamous cell carcinoma being the second most common type (9.58%). Regarding N stages, 67.54% of patients were classified as N0, 24.11% as N1, 6.64% as N2, and 1.70% as N3. For T stages, 22.10% of patients were categorized as T1, 19.31% as T2, 51.93% as T3, and 6.64% as T4. The majority of patients (52.70%) were moderately differentiated (Grade II), while 40.03% were poorly differentiated (Grade III). Approximately 41.26% of patients underwent esophagectomy combined with partial gastrectomy, and 16.84% underwent esophagectomy combined with total gastrectomy. These results are summarized in Table 1 . In the test cohort (as presented in Table 1 ), nearly all patients had squamous cell carcinoma as their histological type. Most patients exhibited a moderately differentiated grade (50.98%), while 24.5% were poorly differentiated. Among this population, 11.76% were classified as T1 stage, 16.66% as T2 stage, 51.96% as T3 stage, and 19.60% as T4 stage. Regarding N stages, the majority of ESCC patients were classified as N0 (52.94%), followed by N1 (19.6%), N2 (18.62%), and N3 (8.82%). All patients in this cohort underwent esophagectomy combined with partial gastrectomy. Table 1 Main Baseline Clinical Characteristics of Patients Characteristics Training(n, %) Testing (n, %) Age 85 + years 9(1.39) 1(0.98) 80–84 years 37(5.71) 3(2.94) 75–79 years 89(13.75) 9(8.82) 70–74 years 144(22.25) 18(17.64) 65–69 years 183(28.28) 39(38.23) 60–64 years 185(28.59) 32(31.37) Sex Male 398(61.51) 77(75.49) Female 249(38.48) 25(24.51) Histologic type Squamous cell carcinoma 563(87.01) 143(98.62) Squamous cell carcinoma, keratinizing 62(9.58) 1(0.68) Squamous cell carcinoma, large cell, nonkeratinizing, 18(2.78) 0 Squamous cell carcinoma, adenoid 2(0.30) 0 Squamous cell carcinoma, spindle cell 1(0.15) 1(0.68) Squamous cell carcinoma, small cell, nonkeratinizing 1(0.15) 0 T Stage T1 143(22.10) 12(11.76) T2 125(19.31) 17(16.66) T3 336(51.93) 53(51.96) T4 43(6.64) 20(19.60) N Stage N0 437(67.54) 54(52.94) N1 156(24.11) 20(19.60) N2 43(6.64) 19(18.62) N3 11(1.70) 9(8.82) M Stage M0 611(94.43) 99(97.05) M1a 23(3.55) 3(2.95) M1b 12(1.85) 0 M1NOS 1(0.15) 0 Surgery Type Esophagectomy with Partial gastrectomy 267(41.26) 102(100) Esophagectomy with gastrectomy 109(16.84) 0 Esophagectomy with Partial esophagectomy 97(14.99) 0 Esophagectomy with Total esophagectomy 82(12.67) 0 Esophagectomy with laryngectomy and/or gastrectomy 57(8.80) 0 Esophagectomy 17(2.62) 0 Esophagectomy with Total gastrectomy 13(2.00) 0 Esophagectomy with laryngectomy 4(0.61) 0 Grade Well differentiated; Grade I 43(6.64) 25(24.50) Moderately differentiated; Grade II 341(52.70) 52(50.98) Poorly differentiated; Grade III 259(40.03) 25(24.50) Undifferentiated; anaplastic; Grade IV 4(0.63) 0 Survival Analysis Feature Importance and Overall Survival The global feature importance figure was derived from the model and patient data (Fig. 2 ). First, we identified three key features influencing overall survival, ranked in descending order: the number of examined regional nodes, N stage, and tumor size. Among these, the number of examined regional nodes emerged as the most critical factor affecting the overall survival of ESCC patients. Second, we visualized the trends of each feature in relation to overall survival. For the number of examined regional nodes, an increase in this count corresponded to a decrease in the risk score, indicating a survival benefit for patients. Regarding the N stage, patients with N0 stage exhibited significantly better overall survival outcomes compared to those with other N stages. In terms of the distribution of points for the number of examined regional nodes, a dense cluster of higher numbers of examined regional nodes with small but positive SHAP values suggested that a lower number of examined regional nodes has a more pronounced negative impact on overall survival than the positive influence of a higher number of examined regional nodes. Number of ELNs and Overall Survival We randomly selected two patients from the Chinese database and obtained their individual predicted risk scores and survival probability curves based on variations in the number of examined lymph nodes (ELNs) using our web-based recommendation system (as shown in Fig. 3 ). These two patients from the Chinese database serve as examples to illustrate the functionality of our machine learning model. Based on our prior analysis, a higher number of ELNs is generally associated with better prognosis. For both patients, the trend of the risk score follows a similar pattern: it increases from 0 to approximately 16, reaching a peak value of around 90. It then gradually decreases until the number of ELNs reaches 33, after which the risk score plateaus, indicating that further increases in the number of ELNs do not provide additional survival benefits. This trend aligns with our previous conclusion, confirming that 33 is the optimal number of examined regional lymph nodes for lymph node dissection. A higher risk score corresponds to poorer prognosis. Consequently, in the survival probability curve, when the number of ELNs is 33, the patient achieves the best 5-year overall survival. Notably, the Log-rank test comparing the best and worst 5-year survival curves yielded statistically significant results (Patient 1: P = 0.003; Patient 2: P = 0.007), indicating a significant difference in survival between the two groups. Therefore, a count of 33 ELNs is statistically optimal for these patients. Individual Feature Importance Analysis Regarding individual patient feature importance (Fig. 4 ), we utilized two randomly selected patients from our testing dataset to demonstrate how the model calculates the average risk score (38.229) and predicts each patient's risk score. The selected patients were as follows: Patient 1 (75–79 years old, female, 16 examined lymph nodes [ELNs]) and Patient 2 (75–79 years old, male, 37 ELNs). For Patient 1, initially, all other insignificant features collectively reduced the risk score by 1.83. Subsequently, her moderately differentiated tumor grade and 16 examined regional nodes contributed an increase of 0.087 and 1.96, respectively. Furthermore, her surgery type (coded as 1) and tumor size (60 mm) indicated a relatively large tumor, which further increased the risk score by 3.08 and 3.43, respectively. Considering her age range (75–79 years), an additional increase in risk by 2.53 was observed, likely due to potential comorbidities that may interact with the esophageal tumor and influence treatment outcomes. Additionally, she had 6 positive regional nodes, indicating an N2 stage, which is a critical factor in the N staging system. This resulted in an increase in the risk score by 3.48 for N2, 10.3 for non-N0 stages, and 30.76 for 6 positive regional nodes. Ultimately, the predicted risk score for Patient 1 reached 85.442. For Patient 2, the analysis followed a similar process. His 6 positive regional nodes and N2 stage similarly increased the risk score. However, he underwent a more extensive lymph node dissection, with 37 regional nodes examined, which significantly reduced the risk score by 12.24. Consequently, the final predicted risk score for Patient 2 was 58.253. Subgroup Analysis With respect to pN stages, we stratified the SEER and collected patients into three groups based on the number of retrieved lymph nodes: Group 1 ( 33 LNs). Consequently, the median survival for Group 2 patients (15–33 LNs) was 36 months, which was significantly longer than that of Group 1 patients ( 33 LNs; 31 months). These findings from the SEER database were corroborated by our own dataset. Specifically, the overall survival (OS) was markedly better for patients with 15–33 retrieved lymph nodes compared to those with > 33 retrieved lymph nodes ( P = 0.02). Due to the limited number of patients with < 15 retrieved lymph nodes in our database (n = 7), results for this subgroup were not presented in this analysis. For subgroup analysis, patients with 15–33 retrieved lymph nodes exhibited the best clinical prognosis in the N0 stage, as observed in both the SEER database and our own dataset. The median survival times for Group 1, Group 2, and Group 3 patients were 30 months, 55 months, and 26.5 months, respectively (Group 1 vs. Group 2: P = 0.02; Group 2 vs. Group 3: P = 0.03). Additionally, our data demonstrated that patients with 15–33 examined lymph nodes had significantly better OS compared to those with > 33 examined lymph nodes in the N0 stage (Median survival: 36 months vs. 13 months, P = 0.0004) (as shown in Fig. 5 ). Interestingly, no significant differences were observed among Groups 1, 2, and 3 for N + patients, either in the SEER dataset or in our own dataset (Fig. 5 ). Training and Validation Following the random search process, we finalized the hyperparameters for the random survival forest. The forest comprises 12 decision trees, with a maximum depth of 16 for each tree. For the hyperparameters related to the tree leaves, we set the maximum number of features to 38 when searching for the best split and required a minimum of 2 samples to be present at each leaf node. In the Penalized Cox Proportional Hazards model, we configured the penalizer to 0.006 and the learning rate to 0.001. Subsequently, we conducted 5-fold cross-validation to select the optimal model for survival prediction. The mean concordance index (C-index) of the random forest was 0.746, which is significantly higher than that of the Cox proportional hazards model (0.614) (Table 2 ). Based on the cross-validation results, the random forest was selected for the web-based individual optimal examined lymph node (ELN) recommender system due to its superior performance (C-index: 0.739 vs. 0.646). Table 2 Performance comparison of the random forest and Cox proportional model Model Cross Validation External Validation Concordance Index Mean Concordance Index Random Forest 0.746 0.739 Cox Proportional 0.614 0.646 Web-based Optimal ELN Recommender System Given the superior performance of the random forest model compared to the Cox proportional hazards model, we developed a web-based optimal examined lymph node (ELN) recommender system to assist oncologists in determining the optimal number of ELNs for lymph node dissection. This system is accessible via a browser at [ https://medservice.top/escc ]. Users can input the patient's current clinical status, including demographic information, extent of disease, morphological characteristics, therapeutic details, and TNM staging information, then click the "Predict" button to generate the output page (as shown in Supplementary Fig. 1). For patients undergoing esophagectomy, preoperative assessments such as laboratory tests, ultrasound esophagoscopy, contrast-enhanced CT scans, supraclavicular lymph node ultrasounds, or PET/CT scans are conducted to provide comprehensive information on morphology and clinical TNM stages. At the bottom of the output page, users can adjust the number of ELNs to visualize the predicted 5-year survival probability dynamically. Additionally, the system allows for comparisons between the best-case, worst-case, and guideline-recommended cutoff values (Version 5.2022) for survival curves and risk score trends under varying ELN counts. DISCUSSION Esophageal squamous cell carcinoma (ESCC) is one of the most prevalent cancers globally and imposes a significant burden on human health, particularly in China. Lymph node resection constitutes a critical component of esophagectomy; however, extensive lymphadenectomy may increase the risks of postoperative complications and mortality due to the poor physical and psychological condition of elderly patients [ 3 , 16 , 17 ] . Adeline et al. reported that the 30- and 90-day mortality rates among elderly patients were 8.2% and 11.5%, respectively [ 18 ] . Therefore, this study aimed to elucidate the relationship between the number of examined regional lymph nodes (ELNs) and prognosis while identifying the optimal ELN count for individual ESCC patients. This research not only trained an accurate random forest model to determine the optimal lymph node resection based on each patient's status but also incorporated an explainable module to clarify how the model predicts outcomes according to individual feature importance. To our knowledge, this represents the first explainable recommender system designed to provide optimal regional lymph node surgery plans for ESCC patients. Our model can be likened to a diligent student striving to excel in final examinations (clinical cases: information derived from preoperative assessments). To achieve this, the student must complete numerous daily exercises (training data) and pass mock examinations (testing data). In contrast to prior studies, such as the work by Feng Z. et al. [ 19 ] , our research included exact tumor size as part of the training data for our model. By incorporating this information, we established a relationship between precise tumor size and hazard rate, offering more specific prognostic insights compared to relying solely on TNM staging. When evaluating model performance, it is noteworthy that Adeoye J. et al. trained a random forest model with 716 patients to predict the probability of malignant transformation of oral leukoplakia and lichenoid lesions [ 20 ] . Comparing their results to those obtained using the Cox proportional hazards model (C-index = 0.95 vs. 0.83), they demonstrated a significant improvement in hazard prediction accuracy using deep learning models. In another study, Zhe J. et al. developed a Try-Wise system to visually explain Random Forest outputs via local force plots [ 21 ] . In our research, we utilized waterfall plots to interpret neural network results for both patients and clinicians. Both types of plots are valuable for understanding model predictions and identifying key features. However, differences exist in their capabilities: the waterfall plot (Fig. 3 ) is two-dimensional, enabling clearer visualization of multiple feature importances in individual predictions. To determine the most critical factors influencing long-term survival in ESCC patients, our study collected 647 and 102 patients from the SEER database and thoracic department of a hospital as training and testing sets, respectively. Eight characteristics—age, sex, TNM stage, tumor size and location, surgical procedure, grade, and the number of positive and examined regional lymph nodes—were analyzed using a machine learning model. Surprisingly, the most important factor was the number of examined regional lymph nodes, encouraging thoracic surgeons and pathologists to maximize ELN retrieval efforts. This finding aligns with Xu Guan et al.'s research on rectal cancer, which suggests that 15 ELNs represent the optimal cutoff value for stratifying rectal cancer patient prognosis [ 22 ] . We attribute this observation to two essential reasons: 1) ELNs are closely associated with the staging system and tumor metastasis; 2) A higher number of retrieved lymph nodes indicates more radical resection. Moreover, our machine learning model can predict long-term survival for individual ESCC patients based on their characteristics. These findings indicate that safety and efficiency should be prioritized during lymph node resection, as examining fewer than 15 or more than 33 lymph nodes increases prognostic risks for ESCC patients. Tianbao Yang et al. [ 23 ] also confirmed that examining more than 10 ELNs aids in evaluating ESCC patient survival through external validation. Other features, such as N stage, tumor size, number of positive lymph nodes, and T stage, are also critical prognostic factors. Xiao Gong et al. [ 24 ] demonstrated using an XGBoost model (SHAP value) that reasons such as no cancer-directed surgery (+ 0.27), Surg Prim Site (+ 0.25), age (+ 0.22), and AJCC stage system (+ 0.22) significantly impact long-term ESCC patient prognosis. Consistent with our findings, the TNM staging system is not the sole standard for evaluating clinical outcomes. Thus, our machine learning model provides more precise and explicit information, including survival rates and hazard ratios, to develop personalized strategies for each patient before surgery. Additionally, surgeons can present survival curves and risk images to patients based on varying lymph node dissection ranges. In summary, our machine model offers prognostic information and risk factors prior to treatment, helping avoid excessive lymph node dissection and surgical trauma. Subgroup analysis was conducted according to different ELN counts. Interestingly, statistical differences were observed among N0 patients in both the SEER database and our own dataset. Patients with 15–33 ELNs exhibited better clinical outcomes compared to those with 33 ELNs. Conversely, no significant differences were noted among N + patients across different ELN groups. In other words, retrieving additional lymph nodes in patients with positive lymph nodes is clinically meaningless. Data from Sun Yat-sen University Cancer Center [ 25 ] corroborated these findings. The 5-year cancer-specific survival (CSS) rate for patients with 15 LNs (42.4% vs. 64%) in the LNR0 group (positive lymph node ratio 0%). However, no significant differences were observed between LNR1 (positive lymph node ratio 1–25%) and LNR2 (positive lymph node ratio 26–100%) groups, with 5-year CSS rates of 24.8% vs. 30.9% ( P = 0.291) and 6.8% vs. 4.9% ( P = 0.121) in groups 1 and 2, respectively. These results suggest that the number of retrieved lymph nodes significantly impacts N0 ESCC patients, whereas lymph node metastasis (N+) plays a more pivotal role in determining ESCC patient prognosis than the number of retrieved lymph nodes. Finally, some limitations of our study warrant acknowledgment. First, the sample size of ESCC patients remains limited, and elderly patients over 60 years are underrepresented in randomized controlled trials. Consequently, we conducted a retrospective trial, considering potential differences in esophageal disease spectra between Western and Eastern countries. Thus, we only collected 647 squamous cancer cases from the SEER database. Second, the number and region of positive lymph nodes for N stages remain controversial [ 23 , 25 , 26 ] . The primary reasons for focusing on the number of lymph nodes in our research include: 1) guidelines developed by UICC/AJCC and CSCO [ 27 , 28 ] suggest that N stages are determined by the number of metastatic lymph nodes; 2) regional information on positive lymph nodes in the SEER database is unavailable. Uncovering the relationship between the number and region of positive lymph nodes in ESCC prognosis would be meaningful. To enhance the practical applicability of this method in real-world medical settings, incorporating causal inference into the training and explanation processes is imperative. For instance, we integrated reweighting techniques into our new machine learning model to predict which HR+/HER2- T1-2 N1M0 breast cancer patients benefit from postmastectomy radiotherapy [ 29 ] . CONCLUSION In this retrospective observational study, our data demonstrated that the number of regionally retrieved lymph nodes is the most critical factor influencing clinical outcomes in ESCC patients aged over 60 years. Both safety and efficiency should be carefully considered during lymph node resection. Examining fewer than 15 or more than 33 lymph nodes may increase prognostic risks for ESCC patients. Declarations Consent to Publish declarations This manuscript has not been published or presented elsewhere in part or in entirety, and is not under consideration by another journal. All the authors have approved the manuscript and agree with submission to your esteemed journal. Conflicts of interest disclosure The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. Ethics approval and consent to participate This study was deemed exempt by the Ethics Committee of Shaanxi Provincial People's Hospital ( NO. 2024-R066 ). The need for consent to participate was waived by the Ethics Committee of Shaanxi Provincial People's Hospital as all data were from public and retrospectivesources. This study adhered to the Declaration of Helsinki. Moreover, our research has been registered in Chinese Clinical Trail Registry (ChiCTR), the Reg. NO. was ChiCTR2400081083 ( https://www.chictr.org.cn/ ). Acknowledgements BH, QZ, WT, YF and JY designed the research. TM, BH, LH and XD collected the training and testing dataset. QZ trained the models and developed explainable module and the web-application. BH, QZ,YF and JY wrote the manuscript. BH, WT, TM , XD , LH and LJ edited and critically revised the manuscript in regard to important intellectual content. All authors read and approved the manuscript. Sources of funding This research was supported by a grant from the Science and Technology Foundation of Shaanxi Province (2022JQ-934) and the Shaanxi Provincial People's Hospital (2021JY-07,2023-YJY-13). Data statement The Deep learning model, testing and training dataset, web-app of the recommender system and the code for reproduction of the Concordance Index are openly available at the following link: https://github.com/snowflake-Zhao/escc_node/. If someone wants to request the data from this study, please contact Bin Hou ( [email protected] ) or Qifa Zhao ( [email protected] ). References Zheng R.S. Zhang S.W. Sun K.X., et al. [Cancer statistics in China, 2016]. Zhonghua Zhong Liu Za Zhi 2023;45:212-220.DOI: 10.3760/cma.j.cn112152-20220922-00647. Hvid-Jensen F. Pedersen L. Drewes A.M..Sørensen H.T. Funch-Jensen P. Incidence of adenocarcinoma among patients with Barrett's esophagus. N Engl J Med 2011;365:1375-1383. DOI: 10.1056/NEJMoa1103042. LoCicero J. 3rd.Shaw J.P. Thoracic surgery in the elderly: areas of future research and studies. Thorac Surg Clin 2009;19:409-413, vii. DOI: 10.1016/j.thorsurg.2009.07.003. Watanabe M.Toh Y. Ishihara R., et al. Comprehensive registry of esophageal cancer in Japan, 2015. Esophagus 2023;20:1-28. DOI: 10.1007/s10388-022-00950-5. Arnold M. Soerjomataram I. Ferlay J. Forman D. Global incidence of oesophageal cancer by histological subtype in 2012. Gut 2015;64:381-387. DOI: 10.1136/gutjnl-2014-308124. Harris J.P. Kashyap M. Humphreys J.N. Pollom E.L. Chang D.T. The clinical and financial cost of mental disorders among elderly patients with gastrointestinal malignancies. Cancer Med 2020;9:8912-8922. DOI: 10.1002/cam4.3509. Amin M.B. Greene F.L. Edge S.B., et al. The Eighth Edition AJCC Cancer Staging Manual: Continuing to build a bridge from a population-based to a more "personalized" approach to cancer staging. CA Cancer J Clin 2017;67:93-99. DOI: 10.3322/caac.21388. Tapias L.F. Muniappan A. Wright C.D., et al. Short and long-term outcomes after esophagectomy for cancer in elderly patients. Ann Thorac Surg 2013;95:1741-1748. DOI: 10.1016/j.athoracsur.2013.01.084. Ajani J.A. D'Amico T.A. Bentrem D.J., et al. Esophageal and Esophagogastric Junction Cancers, Version 2.2023, NCCN Clinical Practice Guidelines in Oncology. J Natl Compr Canc Netw 2023;21:393-422. DOI: 10.6004/jnccn.2023.0019. Collet C. Onuma Y. Andreini D., et al. Coronary computed tomography angiography for heart team decision-making in multivessel coronary artery disease. Eur Heart J 2018;39:3689-3698. DOI: 10.1093/eurheartj/ehy581. Le Berre C. Sandborn W.J. Aridhi S., et al. Application of Artificial Intelligence to Gastroenterology and Hepatology. Gastroenterology 2020;158:76-94.e2. DOI: 10.1053/j.gastro.2019.08.058. Uche-Anya E. Anyane-Yeboa A..Berzin T.M. Ghassemi M. May F.P. Artificial intelligence in gastroenterology and hepatology: how to advance clinical practice while ensuring health equity. Gut 2022;71:1909-1915. DOI: 10.1136/gutjnl-2021-326271. Lamens A. Bajorath J. Explaining Accurate Predictions of Multitarget Compounds with Machine Learning Models Derived for Individual Targets. Molecules 2023;28:825. DOI: 10.3390/molecules28020825. Rigatti S.J. Random Forest. J Insur Med 2017;47:31-39. DOI: 10.17849/insm-47-01-31-39.1. Rashid R. Sohrabi C. Kerwan A., et al. The STROCSS 2024 guideline: strengthening the reporting of cohort, cross-sectional and case-control studies in surgery. Int J Surg 2024; DOI: 10.1097/JS9.0000000000001268. Paulus E. Ripat C. Koshenkov V., et al. Esophagectomy for cancer in octogenarians: should we do it. Langenbecks Arch Surg 2017;402:539-545. DOI: 10.1007/s00423-017-1573-x. Won E. Ilson D.H. Management of localized esophageal cancer in the older patient. Oncologist 2014;19:367-374. DOI: 10.1634/theoncologist.2013-0178. Laurent A. Marechal R. Farinella E., et al. Esophageal cancer: Outcome and potential benefit of esophagectomy in elderly patients. Thorac Cancer 2022;13:2699-2710. DOI: 10.1111/1759-7714.14596. Zhu F. Zhong R. Li F., et al. Development and validation of a deep transfer learning-based multivariable survival model to predict overall survival in lung cancer. Transl Lung Cancer Res 2023;12:471-482. DOI: 10.21037/tlcr-23-84. Adeoye J. Koohi-Moghadam M. Lo A., et al. Deep Learning Predicts the Malignant-Transformation-Free Survival of Oral Potentially Malignant Disorders. Cancers (Basel) 2021;13:6054. DOI: 10.3390/cancers13236054. Jin Z. Pei S. Ouyang L., et al. Thy-Wise: An interpretable machine learning model for the evaluation of thyroid nodules. Int J Cancer 2022;151:2229-2243. DOI: 10.1002/ijc.34248. Guan X. Jiao S. Wen R., et al. Optimal examined lymph node number for accurate staging and long-term survival in rectal cancer: a population-based study. Int J Surg 2023;109:2241-2248. DOI: 10.1097/JS9.0000000000000320. Yang T. Huang S. Chen B., et al. A modified survival model for patients with esophageal squamous cell carcinoma based on lymph nodes: A study based on SEER database and external validation. Front Surg 2022;9:989408. DOI: 10.3389/fsurg.2022.989408. Gong X. Zheng B. Xu G., et al. Application of machine learning approaches to predict the 5-year survival status of patients with esophageal cancer. J Thorac Dis 2021;13:6240-6251. DOI: 10.21037/jtd-21-1107. Tan Z. Ma G. Yang H., et al. Can lymph node ratio replace pn categories in the tumor-node-metastasis classification system for esophageal cancer. J Thorac Oncol 2014;9:1214-1221. DOI: 10.1097/JTO.0000000000000216. Hu Y. Hu C. Zhang H., et al. How does the number of resected lymph nodes influence TNM staging and prognosis for esophageal carcinoma. Ann Surg Oncol 2010;17:784-790. DOI: 10.1245/s10434-009-0818-5. Chen Q..Yu L..Hao C., et al. Effectiveness evaluation of organized screening for esophageal cancer: a case-control study in Linzhou city, China. Sci Rep 2016;6:35707. DOI: 10.1038/srep35707. Muro K. Van Cutsem E. Narita Y., et al. Pan-Asian adapted ESMO Clinical Practice Guidelines for the management of patients with metastatic gastric cancer: a JSMO-ESMO initiative endorsed by CSCO, KSMO, MOS, SSO and TOS. Ann Oncol 2019;30:19-33. DOI: 10.1093/annonc/mdy502. Jin L. Zhao Q. Fu S., et al. Who can benefit from postmastectomy radiotherapy among HR +/HER2- T1-2 N1M0 breast cancer patients? An explainable machine learning mortality prediction based approach. Front. Endocrinol. 15:1326009. DOI: 10.3389/fendo.2024.1326009. Additional Declarations No competing interests reported. Supplementary Files SupplementaryFigure1.tif OurownOriginaltestingdata.csv SEEROriginaltrainingdata.csv Cite Share Download PDF Status: Under Review Version 1 posted Editorial decision: Revision requested 08 Aug, 2025 Reviews received at journal 13 Jun, 2025 Reviewers agreed at journal 10 Jun, 2025 Reviewers invited by journal 27 May, 2025 Editor assigned by journal 26 May, 2025 Submission checks completed at journal 22 May, 2025 First submitted to journal 22 May, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-6557728","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":462781547,"identity":"38a3310c-3b4d-4311-9f79-bbb5ae2e4e7b","order_by":0,"name":"Bin Hou","email":"","orcid":"","institution":"Shaanxi Provincial People's Hospital","correspondingAuthor":false,"prefix":"","firstName":"Bin","middleName":"","lastName":"Hou","suffix":""},{"id":462781548,"identity":"ad441956-00ba-4d69-8f18-f3f5efa99859","order_by":1,"name":"Qifan Zhao","email":"","orcid":"","institution":"University of Hong Kong","correspondingAuthor":false,"prefix":"","firstName":"Qifan","middleName":"","lastName":"Zhao","suffix":""},{"id":462781549,"identity":"169dcdb2-1eab-4afd-9486-913e97c5606d","order_by":2,"name":"Yanfei Cao","email":"","orcid":"","institution":"Shaanxi Provincial People's Hospital","correspondingAuthor":false,"prefix":"","firstName":"Yanfei","middleName":"","lastName":"Cao","suffix":""},{"id":462781550,"identity":"a4271e70-1c43-421d-913f-ef660d2d8e41","order_by":3,"name":"Wei Tian","email":"","orcid":"","institution":"Shaanxi Provincial People's Hospital","correspondingAuthor":false,"prefix":"","firstName":"Wei","middleName":"","lastName":"Tian","suffix":""},{"id":462781551,"identity":"78e1d377-31a4-4846-81d6-81720a760120","order_by":4,"name":"Tian Ma","email":"","orcid":"","institution":"Shaanxi Provincial People's Hospital","correspondingAuthor":false,"prefix":"","firstName":"Tian","middleName":"","lastName":"Ma","suffix":""},{"id":462781552,"identity":"2892fd5b-2c6e-4a8c-8697-cffec089f90f","order_by":5,"name":"Xin Ding","email":"","orcid":"","institution":"Shaanxi Provincial People's Hospital","correspondingAuthor":false,"prefix":"","firstName":"Xin","middleName":"","lastName":"Ding","suffix":""},{"id":462781553,"identity":"97b8585e-24fe-4828-abe8-0e344bacd0ae","order_by":6,"name":"Long Jin","email":"","orcid":"","institution":"Shaanxi Provincial People's Hospital","correspondingAuthor":false,"prefix":"","firstName":"Long","middleName":"","lastName":"Jin","suffix":""},{"id":462781554,"identity":"5fc66d04-d677-4e97-b027-5846963c30a3","order_by":7,"name":"Lei Hou","email":"","orcid":"","institution":"Shaanxi Provincial People's Hospital","correspondingAuthor":false,"prefix":"","firstName":"Lei","middleName":"","lastName":"Hou","suffix":""},{"id":462781555,"identity":"a20669f5-75a2-43e9-803d-e3a8c1d1ef84","order_by":8,"name":"JinYan Yuan","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA5UlEQVRIie3PsWrDMBCA4QsCTaJaZdyq9A1kDFnzKhIGTwl09JBBkGAPafGavIW3pNtl0aTsGd03cLZMbR+gJXK3DPrm+7k7gCi6Q5S3A+rqW87a1bHX1TKcPAjM+sGTXAlXqN67cCJB59muJqaD+TT5XJMRhwGWKaPU7AGnlbEUeLPRtxNiXcoYyz9WtjybwyMIf+oCW451yoR4sojubDwFJRahpKApU2pi0dSvpiZjkpJkW61fOiwojEuEn/QDYp5sHRHaOxb85bl9AzRfKDlvL5drtZS8eb+d/ML+Nx5FURT96Qc7Z02F+zFPSAAAAABJRU5ErkJggg==","orcid":"","institution":"Shaanxi Provincial People's Hospital","correspondingAuthor":true,"prefix":"","firstName":"JinYan","middleName":"","lastName":"Yuan","suffix":""}],"badges":[],"createdAt":"2025-04-29 15:08:20","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-6557728/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-6557728/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":83646886,"identity":"5a452479-611c-4cbc-b571-41c082f1fe20","added_by":"auto","created_at":"2025-05-30 05:45:22","extension":"jpg","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":72962,"visible":true,"origin":"","legend":"\u003cp\u003eDiagram of the training and explainable-AI recommendation procedure.\u003c/p\u003e","description":"","filename":"1.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6557728/v1/9be30abd887a307ec4027883.jpg"},{"id":83646895,"identity":"8bcd7290-64df-4cc8-b1bb-3a7343945a9f","added_by":"auto","created_at":"2025-05-30 05:45:22","extension":"jpg","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":60608,"visible":true,"origin":"","legend":"\u003cp\u003eThe actual relationships of the top 9 features from the random survival forest with predicted risk score. The x-axis represents the Shapley value of every instance.\u003c/p\u003e","description":"","filename":"2.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6557728/v1/47b49feafc27762bc40c3ade.jpg"},{"id":83648067,"identity":"71914e4c-f668-461f-b5d5-8f0e1464160d","added_by":"auto","created_at":"2025-05-30 06:09:22","extension":"jpg","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":82388,"visible":true,"origin":"","legend":"\u003cp\u003ePredicted Risk score and 5-year survival probability for esophageal patients under different number of ELNs\u003c/p\u003e","description":"","filename":"3.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6557728/v1/919b87d7005769fc3f92a984.jpg"},{"id":83648118,"identity":"809a8e63-8482-4403-b00f-e74518c7ef54","added_by":"auto","created_at":"2025-05-30 06:17:22","extension":"jpg","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":60783,"visible":true,"origin":"","legend":"\u003cp\u003ePredicted Risk score and 5-year survival probability for esophageal patients under different number of ELNs; Regional nodes positive: n=numbers; N: 0 represents no, 1 represents yes; Tumor size: mm; Surgery primary site means surgery type: 0 represents no, 1 represents yes, 40/50 represents surgery codes from SEER Program Coding and Staging Manual; Age recode: 0 represents no, 1 represents yes; Regional nodes examined: n=numbers; Sex: 0 represents no, 1 represents yes; Grade: 0 represents no, 1 represents yes;\u003c/p\u003e","description":"","filename":"4.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6557728/v1/78f789e0aecd1fc00852fc2a.jpg"},{"id":83648070,"identity":"c0ccec74-2432-417d-92a8-95b490ee16c8","added_by":"auto","created_at":"2025-05-30 06:09:23","extension":"jpg","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":97873,"visible":true,"origin":"","legend":"\u003cp\u003eSubgroup survival analysis of ESCC patients from SEER and Hospital according to different N stages.\u003c/p\u003e","description":"","filename":"5.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6557728/v1/27f83265833ff7d91771ec5b.jpg"},{"id":83648120,"identity":"87cc5d89-6d69-4b25-8cc0-24849bd60ccf","added_by":"auto","created_at":"2025-05-30 06:17:27","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1211335,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6557728/v1/73ecb0a4-4244-46e1-840b-d9094a0ee40e.pdf"},{"id":83648073,"identity":"8aa5b4f2-8b2a-4653-bda4-ea02f78a9448","added_by":"auto","created_at":"2025-05-30 06:09:24","extension":"tif","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":135752,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryFigure1.tif","url":"https://assets-eu.researchsquare.com/files/rs-6557728/v1/d44547db7388f002ebb7f465.tif"},{"id":83646894,"identity":"44efe975-3c50-4f81-90fd-70ee4f0a28d3","added_by":"auto","created_at":"2025-05-30 05:45:22","extension":"csv","order_by":3,"title":"","display":"","copyAsset":false,"role":"supplement","size":11622,"visible":true,"origin":"","legend":"","description":"","filename":"OurownOriginaltestingdata.csv","url":"https://assets-eu.researchsquare.com/files/rs-6557728/v1/5eed75f6ba90d52b2a8773e8.csv"},{"id":83648074,"identity":"cb2ce4b0-8117-4182-a641-5cc71148427a","added_by":"auto","created_at":"2025-05-30 06:09:24","extension":"csv","order_by":4,"title":"","display":"","copyAsset":false,"role":"supplement","size":79522,"visible":true,"origin":"","legend":"","description":"","filename":"SEEROriginaltrainingdata.csv","url":"https://assets-eu.researchsquare.com/files/rs-6557728/v1/af9dd00f5a9695835034e679.csv"}],"financialInterests":"No competing interests reported.","formattedTitle":"\u003cp\u003eThe impact of lymph node resection and survival prediction by machine learning in esophageal squamous cell carcinoma patients over 60 years old: a clinical trial based on the SEER database and Chinese population\u003c/p\u003e","fulltext":[{"header":"INTRODUCTION","content":"\u003cp\u003eEsophageal cancer is the sixth most commonly diagnosed malignancy and the fifth leading cause of cancer-related mortality in China. In 2016, over 250,000 new cases were reported, with esophageal squamous cell carcinoma (ESCC) accounting for approximately 90% of all esophageal cancer cases \u003csup\u003e[\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e]\u003c/sup\u003e. As global life expectancy increases, the incidence of ESCC in patients aged 60 years and older is expected to rise progressively \u003csup\u003e[\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e, \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e]\u003c/sup\u003e. In Japan, ESCC predominantly affects individuals in their 60s and 70s \u003csup\u003e[\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e]\u003c/sup\u003e. Although recent studies suggest that esophagectomy is feasible and effective in elderly patients, those aged 60 years and older often exhibit poorer physical and psychological conditions compared to younger patients. Postoperative mortality (POM) rates ranging from 5.6\u0026ndash;18.0% have been reported in this age group \u003csup\u003e[\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e, \u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e]\u003c/sup\u003e. Currently, the treatment of ESCC patients is guided by the TNM staging system \u003csup\u003e[\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e]\u003c/sup\u003e, which considers only three indicators to inform clinical decisions. Important characteristics such as age, gender, surgical method, tumor size, and examined lymph nodes are overlooked. Moreover, the TNM staging system lacks the ability to predict individual patient prognosis. Therefore, there is a pressing need to design a more accurate and personalized system to guide clinical treatment and predict the prognosis of elderly (\u0026gt;\u0026thinsp;60 years old) ESCC patients prior to surgery.\u003c/p\u003e \u003cp\u003eLymphadenectomy is a critical component of radical esophagectomy. While extensive lymph node dissection can reduce recurrence rates and improve long-term survival, it may also increase surgical trauma and postoperative complications, including anastomotic leakage, recurrent laryngeal nerve injury, and pneumonia \u003csup\u003e[\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e, \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e]\u003c/sup\u003e. Thus, optimizing surgical procedures is particularly important for elderly patients. According to the latest NCCN guidelines \u003csup\u003e[\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e],\u003c/sup\u003e dissection of more than 15 lymph nodes is recommended. However, few studies have focused on the long-term prognosis of elderly patients (\u0026gt;\u0026thinsp;60 years old) undergoing extensive lymph node dissection for esophageal cancer. Two key controversies remain unresolved: whether age itself is an independent risk factor for complications and whether elderly patients benefit from extensive lymph node dissection. To address these questions, we propose integrating pathological data with artificial intelligence (AI) to provide a novel perspective on this issue.\u003c/p\u003e \u003cp\u003eAlthough computational systems have been widely developed to assist in clinical decision-making \u003csup\u003e[\u003cspan additionalcitationids=\"CR11\" citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e]\u003c/sup\u003e, they have not yet been adopted in ESCC clinical practice. In this study, AI was utilized for deep learning through the National Cancer Institute Surveillance, Epidemiology, and End Results (SEER) database. By combining clinicopathological characteristics with machine learning, we developed a predictive model for ESCC patients. This model provides recommendations and incorporates a module to explain the risk score and 5-year survival curve. Additionally, many studies treat random forest models as \"black boxes,\" lacking transparency and trustworthiness. To address this limitation, we applied explainable AI technology to establish a communication bridge between clinicians and the model \u003csup\u003e[\u003cspan additionalcitationids=\"CR12\" citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e]\u003c/sup\u003e. This enables clinicians to understand the rationale behind the recommendations provided by the random forest model, thereby facilitating informed decisions regarding the optimal number of examined regional lymph nodes.\u003c/p\u003e"},{"header":"METHOD","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003eEligibility Criteria and Patient Information\u003c/h2\u003e \u003cp\u003eBased on the November 2020 submission, we selected 647 medical cases as the training cohort from the SEER Research Plus Data database: Incidence - SEER Research Plus Data, 18 Registries, Nov 2020 Sub (2000\u0026ndash;2018) - Linked to County Attributes - Total U.S., 1969\u0026ndash;2019 Counties, National Cancer Institute, DCCPS, Surveillance Research Program, released April 2021. Cases were included if they met the following criteria: 1) patients aged\u0026thinsp;\u0026gt;\u0026thinsp;60 years diagnosed pathologically with esophageal squamous cell carcinoma (ESCC) between January 2013 and December 2018; 2) patients who underwent esophagectomy; 3) expected survival of more than 30 days; 4) regional examined lymph nodes greater than one. Conversely, cases were excluded if they met any of the following criteria: 1) unidentified or absent examined and positive regional lymph nodes; 2) uncertain or missing tumor size data, overall survival (OS); 3) patients who received neoadjuvant radio-chemotherapy or immunotherapy prior to surgery; 4) patients with other malignancies. We identified crucial characteristics of ESCC patients for determining OS, including demographic information (age and sex), ESCC-related attributes (TNM stage, histology type, tumor size, number of regional nodes examined, positive lymph nodes, grade), and treatment details (surgery of the primary site). The primary endpoint was patient survival time. For the external validation cohort, the same inclusion and exclusion criteria as the training group were applied, resulting in the collection of data from 102 patients between January 2013 and December 2018. Pathologists evaluated resected specimens based on the eighth edition of the Union for International Cancer Control (UICC) and the American Joint Committee on Cancer (AJCC) TNM classification system, and all pathological information was collected post-esophagectomy \u003csup\u003e[\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e, \u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e]\u003c/sup\u003e.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eVariables\u003c/h3\u003e\n\u003cp\u003eThe training and test datasets consist of categorical and numerical covariates. The dataset includes three numerical variables: the number of positive regional nodes, the number of regional nodes examined, and tumor size. Additionally, it features eight categorical variables. To avoid introducing any ordering or hierarchy not present in the original data, we transformed the eight categorical features using one-hot encoding to represent multiple categorical values in binary form. For example, prior to transformation, the \"Grade\" field could only take four values: Grade I to Grade IV. After transformation, this single field is replaced by four binary fields, each capable of taking a value of 0 or 1, thereby indicating the patient's grade information. Finally, normalization was performed to accelerate the training process.\u003c/p\u003e\n\u003ch3\u003eExplainable Machine Learning Survival Model Design\u003c/h3\u003e\n\u003cp\u003eIn this section, we trained and evaluated the performance of two machine learning models for survival prediction. The random survival forest predicts risk scores based on patient covariates. the predicted risk score equals the sum of the estimated ensemble cumulative hazard function. Mathematically,\u003c/p\u003e\u003cp\u003e\u003cimg src=\"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAMMAAABGCAYAAABxAiDBAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAAFiUAABYlAUlSJPAAAAnISURBVHhe7d0/SBvvHwfwd35zosM5GDPFok62niUdpRgXi6WhiJRCYqGIUAclqWAnpUMdmtR2sKXYgn9ALXUQktRCOSc3aeLaSNMuSRyaQe72+y25kHtyl9xd/ur384Ib+txj/tzzfO6e+zzPpTZZlmUQQvA/toCQ/yoKBkIKKBgIKaBgIKSAgoGQAgoGQgooGAgpoGCog1wuhxs3bmB/fx9jY2Ow2WyIRCJsNdLmbDTpVhtJkuDz+SAIAjiOw48fP5DJZDA/P4+TkxM4nU72T0iboitDjex2Ow4PD+H1erG5uQme5+FyucBxHFuVtDkKBkIKKBgIKaBgIEQhk5oFAgEZgMxxnByPx+Xe3l4ZgOz1emVRFNnqpE1RNomQAhomEVJAwUBIAQUDIQUUDIQUUDBUEYvFYLPZGrJNT0+zb0daiIKhiomJCYTDYVWZ1+uFKIqQZdn0lkgk4PF4AACZTAaSJKlem7ROU4Nhf38foVCILTZkbW0N3759Y4ubIhQKIRAIFP8tCAJ8Pp+ljszzPI6Pj+H1etldbaeW9jIrFothbGzM0jGtxFS/YScelAmkSlt/f788MzMjp1Ip9s81iaIoB4NBeWpqyvDfsFKplOzxeORgMNiSiSxRFGWv16s6DoFAgK1mWCKRkKempjS/i14bcBwnJxKJYr1oNFpWR9nC4bDqNc1g2yuRSMgcx5W9h97Gfk4jotFoQyYpzfSbsmCQCwdjZWVFhsYsaiqVkmdmZgwfdKUT1dJxFPV8LSuy2Wxxdtno99cjiqL88uVL3QYSRVE1s12pc+3t7RU7q9XPo9A6xolEQu7v71d9BiVAent75Ww2WyyPx+Myx3FyNBotlhnRqGCQdb6TFs1gkEvOOnof0GhDhcNh3dewQumQtTa6ValUqiwgzDa84tOnTxWPS7U2UCjHpFpbGKHVXtFotOw76gWDrFO/mkYGg2yw3+jeM1Rbk7+wsACO45DP5/H27Vt2NwAgmUxidXUVCwsLsNvt7G5LnE4nnj17htXVVSSTSXZ3w/X19eHg4EB1bJ48eWLpszx9+rRux6Ue9Nqro6MDLpdLVbcSl8uFjo4OtriljPQb3WCopru7G52dnWxxkSRJWFxcxNDQEO7evcvursno6CgAYHFxse43XEbwPI/Nzc3iv/P5PCYnJ5HL5VT1rpJK7TUyMgKe51VllfA8j5GREba45ar1G8vB8PPnT6TTaQDAzZs32d04Pz/H2dkZxsfHy85+09PTZTl35ZlhSZKKzxHbbDbNDENfXx+GhoZwdnaG8/Nz1b5mYVOu6XQaDx48uLIBUam92kWj+42lYJAkqTg06u3txePHj9kqOD4+Rj6fx8DAALsLW1tbiMfjxaFGIBAopvDsdjt2dnbQ29uLlZUVHB4eljWO3W6Hy+VCPp/H8fGxah8ARCKRsoNWbbPyAD+bcj09PYXf7y9rhKugUnu1i0b3G1PBIEkS9vf3cfv2bQiCAI/Hg+/fv5c99C5JEo6OjsBxnO5Y8969e8WhxsnJieqMuru7C5/Ph+Xl5bIvpFCuRkdHR2WdLxQKlU12Vdus5tPX19dVcwaCIGBubk5Vp90Zaa920ch+o5tNqpRbfvjwYcX5AuXOXSvTwFKyUkraKxqN6ubfSymZFiPv0WhK6q70GFXKWhhVaR5Ba7OaTTLTXopK2SQrzGaTGtFvql4ZlKUHoigWz4BnZ2e6kQcAFxcXuLy8ZIs1KWfW7e1tPHr0CDs7O/j8+XPF1y91eXmJi4sLtripSi/RiufPnyMWi6nqWVVt+Uc2m1W9t1lm2qtdNKLfVA0GRWmDp9NpLC0tsVUsKX3dL1++wO/3G/5Cepp1z1DK6XSWpVxJ4zSi3xgOBhQa/N27dwCA7e3tmjtQqf7+fnAcZzlnX6qZ9wyleJ7HixcvAADhcBgTExNslZZIJpPo6uqC7RqulK1nvzEVDGBSinoTGMochNaliCVJEhYWFvDq1Stsbm6aztl3dnaiu7ubLW6JXC6H9+/fq7IcrZbL5fDhwwf8/fsXoigik8mUDd/MtFe7aES/MR0MADA7Owuv14t8Pq85geFwOOB2u1VlepaXl+H3+8HzfDHQ0ul01RTlr1+/AAButxsOh4Pd3XSSJMHv98PtdmN9fZ3d3TK7u7vY2NiAw+GAw+GAIAj4+vWrqo6Z9jJL+R1aNgBr1Yh+YykY7HY7Xr9+DY7jNJcz2+12jI+PI5/PI5PJqP62VCQSQU9Pj2o4oeTuBUHAx48fVfW1tMsk0dzcHP78+YOdnZ22+DylwuGwaki4tbWl2m+0vaxwOp34/ft3XYeMDes3qtxSgViyarVSuq409efxeOR4PF5MbSmpN72VguFwWHd148bGRvF19/b22N3FVGalz9ZM4XBYRg0L9rSIJletKseLTemGw2FVG2SzWXllZUVVRzbQXizlO3Mcp9lGZhlNrTay35QFg95aer3G1qofjUaLb6z1BZUDqfW67D5orNpU8uJseSsoJwS2E9ZC65hCIygqzUMon0fp5Mox3tjY0OwIldqrlN78E/vZFNlsVvZ4PJr7ShkJBrZv1LvflAVDPbENUS/K2aHaAW40s2fTVikNmkpBW+/2qnYmLmUkGGpVrd80NBhknfXxtTCyLr0ZjJ5Jr5pGtFe9rgy1MNJvLN1AmxEKheByucpusq2QJAlLS0twu92YnZ1ldzdVvW6Yk8lkzcelnurZXu3CaL9peDCgMHV+69YtjI6Oai6dNeL8/Bw+nw9dXV2aKxKbKRKJIB6P4+DgoGyRohnKHEC7qUd7tQtT/Ya9VDRSPB6X37x5wxYbEgwGNTMEzaaMv+sxro5Go219v1FLeylaPUwy02+aGgxXnXKDWWncaZQyhm3nYKgHo8HQDpoyTLoOcrkcJicnMTQ0VHHcWc3FxQXW1tYwODiIdDqt+ZQgaQ0KBgOUpRbpdBqCIMDhcJStejW6OZ1OBINB5PN5AGjrJ8vqRWsdUDuiYDBgbm4OgiCwxTW7Ck+WWaWslO3p6cH8/HxNiYZmof+5p4pYLIb79++zxXWh/L/RZn55gjQOBQMhBTRMIqSAgoGQAgoGQgooGCyw8vRWMpnEwMCA5mOypD1QMFhg9umtWCyG4eHh4twCaU8UDE0wMTFR828bkcajYLAgl8vhzp07NOS5ZigYTCpdmkGuFwoGk7R+SlLrp9KV7br9aNd1RsFQB1tbW2W/0Kf3syykfVEw1AFdGa4HCoY6oCvD9UDB0ATJZBKDg4M4PT3F8PCwqck60jwUDE3A8zz+/ftXvFoYnawjzUXBYNFVeXqLGEfBYMJVfHqLGEcP9xBSQFcGQgooGAgp+D/HtB5Du4tfoAAAAABJRU5ErkJggg==\" width=\"195\" height=\"70\"\u003e\u003c/p\u003e \u003cp\u003e \u003cspan class=\"InlineEquation\"\u003e \u003c/span\u003e \u003c/p\u003e \u003cp\u003ewhere \u003cem\u003en\u003c/em\u003e denotes the total number of distinct event times in the training data.\u003c/p\u003e \u003cp\u003eFirst, we developed a random survival forest to predict risk scores. This model fits multiple survival trees on various subsets of our training set and uses averaging to enhance predictive accuracy and prevent overfitting. To tune hyperparameters, we implemented a random search strategy within the following ranges: the number of trees in [2, 300], the maximum number of features for the best node split in [10, 40], and the maximum depth of the tree in [10, 100]. Next, we trained a Cox Proportional Hazards model to formulate the relationship between the log hazard and patient covariates, tuning the hyperparameters using the Random Search Method. Specifically, the penalizer was tuned in [0.0001, 1], and the learning rate in [0.001, 1]. Regarding the explainable module, Shapley values were calculated for each clinical feature to understand their individual contributions to predictions derived from the random forest. Initially, for individual predictions, we calculated each feature's Shapley value and generated a waterfall plot to visualize how the model arrived at its predictions based on each clinical feature value of ESCC patients. To calculate global feature importance, we averaged the absolute Shapley values per feature across the large training dataset. Features were then sorted by decreasing importance and plotted. Finally, a personalized optimal examined regional node number system was developed to guide surgeons in lymph node dissection based on different risk scores and 5-year survival probabilities under varying examined regional nodes. The flowchart of this study is presented in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e\n\u003ch3\u003eModel Training and Evaluation\u003c/h3\u003e\n\u003cp\u003eThe concordance index (C-index) was utilized to evaluate the model's performance. This metric quantifies the proportion of correctly ordered patient pairs relative to all comparable pairs. The C-index ranges from 0.5 to 1, with higher values indicating superior model performance. In this study, the 647 SEER data records were divided into two subsets: a training set comprising 517 records (80%) and a validation set consisting of 130 records (20%). Five-fold cross-validation was employed to optimize the hyperparameters of each model and select the best-performing model for survival prediction. Subsequently, the two models underwent external validation to assess their generalizability. The entire workflow is illustrated in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e. This work adheres to the STROCSS criteria. \u003csup\u003e[\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e]\u003c/sup\u003e\u003c/p\u003e \u003cdiv id=\"Sec7\" class=\"Section2\"\u003e \u003ch2\u003eStatistical Analysis\u003c/h2\u003e \u003cp\u003eClinical outcomes were estimated using the Kaplan‒Meier method, and significant differences were examined using the log-rank test. A p-value\u0026thinsp;\u0026lt;\u0026thinsp;0.05 was considered statistically significant. All statistical analyses were performed using SPSS software (version 24.0; IBM, Chicago, IL, USA) and GraphPad Prism software (version 6.01; GraphPad Software, Inc., La Jolla, CA, USA).\u003c/p\u003e \u003c/div\u003e"},{"header":"RESULTS","content":"\u003cdiv id=\"Sec9\" class=\"Section2\"\u003e \u003ch2\u003ePatient Baseline Characteristics\u003c/h2\u003e \u003cp\u003e In this study, we included 749 ESCC patients aged over 60 years who underwent esophagectomy based on the inclusion criteria: 647 patients were allocated to the training dataset (SEER), and 102 patients were assigned to the testing dataset. In the training cohort, squamous cell carcinoma accounted for the majority of histological types (87.01%), with keratinizing squamous cell carcinoma being the second most common type (9.58%). Regarding N stages, 67.54% of patients were classified as N0, 24.11% as N1, 6.64% as N2, and 1.70% as N3. For T stages, 22.10% of patients were categorized as T1, 19.31% as T2, 51.93% as T3, and 6.64% as T4. The majority of patients (52.70%) were moderately differentiated (Grade II), while 40.03% were poorly differentiated (Grade III). Approximately 41.26% of patients underwent esophagectomy combined with partial gastrectomy, and 16.84% underwent esophagectomy combined with total gastrectomy. These results are summarized in Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e.\u003c/p\u003e \u003cp\u003eIn the test cohort (as presented in Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e), nearly all patients had squamous cell carcinoma as their histological type. Most patients exhibited a moderately differentiated grade (50.98%), while 24.5% were poorly differentiated. Among this population, 11.76% were classified as T1 stage, 16.66% as T2 stage, 51.96% as T3 stage, and 19.60% as T4 stage. Regarding N stages, the majority of ESCC patients were classified as N0 (52.94%), followed by N1 (19.6%), N2 (18.62%), and N3 (8.82%). All patients in this cohort underwent esophagectomy combined with partial gastrectomy.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eMain Baseline Clinical Characteristics of Patients\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"3\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCharacteristics\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eTraining(n, %)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eTesting (n, %)\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAge\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e85\u0026thinsp;+\u0026thinsp;years\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e9(1.39)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1(0.98)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e80\u0026ndash;84 years\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e37(5.71)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e3(2.94)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e75\u0026ndash;79 years\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e89(13.75)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e9(8.82)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e70\u0026ndash;74 years\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e144(22.25)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e18(17.64)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e65\u0026ndash;69 years\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e183(28.28)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e39(38.23)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e60\u0026ndash;64 years\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e185(28.59)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e32(31.37)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSex\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMale\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e398(61.51)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e77(75.49)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eFemale\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e249(38.48)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e25(24.51)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eHistologic type\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSquamous cell carcinoma\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e563(87.01)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e143(98.62)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSquamous cell carcinoma, keratinizing\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e62(9.58)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1(0.68)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSquamous cell carcinoma, large cell, nonkeratinizing,\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e18(2.78)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSquamous cell carcinoma, adenoid\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e2(0.30)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSquamous cell carcinoma, spindle cell\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e1(0.15)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1(0.68)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSquamous cell carcinoma, small cell, nonkeratinizing\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e1(0.15)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eT Stage\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eT1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e143(22.10)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e12(11.76)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eT2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e125(19.31)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e17(16.66)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eT3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e336(51.93)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e53(51.96)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eT4\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e43(6.64)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e20(19.60)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eN Stage\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eN0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e437(67.54)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e54(52.94)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eN1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e156(24.11)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e20(19.60)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eN2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e43(6.64)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e19(18.62)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eN3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e11(1.70)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e9(8.82)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eM Stage\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eM0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e611(94.43)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e99(97.05)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eM1a\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e23(3.55)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e3(2.95)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eM1b\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e12(1.85)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eM1NOS\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e1(0.15)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSurgery Type\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eEsophagectomy with Partial gastrectomy\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e267(41.26)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e102(100)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eEsophagectomy with gastrectomy\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e109(16.84)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eEsophagectomy with Partial esophagectomy\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e97(14.99)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eEsophagectomy with Total esophagectomy\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e82(12.67)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eEsophagectomy with laryngectomy and/or gastrectomy\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e57(8.80)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eEsophagectomy\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e17(2.62)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eEsophagectomy with Total gastrectomy\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e13(2.00)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eEsophagectomy with laryngectomy\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e4(0.61)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eGrade\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eWell differentiated; Grade I\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e43(6.64)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e25(24.50)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eModerately differentiated; Grade II\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e341(52.70)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e52(50.98)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ePoorly differentiated; Grade III\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e259(40.03)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e25(24.50)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eUndifferentiated; anaplastic; Grade IV\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e4(0.63)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eSurvival Analysis\u003c/h3\u003e\n\u003cdiv id=\"Sec11\" class=\"Section2\"\u003e \u003ch2\u003eFeature Importance and Overall Survival\u003c/h2\u003e \u003cp\u003eThe global feature importance figure was derived from the model and patient data (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e). First, we identified three key features influencing overall survival, ranked in descending order: the number of examined regional nodes, N stage, and tumor size. Among these, the number of examined regional nodes emerged as the most critical factor affecting the overall survival of ESCC patients. Second, we visualized the trends of each feature in relation to overall survival. For the number of examined regional nodes, an increase in this count corresponded to a decrease in the risk score, indicating a survival benefit for patients. Regarding the N stage, patients with N0 stage exhibited significantly better overall survival outcomes compared to those with other N stages. In terms of the distribution of points for the number of examined regional nodes, a dense cluster of higher numbers of examined regional nodes with small but positive SHAP values suggested that a lower number of examined regional nodes has a more pronounced negative impact on overall survival than the positive influence of a higher number of examined regional nodes.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec12\" class=\"Section2\"\u003e \u003ch2\u003eNumber of ELNs and Overall Survival\u003c/h2\u003e \u003cp\u003eWe randomly selected two patients from the Chinese database and obtained their individual predicted risk scores and survival probability curves based on variations in the number of examined lymph nodes (ELNs) using our web-based recommendation system (as shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e). These two patients from the Chinese database serve as examples to illustrate the functionality of our machine learning model. Based on our prior analysis, a higher number of ELNs is generally associated with better prognosis. For both patients, the trend of the risk score follows a similar pattern: it increases from 0 to approximately 16, reaching a peak value of around 90. It then gradually decreases until the number of ELNs reaches 33, after which the risk score plateaus, indicating that further increases in the number of ELNs do not provide additional survival benefits. This trend aligns with our previous conclusion, confirming that 33 is the optimal number of examined regional lymph nodes for lymph node dissection. A higher risk score corresponds to poorer prognosis. Consequently, in the survival probability curve, when the number of ELNs is 33, the patient achieves the best 5-year overall survival.\u003c/p\u003e \u003cp\u003eNotably, the Log-rank test comparing the best and worst 5-year survival curves yielded statistically significant results (Patient 1: \u003cem\u003eP\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.003; Patient 2: \u003cem\u003eP\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.007), indicating a significant difference in survival between the two groups. Therefore, a count of 33 ELNs is statistically optimal for these patients.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec13\" class=\"Section2\"\u003e \u003ch2\u003eIndividual Feature Importance Analysis\u003c/h2\u003e \u003cp\u003eRegarding individual patient feature importance (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e), we utilized two randomly selected patients from our testing dataset to demonstrate how the model calculates the average risk score (38.229) and predicts each patient's risk score. The selected patients were as follows: Patient 1 (75\u0026ndash;79 years old, female, 16 examined lymph nodes [ELNs]) and Patient 2 (75\u0026ndash;79 years old, male, 37 ELNs). For Patient 1, initially, all other insignificant features collectively reduced the risk score by 1.83. Subsequently, her moderately differentiated tumor grade and 16 examined regional nodes contributed an increase of 0.087 and 1.96, respectively. Furthermore, her surgery type (coded as 1) and tumor size (60 mm) indicated a relatively large tumor, which further increased the risk score by 3.08 and 3.43, respectively. Considering her age range (75\u0026ndash;79 years), an additional increase in risk by 2.53 was observed, likely due to potential comorbidities that may interact with the esophageal tumor and influence treatment outcomes. Additionally, she had 6 positive regional nodes, indicating an N2 stage, which is a critical factor in the N staging system. This resulted in an increase in the risk score by 3.48 for N2, 10.3 for non-N0 stages, and 30.76 for 6 positive regional nodes. Ultimately, the predicted risk score for Patient 1 reached 85.442. For Patient 2, the analysis followed a similar process. His 6 positive regional nodes and N2 stage similarly increased the risk score. However, he underwent a more extensive lymph node dissection, with 37 regional nodes examined, which significantly reduced the risk score by 12.24. Consequently, the final predicted risk score for Patient 2 was 58.253.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec14\" class=\"Section2\"\u003e \u003ch2\u003eSubgroup Analysis\u003c/h2\u003e \u003cp\u003eWith respect to pN stages, we stratified the SEER and collected patients into three groups based on the number of retrieved lymph nodes: Group 1 (\u0026lt;\u0026thinsp;15 LNs), Group 2 (15\u0026ndash;33 LNs), and Group 3 (\u0026gt;\u0026thinsp;33 LNs). Consequently, the median survival for Group 2 patients (15\u0026ndash;33 LNs) was 36 months, which was significantly longer than that of Group 1 patients (\u0026lt;\u0026thinsp;15 LNs; 24 months) and Group 3 patients (\u0026gt;\u0026thinsp;33 LNs; 31 months). These findings from the SEER database were corroborated by our own dataset. Specifically, the overall survival (OS) was markedly better for patients with 15\u0026ndash;33 retrieved lymph nodes compared to those with \u0026gt;\u0026thinsp;33 retrieved lymph nodes (\u003cem\u003eP\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.02). Due to the limited number of patients with \u0026lt;\u0026thinsp;15 retrieved lymph nodes in our database (n\u0026thinsp;=\u0026thinsp;7), results for this subgroup were not presented in this analysis.\u003c/p\u003e \u003cp\u003eFor subgroup analysis, patients with 15\u0026ndash;33 retrieved lymph nodes exhibited the best clinical prognosis in the N0 stage, as observed in both the SEER database and our own dataset. The median survival times for Group 1, Group 2, and Group 3 patients were 30 months, 55 months, and 26.5 months, respectively (Group 1 vs. Group 2: \u003cem\u003eP\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.02; Group 2 vs. Group 3: \u003cem\u003eP\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.03). Additionally, our data demonstrated that patients with 15\u0026ndash;33 examined lymph nodes had significantly better OS compared to those with \u0026gt;\u0026thinsp;33 examined lymph nodes in the N0 stage (Median survival: 36 months vs. 13 months, \u003cem\u003eP\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.0004) (as shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e). Interestingly, no significant differences were observed among Groups 1, 2, and 3 for N\u0026thinsp;+\u0026thinsp;patients, either in the SEER dataset or in our own dataset (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec15\" class=\"Section2\"\u003e \u003ch2\u003eTraining and Validation\u003c/h2\u003e \u003cp\u003eFollowing the random search process, we finalized the hyperparameters for the random survival forest. The forest comprises 12 decision trees, with a maximum depth of 16 for each tree. For the hyperparameters related to the tree leaves, we set the maximum number of features to 38 when searching for the best split and required a minimum of 2 samples to be present at each leaf node. In the Penalized Cox Proportional Hazards model, we configured the penalizer to 0.006 and the learning rate to 0.001.\u003c/p\u003e \u003cp\u003eSubsequently, we conducted 5-fold cross-validation to select the optimal model for survival prediction. The mean concordance index (C-index) of the random forest was 0.746, which is significantly higher than that of the Cox proportional hazards model (0.614) (Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e). Based on the cross-validation results, the random forest was selected for the web-based individual optimal examined lymph node (ELN) recommender system due to its superior performance (C-index: 0.739 vs. 0.646).\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003ePerformance comparison of the random forest and Cox proportional model\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"3\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eModel\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eCross Validation\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eExternal Validation\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eConcordance Index Mean\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eConcordance Index\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRandom Forest\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.746\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.739\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCox Proportional\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.614\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.646\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec16\" class=\"Section2\"\u003e \u003ch2\u003eWeb-based Optimal ELN Recommender System\u003c/h2\u003e \u003cp\u003eGiven the superior performance of the random forest model compared to the Cox proportional hazards model, we developed a web-based optimal examined lymph node (ELN) recommender system to assist oncologists in determining the optimal number of ELNs for lymph node dissection. This system is accessible via a browser at [\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://medservice.top/escc\u003c/span\u003e\u003cspan address=\"https://medservice.top/escc\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e]. Users can input the patient's current clinical status, including demographic information, extent of disease, morphological characteristics, therapeutic details, and TNM staging information, then click the \"Predict\" button to generate the output page (as shown in Supplementary Fig.\u0026nbsp;1). For patients undergoing esophagectomy, preoperative assessments such as laboratory tests, ultrasound esophagoscopy, contrast-enhanced CT scans, supraclavicular lymph node ultrasounds, or PET/CT scans are conducted to provide comprehensive information on morphology and clinical TNM stages. At the bottom of the output page, users can adjust the number of ELNs to visualize the predicted 5-year survival probability dynamically. Additionally, the system allows for comparisons between the best-case, worst-case, and guideline-recommended cutoff values (Version 5.2022) for survival curves and risk score trends under varying ELN counts.\u003c/p\u003e \u003c/div\u003e"},{"header":"DISCUSSION","content":"\u003cp\u003eEsophageal squamous cell carcinoma (ESCC) is one of the most prevalent cancers globally and imposes a significant burden on human health, particularly in China. Lymph node resection constitutes a critical component of esophagectomy; however, extensive lymphadenectomy may increase the risks of postoperative complications and mortality due to the poor physical and psychological condition of elderly patients \u003csup\u003e[\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e, \u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e, \u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e]\u003c/sup\u003e. Adeline et al. reported that the 30- and 90-day mortality rates among elderly patients were 8.2% and 11.5%, respectively \u003csup\u003e[\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e]\u003c/sup\u003e. Therefore, this study aimed to elucidate the relationship between the number of examined regional lymph nodes (ELNs) and prognosis while identifying the optimal ELN count for individual ESCC patients. This research not only trained an accurate random forest model to determine the optimal lymph node resection based on each patient's status but also incorporated an explainable module to clarify how the model predicts outcomes according to individual feature importance. To our knowledge, this represents the first explainable recommender system designed to provide optimal regional lymph node surgery plans for ESCC patients. Our model can be likened to a diligent student striving to excel in final examinations (clinical cases: information derived from preoperative assessments). To achieve this, the student must complete numerous daily exercises (training data) and pass mock examinations (testing data).\u003c/p\u003e \u003cp\u003eIn contrast to prior studies, such as the work by Feng Z. et al. \u003csup\u003e[\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e]\u003c/sup\u003e, our research included exact tumor size as part of the training data for our model. By incorporating this information, we established a relationship between precise tumor size and hazard rate, offering more specific prognostic insights compared to relying solely on TNM staging. When evaluating model performance, it is noteworthy that Adeoye J. et al. trained a random forest model with 716 patients to predict the probability of malignant transformation of oral leukoplakia and lichenoid lesions \u003csup\u003e[\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e]\u003c/sup\u003e. Comparing their results to those obtained using the Cox proportional hazards model (C-index\u0026thinsp;=\u0026thinsp;0.95 vs. 0.83), they demonstrated a significant improvement in hazard prediction accuracy using deep learning models. In another study, Zhe J. et al. developed a Try-Wise system to visually explain Random Forest outputs via local force plots \u003csup\u003e[\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e]\u003c/sup\u003e. In our research, we utilized waterfall plots to interpret neural network results for both patients and clinicians. Both types of plots are valuable for understanding model predictions and identifying key features. However, differences exist in their capabilities: the waterfall plot (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e) is two-dimensional, enabling clearer visualization of multiple feature importances in individual predictions.\u003c/p\u003e \u003cp\u003eTo determine the most critical factors influencing long-term survival in ESCC patients, our study collected 647 and 102 patients from the SEER database and thoracic department of a hospital as training and testing sets, respectively. Eight characteristics\u0026mdash;age, sex, TNM stage, tumor size and location, surgical procedure, grade, and the number of positive and examined regional lymph nodes\u0026mdash;were analyzed using a machine learning model. Surprisingly, the most important factor was the number of examined regional lymph nodes, encouraging thoracic surgeons and pathologists to maximize ELN retrieval efforts. This finding aligns with Xu Guan et al.'s research on rectal cancer, which suggests that 15 ELNs represent the optimal cutoff value for stratifying rectal cancer patient prognosis \u003csup\u003e[\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e]\u003c/sup\u003e. We attribute this observation to two essential reasons: 1) ELNs are closely associated with the staging system and tumor metastasis; 2) A higher number of retrieved lymph nodes indicates more radical resection. Moreover, our machine learning model can predict long-term survival for individual ESCC patients based on their characteristics. These findings indicate that safety and efficiency should be prioritized during lymph node resection, as examining fewer than 15 or more than 33 lymph nodes increases prognostic risks for ESCC patients. Tianbao Yang et al. \u003csup\u003e[\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e]\u003c/sup\u003e also confirmed that examining more than 10 ELNs aids in evaluating ESCC patient survival through external validation. Other features, such as N stage, tumor size, number of positive lymph nodes, and T stage, are also critical prognostic factors. Xiao Gong et al. \u003csup\u003e[\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e]\u003c/sup\u003e demonstrated using an XGBoost model (SHAP value) that reasons such as no cancer-directed surgery (+\u0026thinsp;0.27), Surg Prim Site (+\u0026thinsp;0.25), age (+\u0026thinsp;0.22), and AJCC stage system (+\u0026thinsp;0.22) significantly impact long-term ESCC patient prognosis. Consistent with our findings, the TNM staging system is not the sole standard for evaluating clinical outcomes. Thus, our machine learning model provides more precise and explicit information, including survival rates and hazard ratios, to develop personalized strategies for each patient before surgery. Additionally, surgeons can present survival curves and risk images to patients based on varying lymph node dissection ranges. In summary, our machine model offers prognostic information and risk factors prior to treatment, helping avoid excessive lymph node dissection and surgical trauma.\u003c/p\u003e \u003cp\u003eSubgroup analysis was conducted according to different ELN counts. Interestingly, statistical differences were observed among N0 patients in both the SEER database and our own dataset. Patients with 15\u0026ndash;33 ELNs exhibited better clinical outcomes compared to those with \u0026lt;\u0026thinsp;15 or \u0026gt;\u0026thinsp;33 ELNs. Conversely, no significant differences were noted among N\u0026thinsp;+\u0026thinsp;patients across different ELN groups. In other words, retrieving additional lymph nodes in patients with positive lymph nodes is clinically meaningless. Data from Sun Yat-sen University Cancer Center \u003csup\u003e[\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e]\u003c/sup\u003e corroborated these findings. The 5-year cancer-specific survival (CSS) rate for patients with \u0026lt;\u0026thinsp;15 LNs was significantly shorter than for those with \u0026gt;\u0026thinsp;15 LNs (42.4% vs. 64%) in the LNR0 group (positive lymph node ratio 0%). However, no significant differences were observed between LNR1 (positive lymph node ratio 1\u0026ndash;25%) and LNR2 (positive lymph node ratio 26\u0026ndash;100%) groups, with 5-year CSS rates of 24.8% vs. 30.9% (\u003cem\u003eP\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.291) and 6.8% vs. 4.9% (\u003cem\u003eP\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.121) in groups 1 and 2, respectively. These results suggest that the number of retrieved lymph nodes significantly impacts N0 ESCC patients, whereas lymph node metastasis (N+) plays a more pivotal role in determining ESCC patient prognosis than the number of retrieved lymph nodes.\u003c/p\u003e \u003cp\u003eFinally, some limitations of our study warrant acknowledgment. First, the sample size of ESCC patients remains limited, and elderly patients over 60 years are underrepresented in randomized controlled trials. Consequently, we conducted a retrospective trial, considering potential differences in esophageal disease spectra between Western and Eastern countries. Thus, we only collected 647 squamous cancer cases from the SEER database. Second, the number and region of positive lymph nodes for N stages remain controversial \u003csup\u003e[\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e, \u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e, \u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e]\u003c/sup\u003e. The primary reasons for focusing on the number of lymph nodes in our research include: 1) guidelines developed by UICC/AJCC and CSCO \u003csup\u003e[\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e, \u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e]\u003c/sup\u003e suggest that N stages are determined by the number of metastatic lymph nodes; 2) regional information on positive lymph nodes in the SEER database is unavailable. Uncovering the relationship between the number and region of positive lymph nodes in ESCC prognosis would be meaningful. To enhance the practical applicability of this method in real-world medical settings, incorporating causal inference into the training and explanation processes is imperative. For instance, we integrated reweighting techniques into our new machine learning model to predict which HR+/HER2- T1-2 N1M0 breast cancer patients benefit from postmastectomy radiotherapy \u003csup\u003e[\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e]\u003c/sup\u003e.\u003c/p\u003e"},{"header":"CONCLUSION","content":"\u003cp\u003eIn this retrospective observational study, our data demonstrated that the number of regionally retrieved lymph nodes is the most critical factor influencing clinical outcomes in ESCC patients aged over 60 years. Both safety and efficiency should be carefully considered during lymph node resection. Examining fewer than 15 or more than 33 lymph nodes may increase prognostic risks for ESCC patients.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eConsent to Publish declarations\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis manuscript has not been published or presented elsewhere in part or in entirety, and is not under consideration by another journal. All the authors have approved the manuscript and agree with submission to your esteemed journal.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConflicts of interest disclosure\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe\u0026nbsp;authors\u0026nbsp;declare\u0026nbsp;that\u0026nbsp;the\u0026nbsp;research\u0026nbsp;was\u0026nbsp;conducted\u0026nbsp;in\u0026nbsp;the\u0026nbsp;absence\u0026nbsp;of\u0026nbsp;any\u0026nbsp;commercial\u0026nbsp;or\u0026nbsp;financial relationships that could be construed as a potential conflict of\u0026nbsp;interest.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eEthics approval and consent to participate\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis study was deemed exempt by the Ethics Committee of Shaanxi Provincial People\u0026apos;s Hospital (\u003cem\u003eNO. 2024-R066\u003c/em\u003e). The need for consent to participate was waived by the Ethics Committee of Shaanxi Provincial People\u0026apos;s Hospital as all data were from public and retrospectivesources. This study adhered to the Declaration of Helsinki. Moreover, our research has been registered in Chinese Clinical Trail Registry (ChiCTR), the Reg. NO. was ChiCTR2400081083 \u003cem\u003e(\u003c/em\u003e\u003cem\u003ehttps://www.chictr.org.cn/\u003c/em\u003e\u003cem\u003e).\u003c/em\u003e\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAcknowledgements\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eBH, QZ, WT, YF and JY designed the research.\u0026nbsp;TM, BH, LH\u0026nbsp;and\u0026nbsp;XD collected the training and testing dataset. QZ trained the models and developed explainable module and the web-application. BH, QZ,YF and JY wrote the manuscript. BH, WT, TM , XD , LH\u0026nbsp;and LJ edited and critically revised the manuscript in regard to important intellectual content. All authors read and approved the manuscript.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eSources of funding\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis research was supported by a grant from the Science and Technology Foundation of Shaanxi Province (2022JQ-934) and the Shaanxi Provincial People\u0026apos;s Hospital (2021JY-07,2023-YJY-13).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eData statement\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe Deep learning model, testing and training dataset, web-app of the recommender system and the code for reproduction of the Concordance Index are openly available at the following link:\u0026nbsp;\u003cem\u003ehttps://github.com/snowflake-Zhao/escc_node/.\u003c/em\u003eIf someone wants to request the data from this study, please contact Bin Hou\u0026nbsp;\u003cem\u003e([email protected]\u003c/em\u003e\u003cem\u003e)\u003c/em\u003e or Qifa Zhao \u003cem\u003e\u003cu\u003e([email protected]).\u003c/u\u003e\u003c/em\u003e\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eZheng R.S. Zhang S.W. Sun K.X., et al. [Cancer statistics in China, 2016]. Zhonghua Zhong Liu Za Zhi 2023;45:212-220.DOI: 10.3760/cma.j.cn112152-20220922-00647.\u003c/li\u003e\n\u003cli\u003eHvid-Jensen F. Pedersen L. Drewes A.M..S\u0026oslash;rensen H.T. Funch-Jensen P. Incidence of adenocarcinoma among patients with Barrett\u0026apos;s esophagus. N Engl J Med 2011;365:1375-1383. DOI: 10.1056/NEJMoa1103042.\u003c/li\u003e\n\u003cli\u003eLoCicero J. 3rd.Shaw J.P. Thoracic surgery in the elderly: areas of future research and studies. Thorac Surg Clin 2009;19:409-413, vii. DOI: 10.1016/j.thorsurg.2009.07.003.\u003c/li\u003e\n\u003cli\u003eWatanabe M.Toh Y. Ishihara R., et al. Comprehensive registry of esophageal cancer in Japan, 2015. Esophagus 2023;20:1-28. DOI: 10.1007/s10388-022-00950-5.\u003c/li\u003e\n\u003cli\u003eArnold M. Soerjomataram I. Ferlay J. Forman D. Global incidence of oesophageal cancer by histological subtype in 2012. Gut 2015;64:381-387. DOI: 10.1136/gutjnl-2014-308124.\u003c/li\u003e\n\u003cli\u003eHarris J.P. Kashyap M. Humphreys J.N. Pollom E.L. Chang D.T. The clinical and financial cost of mental disorders among elderly patients with gastrointestinal malignancies. Cancer Med 2020;9:8912-8922. DOI: 10.1002/cam4.3509.\u003c/li\u003e\n\u003cli\u003eAmin M.B. Greene F.L. Edge S.B., et al. The Eighth Edition AJCC Cancer Staging Manual: Continuing to build a bridge from a population-based to a more \u0026quot;personalized\u0026quot; approach to cancer staging. CA Cancer J Clin 2017;67:93-99. DOI: 10.3322/caac.21388.\u003c/li\u003e\n\u003cli\u003eTapias L.F. Muniappan A. Wright C.D., et al. Short and long-term outcomes after esophagectomy for cancer in elderly patients. Ann Thorac Surg 2013;95:1741-1748. DOI: 10.1016/j.athoracsur.2013.01.084.\u003c/li\u003e\n\u003cli\u003eAjani J.A. D\u0026apos;Amico T.A. Bentrem D.J., et al. Esophageal and Esophagogastric Junction Cancers, Version 2.2023, NCCN Clinical Practice Guidelines in Oncology. J Natl Compr Canc Netw 2023;21:393-422. DOI: 10.6004/jnccn.2023.0019.\u003c/li\u003e\n\u003cli\u003eCollet C. Onuma Y. Andreini D., et al. Coronary computed tomography angiography for heart team decision-making in multivessel coronary artery disease. Eur Heart J 2018;39:3689-3698. DOI: 10.1093/eurheartj/ehy581.\u003c/li\u003e\n\u003cli\u003eLe Berre C. Sandborn W.J. Aridhi S., et al. Application of Artificial Intelligence to Gastroenterology and Hepatology. Gastroenterology 2020;158:76-94.e2. DOI: 10.1053/j.gastro.2019.08.058.\u003c/li\u003e\n\u003cli\u003eUche-Anya E. Anyane-Yeboa A..Berzin T.M. Ghassemi M. May F.P. Artificial intelligence in gastroenterology and hepatology: how to advance clinical practice while ensuring health equity. Gut 2022;71:1909-1915. DOI: 10.1136/gutjnl-2021-326271.\u003c/li\u003e\n\u003cli\u003eLamens A. Bajorath J. Explaining Accurate Predictions of Multitarget Compounds with Machine Learning Models Derived for Individual Targets. Molecules 2023;28:825. DOI: 10.3390/molecules28020825.\u003c/li\u003e\n\u003cli\u003eRigatti S.J. Random Forest. J Insur Med 2017;47:31-39. DOI: 10.17849/insm-47-01-31-39.1.\u003c/li\u003e\n\u003cli\u003eRashid R. Sohrabi C. Kerwan A., et al. The STROCSS 2024 guideline: strengthening the reporting of cohort, cross-sectional and case-control studies in surgery. Int J Surg 2024; DOI: 10.1097/JS9.0000000000001268.\u003c/li\u003e\n\u003cli\u003ePaulus E. Ripat C. Koshenkov V., et al. Esophagectomy for cancer in octogenarians: should we do it. Langenbecks Arch Surg 2017;402:539-545. DOI: 10.1007/s00423-017-1573-x.\u003c/li\u003e\n\u003cli\u003eWon E. Ilson D.H. Management of localized esophageal cancer in the older patient. Oncologist 2014;19:367-374. DOI: 10.1634/theoncologist.2013-0178.\u003c/li\u003e\n\u003cli\u003eLaurent A. Marechal R. Farinella E., et al. Esophageal cancer: Outcome and potential benefit of esophagectomy in elderly patients. Thorac Cancer 2022;13:2699-2710. DOI: 10.1111/1759-7714.14596.\u003c/li\u003e\n\u003cli\u003eZhu F. Zhong R. Li F., et al. Development and validation of a deep transfer learning-based multivariable survival model to predict overall survival in lung cancer. Transl Lung Cancer Res 2023;12:471-482. DOI: 10.21037/tlcr-23-84.\u003c/li\u003e\n\u003cli\u003eAdeoye J. Koohi-Moghadam M. Lo A., et al. Deep Learning Predicts the Malignant-Transformation-Free Survival of Oral Potentially Malignant Disorders. Cancers (Basel) 2021;13:6054. DOI: 10.3390/cancers13236054.\u003c/li\u003e\n\u003cli\u003eJin Z. Pei S. Ouyang L., et al. Thy-Wise: An interpretable machine learning model for the evaluation of thyroid nodules. Int J Cancer 2022;151:2229-2243. DOI: 10.1002/ijc.34248.\u003c/li\u003e\n\u003cli\u003eGuan X. Jiao S. Wen R., et al. Optimal examined lymph node number for accurate staging and long-term survival in rectal cancer: a population-based study. Int J Surg 2023;109:2241-2248. DOI: 10.1097/JS9.0000000000000320.\u003c/li\u003e\n\u003cli\u003eYang T. Huang S. Chen B., et al. A modified survival model for patients with esophageal squamous cell carcinoma based on lymph nodes: A study based on SEER database and external validation. Front Surg 2022;9:989408. DOI: 10.3389/fsurg.2022.989408.\u003c/li\u003e\n\u003cli\u003eGong X. Zheng B. Xu G., et al. Application of machine learning approaches to predict the 5-year survival status of patients with esophageal cancer. J Thorac Dis 2021;13:6240-6251. DOI: 10.21037/jtd-21-1107.\u003c/li\u003e\n\u003cli\u003eTan Z. Ma G. Yang H., et al. Can lymph node ratio replace pn categories in the tumor-node-metastasis classification system for esophageal cancer. J Thorac Oncol 2014;9:1214-1221. DOI: 10.1097/JTO.0000000000000216.\u003c/li\u003e\n\u003cli\u003eHu Y. Hu C. Zhang H., et al. How does the number of resected lymph nodes influence TNM staging and prognosis for esophageal carcinoma. Ann Surg Oncol 2010;17:784-790. DOI: 10.1245/s10434-009-0818-5.\u003c/li\u003e\n\u003cli\u003eChen Q..Yu L..Hao C., et al. Effectiveness evaluation of organized screening for esophageal cancer: a case-control study in Linzhou city, China. Sci Rep 2016;6:35707. DOI: 10.1038/srep35707.\u003c/li\u003e\n\u003cli\u003eMuro K. Van Cutsem E. Narita Y., et al. Pan-Asian adapted ESMO Clinical Practice Guidelines for the management of patients with metastatic gastric cancer: a JSMO-ESMO initiative endorsed by CSCO, KSMO, MOS, SSO and TOS. Ann Oncol 2019;30:19-33. DOI: 10.1093/annonc/mdy502.\u003c/li\u003e\n\u003cli\u003eJin L. Zhao Q. Fu S., et al. Who can benefit from postmastectomy radiotherapy among HR +/HER2- T1-2 N1M0 breast cancer patients? An explainable machine learning mortality prediction based approach. Front. Endocrinol. 15:1326009. DOI: 10.3389/fendo.2024.1326009.\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"bmc-medical-informatics-and-decision-making","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"midm","sideBox":"Learn more about [BMC Medical Informatics and Decision Making](http://bmcmedinformdecismak.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/midm/default.aspx","title":"BMC Medical Informatics and Decision Making","twitterHandle":"BMC_series","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"Esophageal squamous cell carcinoma (ESCC), Random survival forest (RSF), Lymph node resection, Survival analysis","lastPublishedDoi":"10.21203/rs.3.rs-6557728/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-6557728/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003e\u003cem\u003e\u003cstrong\u003eBackground: \u003c/strong\u003e\u003c/em\u003eThis study aims to investigate the relationship between long-term survival in esophageal squamous cell carcinoma (ESCC) patients and various clinical factors, including age, sex, examined lymph nodes (ELN), tumor size, T stage, N stage, grade, and surgical procedure. These findings aim to provide surgeons with precise information to avoid overtreatment.\u003c/p\u003e\n\u003cp\u003e\u003cem\u003e\u003cstrong\u003eMaterials and Methods:\u003c/strong\u003e\u003c/em\u003e Random forest and Cox proportional hazard models were developed and validated using data from the National Cancer Institute Surveillance, Epidemiology, and End Results (SEER) database (2013–2018) and Shaanxi Provincial Hospital. A web-based recommendation system was constructed to facilitate the selection of an optimal number of lymph nodes for visualizing survival curves and score trends under different conditions. This study has been registered in the Chinese Clinical Trial Registry (No. ChiCTR2400081083).\u003c/p\u003e\n\u003cp\u003e\u003cem\u003e\u003cstrong\u003eResults:\u003c/strong\u003e\u003c/em\u003e The optimal number of ELN for two randomly selected ESCC patients was determined to be 33. In the N0 group, patients with 15–33 ELN had a median survival of 55.0 months, significantly longer than those with \u0026lt;15 ELN (30.0 months) or \u0026gt;33 ELN (26.5 months). Statistically significant differences were observed between the 15–33 ELN group and both the \u0026lt;15 ELN (P = 0.02) and \u0026gt;33 ELN groups (P = 0.03) in N0 patients from both the SEER database and our independent cohort (15–33 ELN vs. \u0026gt;33 ELN: 36.0 months vs. 13.0 months, P \u0026lt; 0.001). No significant difference was found in N+ patients, suggesting that the number of retrieved lymph nodes has minimal impact on prognosis in this subgroup.\u003c/p\u003e\n\u003cp\u003e\u003cem\u003e\u003cstrong\u003eConclusions:\u003c/strong\u003e\u003c/em\u003eOur findings indicate that examining fewer than 15 or more than 33 lymph nodes increases prognostic risks in ESCC patients over 60 years old.\u003c/p\u003e","manuscriptTitle":"The impact of lymph node resection and survival prediction by machine learning in esophageal squamous cell carcinoma patients over 60 years old: a clinical trial based on the SEER database and Chinese population","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-05-30 05:45:18","doi":"10.21203/rs.3.rs-6557728/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2025-08-08T12:05:46+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-06-13T08:05:25+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"305049513443778003447402148521543015983","date":"2025-06-10T06:25:02+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2025-05-28T01:36:06+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2025-05-26T07:45:03+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2025-05-23T01:17:42+00:00","index":"","fulltext":""},{"type":"submitted","content":"BMC Medical Informatics and Decision Making","date":"2025-05-23T01:17:35+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"bmc-medical-informatics-and-decision-making","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"midm","sideBox":"Learn more about [BMC Medical Informatics and Decision Making](http://bmcmedinformdecismak.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/midm/default.aspx","title":"BMC Medical Informatics and Decision Making","twitterHandle":"BMC_series","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"c1a5c616-8eb3-4628-906f-5b9db399bdbe","owner":[],"postedDate":"May 30th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[],"tags":[],"updatedAt":"2026-04-27T14:10:15+00:00","versionOfRecord":[],"versionCreatedAt":"2025-05-30 05:45:18","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-6557728","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-6557728","identity":"rs-6557728","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00
unpaywall
last seen: 2026-05-23T02:00:01.238055+00:00
License: CC-BY-4.0