Prediction of Pathological Complete Response in Hepatocellular Carcinoma Using Machine Learning Models

doi:10.21203/rs.3.rs-6637416/v1

Prediction of Pathological Complete Response in Hepatocellular Carcinoma Using Machine Learning Models

2025 · doi:10.21203/rs.3.rs-6637416/v1

preprint OA: closed

Full text JSON View at publisher

Full text 81,609 characters · extracted from preprint-html · click to expand

Prediction of Pathological Complete Response in Hepatocellular Carcinoma Using Machine Learning Models | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Prediction of Pathological Complete Response in Hepatocellular Carcinoma Using Machine Learning Models Zhou Ye, Menghui Zhang, Tao Zeng, Chuanhui Peng, Yibo Zhang, JunJun Jia, and 1 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-6637416/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Background Pathological complete response (PCR) in hepatocellular carcinoma (HCC) following conversion therapy is associated with improved prognosis and influences treatment decisions. This study aims to develop and validate machine learning-based predictive models for assessing PCR in HCC patients. Methods This retrospective single-center study included 110 HCC patients after propensity score matching. Four machine learning models—LASSO, RF, XGBoost, and Decision Tree—were developed to predict PCR. After training models, the performance was assessed in the test set. Feature importance was analyzed, and a public visualization tool was developed. Results The RF model demonstrated the highest predictive accuracy (AUC: 0.962), followed by XGBoost (AUC: 0.929), Decision Tree (AUC: 0.874), and LASSO (AUC: 0.799). Key predictive factors included tumor invasion, AFP levels, and tumor diameter. The RF model effectively distinguished PCR and NPCR groups, providing robust prediction capabilities. Conclusion Machine learning models, particularly RF, significantly enhance the accuracy of PCR prediction in HCC patients. This approach highlights the potential of integrating demographic, laboratory, and radiographic data for personalized treatment planning. Pathological complete response Hepatocellular carcinoma Machine learning models Predictive factors Random forest Figures Figure 1 Figure 2 Introduction Hepatocellular carcinoma (HCC) is a common malignancy with high cancer-related mortality. Surgery remains the preferred treatment for achieving favorable survival outcomes in HCC patients. However, only a small proportion of patients are eligible for surgery, as the majority are diagnosed at an advanced stage[ 1 ]. For patients with unresectable HCC, conversion therapies, including locoregional and systematic treatments, are used to downstage the tumor[ 2 ]. Following successful conversion, surgical resection is still a recommended option for radical objectives[ 3 ]. When postoperative pathological examination confirms complete necrosis of the tumor, this is defined as a pathological complete response (PCR). PCR not only reflects the efficacy of conversion therapy but also indicates a better prognosis compared to non-PCR(NPCR) [ 4 – 6 ]. Previous studies have reported that for patients achieving PCR, there is no significant difference in outcomes whether or not they receive postoperative adjuvant therapy[ 5 ]. Furthermore, surgical resection may not be necessary in cases of complete tumor necrosis[ 7 , 8 ]. Therefore, the assessment of tumor response has crucial implications for treatment decision-making both preoperatively and postoperatively. Consequently, it is essential to explore methods for predicting the pathological response of tumors in advance. Algorithms based on machine learning have become increasingly prevalent in supporting clinical decision-making. Currently, several predictive models for liver cancer pathological response have been reported, mostly based on univariate and multivariate regression analysis[ 9 , 10 ]. However, these traditional linear models are limited in their ability to capture complex relationships or integrate multiple features. In contrast, advanced models offer improved performance. For example, the least absolute shrinkage and selection operator (LASSO) model is recognized for its robust feature selection capabilities and interpretability. Meanwhile, random forest (RF), extreme gradient boosting (XGBoost) and Decision Tree, excel in handling complex nonlinear relationships and effectively managing missing values. This study aims to develop more accurate predictive models using various advanced machine learning algorithms to assess the likelihood of PCR in HCC patients. By comparing the predictive performance of multiple models, we seek to identify the best-performing model to develop a convenient visualization tool. Materials and methods Patient population This single-center retrospective study analyzed clinical data collected between January 2019 and December 2021. The inclusion criteria were as follows: (1) patients with an initial diagnosis of unresectable liver cancer, (2) those who received preoperative conversion therapy, (3) patients who subsequently underwent liver resection with available postoperative pathological results, and (4) individuals aged 18–75 years with Child-Pugh A. The pathological analysis was conducted on resected specimens by two independent pathologists. PCR was defined as the absence of viable tumor cells, while the presence of any viable tumor cells was categorized as NPCR. After propensity score matching (PSM) at a 1:1 ratio, a total of 110 patients were included in the analysis, comprising 55 patients in the PCR group and 55 in the NPCR group. Data collection We collected clinical data across three categories: 1) demographic data, such as age and gender; 2) radiographic data, including tumor diameter ( 10 cm), number of tumors (solitary or multiple), with or without portal vein invasion, and clinical tumor grade (I–IV); and 3) laboratory data, such as HBV infection status, HBV DNA status, and AFP levels. Specifically, the clinical tumor grade was assessed based on the China Liver Cancer Staging (CNLC) system. Tumor diameter was determined by the largest diameter from imaging data, and the number of tumors was primarily identified through radiological review. Portal vein invasion presented as the discontinuity of the portal vein wall or the presence of tumor thrombus via radiology. Machine learning models Four machine learning algorithms, i.e., LASSO, RF, XGBoost, and Decision Tree were selected to develop predictive models for PCR, based on their ability to handle complex clinical data, address classification problems (NPCR and PCR), capture nonlinear relationships, and provide feature importance analysis. The importance of each feature varied across methods. In LASSO regression, feature importance was assessed based on the magnitude of the gradients. For decision tree, it was evaluated using the cumulative goodness-of-split measures. In XGBoost and RF, feature importance was determined using the Gini importance metric. After models training, performance evaluation was conducted on the test set, using the confusion matrix, sensitivity, specificity, precision, F1 score, and the area under the curve (AUC) from the receiver operator characteristic curve (ROC). Additionally, the density distribution plot of prediction scores presents an intuitive reflection of the model`s performance, with a distinguishable distribution between sample groups highlighting the model`s discriminative capability. All analysis were carried in R-software (v 4.4.1). Statistical analysis Categorical variables were summarized as numbers (percentages) and compared using the χ² test or Fisher’s exact test. Continuous variables were expressed as mean ± SD for normally distributed data or median (IQR) for non-normally distributed data and compared using the t-test or Mann-Whitney U test, respectively. The R package ‘tableone’ (v0.13.2) was used for statistical analysis. P-value less than 0.05 was considered statistically significant. Survival curves were estimated using Kaplan-Meier analysis and compared with the log-rank test, implemented via the R packages ‘survival’ (v3.7.0) and ‘survminer’ (v0.4.9). Results Patient characteristics A total of 110 patients were included in this study after propensity score matching, with 55 patients assigned to the PCR group and 55 to the NPCR group. Baseline characteristics, including age, sex, HBV infection status and HBV DNA levels were well matched between two groups (Table S1). The 110 samples were then randomly split into a training set (76 samples) and a test set (34 samples). No significant differences in baseline data were observed between the training and test sets (Table 1 ). Table 1 Baseline characteristics between test and train set. Overall (n = 110) train_data (n = 76) test_data (n = 34) P value Age (years) 55.55 ± 11.46 55.43 ± 11.57 55.82 ± 11.37 0.87 Gender (%) 1 Male 96 (87.3) 66 (86.8) 30 (88.2) Female 14 (12.7) 10 (13.2) 4 (11.8) Number of Tumors (%) 1 Solitary 86 (78.2) 59 (77.6) 27 (79.4) Multiple 24 (21.8) 17 (22.4) 7 (20.6) Diameter (%) a) 0.17 10cm 36 (32.7) 23 (30.3) 13 (38.2) Portal Vein Invasion (%) b) 1 Absent 57 (51.8) 39 (51.3) 18 (52.9) Present 53 (48.2) 37 (48.7) 16 (47.1) Grade (%) c) 0.487 I 65 (59.1) 45 (59.2) 20 (58.8) II 19 (17.3) 14 (18.4) 5 (14.7) III 25 (22.7) 17 (22.4) 8 (23.5) IV 1 (0.9) 0 (0.0) 1 (2.9) HBV status (%) 0.152 Negative 16 (14.5) 14 (18.4) 2 (5.9) Positive 94 (85.5) 62 (81.6) 32 (94.1) HBV-DNA status (%) 0.536 Negative 55 (50.0) 40 (52.6) 15 (44.1) Positive 55 (50.0) 36 (47.4) 19 (55.9) AFP level (IQR) (ng/mL) 1,470.04 (8,534.17) 837.04 (4,099.31) 2,884.97 (14,120.54) 0.587 Note: The above clinical data were all collected at patients` first visit. PCR, pathological complete response; NPCR, non pathological complete response; AFP, alpha-fetoprotein; HBV, hepatitis b virus. a) Diameter: recorded as the maximum diameter. In case of multiple tumors, the largest tumor`s maximum diameter is recorded. b) Portal Vein Invasion: identified by radiological review, presenting as the discontinuity of the portal vein wall or the presence of tumor thrombus; c) Grade: classified by CNLC staging system. I: early stage; II: locally advanced; III: regional invasion; IV: metastatic. The average age of the patients was 55.55 years, with the majority being male (87.3%). Most patients had a solitary tumor (78.2%), and the distribution of tumor diameter was relatively even (36.4% vs. 30.9% vs. 32.7%). Approximately half of the patients had portal vein invasion, with 51.8% having portal vein invasion and 48.2% without. Tumor differentiation was predominantly Grade 1(59.1%). The vast majority of patients were positive for HBV infection (85.5%), and the mean AFP level was 1470 ng/ml (Table 1 ). Survival analysis Figure 1 compares the overall survival (OS) and progression-free survival (PFS) rates between the PCR and NPCR groups. The 3-year OS rate in the NPCR group was 65.6%, and the 3-year PFS rate was 44.3%. In the PCR group, the 3-year OS rate was 90.2%, and 3-year PFS rate was 79.8%. Evaluating performance of PCR predictive models The RF model demonstrated excellent performance in predicting pathological outcomes, achieving an AUC of 0.962 (95% CI: 0.907–1) under the ROC curve. Additionally, the colors in the ROC curve help verify the cutoff value, with the yellow color along the diagonal indicating that the optimal cutoff is near the midpoint. Given the balanced proportion of positive and negative samples in the dataset, the default cutoff value is 0.5(Fig. 2 B). Comparatively, the AUC values for the LASSO, Decision Tree, and XGBoost models were 0.799, 0.874, and 0.929, respectively, highlighting strong predictive capabilities across all models. Among these, the RF model exhibited the highest predictive accuracy, followed by the XGBoost model (Table 2 ). Table 2 Performance of PCR predictive models in the test set. Score LASSO Decision Tree RF XGBoost TP 13 12 14 14 FP 2 2 1 1 TN 15 15 16 16 FN 4 5 3 3 sensitivity 0.76470588 0.70588235 0.82352941 0.82352941 specificity 0.88235294 0.88235294 0.94117647 0.94117647 precious 0.86666667 0.85714286 0.93333333 0.93333333 F1 score 0.8125 0.77419355 0.875 0.875 AUC (95%CI) 0.799 (0.637–0.962) 0.874 (0.758–0.99) 0.962 (0.907-1) 0.929 (0.848-1) Note: TP, true-positive; FP, false-positive; TN, true-negative; FN, false-negative; The confusion matrix of the RF model on the test set (34 samples) shows that, among 17 actual NPCR patients, 16 were predicted as NPCR (true-negative) and 1 as PCR (false-positive). Among 17 actual PCR patients, 14 were correctly predicted as PCR (true-positive) and 3 were misclassified as NPCR (false-negative) (Fig. 2 A). These results indicate strong alignment between the predicted and actual situations. The score density distribution of the RF model provides a clear visualization of its predictive performance. Most predictive scores in the PCR group are greater than 0.5, while the majority of scores in the NPCR group are less than 0.5(Fig. 2 D). It demonstrates that the RF model effectively distinguishes between PCR and NPCR patients. Feature importance in PCR predictive models In the RF model, feature importance was assessed using Gini importance analysis. Nine parameters were selected and ranked by their contribution to the model: portal vein invasion, AFP level, age, tumor diameter, grade, number of tumors, DNA level, HBV infection status, and gender (Fig. 2 C). Similarly, portal vein invasion was the most important feature in the LASSO, XGBoost, and Decision Tree models, with the highest ranking in each. In contrast, age, which ranked third in the RF model, was considered the least important in the other three models (Supplementary Fig. S1-S4 ). Finally, we established a public visualization platform based on the best-performing RF model, which is available at: https://datalinkx.shinyapps.io/PCRpredict/ . Discussion Achieving PCR through preoperative conversion therapy reflects a high tumor sensitivity to treatment. Meanwhile, PCR is associated with improved clinical outcomes, including enhanced survival and reduced recurrence rates[ 4 – 6 , 11 ]. Similarly, our study showed significantly prolonged OS and PFS in the PCR group compared to the NPCR group. Given the clinical value of PCR, further research on the PCR subgroups is required. Li et al.[ 8 ] proposed that for patients achieving radiological complete response (RCR), a watch-and-wait strategy yields OS and PFS outcomes comparable to surgical resection. Similarly, Choi et al.[ 12 ]reported no significant difference in long-term survival outcomes between patients who were predicted to achieve PCR but did not undergo surgery and those who underwent liver resection and were pathologically confirmed PCR. Together, these findings highlight the potential of watch-and-wait approach for patients with complete tumor necrosis, offering an alternative to surgical intervention. Furthermore, our previous study demonstrated that for patients who achieve PCR, whether to receive postoperative adjuvant therapy does not result in significantly different outcomes[ 5 ]. Therefore, accurate prediction of the pathological response in clinical practice would influence treatment decisions both preoperatively and postoperatively. Currently, several articles have explored predictive factors for pathological response in liver cancer. Yang et al.[ 4 ] performed a regression analysis and identified that AFP < 100 ng/mL and single tumor were significant predictors for achieving PCR. Lin et al.[ 9 ] conducted univariable and multivariable logistic regression analyses, revealing that HBV DNA load, AFP levels, maximum tumor diameter, preoperative TACE session, and achieving a complete response according to modified Response Evaluation Criteria in Solid Tumors (mRECIST) were significant predictors of PCR. Based on these five factors, they developed a nomogram with a concordance index of 0.80. Huang et al.[ 13 ] reported that using radiographic response alone to predict pCR yielded an AUC of 0.727, while a combination of radiomics and AFP response yielded an AUC of 0.926. Consistent with most previous reports, our models also recognized AFP as an important predictive factor. Nonetheless, portal vein invasion was identified as the most significant factor by all four of our models, while previous reports on PCR prediction did not mention portal vein invasion as a factor. Among our four predictive models, RF performed the best, with an AUC value of 0.962. The other three models, ranked by performance from highest to lowest, are XGBoost (AUC 0.929), Decision Tree (AUC 0.874), and LASSO model (AUC 0.799). Since both RF and XGBoost are algorithms capable of handling nonlinear relationships and high-dimensional data, whereas LASSO is designed to address linear relationships, the superior predictive performance of the former two algorithms is reasonable. To our knowledge, this is the first study that utilizing four different machine learning algorithms to develop pathological prediction models of HCC, and one of the models` predictive accuracy is the highest among all previously reported studies. In clinical practice, the therapeutic efficacy of HCC is primarily evaluated through radiomics, such as the mRECIST criteria and the WHO criteria. Wen et al.[ 14 ] utilized ten imaging parameters to construct a SMOTE model for predicting PCR, achieving an AUC value of 0.843. Nonetheless, a tumor response on radiology does not correlate with pathological response[ 15 ]. Additionally, no significant differences in survival outcomes were observed between radiographic complete response (RCR) and those without RCR[ 16 ]. Therefore, RCR alone is insufficient as a direct prognostic reference. In our study, the predictive factors derived from imaging are mainly portal vein invasion, number of tumors, diameter and CNLC staging. We combined demographic, laboratory and radiographic data to establish predictive models. However, there are several limitations of our study. First, this is a retrospective study, which carries inherent risk of biases in data collection and analysis. Second, a relatively small sample size and the single-center nature of this study may introduce selection bias. Third, molecular markers were not incorporated into the predictive models. Additionally, some studies use more detailed classifications of pathological responses, such as complete response, major response (pathological response ≥ 50%), and minor response (pathological response < 50%)[ 17 ]. Our predictive procedures did not adopt such detailed classifications. Further research is required to optimize the model. Conclusions We developed four models to predict PCR using different machine learning algorithms, with the RF model performing the best, achieving an AUC of 0.962. The model shows that the primary factor affecting pathological results is portal vein invasion, followed by AFP level. This finding suggests that machine learning can potentially improve the accuracy of prediction, thereby influencing treatment strategies. Declarations Acknowledgements Not applicable. Authors’ contributions ZY and MZ were involved in Conceptualization, writing – original draft; TZ was involved in Methodology, formal analysis; CP was involved in data curation; YZ and JJ were involved in validation, and visualization; SY was involved in conceptualization, writing – review and editing. Funding Not applicable. Data availability The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request. Ethics approval and consent to participate The requirement for ethical approval and informed consent of patients has been waived by the review board of the First Affiliated Hospital of Zhejiang University School of Medicine. Consent for publication Not applicable. Competing interests The authors declare no competing financial interests. References Wang MD, Xu XJ, Wang KC, Diao YK, Xu JH, Gu LH, et al. Conversion therapy for advanced hepatocellular carcinoma in the era of precision medicine: Current status, challenges and opportunities. Cancer Sci. 2024;115:2159–69. Zhou H, Song T. Conversion therapy and maintenance therapy for primary hepatocellular carcinoma. Biosci Trends. 2021;15:155–60. Sun HC, Zhou J, Wang Z, Liu X, Xie Q, Jia W, et al. Alliance of Liver Cancer Conversion Therapy. Chinese expert consensus on conversion therapy for hepatocellular carcinoma (2021 edition). Hepatobiliary Surg Nutr. 2022;11:227–52. Yang K, Sung PS, You YK, Kim DG, Oh JS, Chun HJ, et al. Pathologic complete response to chemoembolization improves survival outcomes after curative surgery for hepatocellular carcinoma: predictive factors of response. HPB (Oxford). 2019;21:1718–26. Jia J, Ding C, Mao M, Gao F, Shao Z, Zhang M, et al. Pathological complete response after conversion therapy in unresectable hepatocellular carcinoma: a retrospective study. BMC Gastroenterol. 2024;24:242. Zeng ZX, Wu JY, Wu JY, Zhang ZB, Wang K, Zhuang SW, et al. Prognostic Value of Pathological Response for Patients with Unresectable Hepatocellular Carcinoma Undergoing Conversion Surgery. Liver Cancer. 2024;13:498–508. Wu JY, Wu JY, Fu YK, Ou XY, Li SQ, Zhang ZB, et al. Outcomes of Salvage Surgery Versus Non-Salvage Surgery for Initially Unresectable Hepatocellular Carcinoma After Conversion Therapy with Transcatheter Arterial Chemoembolization Combined with Lenvatinib Plus Anti-PD-1 Antibody: A Multicenter Retrospective Study. Ann Surg Oncol. 2024;31:3073–83. Li B, Wang C, He W, Qiu J, Zheng Y, Zou R, et al. Watch-and-wait strategy vs. resection in patients with radiologic complete response after conversion therapy for initially unresectable hepatocellular carcinoma: a propensity score-matching comparative study. Int J Surg. 2024;110:2545–55. Lin J, Li X, Shi X, Zhang L, Liu H, Liu J, et al. Nomogram for predicting pathologic complete response after transarterial chemoembolization in patients with hepatocellular carcinoma. Ann Transl Med. 2021;9:1130. Chen Q, Deng Y, Zhao C, Huang Z, Zhang W, Yang Y, et al. Nomogram for tumour response based on prospective cohorts of hepatocellular carcinoma patients receiving immunotherapy combined with targeted therapy: development and validation. Ann Transl Med. 2023;11:199. Agopian VG, Morshedi MM, McWilliams J, Harlander-Locke MP, Markovic D, Zarrinpar A et al. Complete pathologic response to pretransplant locoregional therapy for hepatocellular carcinoma defines cancer cure after liver transplantation: analysis of 501 consecutively treated patients. Ann Surg. 2015;262: 536 – 45; discussion 543-5. Liu L, Wang X, Feng J, Cheng S. Comment on Is liver resection still required for patients who have predictive factors for complete pathologic necrosis after downstaging treatments of locally advanced hepatocellular carcinoma? Eur J Surg Oncol. 2025;51(8):110032. Huang C, Zhu XD, Shen YH, Xu B, Wu D, Ji Y, et al. Radiographic and alpha-fetoprotein response predict pathologic complete response to immunotherapy plus a TKI in hepatocellular carcinoma: a multicenter study. BMC Cancer. 2023;23:416. Wen H, Liang R, Liu X, Yu Y, Lin S, Song Z, et al. Predicting Pathological Response of Neoadjuvant Conversion Therapy for Hepatocellular Carcinoma Patients Using CT-Based Radiomics Model. J Hepatocell Carcinoma. 2024;11:2145–57. Mosenthal M, Adams W, Cotler S, Ding X, Borge M, Malamis A, et al. Locoregional Therapies for Hepatocellular Carcinoma prior to Liver Transplant: Comparative Pathologic Necrosis, Radiologic Response, and Recurrence. J Vasc Interv Radiol. 2024;35:506–14. Habibollahi P, Shamchi SP, Choi JM, Gade TP, Stavropoulos SW, Hunt SJ, et al. Association of Complete Radiologic and Pathologic Response following Locoregional Therapy before Liver Transplantation with Long-Term Outcomes of Hepatocellular Carcinoma: A Retrospective Study. J Vasc Interv Radiol. 2019;30:323–9. Paik KY, Kim EK. Pathologic response to preoperative transarterial chemoembolization for resectable hepatocellular carcinoma may not predict recurrence after liver resection. Hepatobiliary Pancreat Dis Int. 2016;15:158–64. Additional Declarations No competing interests reported. Supplementary Files SupplementaryMaterial.docx Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-6637416","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":475622871,"identity":"191f956d-6c71-4c8e-8066-ff84d5d6c199","order_by":0,"name":"Zhou Ye","email":"","orcid":"","institution":"Zhejiang University School of Medicine","correspondingAuthor":false,"prefix":"","firstName":"Zhou","middleName":"","lastName":"Ye","suffix":""},{"id":475622873,"identity":"29bec0c4-2f2f-4958-80f4-b6ca9b1e92a6","order_by":1,"name":"Menghui Zhang","email":"","orcid":"","institution":"Zhejiang University School of Medicine","correspondingAuthor":false,"prefix":"","firstName":"Menghui","middleName":"","lastName":"Zhang","suffix":""},{"id":475622875,"identity":"73179039-7092-41a9-8374-21df307ee252","order_by":2,"name":"Tao Zeng","email":"","orcid":"","institution":"Nanjing University","correspondingAuthor":false,"prefix":"","firstName":"Tao","middleName":"","lastName":"Zeng","suffix":""},{"id":475622876,"identity":"2c102861-5805-4164-b48e-5834c8a39ce4","order_by":3,"name":"Chuanhui Peng","email":"","orcid":"","institution":"Zhejiang University School of Medicine","correspondingAuthor":false,"prefix":"","firstName":"Chuanhui","middleName":"","lastName":"Peng","suffix":""},{"id":475622877,"identity":"8402c703-80c5-440e-919b-6c8b3a7c0e02","order_by":4,"name":"Yibo Zhang","email":"","orcid":"","institution":"Zhejiang University School of Medicine","correspondingAuthor":false,"prefix":"","firstName":"Yibo","middleName":"","lastName":"Zhang","suffix":""},{"id":475622878,"identity":"3b6aa35f-1769-4e37-a950-28517f6a9143","order_by":5,"name":"JunJun Jia","email":"","orcid":"","institution":"Zhejiang University School of Medicine","correspondingAuthor":false,"prefix":"","firstName":"JunJun","middleName":"","lastName":"Jia","suffix":""},{"id":475622879,"identity":"7583fe43-86bb-4de9-b073-34997db9614a","order_by":6,"name":"Songfeng Yu","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA9UlEQVRIie3RsWrDMBCA4ROCZFGq9URK20e4PUNexRCoFw+ZCoXWeFIWZ0+gL9Eto4shWTQXjc7SqUNMl3jrpZmrJFuh+uHQDfqQwQCx2B9O84gG1GFPziOmAJB0MenhcT9ByCdy166eUQ9n74/76xp0PyPoVkHSM0u3QfPiHjyqGkz5SWLuwkQO7Donn917YsILSWHDH/bFBMdMpgmT8RkEhgP7hITpGqrDK3iCGLe1ZmkrRJ9JU6hUofuYvs0D5GozqXetzVEv0m3blaMbPZu8Nl2A3FWi4KPmUQSiPP7M6ncAcFv8HDlPvwHYh+7GYrHYf+0bMS5SR1wV9bwAAAAASUVORK5CYII=","orcid":"","institution":"Zhejiang University School of Medicine","correspondingAuthor":true,"prefix":"","firstName":"Songfeng","middleName":"","lastName":"Yu","suffix":""}],"badges":[],"createdAt":"2025-05-11 03:38:03","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-6637416/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-6637416/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":85645565,"identity":"2db3bce8-d16f-440e-9f9d-7e1561c4dc98","added_by":"auto","created_at":"2025-06-30 08:23:59","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":123026,"visible":true,"origin":"","legend":"\u003cp\u003eComparison of survival curves between the PCR and NPCR groups.\u003c/p\u003e","description":"","filename":"1.png","url":"https://assets-eu.researchsquare.com/files/rs-6637416/v1/330aea764008eb2e3d053310.png"},{"id":85643369,"identity":"7351da20-43b1-4d89-8c7e-a97930fb9e35","added_by":"auto","created_at":"2025-06-30 08:07:59","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":85402,"visible":true,"origin":"","legend":"\u003cp\u003ePerformance of RF model in test set. (\u003cstrong\u003eA)\u003c/strong\u003e The confusion matrix of the test set. True-positive：14 samples, false-negative：16 samples, true-negative：1 samples, false-positive：3 samples. (\u003cstrong\u003eB)\u003c/strong\u003e The ROC curve of the test set. Colors are used to validate the cutoff value. (\u003cstrong\u003eC)\u003c/strong\u003eThe Gini importance of variables in the RF model. Invasion refers to portal vein invasion. (\u003cstrong\u003eD)\u003c/strong\u003e The density plot of scores for each sample in the test set. The red dashed line represents the cutoff value.\u003c/p\u003e","description":"","filename":"2.png","url":"https://assets-eu.researchsquare.com/files/rs-6637416/v1/bcb345f77a9634ad34661893.png"},{"id":87168576,"identity":"e2a587f0-14d3-4550-a599-0cdc87228d4f","added_by":"auto","created_at":"2025-07-21 06:54:52","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":926175,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6637416/v1/4596f65a-0b82-4542-8c78-136b8356c45a.pdf"},{"id":85644660,"identity":"09332442-bed2-4372-8574-4c78687a5cd6","added_by":"auto","created_at":"2025-06-30 08:15:59","extension":"docx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":14012168,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryMaterial.docx","url":"https://assets-eu.researchsquare.com/files/rs-6637416/v1/9aef1d666a09ee37e8a2dad7.docx"}],"financialInterests":"No competing interests reported.","formattedTitle":"Prediction of Pathological Complete Response in Hepatocellular Carcinoma Using Machine Learning Models","fulltext":[{"header":"Introduction","content":"\u003cp\u003eHepatocellular carcinoma (HCC) is a common malignancy with high cancer-related mortality. Surgery remains the preferred treatment for achieving favorable survival outcomes in HCC patients. However, only a small proportion of patients are eligible for surgery, as the majority are diagnosed at an advanced stage[\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e]. For patients with unresectable HCC, conversion therapies, including locoregional and systematic treatments, are used to downstage the tumor[\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e]. Following successful conversion, surgical resection is still a recommended option for radical objectives[\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e]. When postoperative pathological examination confirms complete necrosis of the tumor, this is defined as a pathological complete response (PCR).\u003c/p\u003e \u003cp\u003ePCR not only reflects the efficacy of conversion therapy but also indicates a better prognosis compared to non-PCR(NPCR) [\u003cspan additionalcitationids=\"CR5\" citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e]. Previous studies have reported that for patients achieving PCR, there is no significant difference in outcomes whether or not they receive postoperative adjuvant therapy[\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e]. Furthermore, surgical resection may not be necessary in cases of complete tumor necrosis[\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e, \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e]. Therefore, the assessment of tumor response has crucial implications for treatment decision-making both preoperatively and postoperatively. Consequently, it is essential to explore methods for predicting the pathological response of tumors in advance.\u003c/p\u003e \u003cp\u003eAlgorithms based on machine learning have become increasingly prevalent in supporting clinical decision-making. Currently, several predictive models for liver cancer pathological response have been reported, mostly based on univariate and multivariate regression analysis[\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e, \u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e]. However, these traditional linear models are limited in their ability to capture complex relationships or integrate multiple features. In contrast, advanced models offer improved performance. For example, the least absolute shrinkage and selection operator (LASSO) model is recognized for its robust feature selection capabilities and interpretability. Meanwhile, random forest (RF), extreme gradient boosting (XGBoost) and Decision Tree, excel in handling complex nonlinear relationships and effectively managing missing values.\u003c/p\u003e \u003cp\u003eThis study aims to develop more accurate predictive models using various advanced machine learning algorithms to assess the likelihood of PCR in HCC patients. By comparing the predictive performance of multiple models, we seek to identify the best-performing model to develop a convenient visualization tool.\u003c/p\u003e"},{"header":"Materials and methods","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003ePatient population\u003c/h2\u003e \u003cp\u003eThis single-center retrospective study analyzed clinical data collected between January 2019 and December 2021. The inclusion criteria were as follows: (1) patients with an initial diagnosis of unresectable liver cancer, (2) those who received preoperative conversion therapy, (3) patients who subsequently underwent liver resection with available postoperative pathological results, and (4) individuals aged 18\u0026ndash;75 years with Child-Pugh A.\u003c/p\u003e \u003cp\u003eThe pathological analysis was conducted on resected specimens by two independent pathologists. PCR was defined as the absence of viable tumor cells, while the presence of any viable tumor cells was categorized as NPCR. After propensity score matching (PSM) at a 1:1 ratio, a total of 110 patients were included in the analysis, comprising 55 patients in the PCR group and 55 in the NPCR group.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eData collection\u003c/h3\u003e\n\u003cp\u003eWe collected clinical data across three categories: 1) demographic data, such as age and gender; 2) radiographic data, including tumor diameter (\u0026lt;\u0026thinsp;5 cm, 5\u0026ndash;10 cm, \u0026gt;\u0026thinsp;10 cm), number of tumors (solitary or multiple), with or without portal vein invasion, and clinical tumor grade (I\u0026ndash;IV); and 3) laboratory data, such as HBV infection status, HBV DNA status, and AFP levels. Specifically, the clinical tumor grade was assessed based on the China Liver Cancer Staging (CNLC) system. Tumor diameter was determined by the largest diameter from imaging data, and the number of tumors was primarily identified through radiological review. Portal vein invasion presented as the discontinuity of the portal vein wall or the presence of tumor thrombus via radiology.\u003c/p\u003e\n\u003ch3\u003eMachine learning models\u003c/h3\u003e\n\u003cp\u003eFour machine learning algorithms, i.e., LASSO, RF, XGBoost, and Decision Tree were selected to develop predictive models for PCR, based on their ability to handle complex clinical data, address classification problems (NPCR and PCR), capture nonlinear relationships, and provide feature importance analysis.\u003c/p\u003e \u003cp\u003eThe importance of each feature varied across methods. In LASSO regression, feature importance was assessed based on the magnitude of the gradients. For decision tree, it was evaluated using the cumulative goodness-of-split measures. In XGBoost and RF, feature importance was determined using the Gini importance metric.\u003c/p\u003e \u003cp\u003eAfter models training, performance evaluation was conducted on the test set, using the confusion matrix, sensitivity, specificity, precision, F1 score, and the area under the curve (AUC) from the receiver operator characteristic curve (ROC). Additionally, the density distribution plot of prediction scores presents an intuitive reflection of the model`s performance, with a distinguishable distribution between sample groups highlighting the model`s discriminative capability. All analysis were carried in R-software (v 4.4.1).\u003c/p\u003e \u003cdiv id=\"Sec6\" class=\"Section2\"\u003e \u003ch2\u003eStatistical analysis\u003c/h2\u003e \u003cp\u003eCategorical variables were summarized as numbers (percentages) and compared using the χ\u0026sup2; test or Fisher\u0026rsquo;s exact test. Continuous variables were expressed as mean\u0026thinsp;\u0026plusmn;\u0026thinsp;SD for normally distributed data or median (IQR) for non-normally distributed data and compared using the t-test or Mann-Whitney U test, respectively. The R package \u0026lsquo;tableone\u0026rsquo; (v0.13.2) was used for statistical analysis. P-value less than 0.05 was considered statistically significant. Survival curves were estimated using Kaplan-Meier analysis and compared with the log-rank test, implemented via the R packages \u0026lsquo;survival\u0026rsquo; (v3.7.0) and \u0026lsquo;survminer\u0026rsquo; (v0.4.9).\u003c/p\u003e \u003c/div\u003e"},{"header":"Results","content":"\u003cdiv id=\"Sec8\" class=\"Section2\"\u003e\n \u003ch2\u003ePatient characteristics\u003c/h2\u003e\n \u003cp\u003eA total of 110 patients were included in this study after propensity score matching, with 55 patients assigned to the PCR group and 55 to the NPCR group. Baseline characteristics, including age, sex, HBV infection status and HBV DNA levels were well matched between two groups (Table S1). The 110 samples were then randomly split into a training set (76 samples) and a test set (34 samples). No significant differences in baseline data were observed between the training and test sets (Table \u003cspan class=\"InternalRef\"\u003e1\u003c/span\u003e).\u003c/p\u003e\n \u003cdiv class=\"gridtable\"\u003e\u0026nbsp;\u003ctable id=\"Tab1\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e\n \u003cdiv class=\"CaptionContent\"\u003e\n \u003cp\u003eBaseline characteristics between test and train set.\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003ccolgroup cols=\"5\"\u003e\u003c/colgroup\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\"\u003e\u0026nbsp;\u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eOverall\u003c/p\u003e\n \u003cp\u003e(n\u0026thinsp;=\u0026thinsp;110)\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003etrain_data\u003c/p\u003e\n \u003cp\u003e(n\u0026thinsp;=\u0026thinsp;76)\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003etest_data\u003c/p\u003e\n \u003cp\u003e(n\u0026thinsp;=\u0026thinsp;34)\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003e\u003cem\u003eP\u003c/em\u003e value\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003eAge (years)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e55.55\u0026thinsp;\u0026plusmn;\u0026thinsp;11.46\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e55.43\u0026thinsp;\u0026plusmn;\u0026thinsp;11.57\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e55.82\u0026thinsp;\u0026plusmn;\u0026thinsp;11.37\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.87\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003eGender (%)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eMale\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e96 (87.3)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e66 (86.8)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e30 (88.2)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eFemale\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e14 (12.7)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e10 (13.2)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e4 (11.8)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003eNumber of Tumors (%)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eSolitary\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e86 (78.2)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e59 (77.6)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e27 (79.4)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eMultiple\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e24 (21.8)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e17 (22.4)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e7 (20.6)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003eDiameter (%)\u003c/strong\u003e \u003csup\u003e\u003cstrong\u003ea)\u003c/strong\u003e\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.17\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u0026lt;5cm\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e40 (36.4)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e32 (42.1)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e8 (23.5)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u0026ge;\u0026thinsp;5cm,\u0026le;10cm\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e34 (30.9)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e21 (27.6)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e13 (38.2)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u0026gt;10cm\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e36 (32.7)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e23 (30.3)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e13 (38.2)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003ePortal Vein Invasion (%)\u003c/strong\u003e \u003csup\u003e\u003cstrong\u003eb)\u003c/strong\u003e\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAbsent\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e57 (51.8)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e39 (51.3)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e18 (52.9)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003ePresent\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e53 (48.2)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e37 (48.7)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e16 (47.1)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003eGrade (%)\u003c/strong\u003e \u003csup\u003e\u003cstrong\u003ec)\u003c/strong\u003e\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.487\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eI\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e65 (59.1)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e45 (59.2)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e20 (58.8)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eII\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e19 (17.3)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e14 (18.4)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e5 (14.7)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eIII\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e25 (22.7)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e17 (22.4)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e8 (23.5)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eIV\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1 (0.9)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0 (0.0)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1 (2.9)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003eHBV status (%)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.152\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eNegative\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e16 (14.5)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e14 (18.4)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e2 (5.9)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003ePositive\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e94 (85.5)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e62 (81.6)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e32 (94.1)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003eHBV-DNA status (%)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.536\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eNegative\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e55 (50.0)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e40 (52.6)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e15 (44.1)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003ePositive\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e55 (50.0)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e36 (47.4)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e19 (55.9)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003eAFP level (IQR)\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003e(ng/mL)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1,470.04 (8,534.17)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e837.04\u003c/p\u003e\n \u003cp\u003e(4,099.31)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e2,884.97 (14,120.54)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.587\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003ctfoot\u003e\n \u003ctr\u003e\n \u003ctd colspan=\"5\"\u003eNote: The above clinical data were all collected at patients` first visit.\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd colspan=\"5\"\u003ePCR, pathological complete response; NPCR, non pathological complete response; AFP, alpha-fetoprotein; HBV, hepatitis b virus.\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd colspan=\"5\"\u003ea) Diameter: recorded as the maximum diameter. In case of multiple tumors, the largest tumor`s maximum diameter is recorded.\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd colspan=\"5\"\u003eb) Portal Vein Invasion: identified by radiological review, presenting as the discontinuity of the portal vein wall or the presence of tumor thrombus;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd colspan=\"5\"\u003ec) Grade: classified by CNLC staging system. I: early stage; II: locally advanced; III: regional invasion; IV: metastatic.\u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tfoot\u003e\n \u003c/table\u003e\n \u003c/div\u003e\n \u003cp\u003eThe average age of the patients was 55.55 years, with the majority being male (87.3%). Most patients had a solitary tumor (78.2%), and the distribution of tumor diameter was relatively even (36.4% vs. 30.9% vs. 32.7%). Approximately half of the patients had portal vein invasion, with 51.8% having portal vein invasion and 48.2% without. Tumor differentiation was predominantly Grade 1(59.1%). The vast majority of patients were positive for HBV infection (85.5%), and the mean AFP level was 1470 ng/ml (Table \u003cspan class=\"InternalRef\"\u003e1\u003c/span\u003e).\u003c/p\u003e\n\u003c/div\u003e\n\u003ch3\u003eSurvival analysis\u003c/h3\u003e\n\u003cp\u003eFigure \u003cspan class=\"InternalRef\"\u003e1\u003c/span\u003e compares the overall survival (OS) and progression-free survival (PFS) rates between the PCR and NPCR groups. The 3-year OS rate in the NPCR group was 65.6%, and the 3-year PFS rate was 44.3%. In the PCR group, the 3-year OS rate was 90.2%, and 3-year PFS rate was 79.8%.\u003c/p\u003e\n\u003ch3\u003eEvaluating performance of PCR predictive models\u003c/h3\u003e\n\u003cp\u003eThe RF model demonstrated excellent performance in predicting pathological outcomes, achieving an AUC of 0.962 (95% CI: 0.907\u0026ndash;1) under the ROC curve. Additionally, the colors in the ROC curve help verify the cutoff value, with the yellow color along the diagonal indicating that the optimal cutoff is near the midpoint. Given the balanced proportion of positive and negative samples in the dataset, the default cutoff value is 0.5(Fig. \u003cspan class=\"InternalRef\"\u003e2\u003c/span\u003eB).\u003c/p\u003e\n\u003cp\u003eComparatively, the AUC values for the LASSO, Decision Tree, and XGBoost models were 0.799, 0.874, and 0.929, respectively, highlighting strong predictive capabilities across all models. Among these, the RF model exhibited the highest predictive accuracy, followed by the XGBoost model (Table \u003cspan class=\"InternalRef\"\u003e2\u003c/span\u003e).\u003c/p\u003e\n\u003cdiv class=\"gridtable\"\u003e\u0026nbsp;\u003ctable id=\"Tab3\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e\n \u003cdiv class=\"CaptionContent\"\u003e\n \u003cp\u003ePerformance of PCR predictive models in the test set.\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003ccolgroup cols=\"5\"\u003e\u003c/colgroup\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eScore\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eLASSO\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eDecision Tree\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eRF\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eXGBoost\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eTP\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e13\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e12\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e14\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e14\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eFP\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eTN\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e15\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e15\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e16\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e16\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eFN\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e4\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e5\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e3\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e3\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003esensitivity\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.76470588\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.70588235\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.82352941\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.82352941\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003especificity\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.88235294\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.88235294\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.94117647\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.94117647\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eprecious\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.86666667\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.85714286\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.93333333\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.93333333\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eF1 score\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.8125\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.77419355\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.875\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.875\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAUC (95%CI)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.799\u003c/p\u003e\n \u003cp\u003e(0.637\u0026ndash;0.962)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.874\u003c/p\u003e\n \u003cp\u003e(0.758\u0026ndash;0.99)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.962\u003c/p\u003e\n \u003cp\u003e(0.907-1)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.929\u003c/p\u003e\n \u003cp\u003e(0.848-1)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003ctfoot\u003e\n \u003ctr\u003e\n \u003ctd colspan=\"5\"\u003eNote: TP, true-positive; FP, false-positive; TN, true-negative; FN, false-negative;\u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tfoot\u003e\n \u003c/table\u003e\n\u003c/div\u003e\n\u003cp\u003eThe confusion matrix of the RF model on the test set (34 samples) shows that, among 17 actual NPCR patients, 16 were predicted as NPCR (true-negative) and 1 as PCR (false-positive). Among 17 actual PCR patients, 14 were correctly predicted as PCR (true-positive) and 3 were misclassified as NPCR (false-negative) (Fig. \u003cspan class=\"InternalRef\"\u003e2\u003c/span\u003eA). These results indicate strong alignment between the predicted and actual situations.\u003c/p\u003e\n\u003cp\u003eThe score density distribution of the RF model provides a clear visualization of its predictive performance. Most predictive scores in the PCR group are greater than 0.5, while the majority of scores in the NPCR group are less than 0.5(Fig. \u003cspan class=\"InternalRef\"\u003e2\u003c/span\u003eD). It demonstrates that the RF model effectively distinguishes between PCR and NPCR patients.\u003c/p\u003e\n\u003cdiv id=\"Sec11\" class=\"Section2\"\u003e\n \u003ch2\u003eFeature importance in PCR predictive models\u003c/h2\u003e\n \u003cp\u003eIn the RF model, feature importance was assessed using Gini importance analysis. Nine parameters were selected and ranked by their contribution to the model: portal vein invasion, AFP level, age, tumor diameter, grade, number of tumors, DNA level, HBV infection status, and gender (Fig. \u003cspan class=\"InternalRef\"\u003e2\u003c/span\u003eC).\u003c/p\u003e\n \u003cp\u003eSimilarly, portal vein invasion was the most important feature in the LASSO, XGBoost, and Decision Tree models, with the highest ranking in each. In contrast, age, which ranked third in the RF model, was considered the least important in the other three models (Supplementary Fig. S1-S4 ). Finally, we established a public visualization platform based on the best-performing RF model, which is available at: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://datalinkx.shinyapps.io/PCRpredict/\u003c/span\u003e\u003c/span\u003e.\u003c/p\u003e\n\u003c/div\u003e"},{"header":"Discussion","content":"\u003cp\u003eAchieving PCR through preoperative conversion therapy reflects a high tumor sensitivity to treatment. Meanwhile, PCR is associated with improved clinical outcomes, including enhanced survival and reduced recurrence rates[\u003cspan additionalcitationids=\"CR5\" citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e, \u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e]. Similarly, our study showed significantly prolonged OS and PFS in the PCR group compared to the NPCR group. Given the clinical value of PCR, further research on the PCR subgroups is required.\u003c/p\u003e \u003cp\u003eLi et al.[\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e] proposed that for patients achieving radiological complete response (RCR), a watch-and-wait strategy yields OS and PFS outcomes comparable to surgical resection. Similarly, Choi et al.[\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e]reported no significant difference in long-term survival outcomes between patients who were predicted to achieve PCR but did not undergo surgery and those who underwent liver resection and were pathologically confirmed PCR. Together, these findings highlight the potential of watch-and-wait approach for patients with complete tumor necrosis, offering an alternative to surgical intervention. Furthermore, our previous study demonstrated that for patients who achieve PCR, whether to receive postoperative adjuvant therapy does not result in significantly different outcomes[\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e]. Therefore, accurate prediction of the pathological response in clinical practice would influence treatment decisions both preoperatively and postoperatively.\u003c/p\u003e \u003cp\u003eCurrently, several articles have explored predictive factors for pathological response in liver cancer. Yang et al.[\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e] performed a regression analysis and identified that AFP\u0026thinsp;\u0026lt;\u0026thinsp;100 ng/mL and single tumor were significant predictors for achieving PCR. Lin et al.[\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e] conducted univariable and multivariable logistic regression analyses, revealing that HBV DNA load, AFP levels, maximum tumor diameter, preoperative TACE session, and achieving a complete response according to modified Response Evaluation Criteria in Solid Tumors (mRECIST) were significant predictors of PCR. Based on these five factors, they developed a nomogram with a concordance index of 0.80. Huang et al.[\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e] reported that using radiographic response alone to predict pCR yielded an AUC of 0.727, while a combination of radiomics and AFP response yielded an AUC of 0.926. Consistent with most previous reports, our models also recognized AFP as an important predictive factor. Nonetheless, portal vein invasion was identified as the most significant factor by all four of our models, while previous reports on PCR prediction did not mention portal vein invasion as a factor.\u003c/p\u003e \u003cp\u003eAmong our four predictive models, RF performed the best, with an AUC value of 0.962. The other three models, ranked by performance from highest to lowest, are XGBoost (AUC 0.929), Decision Tree (AUC 0.874), and LASSO model (AUC 0.799). Since both RF and XGBoost are algorithms capable of handling nonlinear relationships and high-dimensional data, whereas LASSO is designed to address linear relationships, the superior predictive performance of the former two algorithms is reasonable. To our knowledge, this is the first study that utilizing four different machine learning algorithms to develop pathological prediction models of HCC, and one of the models` predictive accuracy is the highest among all previously reported studies.\u003c/p\u003e \u003cp\u003eIn clinical practice, the therapeutic efficacy of HCC is primarily evaluated through radiomics, such as the mRECIST criteria and the WHO criteria. Wen et al.[\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e] utilized ten imaging parameters to construct a SMOTE model for predicting PCR, achieving an AUC value of 0.843. Nonetheless, a tumor response on radiology does not correlate with pathological response[\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e]. Additionally, no significant differences in survival outcomes were observed between radiographic complete response (RCR) and those without RCR[\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e]. Therefore, RCR alone is insufficient as a direct prognostic reference. In our study, the predictive factors derived from imaging are mainly portal vein invasion, number of tumors, diameter and CNLC staging. We combined demographic, laboratory and radiographic data to establish predictive models.\u003c/p\u003e \u003cp\u003eHowever, there are several limitations of our study. First, this is a retrospective study, which carries inherent risk of biases in data collection and analysis. Second, a relatively small sample size and the single-center nature of this study may introduce selection bias. Third, molecular markers were not incorporated into the predictive models. Additionally, some studies use more detailed classifications of pathological responses, such as complete response, major response (pathological response\u0026thinsp;\u0026ge;\u0026thinsp;50%), and minor response (pathological response\u0026thinsp;\u0026lt;\u0026thinsp;50%)[\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e]. Our predictive procedures did not adopt such detailed classifications. Further research is required to optimize the model.\u003c/p\u003e"},{"header":"Conclusions","content":"\u003cp\u003eWe developed four models to predict PCR using different machine learning algorithms, with the RF model performing the best, achieving an AUC of 0.962. The model shows that the primary factor affecting pathological results is portal vein invasion, followed by AFP level. This finding suggests that machine learning can potentially improve the accuracy of prediction, thereby influencing treatment strategies.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eAcknowledgements\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNot applicable.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthors\u0026rsquo; contributions\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eZY and MZ were involved in Conceptualization, writing \u0026ndash; original draft; TZ was involved in Methodology, formal analysis; CP was involved in data curation; YZ and JJ were involved in validation, and visualization; SY was involved in conceptualization, writing \u0026ndash; review and editing.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding \u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNot applicable.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eData availability\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eEthics approval and consent to participate\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe requirement for ethical approval and informed consent of patients has been\u003c/p\u003e\n\u003cp\u003e\u0026nbsp;waived by the review board of the First Affiliated Hospital of Zhejiang University\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eSchool of Medicine.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConsent for publication\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNot applicable.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCompeting interests\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors declare no competing financial interests.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eWang MD, Xu XJ, Wang KC, Diao YK, Xu JH, Gu LH, et al. Conversion therapy for advanced hepatocellular carcinoma in the era of precision medicine: Current status, challenges and opportunities. Cancer Sci. 2024;115:2159\u0026ndash;69.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhou H, Song T. Conversion therapy and maintenance therapy for primary hepatocellular carcinoma. Biosci Trends. 2021;15:155\u0026ndash;60.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSun HC, Zhou J, Wang Z, Liu X, Xie Q, Jia W, et al. Alliance of Liver Cancer Conversion Therapy. Chinese expert consensus on conversion therapy for hepatocellular carcinoma (2021 edition). Hepatobiliary Surg Nutr. 2022;11:227\u0026ndash;52.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYang K, Sung PS, You YK, Kim DG, Oh JS, Chun HJ, et al. Pathologic complete response to chemoembolization improves survival outcomes after curative surgery for hepatocellular carcinoma: predictive factors of response. HPB (Oxford). 2019;21:1718\u0026ndash;26.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJia J, Ding C, Mao M, Gao F, Shao Z, Zhang M, et al. Pathological complete response after conversion therapy in unresectable hepatocellular carcinoma: a retrospective study. BMC Gastroenterol. 2024;24:242.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZeng ZX, Wu JY, Wu JY, Zhang ZB, Wang K, Zhuang SW, et al. Prognostic Value of Pathological Response for Patients with Unresectable Hepatocellular Carcinoma Undergoing Conversion Surgery. Liver Cancer. 2024;13:498\u0026ndash;508.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWu JY, Wu JY, Fu YK, Ou XY, Li SQ, Zhang ZB, et al. Outcomes of Salvage Surgery Versus Non-Salvage Surgery for Initially Unresectable Hepatocellular Carcinoma After Conversion Therapy with Transcatheter Arterial Chemoembolization Combined with Lenvatinib Plus Anti-PD-1 Antibody: A Multicenter Retrospective Study. Ann Surg Oncol. 2024;31:3073\u0026ndash;83.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi B, Wang C, He W, Qiu J, Zheng Y, Zou R, et al. Watch-and-wait strategy vs. resection in patients with radiologic complete response after conversion therapy for initially unresectable hepatocellular carcinoma: a propensity score-matching comparative study. Int J Surg. 2024;110:2545\u0026ndash;55.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLin J, Li X, Shi X, Zhang L, Liu H, Liu J, et al. Nomogram for predicting pathologic complete response after transarterial chemoembolization in patients with hepatocellular carcinoma. Ann Transl Med. 2021;9:1130.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChen Q, Deng Y, Zhao C, Huang Z, Zhang W, Yang Y, et al. Nomogram for tumour response based on prospective cohorts of hepatocellular carcinoma patients receiving immunotherapy combined with targeted therapy: development and validation. Ann Transl Med. 2023;11:199.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAgopian VG, Morshedi MM, McWilliams J, Harlander-Locke MP, Markovic D, Zarrinpar A et al. Complete pathologic response to pretransplant locoregional therapy for hepatocellular carcinoma defines cancer cure after liver transplantation: analysis of 501 consecutively treated patients. Ann Surg. 2015;262: 536\u0026thinsp;\u0026ndash;\u0026thinsp;45; discussion 543-5.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLiu L, Wang X, Feng J, Cheng S. Comment on Is liver resection still required for patients who have predictive factors for complete pathologic necrosis after downstaging treatments of locally advanced hepatocellular carcinoma? Eur J Surg Oncol. 2025;51(8):110032.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHuang C, Zhu XD, Shen YH, Xu B, Wu D, Ji Y, et al. Radiographic and alpha-fetoprotein response predict pathologic complete response to immunotherapy plus a TKI in hepatocellular carcinoma: a multicenter study. BMC Cancer. 2023;23:416.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWen H, Liang R, Liu X, Yu Y, Lin S, Song Z, et al. Predicting Pathological Response of Neoadjuvant Conversion Therapy for Hepatocellular Carcinoma Patients Using CT-Based Radiomics Model. J Hepatocell Carcinoma. 2024;11:2145\u0026ndash;57.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMosenthal M, Adams W, Cotler S, Ding X, Borge M, Malamis A, et al. Locoregional Therapies for Hepatocellular Carcinoma prior to Liver Transplant: Comparative Pathologic Necrosis, Radiologic Response, and Recurrence. J Vasc Interv Radiol. 2024;35:506\u0026ndash;14.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHabibollahi P, Shamchi SP, Choi JM, Gade TP, Stavropoulos SW, Hunt SJ, et al. Association of Complete Radiologic and Pathologic Response following Locoregional Therapy before Liver Transplantation with Long-Term Outcomes of Hepatocellular Carcinoma: A Retrospective Study. J Vasc Interv Radiol. 2019;30:323\u0026ndash;9.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePaik KY, Kim EK. Pathologic response to preoperative transarterial chemoembolization for resectable hepatocellular carcinoma may not predict recurrence after liver resection. Hepatobiliary Pancreat Dis Int. 2016;15:158\u0026ndash;64.\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Pathological complete response, Hepatocellular carcinoma, Machine learning models, Predictive factors, Random forest","lastPublishedDoi":"10.21203/rs.3.rs-6637416/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-6637416/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003ch2\u003eBackground\u003c/h2\u003e \u003cp\u003ePathological complete response (PCR) in hepatocellular carcinoma (HCC) following conversion therapy is associated with improved prognosis and influences treatment decisions. This study aims to develop and validate machine learning-based predictive models for assessing PCR in HCC patients.\u003c/p\u003e\u003ch2\u003eMethods\u003c/h2\u003e \u003cp\u003eThis retrospective single-center study included 110 HCC patients after propensity score matching. Four machine learning models\u0026mdash;LASSO, RF, XGBoost, and Decision Tree\u0026mdash;were developed to predict PCR. After training models, the performance was assessed in the test set. Feature importance was analyzed, and a public visualization tool was developed.\u003c/p\u003e\u003ch2\u003eResults\u003c/h2\u003e \u003cp\u003eThe RF model demonstrated the highest predictive accuracy (AUC: 0.962), followed by XGBoost (AUC: 0.929), Decision Tree (AUC: 0.874), and LASSO (AUC: 0.799). Key predictive factors included tumor invasion, AFP levels, and tumor diameter. The RF model effectively distinguished PCR and NPCR groups, providing robust prediction capabilities.\u003c/p\u003e\u003ch2\u003eConclusion\u003c/h2\u003e \u003cp\u003eMachine learning models, particularly RF, significantly enhance the accuracy of PCR prediction in HCC patients. This approach highlights the potential of integrating demographic, laboratory, and radiographic data for personalized treatment planning.\u003c/p\u003e","manuscriptTitle":"Prediction of Pathological Complete Response in Hepatocellular Carcinoma Using Machine Learning Models","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-06-30 08:07:54","doi":"10.21203/rs.3.rs-6637416/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"19ceb3f9-a279-4d6a-9c59-3bcc5fad4f9e","owner":[],"postedDate":"June 30th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2025-07-21T06:54:36+00:00","versionOfRecord":[],"versionCreatedAt":"2025-06-30 08:07:54","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-6637416","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-6637416","identity":"rs-6637416","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00