Development and evaluation of a prognosis prediction model for hepatocellular carcinoma via multiomics integration and semisupervised machine learning | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Development and evaluation of a prognosis prediction model for hepatocellular carcinoma via multiomics integration and semisupervised machine learning Xiaoling Xian, Xuanxiu Huang, Liu Qi, Huiping Cui, Chenting Zhang This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-7455049/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Hepatitis B virus-related hepatocellular carcinoma (HBV-HCC) is a malignant tumor caused primarily by chronic hepatitis B virus infection and accounts for 75.09% of all HCC cases in China. However, treatment efficacy is low when the regimen is selected on the basis of the existing tumor staging system. In this study, we developed a multiomics semisupervised collaborative ensemble learning(SSCEL) framework by combining machine learning(ML), tumor microenvironment (TME) analysis, and reverse network pharmacology. Principal component(PC) dimensionality reduction was employed to establish a comprehensive model evaluation function. On the basis of this function, the optimal ensemble model scheme (decision tree(DT), AdaBoost, XGBoost, and random forest(RF)) was constructed. By introducing HistGradientBoosting technology to achieve intergenerational complementarity, a five-model integration framework was formed. This model, which is based on multiomics data integration and semisupervised learning (SSL), had an area Under Curve (AUC) of 0.91 for predicting recurrence in the multiomics training set and an AUC>0.75 for predicting recurrence in the external validation set, indicating strong stability. Compared with the existing staging systems, the framework under study mainly serves to predict the recurrence risk of HCC-HBV on the basis of small multiomics datasets. The levels of five core genes in the model (CST3,HSPH1,RAB2A,WASHC4 and PLK1) were found to be significantly associated with clinical recurrence in patients. Mainly involved in regulating the CDK5-related pathway, sensing DNA double-strand breaks, and modulating early pancreatic gene expression regulatory pathways Additionally, reverse network pharmacology analysis revealed 9 potential traditional Chinese medicine (TCM) compounds and 14 related target genes associated with recurrence signature genes. These findings provide new research directions for TCM-based HCC treatment and further exploration of recurrence mechanisms. Moreover, the prognostic model developed in this study was validated across populations with diverse features and HCC etiologies and demonstrating robust performance. Biological sciences/Cancer Biological sciences/Computational biology and bioinformatics Health sciences/Oncology HCC prediction semi-supervised learning multiomics analysis single-cell target gene prediction therapy reverse network pharmacology Full Text Additional Declarations No competing interests reported. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-7455049","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":524838654,"identity":"ba6ab7af-977f-4112-aaa7-6020ccee6a8b","order_by":0,"name":"Xiaoling Xian","email":"","orcid":"","institution":"Guangdong Pharmaceutical University","correspondingAuthor":false,"prefix":"","firstName":"Xiaoling","middleName":"","lastName":"Xian","suffix":""},{"id":524838655,"identity":"9e357e20-29a9-40d6-ace2-c04329fd72fc","order_by":1,"name":"Xuanxiu Huang","email":"","orcid":"","institution":"Guangdong Pharmaceutical University","correspondingAuthor":false,"prefix":"","firstName":"Xuanxiu","middleName":"","lastName":"Huang","suffix":""},{"id":524838656,"identity":"e5a62464-7461-4f86-aa49-18caf0330258","order_by":2,"name":"Liu Qi","email":"","orcid":"","institution":"Guangdong Pharmaceutical University","correspondingAuthor":false,"prefix":"","firstName":"Liu","middleName":"","lastName":"Qi","suffix":""},{"id":524838657,"identity":"b7598dfe-69ac-43a3-bad4-4cf8b21c83c6","order_by":3,"name":"Huiping Cui","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA2klEQVRIiWNgGAWjYFACxjYGBgMGHjD7g4GNHWlaGGcUpCUTYw0bnMXM8+EQYwMh9fwzktse8xTckTHnX8AmbWNwgJmB/fDRDfi0SNxIbDfmMXjGYznjAbNxjsEdPgaetLQb+LQYSCS2SfMYHOYxuHGA8XGOwTNmBgkeM6K1MBy2MDjM2EC8lvMNjI8ZiNEiceZhm+QcoF8MbjA2G/YYpCWzEfILf3v6M4k3f+7YG5w/fEzixx8bO372w8fwaoGCA0D7EhvATDa8ClG08B8gUu0oGAWjYBSMOAAAlM5HrEA39NUAAAAASUVORK5CYII=","orcid":"","institution":"Guangdong Pharmaceutical University","correspondingAuthor":true,"prefix":"","firstName":"Huiping","middleName":"","lastName":"Cui","suffix":""},{"id":524838658,"identity":"bb193f30-54ca-4a74-81ae-f51053a9ffa6","order_by":4,"name":"Chenting Zhang","email":"","orcid":"","institution":"The First Affiliated Hospital of Guangzhou Medic Guangzhou","correspondingAuthor":false,"prefix":"","firstName":"Chenting","middleName":"","lastName":"Zhang","suffix":""}],"badges":[],"createdAt":"2025-08-25 15:08:22","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-7455049/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-7455049/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":95657412,"identity":"f5e7742f-527d-4680-b445-8e6424bd98c3","added_by":"auto","created_at":"2025-11-11 16:20:46","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1727868,"visible":true,"origin":"","legend":"","description":"","filename":"Developmentandevaluationofaprognosispredictionmodelforhepatocellularcarcinomaviamultiomicsintegrationandsemisuperv.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7455049/v1_covered_9256e900-b3ac-42e2-855d-ed32147f28e7.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Development and evaluation of a prognosis prediction model for hepatocellular carcinoma via multiomics integration and semisupervised machine learning","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"HCC prediction, semi-supervised learning, multiomics analysis, single-cell target gene prediction therapy, reverse network pharmacology","lastPublishedDoi":"10.21203/rs.3.rs-7455049/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-7455049/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"Hepatitis B virus-related hepatocellular carcinoma (HBV-HCC) is a malignant tumor caused primarily by chronic hepatitis B virus infection and accounts for 75.09% of all HCC cases in China. However, treatment efficacy is low when the regimen is selected on the basis of the existing tumor staging system. In this study, we developed a multiomics semisupervised collaborative ensemble learning(SSCEL) framework by combining machine learning(ML), tumor microenvironment (TME) analysis, and reverse network pharmacology. Principal component(PC) dimensionality reduction was employed to establish a comprehensive model evaluation function. On the basis of this function, the optimal ensemble model scheme (decision tree(DT), AdaBoost, XGBoost, and random forest(RF)) was constructed. By introducing HistGradientBoosting technology to achieve intergenerational complementarity, a five-model integration framework was formed. This model, which is based on multiomics data integration and semisupervised learning (SSL), had an area Under Curve (AUC) of 0.91 for predicting recurrence in the multiomics training set and an AUC\u003e0.75 for predicting recurrence in the external validation set, indicating strong stability. Compared with the existing staging systems, the framework under study mainly serves to predict the recurrence risk of HCC-HBV on the basis of small multiomics datasets.\nThe levels of five core genes in the model (CST3,HSPH1,RAB2A,WASHC4 and PLK1) were found to be significantly associated with clinical recurrence in patients. Mainly involved in regulating the CDK5-related pathway, sensing DNA double-strand breaks, and modulating early pancreatic gene expression regulatory pathways Additionally, reverse network pharmacology analysis revealed 9 potential traditional Chinese medicine (TCM) compounds and 14 related target genes associated with recurrence signature genes. These findings provide new research directions for TCM-based HCC treatment and further exploration of recurrence mechanisms. Moreover, the prognostic model developed in this study was validated across populations with diverse features and HCC etiologies and demonstrating robust performance.","manuscriptTitle":"Development and evaluation of a prognosis prediction model for hepatocellular carcinoma via multiomics integration and semisupervised machine learning","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-10-07 19:56:14","doi":"10.21203/rs.3.rs-7455049/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"f21a3617-7bc8-45b3-93d6-2d004fdd01a6","owner":[],"postedDate":"October 7th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":55782741,"name":"Biological sciences/Cancer"},{"id":55782742,"name":"Biological sciences/Computational biology and bioinformatics"},{"id":55782743,"name":"Health sciences/Oncology"}],"tags":[],"updatedAt":"2025-11-11T02:53:44+00:00","versionOfRecord":[],"versionCreatedAt":"2025-10-07 19:56:14","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-7455049","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-7455049","identity":"rs-7455049","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.