TANK: A Variance-Based Framework for Identifying Heterogeneous Therapeutic Targets in Gastric Cancer | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article TANK: A Variance-Based Framework for Identifying Heterogeneous Therapeutic Targets in Gastric Cancer XIAOQI HU This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-9183044/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Background Identifying therapeutically relevant tumor antigens from large-scale transcriptomic data remains a central challenge in precision oncology. Complex computational methods are often employed, yet simple statistical approaches remain underexplored. Methods We developed TANK (Tumor Antigen prioritization by variance-based raNKing), a simple variance-based framework for identifying highly heterogeneous tumor antigens. Using raw RNA-seq count data from TCGA-STAD (n = 448, 60,660 genes), genes were ranked by expression variance across patients. Results were validated in an independent microarray dataset (GSE26942, n = 217). Results CLDN18.2 ranked 89th out of 60,660 genes (top 0.15%) in TCGA-STAD, and 24th out of 36,157 probes (top 0.07%) in GSE26942. Mean-variance analysis confirmed CLDN18.2 as a consistent outlier across both platforms. Kaplan-Meier and multivariate Cox regression analyses adjusting for stage and age showed no significant association between CLDN18.2 expression and overall survival (HR = 1.37, p = 0.11). CLDN18.2 expression remained stable across pathologic stages I-IV (ANOVA p = 0.71). Conclusions TANK identifies clinically validated therapeutic targets through variance alone. The dissociation between high variance and survival significance positions CLDN18.2 as a companion diagnostic target rather than a prognostic marker - a distinction critical for precision patient selection in targeted therapy. Cancer Biology CLDN18.2 gastric cancer variance-based ranking tumor heterogeneity companion diagnostics TANK method TCGA Figures Figure 1 Figure 2 Figure 3 1. Introduction Precision oncology depends on the accurate identification of tumor-specific antigens suitable for targeted therapy. Conventional approaches such as differential expression analysis identify genes with consistently high expression across patient cohorts, but may overlook antigens whose clinical value lies in their heterogeneous distribution across patients. CLDN18.2 (Claudin 18.2) is a tight junction protein selectively expressed in gastric epithelium and has emerged as a validated therapeutic target in gastric cancer. Zolbetuximab, a CLDN18.2-targeting antibody, demonstrated significant survival benefit in Phase III clinical trials, establishing CLDN18.2 as the first companion diagnostic target in gastric cancer. Here we propose TANK (Tumor Antigen prioritization by variance-based raNKing), a simple variance-based ranking method designed to capture inter-patient heterogeneity as a proxy for therapeutic target potential. We demonstrate that TANK independently identifies CLDN18.2 as a top-ranked gene using variance alone, and we characterize its relationship to patient survival and disease stage. 2. Methods 2.1 Data Sources RNA-seq STAR count data for gastric adenocarcinoma (TCGA-STAD, n = 448) were obtained from the UCSC Xena hub ( https://gdc-hub.s3.us-east-1.amazonaws.com ). Overall survival and clinical staging data were downloaded from the TCGA Pan-Cancer Clinical Data Resource. Independent validation used microarray data from GSE26942 (n = 217, Illumina HumanHT-12 V3.0), retrieved from the NCBI Gene Expression Omnibus. 2.2 TANK Method Gene expression variance was calculated across all patient samples using raw count values. Genes were ranked in descending order of variance across 60,660 genes. To reduce technical noise, sex-linked genes (XIST, RPS4Y1, KDM5D) and non-coding RNA (MALAT1) were excluded prior to final ranking. 2.3 Statistical Analysis Mean-variance relationships were visualized on log-log axes following log2(count + 1) transformation. Survival analysis used Kaplan-Meier curves with log-rank testing and univariate/multivariate Cox proportional hazards regression adjusting for pathologic stage and age at diagnosis. Stage-stratified expression differences were assessed using one-way ANOVA. All analyses were performed in Python 3.14 using pandas, numpy, matplotlib, scipy, and lifelines libraries. 3. Results 3.1 CLDN18.2 ranks in the top 0.15% by variance Applying TANK to TCGA-STAD (n = 448, 60,660 genes), CLDN18.2 ranked 89th overall (top 0.15%, variance = 12.91). For comparison, HER2 ranked 5,300th (top 8.74%) and EGFR ranked 10,364th (top 17.09%), demonstrating that CLDN18.2 exhibits substantially greater inter-patient expression heterogeneity than other established gastric cancer targets. In the independent GSE26942 microarray dataset (n = 217), CLDN18.2 ranked 24th out of 36,157 probes (top 0.07%), confirming cross-platform reproducibility of the TANK ranking. 3.2 CLDN18.2 is a mean-variance outlier Mean-variance analysis following log2 normalization confirmed CLDN18.2, MUC2, and OLFM4 as consistent outliers, with variance substantially exceeding that of genes with comparable mean expression levels. HER2 and EGFR, while highly expressed, fell within the expected mean-variance trend, indicating that their inter-patient variation is consistent with background transcriptional noise rather than true biological heterogeneity. [Figure 1 ] 3.3 CLDN18.2 expression is not associated with overall survival Kaplan-Meier analysis showed no significant difference in overall survival between CLDN18.2 high and low expression groups (log-rank p = 0.92, n = 441). Multivariate Cox regression confirmed pathologic stage (HR = 1.78, p < 0.005) and age (HR = 1.03, p < 0.005) as independent prognostic factors, while CLDN18.2 expression was not independently associated with survival (HR = 1.37, 95% CI 0.94-2.00, p = 0.11). [Figure 2 ] 3.4 CLDN18.2 expression is stable across pathologic stages One-way ANOVA across Stage I (n = 64), Stage II (n = 140), Stage III (n = 178), and Stage IV (n = 43) showed no significant difference in CLDN18.2 expression (F = 0.461, p = 0.71), indicating that its heterogeneous distribution is independent of disease progression. [Figure 3 ] Table 1 Summary of key statistical results Analysis Dataset Result Statistic p-value TANK Ranking TCGA-STAD Rank 89/60,660 Top 0.15% - TANK Validation GSE26942 Rank 24/36,157 Top 0.07% - KM Survival TCGA-STAD Not significant Log-rank 0.92 Cox Regression TCGA-STAD HR = 1.37 Multivariate 0.11 Stage ANOVA TCGA-STAD Not significant F = 0.461 0.71 4. Discussion TANK identifies CLDN18.2 as a high-variance outlier in gastric cancer using variance ranking alone, independently validating a target currently in Phase III clinical use. The simplicity of the method is a feature, not a limitation: variance is a parameter-free, assumption-free measure of inter-patient heterogeneity that requires no case-control design or normalization. The absence of prognostic significance across KM, univariate Cox, and multivariate Cox analyses, combined with stage-independent expression, supports CLDN18.2's role as a companion diagnostic target rather than a driver of disease progression. This dissociation between high heterogeneity and survival impact may represent a general characteristic of actionable immunotherapy targets - genes whose clinical value lies in patient stratification, not prognosis. Limitations include the use of bulk RNA-seq, which cannot resolve intra-tumoral heterogeneity, and the absence of protein-level validation. The current analysis is restricted to gastric cancer; pan-cancer extension and integration of normal tissue expression penalties (GTEx) are planned for future work. 5. Conclusion TANK provides a simple, reproducible, and computationally accessible framework for identifying heterogeneous tumor antigens. Applied to gastric cancer, it independently identifies CLDN18.2 as a top-ranked target and characterizes it as a stage-independent companion diagnostic marker. The method is freely available and requires only standard RNA-seq count data. Pan-cancer extension of this framework across 33 TCGA cancer types, including a four-mode therapeutic antigen classification and prospective nomination of CLDN6, is presented in a companion analysis. Declarations Data Availability TCGA-STAD RNA-seq data: https://gdc-hub.s3.us-east-1.amazonaws.com/download/TCGA-STAD.star_counts.tsv.gz Survival data: https://tcga-xena-hub.s3.us-east-1.amazonaws.com/download/survival%2FSTAD_survival.txt GEO dataset: GSE26942 ( https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE26942 ) Code: https://github.com/ohahouhui/AI-CAR-Loop-1.0 References Shitara K et al (2023) Zolbetuximab plus mFOLFOX6 in patients with CLDN18.2-positive, HER2-negative, untreated, locally advanced unresectable or metastatic gastric or gastro-oesophageal junction adenocarcinoma (SPOTLIGHT): a multicentre, randomised, double-blind, phase 3 trial. Lancet 401(10389):1655–1668 Goldman MJ et al (2020) Visualizing and interpreting cancer genomics data via the Xena platform. Nat Biotechnol 38:675–678 Barrett T et al (2013) NCBI GEO: archive for functional genomics data sets - update. Nucleic Acids Res 41:D991–995 Liu J et al (2018) An Integrated TCGA Pan-Cancer Clinical Data Resource to Drive High-Quality Survival Outcome Analytics. Cell 173(2):400–416 Mauda-Havakuk M et al (2022) CLDN18.2 expression in gastrointestinal malignancies. Oncotarget 13:896–907 Additional Declarations The authors declare no competing interests. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-9183044","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":609732537,"identity":"a97f690d-617f-4423-92cb-fa82d1b1c227","order_by":0,"name":"XIAOQI HU","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA+UlEQVRIiWNgGAWjYJCCA4x/2OTs2/s/PgByePiI09LAZ2zAc8DYAKSFjShrGBvkEjdIJJhJgDgEtci39x488HOHWeJ2hoS0yq85djJsDMwPH93Ao8XgzLmEg71n0ox3Nhw4dlt2WzLQYWzGxjn4tEjkGBzgYTsm23Cwse225DZmoBYeNml8WuRn5Bgc/MP2n7HhMDNbseS2esJaGG7kGBzmbWNT3HCMjY3x47bDhLUYnDljcFjmDJuxZA8PszTjtuM8bMwE/CLf3mP88U0Fmxy//BvGjz+3Vdvzszc/fIzXYciAmQdMEqscBBh/kKJ6FIyCUTAKRgwAAMqYSkbSxxrIAAAAAElFTkSuQmCC","orcid":"https://orcid.org/0009-0009-9829-6404","institution":"Independent Researcher","correspondingAuthor":true,"prefix":"","firstName":"XIAOQI","middleName":"","lastName":"HU","suffix":""}],"badges":[],"createdAt":"2026-03-21 04:19:07","currentVersionCode":1,"declarations":{"humanSubjects":false,"vertebrateSubjects":false,"conflictsOfInterestStatement":false,"humanSubjectEthicalGuidelines":false,"humanSubjectConsent":false,"humanSubjectClinicalTrial":false,"humanSubjectCaseReport":false,"vertebrateSubjectEthicalGuidelines":false},"doi":"10.21203/rs.3.rs-9183044/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-9183044/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":105310898,"identity":"fe4857c7-ca4d-4358-aaf1-ebf626d8db09","added_by":"auto","created_at":"2026-03-24 15:12:59","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":240403,"visible":true,"origin":"","legend":"\u003cp\u003eMean-variance plot of all 60,660 genes in TCGA-STAD (n=448) following log2(count+1) normalization. Key therapeutic targets are highlighted. CLDN18.2, MUC2, and OLFM4 appear as consistent outliers above the mean-variance trend. HER2 and EGFR fall within the expected variance range.\u003c/p\u003e","description":"","filename":"Figure01meanvariance.png","url":"https://assets-eu.researchsquare.com/files/rs-9183044/v1/4364597021b8601f499f51d1.png"},{"id":105310987,"identity":"fba1269f-550a-41f0-8169-4d89c7c36144","added_by":"auto","created_at":"2026-03-24 15:13:14","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":66677,"visible":true,"origin":"","legend":"\u003cp\u003eKaplan-Meier overall survival curves for CLDN18.2 high (n=220) and low (n=221) expression groups in TCGA-STAD. Log-rank p=0.92. No significant survival difference was observed.\u003c/p\u003e","description":"","filename":"Figure02CLDN18survival.png","url":"https://assets-eu.researchsquare.com/files/rs-9183044/v1/67221aa0f82e886ac5954f00.png"},{"id":105565450,"identity":"c604fa66-ac77-4df7-9ca8-3f0256c93f81","added_by":"auto","created_at":"2026-03-27 12:53:17","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":55079,"visible":true,"origin":"","legend":"\u003cp\u003eCLDN18.2 expression by pathologic stage (I-IV) in TCGA-STAD (n=425). One-way ANOVA F=0.461, p=0.71. Expression levels are consistent across all disease stages.\u003c/p\u003e","description":"","filename":"Figure03CLDN18stage.png","url":"https://assets-eu.researchsquare.com/files/rs-9183044/v1/99b0ba2dbea053fbd97846a9.png"},{"id":105569707,"identity":"4eaa3857-77a1-4218-87ef-a1985999cc66","added_by":"auto","created_at":"2026-03-27 13:13:12","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":807371,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-9183044/v1/88243ded-d76c-43db-8c99-e9c1c21a4f76.pdf"}],"financialInterests":"The authors declare no competing interests.","formattedTitle":"\u003cp\u003eTANK: A Variance-Based Framework for Identifying Heterogeneous Therapeutic Targets in Gastric Cancer\u003c/p\u003e","fulltext":[{"header":"1. Introduction","content":"\u003cp\u003ePrecision oncology depends on the accurate identification of tumor-specific antigens suitable for targeted therapy. Conventional approaches such as differential expression analysis identify genes with consistently high expression across patient cohorts, but may overlook antigens whose clinical value lies in their heterogeneous distribution across patients.\u003c/p\u003e \u003cp\u003eCLDN18.2 (Claudin 18.2) is a tight junction protein selectively expressed in gastric epithelium and has emerged as a validated therapeutic target in gastric cancer. Zolbetuximab, a CLDN18.2-targeting antibody, demonstrated significant survival benefit in Phase III clinical trials, establishing CLDN18.2 as the first companion diagnostic target in gastric cancer.\u003c/p\u003e \u003cp\u003eHere we propose TANK (Tumor Antigen prioritization by variance-based raNKing), a simple variance-based ranking method designed to capture inter-patient heterogeneity as a proxy for therapeutic target potential. We demonstrate that TANK independently identifies CLDN18.2 as a top-ranked gene using variance alone, and we characterize its relationship to patient survival and disease stage.\u003c/p\u003e"},{"header":"2. Methods","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003e2.1 Data Sources\u003c/h2\u003e \u003cp\u003eRNA-seq STAR count data for gastric adenocarcinoma (TCGA-STAD, n\u0026thinsp;=\u0026thinsp;448) were obtained from the UCSC Xena hub (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://gdc-hub.s3.us-east-1.amazonaws.com\u003c/span\u003e\u003cspan address=\"https://gdc-hub.s3.us-east-1.amazonaws.com\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e). Overall survival and clinical staging data were downloaded from the TCGA Pan-Cancer Clinical Data Resource. Independent validation used microarray data from GSE26942 (n\u0026thinsp;=\u0026thinsp;217, Illumina HumanHT-12 V3.0), retrieved from the NCBI Gene Expression Omnibus.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec4\" class=\"Section2\"\u003e \u003ch2\u003e2.2 TANK Method\u003c/h2\u003e \u003cp\u003eGene expression variance was calculated across all patient samples using raw count values. Genes were ranked in descending order of variance across 60,660 genes. To reduce technical noise, sex-linked genes (XIST, RPS4Y1, KDM5D) and non-coding RNA (MALAT1) were excluded prior to final ranking.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec5\" class=\"Section2\"\u003e \u003ch2\u003e2.3 Statistical Analysis\u003c/h2\u003e \u003cp\u003eMean-variance relationships were visualized on log-log axes following log2(count\u0026thinsp;+\u0026thinsp;1) transformation. Survival analysis used Kaplan-Meier curves with log-rank testing and univariate/multivariate Cox proportional hazards regression adjusting for pathologic stage and age at diagnosis. Stage-stratified expression differences were assessed using one-way ANOVA. All analyses were performed in Python 3.14 using pandas, numpy, matplotlib, scipy, and lifelines libraries.\u003c/p\u003e \u003c/div\u003e"},{"header":"3. Results","content":"\u003cdiv id=\"Sec7\" class=\"Section2\"\u003e \u003ch2\u003e3.1 CLDN18.2 ranks in the top 0.15% by variance\u003c/h2\u003e \u003cp\u003eApplying TANK to TCGA-STAD (n\u0026thinsp;=\u0026thinsp;448, 60,660 genes), CLDN18.2 ranked 89th overall (top 0.15%, variance\u0026thinsp;=\u0026thinsp;12.91). For comparison, HER2 ranked 5,300th (top 8.74%) and EGFR ranked 10,364th (top 17.09%), demonstrating that CLDN18.2 exhibits substantially greater inter-patient expression heterogeneity than other established gastric cancer targets.\u003c/p\u003e \u003cp\u003eIn the independent GSE26942 microarray dataset (n\u0026thinsp;=\u0026thinsp;217), CLDN18.2 ranked 24th out of 36,157 probes (top 0.07%), confirming cross-platform reproducibility of the TANK ranking.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003e3.2 CLDN18.2 is a mean-variance outlier\u003c/h2\u003e \u003cp\u003eMean-variance analysis following log2 normalization confirmed CLDN18.2, MUC2, and OLFM4 as consistent outliers, with variance substantially exceeding that of genes with comparable mean expression levels. HER2 and EGFR, while highly expressed, fell within the expected mean-variance trend, indicating that their inter-patient variation is consistent with background transcriptional noise rather than true biological heterogeneity. [Figure \u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e]\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec9\" class=\"Section2\"\u003e \u003ch2\u003e3.3 CLDN18.2 expression is not associated with overall survival\u003c/h2\u003e \u003cp\u003eKaplan-Meier analysis showed no significant difference in overall survival between CLDN18.2 high and low expression groups (log-rank p\u0026thinsp;=\u0026thinsp;0.92, n\u0026thinsp;=\u0026thinsp;441). Multivariate Cox regression confirmed pathologic stage (HR\u0026thinsp;=\u0026thinsp;1.78, p\u0026thinsp;\u0026lt;\u0026thinsp;0.005) and age (HR\u0026thinsp;=\u0026thinsp;1.03, p\u0026thinsp;\u0026lt;\u0026thinsp;0.005) as independent prognostic factors, while CLDN18.2 expression was not independently associated with survival (HR\u0026thinsp;=\u0026thinsp;1.37, 95% CI 0.94-2.00, p\u0026thinsp;=\u0026thinsp;0.11). [Figure \u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e]\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec10\" class=\"Section2\"\u003e \u003ch2\u003e3.4 CLDN18.2 expression is stable across pathologic stages\u003c/h2\u003e \u003cp\u003eOne-way ANOVA across Stage I (n\u0026thinsp;=\u0026thinsp;64), Stage II (n\u0026thinsp;=\u0026thinsp;140), Stage III (n\u0026thinsp;=\u0026thinsp;178), and Stage IV (n\u0026thinsp;=\u0026thinsp;43) showed no significant difference in CLDN18.2 expression (F\u0026thinsp;=\u0026thinsp;0.461, p\u0026thinsp;=\u0026thinsp;0.71), indicating that its heterogeneous distribution is independent of disease progression. [Figure \u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e]\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eSummary of key statistical results\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"No\" id=\"Taba\" border=\"1\"\u003e \u003ccolgroup cols=\"5\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAnalysis\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eDataset\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eResult\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eStatistic\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003ep-value\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTANK Ranking\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eTCGA-STAD\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eRank 89/60,660\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eTop 0.15%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTANK Validation\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eGSE26942\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eRank 24/36,157\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eTop 0.07%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eKM Survival\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eTCGA-STAD\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eNot significant\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eLog-rank\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.92\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCox Regression\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eTCGA-STAD\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eHR\u0026thinsp;=\u0026thinsp;1.37\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eMultivariate\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.11\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eStage ANOVA\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eTCGA-STAD\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eNot significant\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eF\u0026thinsp;=\u0026thinsp;0.461\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.71\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e"},{"header":"4. Discussion","content":"\u003cp\u003eTANK identifies CLDN18.2 as a high-variance outlier in gastric cancer using variance ranking alone, independently validating a target currently in Phase III clinical use. The simplicity of the method is a feature, not a limitation: variance is a parameter-free, assumption-free measure of inter-patient heterogeneity that requires no case-control design or normalization.\u003c/p\u003e \u003cp\u003eThe absence of prognostic significance across KM, univariate Cox, and multivariate Cox analyses, combined with stage-independent expression, supports CLDN18.2's role as a companion diagnostic target rather than a driver of disease progression. This dissociation between high heterogeneity and survival impact may represent a general characteristic of actionable immunotherapy targets - genes whose clinical value lies in patient stratification, not prognosis.\u003c/p\u003e \u003cp\u003eLimitations include the use of bulk RNA-seq, which cannot resolve intra-tumoral heterogeneity, and the absence of protein-level validation. The current analysis is restricted to gastric cancer; pan-cancer extension and integration of normal tissue expression penalties (GTEx) are planned for future work.\u003c/p\u003e"},{"header":"5. Conclusion","content":"\u003cp\u003eTANK provides a simple, reproducible, and computationally accessible framework for identifying heterogeneous tumor antigens. Applied to gastric cancer, it independently identifies CLDN18.2 as a top-ranked target and characterizes it as a stage-independent companion diagnostic marker. The method is freely available and requires only standard RNA-seq count data. Pan-cancer extension of this framework across 33 TCGA cancer types, including a four-mode therapeutic antigen classification and prospective nomination of CLDN6, is presented in a companion analysis.\u003c/p\u003e"},{"header":"Declarations","content":"\u003ch2\u003eData Availability\u003c/h2\u003e \u003cp\u003eTCGA-STAD RNA-seq data: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://gdc-hub.s3.us-east-1.amazonaws.com/download/TCGA-STAD.star_counts.tsv.gz\u003c/span\u003e\u003cspan address=\"https://gdc-hub.s3.us-east-1.amazonaws.com/download/TCGA-STAD.star_counts.tsv.gz\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e \u003cp\u003eSurvival data: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://tcga-xena-hub.s3.us-east-1.amazonaws.com/download/survival%2FSTAD_survival.txt\u003c/span\u003e\u003cspan address=\"https://tcga-xena-hub.s3.us-east-1.amazonaws.com/download/survival%2FSTAD_survival.txt\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e \u003cp\u003eGEO dataset: GSE26942 (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE26942\u003c/span\u003e\u003cspan address=\"https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE26942\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e)\u003c/p\u003e \u003cp\u003eCode: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/ohahouhui/AI-CAR-Loop-1.0\u003c/span\u003e\u003cspan address=\"https://github.com/ohahouhui/AI-CAR-Loop-1.0\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eShitara K et al (2023) Zolbetuximab plus mFOLFOX6 in patients with CLDN18.2-positive, HER2-negative, untreated, locally advanced unresectable or metastatic gastric or gastro-oesophageal junction adenocarcinoma (SPOTLIGHT): a multicentre, randomised, double-blind, phase 3 trial. Lancet 401(10389):1655\u0026ndash;1668\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGoldman MJ et al (2020) Visualizing and interpreting cancer genomics data via the Xena platform. Nat Biotechnol 38:675\u0026ndash;678\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBarrett T et al (2013) NCBI GEO: archive for functional genomics data sets - update. Nucleic Acids Res 41:D991\u0026ndash;995\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLiu J et al (2018) An Integrated TCGA Pan-Cancer Clinical Data Resource to Drive High-Quality Survival Outcome Analytics. Cell 173(2):400\u0026ndash;416\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMauda-Havakuk M et al (2022) CLDN18.2 expression in gastrointestinal malignancies. Oncotarget 13:896\u0026ndash;907\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"Shanghai Normal University","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"CLDN18.2, gastric cancer, variance-based ranking, tumor heterogeneity, companion diagnostics, TANK method, TCGA","lastPublishedDoi":"10.21203/rs.3.rs-9183044/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-9183044/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003ch2\u003eBackground\u003c/h2\u003e \u003cp\u003eIdentifying therapeutically relevant tumor antigens from large-scale transcriptomic data remains a central challenge in precision oncology. Complex computational methods are often employed, yet simple statistical approaches remain underexplored.\u003c/p\u003e\u003ch2\u003eMethods\u003c/h2\u003e \u003cp\u003eWe developed TANK (Tumor Antigen prioritization by variance-based raNKing), a simple variance-based framework for identifying highly heterogeneous tumor antigens. Using raw RNA-seq count data from TCGA-STAD (n\u0026thinsp;=\u0026thinsp;448, 60,660 genes), genes were ranked by expression variance across patients. Results were validated in an independent microarray dataset (GSE26942, n\u0026thinsp;=\u0026thinsp;217).\u003c/p\u003e\u003ch2\u003eResults\u003c/h2\u003e \u003cp\u003eCLDN18.2 ranked 89th out of 60,660 genes (top 0.15%) in TCGA-STAD, and 24th out of 36,157 probes (top 0.07%) in GSE26942. Mean-variance analysis confirmed CLDN18.2 as a consistent outlier across both platforms. Kaplan-Meier and multivariate Cox regression analyses adjusting for stage and age showed no significant association between CLDN18.2 expression and overall survival (HR\u0026thinsp;=\u0026thinsp;1.37, p\u0026thinsp;=\u0026thinsp;0.11). CLDN18.2 expression remained stable across pathologic stages I-IV (ANOVA p\u0026thinsp;=\u0026thinsp;0.71).\u003c/p\u003e\u003ch2\u003eConclusions\u003c/h2\u003e \u003cp\u003eTANK identifies clinically validated therapeutic targets through variance alone. The dissociation between high variance and survival significance positions CLDN18.2 as a companion diagnostic target rather than a prognostic marker - a distinction critical for precision patient selection in targeted therapy.\u003c/p\u003e","manuscriptTitle":"TANK: A Variance-Based Framework for Identifying Heterogeneous Therapeutic Targets in Gastric Cancer","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-03-24 15:10:54","doi":"10.21203/rs.3.rs-9183044/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"4b070e46-80c9-4d9f-a993-e43cfdb56da2","owner":[],"postedDate":"March 24th, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":64884430,"name":"Cancer Biology"}],"tags":[],"updatedAt":"2026-03-24T15:10:56+00:00","versionOfRecord":[],"versionCreatedAt":"2026-03-24 15:10:54","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-9183044","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-9183044","identity":"rs-9183044","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.