Repurposing public sarcoma multi-omics for neoantigen discovery | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Repurposing public sarcoma multi-omics for neoantigen discovery Panagiotis Mantas, Karen A. Krogfelt This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-8854019/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 21 Apr, 2026 Read the published version in Cancer Immunology, Immunotherapy → Version 1 posted 7 You are reading this latest preprint version Abstract Background : Soft tissue sarcomas, particularly Complex Karyotype Sarcomas (CKS), are characterized as "immunologically cold" malignancies driven by structural instability rather than a high tumor mutational burden (TMB). Public “legacy” cohorts are a useful resource to uncover immunotherapy biomarkers. This study used the Whole Exome Sequencing (WES) and RNA-sequencing of CKS patients, to overcome technical limitations and to identify and prioritize neoantigens. Methods : The systematic immunogenomic reanalysis was performed on a landmark cohort of CKS patients (Kim et al., 2018) with a custom bioinformatics workflow which was developed to uncover interpretable immunogenomic signals. This approach consisted of: (1) defining a quality-controlled "callable territory" and normalizing TMB metrics respectively; (2) utilizing RNA-seq not only for expression filtering but as an orthogonal validation check for variant transcription and to distinguish functional amplifications from technical depth artifacts; and (3) applying a multi-modal epitope prediction pipeline to identify and prioritize high-affinity neoantigens derived from both somatic SNVs indels and expressed gene fusions. Results : The reanalysis shows that standard genome-wide metrics frequently underestimated the immunogenic potential. Normalizing the TMB refined the identification of hypermutated and microsatellite instability-like phenotypes. Furthermore, integration of transcriptomic data facilitated the recovery of actionable targets in "low-TMB" tumors. A subset of fusion-derived peptides demonstrated predicted binding affinities competitive with SNV-derived candidates. Conclusion : This study illustrates that technically constrained multi-omic datasets can be systematically re-analyzed to identify potential therapeutic targets. These data argue for looking beyond aggregate biomarkers; patient-specific, expressed neoepitopes may exist even in sarcomas typically described as immunologically “cold”. RNA-guided neoantigens Complex Karyotype Sarcomas Immunogenomics Legacy sequencing Figures Figure 1 Figure 2 Figure 3 Figure 4 Introduction Soft tissue sarcomas are rare cancers with diverse biology. They have historically shown limited responsiveness to immune checkpoint blockade (ICB), particularly when compared with "hot" tumors such as melanoma or lung cancer [ 1 , 2 ]. Standard biomarkers, such as Tumor Mutational Burden (TMB), fall short on predicting immunotherapy response in cancers such as sarcomas because they overlook non-SNV neoantigen sources [ 3 – 5 ]. This metric is not well-suited for sarcomas, which are broadly divided into two genomic classes: "Simple Karyotype" sarcomas (driven by specific translocations like EWSR1-FLI1) and "Complex Karyotype" sarcomas (CKS), which are driven by chaotic copy number alterations and structural rearrangements rather than high SNV loads [ 6 ]. Thus, CKS tumors are frequently characterized as "immunologically cold", despite possessing significant neoantigen potential driven by high genomic instability, copy number alterations and DNA damage response (DDR) pathway defects [ 7 – 9 ]. Personalized vaccines (or shared neoantigens) remain still a promising frontier [ 10 – 15 ]. Identifying immunogenic targets within this structural chaos could open new therapeutic paths for patients who currently have few options. While a handful of genomic studies in sarcoma have explicitly evaluated neoantigens to inform survival associations or support vaccine design [ 16 , 17 ], most large-scale paired WES and RNA-seq efforts have pursued different objectives. Cohorts have primarily focused on pinpointing core molecular drivers (e.g., PI3K/mTOR vulnerabilities in osteosarcoma or chromatin remodeling mutations in UPS), defining molecular subtypes through integrated clustering as seen in rhabdomyosarcoma classification [ 18 – 20 ]. These analyses prioritized differentiation programs, copy number variations (e.g., CDK4/RB1 signatures in Complex Karyotype Sarcoma) and transcriptional states to nominate drug targets (like PDGFR) and explain disease biology rather than performing systematic neoantigen discovery or immune-focused tumor characterization [ 21 ]. Yet these same datasets contain the exact raw data required for immunogenomics: somatic variant calls, expression evidence, and immune context. As a result, neoantigen-level information in these cohorts has not been systematically extracted. The raw sequencing is already public; reanalyzing it with current neoantigen prediction and prioritization workflows can uncover immunogenic determinants that were outside the scope of the original driver- and subtype-centric analyses, possibly yielding actionable vaccine targets without new patient samples. To bridge this translational gap, a systematic immunogenomic re-analysis was performed by Kim et al. CKS cohort using a state-of-the-art bioinformatics framework [ 21 ]. By integrating thoroughly quality control measures, which successfully recapitulated the original molecular classifications (e.g., CDK4-amplified subtypes), this study established a foundation for extending the study’s scope beyond driver identification. The presented approach specifically maps the neoantigen landscape across distinct genomic architectures, contrasting purported 'Immunogenic Hotspots' against the traditionally viewed 'Immunogenic Deserts' of Complex Karyotype Sarcomas, to isolate overlooked vaccine candidates. This is the first comprehensive profiling of the SNV- and fusion-derived neoantigen landscape in this specific CKS cohort. By applying a multi-modal epitope prediction pipeline, the results revealed that even these phenotypically 'cold' tumors, often excluded from immunotherapy trials, harbor high-affinity, patient-specific targets born from their profound genomic instability. Materials and Methods Data Acquisition Raw whole-exome sequencing (WES) and RNA-seq paired-end FASTQ files were retrieved from the European Nucleotide Archive (ENA) under the accession number PRJEB23898. These data correspond to the cohort originally described by Kim et al. (2018), comprising 14 patients with complex karyotype soft tissue sarcomas (CKS) [ 21 ]. To enable deep multi-modal characterization while sustaining computational feasibility, a representative discovery set of six samples were selected, spanning the genomic subtypes defined in the first study. Stratified group of MSI-High/Hypermutators (n = 3): UT05, FT13 and LT02 and group of Chromosomal Stable ("Desert") (n = 3): UT01, UT08 and LT01. This grouping captures the major biological axes identified by Kim et al. (hypermutation vs. chromosomal stability, treatment-naïve vs. chemotherapy-exposed) and allows for a direct comparison of neoantigen landscapes across mechanistically distinct tumor subtypes. Genomic Preprocessing & Alignment Whole-exome preprocessing and alignment Raw WES FASTQ files were assessed with FastQC (v0.12.1) [ 22 ] to evaluate base quality, adapter contamination, and over-represented sequences. Illumina TruSeq adapter sequences and low-quality bases were trimmed using Cutadapt (v5.2) with a minimum Phred score of 20 and minimum post-trimmed read length of 50 bp [ 23 ]. Trimmed reads were aligned to the GRCh38/hg38 reference genome (updated from the original hg19 analysis to ensure compatibility with current genomic resources) using BWA-MEM (v0.7.19) [ 24 ]. PCR and optical duplicates were marked with Broad’s institute tool Picard MarkDuplicates(v3.4.0)( http://broadinstitute.github.io/picard ). Following GATK Best Practices, the gnomAD resource was used for germline filtering in Mutect2 and GetPileupSummaries software. Because the original study used hg19, all analyses in this study were performed de novo on GRCh38 to ensure compatibility with current annotation resources and HLA/neoantigen tools. The GRCh38/hg38 human reference genome and transcriptome data were downloaded from the Gencode Project website ( https://www.gencodegenes.org/ ) [ 25 ]. RNA Alignment & Quantification Tumor RNA-seq FASTQs underwent quality control with FastQC and adapter/quality trimming with Cutadapt as above. Reads were then processed through two complementary methods STAR alignment used for fusion detection and expression-aware mapping. Reads were aligned to GRCh38 (with GENCODE transcriptome) using STAR (v2.7.10b) in two-pass mode with chimeric detection enabled to preserve fusion-relevant soft-clipped and discordant reads [ 26 ]. Transcript-level quantification for expression filtering. In parallel, pseudo-alignment and transcript quantification were performed with Kallisto (v0.51.1) using GENCODE-derived transcript indices [ 27 ]. Transcript-level TPMs were summed to gene-level TPMs and later used to filter neoantigen candidates by expression. Somatic Variant Calling Somatic mutations encompass SNVs and indels, including frameshift mutations were identified using GATK Mutect2 (v4.6.2.0) in tumor–normal mode, with each tumor BAM compared to its patient-matched germline WES BAM [ 28 ]. Default Mutect2 filters were applied via FilterMutectCalls, followed by additional hard and strict filters: a) PASS in Mutect2 filter field i.e. passed all gatk tests such as contamination, strand bias and generic bias. b) Tumor depth ≥ 10× at the variant site Variants were functionally annotated with Ensembl VEP (v110), restricting downstream analysis to protein-coding missense SNVs, as well as in-frame and frameshift indels, while excluding synonymous and nonsense (stop-gain) mutations. The dataset showed heavy capture bias with mean depth of 29.3x (range: 15.0–39.2x) so TMB was normalized carefully. The numerator was restricted to non-synonymous somatic mutations (missense, frameshift, and in-frame insertions/deletions) identified within the high-confidence intervals. To prevent TMB underestimation due to varying capture efficiency, the denominator was defined as the sample-specific 'callable coding territory, consisting of protein-coding CDS regions (GENCODE v49) that achieved a minimum of 10x depth. This normalization avoided underestimating mutation load in poorly captured regions. The MSI status for samples came directly from Kim et al’s original hypermutator analysis, which identified MSI/MMR-associated mutational signatures and expression patterns in the hypermutator subset. Transition-to-transversion (Ti/Tv) ratios were determined using bcftools stats (v1.17) based on somatic variants passing all quality filters (PASS) with a minimum tumor read depth of 10x, restricted to the exome capture regions. Fusion detection from RNA-seq To identify immunogenic targets derived from structural variations, fusion transcripts were characterized using Arriba (v2.4.0) on STAR-aligned BAMs. Default Arriba annotations (blacklists for recurrent artifacts, read-through events, and immunoglobulin artifacts) were used [ 29 ]. High-confidence fusions meeting the following criteria were retained: At least 5 junction (split-read) reads and/or discordant read pairs Not present in Arriba's internal blacklist (e.g., recurrent technical artifacts, rRNA, mitochondrial genes, or same-gene read-throughs). Only fusions classified as 'high' confidence by Arriba were considered. In-Frame Events: Fusions were restricted to in-frame sequences to ensure a plausible open reading frame (ORF) for peptide generation. Expression metric: Fusion abundance was quantified using Fragment Families Per Million (FFPM), derived from fusion-specific support reads normalized by total mapped library depth. HLA class I typing Patient-specific HLA class I genotypes (HLA-A, -B, -C; 4-digit resolution) were inferred from tumor RNA-seq using OptiType (v1.3.5), which utilizes coverage patterns across exons 2 and 3 to reconstruct the most likely allele combination. HLA typing was performed once per patient and used for all downstream binding predictions [ 30 ]. Peptide generation from somatic variants and fusions SNV/indel-derived peptides : For each non-synonymous somatic variant passing filters, the corresponding mutant protein sequence was reconstructed based on VEP annotation. A sliding window approach encompassing all possible 8- to 11-mers containing the mutation centered on the mutated residue was extracted where possible (shorter for events near termini). From this window, all overlapping 8-mer to 11-mer peptides that included the mutant residue were generated, as these lengths account for the majority of MHC class I ligands. Frameshift and in-frame indels were handled similarly, using the mutant reading frame downstream of the variant to generate overlapping 8–11-mers from the altered region. Fusion-derived peptides For each in-frame gene fusion, the predicted chimeric coding sequence spanning the fusion junction was obtained from Arriba’s annotation. This junction region was translated, and all possible 8–11-mer peptides that span the breakpoint (i.e., include amino acids from both fusion partners) were generated. Only peptides that included the junction were retained, ensuring that fusion-derived neoantigens were strictly non-self. MHC binding prediction and neoantigen filtering Peptide–MHC class I binding was predicted using NetMHCpan-4.2, using each patient’s OptiType-inferred HLA-A/B/C alleles. For each peptide–HLA pair, NetMHCpan reported an affinity rank percentile; peptides with rank 2.0% were heavily penalized by the downstream logistic scoring function, effectively removing them from consideration. To reduce false positives and prioritize clinically relevant targets, was MuPeXI-based multi-modal scoring framework was implemented in Python (adapted for Python 3.11 and NetMHCpan-4.2) [ 31 ]: Binding affinity component: Strong binding: NetMHCpan rank < 0.5% Weak binding: 0.5–2.0% Peptides outside 2.0% were assigned near-zero priority scores by the logistic function, effectively deprioritizing them from the final candidate list Expression component: Source gene TPM ≥ 1 (gene-level expression from Kallisto). Peptides from genes with TPM < 1 were excluded, approximating insufficient antigen supply. Clonality component: Variant VAF was used as a linear weighting factor for clonality, essentially deprioritizing subclonal events in the final ranking For fusions, where VAF is not directly measurable from RNA-seq data, normalized fusion abundance (FFPM) was used as a proxy for expression within the scoring framework, and variants were treated as clonal (VAF = 1.0) given their status as pathognomonic drivers Self-similarity filter: Each candidate peptide was checked for exact matches elsewhere in the human reference proteome (Gencode v49), and those with 100% identity to self-peptides were assigned a priority score of zero Neoantigen priority score was computed for ranking, as a weighted combination of: Inverse binding rank (stronger binders score higher) Source gene expression was incorporated using a hyperbolic tangent (tanh) transformation, as described in the MuPeXI framework VAF (for SNV/indels) or normalized junction read count (for fusions) A foreignness component was calculated using an agretopicity index, comparing the predicted binding rank of the mutant peptide with that of its wild-type counterpart. This score was used to rank SNV- and fusion-derived neoantigens within each tumor, and the top-ranked peptides formed the focus of downstream interpretation. Reproducibility & Data Availability The entire analysis workflow was implemented in Snakemake (v9.14.5) to ensure reproducibility and scalability across cohorts [ 33 ]. Each major step (QC, alignment, variant calling, fusion detection, HLA typing, peptide generation, MHC binding prediction, scoring) is encapsulated as a separate rule with explicit input/output and software environment definitions. The conda environments were pinned to specific tool versions to maximize reproducibility [34]. The pipeline, along with configuration files and environment specifications, is publicly available at [ https://github.com/mantaspanos/sarcoma-multiomics-neoantigens ]. Results Genomic Quality and Strategic Repurposing of Legacy Data Paired WES and RNA-seq data were repurposed to ask whether CKS tumors yield actionable neoantigens despite the technically constrained legacy datasets with low coverage. After aggressive QC it was possible to stll call high variant Allele Frequency (VAF) neoantigens. Because CKS tumors are notoriously heterogeneous, candidates with low VAF were deprioritized to address the intratumoral heterogeneity. Initial quality control of the WES data revealed a challenging sequencing output. The assay targeted an extensive 192 Mb exome footprint but achieved a modest mean depth of 39.2x with a highly uneven coverage distribution (max depth 4,735x). Only 31% of the target territory achieved ≥ 10x depth, reflecting a substantial capture bias across coding and non-coding regions. Uneven Capture Efficiency and Coverage Strategy and Distinguishing Technical Bias from Biological Signal The UT05 sample illustrated the problem starkly (Fig. 1 -a) since only 27% of the target remained usable at 10x depth. To preserve cohort-wide comparability while maintaining mutation sensitivity, the callable territory was thus defined using a ≥ 10x threshold for downstream analyses. Figure 1 -b shows the depth coverage which indicated that many loci with extreme coverage likely reflect technical capture bias rather than biological amplification, as evidenced by hyper-covered regions shared between tumor and matched normal samples. To identify candidate copy-number events, selected loci were evaluated for multi-omic concordance. In sample LT01, the CDK4 and MDM2 loci displayed a distinct, concordant outlier profile: elevated locus depth (~ 353x for CDK4) accompanied by extreme transcriptional abundance (64,344 TPM). This dual signal is consistent with the 12q amplicon-driven phenotype described in the CKS molecular classification (Fig. 2 ). In contrast, sporadic depth spikes observed at isolated exons in other samples lacked commensurate RNA overexpression, supporting their classification as artifacts. To further mitigate the risk of underestimating immunogenicity in this uneven landscape, the Tumor Mutational Burden (TMB) was normalized to the sample-specific callable protein-coding territory (CDS regions with ≥ 10x coverage). This integrative approach reliably "rescued" the immunogenomic classification of the cohort, correctly identifying hypermutators (FT13, LT02) and fusion-driven targets (UT08) that would have been obscured by standard genome-wide analysis. Figure 2 DNA–RNA Concordance Validates the 12q15 Amplicon Signature. Integration of locus-specific WES depth and RNA-seq abundance reveals concordant outliers for both CDK4 and MDM2 in sample LT01, consistent with functional amplification. While other samples show sporadic depth increases (attributable to capture bias), only LT01 exhibits commensurate transcriptional overexpression. TPM values are shown for qualitative concordance not cross-sample quantitative comparison though Molecular stratification into immunogenic archetypes Consistent with the molecular framework reported by Kim et al., the data suggest a hypermutator subset characterized by elevated nonsynonymous TMB and a transition-enriched SNV spectrum in a subset of cases, suggestive of mismatch repair deficiency-like mutagenesis (Table 1 ). Within the hypermutator group, however, mutational spectra were heterogeneous: FT13 and UT05 showed markedly elevated Ti/Tv ratios, whereas LT02 remained highly mutated by TMB yet exhibited a distinct Ti/Tv profile, indicating that “hypermutation” in this cohort is not mechanistically uniform. To support downstream neoantigen interpretation without over-attributing biology individually, the following metrics were integrated i.e. mutation burden (nonsynonymous TMB), mutational spectrum (Ti/Tv as a transparency/QC descriptor), structural variation (expressed fusions), and driver context into three practical immunogenomic archetypes: (i) hypermutator (MSI-like or non–MSI-like), (ii) SNV-low/structural-driven, and (iii) SNV-low “desert” (including amplification-driven cases). Cross-validation against Kim et al. confirmed archetype-biologic alignment such as TP53/ATRX/PTEN alterations enriched in hypermutators (Supplemental Table S1 ) and CDK4 amplification in LT01. This analysis broadens these conclusions by identifying expressed structural variants (BLTP3B::NF1, KDM2A::MYH9) as immunogenic sources in SNV-low contexts (Table S3). Notably, LT02 generated 41 fusion-derived candidates versus zero in naïve MSI-like samples (FT13/UT05), suggesting that therapy may enrich junction neoantigens with enhanced foreignness potential. Lastly, Table 1 shows that samples UT08 and LT01 although they have low number of SNVs (SNV-low) they exhibit structural-driven potential. Fusion neoepitopes and structural complexity provide complementary immunogenomic sources beyond the SNV burden alone. Table 1 Immunogenomic profile of the cohort: summarizes cohort-level immunogenomic features, including nonsynonymous TMB normalized to sample-specific callable protein-coding CDS territory (≥ 10×), Ti/Tv ratios computed from PASS SNVs within capture targets (reported primarily for contextual interpretation and QC), representative driver events, and the distribution of predicted neoepitopes by source (SNV/indel versus fusion). Ti/Tv is sensitive to target definition and filtering stringency; therefore, it is not used as a standalone classifier of MSI status or treatment etiology but rather as supportive context alongside TMB and structural evidence. Sample Phenotype TMB TiTv Driver Events Neoepitope Counts FT13 Hypermutator (MSI-like) 16.43 4.05 PTEN indel, OBSCN 1088 (1088 SNV / 0 Fusion) LT02 Hypermutator (non-MSI-like) 15.25 2.37 ATRX (mut) 933 (892 SNV / 41 Fusion) UT05 Hypermutator (MSI-like) 6.09 7.54 TP53, PTEN, RB1 (mut) 903 (903 SNV / 0 Fusion) UT08 SNV-low / structural-driven 0.83 1.42 NF1-BLTP3B, KDM2A-MYH9 (fusions) 65 (58 SNV / 7 Fusion) UT01 SNV-low / Desert 0.90 1.79 No recurrent driver detected (SNV/indel) 83 (83 SNV / 0 Fusion) LT01 SNV-low / amplification-driven 0.64 2.54 RNA-high CDK4 37 (24 SNV / 13 Fusion) RNA-Guided "Rescue" Identifies Expressed Target Reservoirs RNA-seq provided orthogonal support for events occurring in genomic regions interpreted with ambiguity, enabling the differentiation of true biological signals from technical or baseline variation. In sample FT13, the PTEN locus exhibited heterogenous coverage which challenged high-confidence conventional somatic calling; however, transcriptomic evidence rescued this high-priority driver. Even with poor WES depth, RNA-sequencing confirmed the mutant allele at high transcriptional abundance (107.5 TPM), validating expression of a frameshift (p.Asn323MetfsTer40). This produced a strong-binding neoepitope (YLVLTLTKV; Rank 0.175%), demonstrating that mutational expression can prioritize actionable targets inspite of coverage gaps. Similarly, the integration of WES depth and RNA abundance effectively validated the functional copy-number drivers. Figure 2 highlights sample LT01 as a distinct outlier, exhibiting simultaneous elevation of genomic coverage and transcript levels for both CDK4 and MDM2 genes. Comparing these syntenic genes strengthened the diagnosis of a functional 12q15 amplification in LT01, separating it from cases such as FT13, where high CDK4 expression occurred without MDM2 co-elevation or genomic gain - a pattern indicative of transcriptional regulation rather than chromosomal amplification. This concordance-based framework provides a strong foundation for ranking the genomic drivers. Prioritization of candidate neoepitopes Candidate prioritization integrated predicted HLA binding strength (NetMHCpan %Rank), expression support (TPM for SNV/indel-derived candidates; FFPM for fusion junctions), and a composite priority score. HLA genotyping confirmed that the allele distribution of the cohort mirrored the high-frequency haplotypes in the Korean population (for example, A*24:02, A*33:03-B*58:01), supporting the relevance of the results to the local demographics (Supplemental Table S2). The expression–affinity landscape (Fig. 3 ) summarizes the selection space and stresses candidates that combine detectable expression with strong predicted binding. Notably, fusion-derived candidates contributed significantly to the high-affinity tail in some samples. In LT02, CALD1::REV3L achieved an exceptional predicted binding affinity (0.005% Rank) with detectable fusion abundance, showing the potential of expressed junction peptides to rank alongside or above SNV-derived candidates. In UT08, candidate ranking reflected an abundance-weighted rationale: KDM2A::MYH9 was retained because higher fusion abundance (7.85 FFPM) partially offset intermediate binding rank, whereas lower-abundance junctions were deprioritized. Table 2 lists a selection of high-priority neoepitopes (extended candidate lists and top five candidates, are provided in Supplemental Table S3). The presented candidates are two per sample and identified through the current multi-omic discovery pipeline. These candidates, spanning all three immunogenomic archetypes, demonstrated the capacity of the workflow to prioritize effectively across diverse mutation burdens. High-quality fusion events emerged as leading candidates, with CALD1::REV3L (LT02) achieving an exceptional binding rank of 0.005%, and KDM2A::MYH9 (UT08) demonstrating robust expression support (7.85 FFPM). Furthermore, the prioritization framework, defined in Methods section, successfully linked immunogenicity to known driver biology, highlighting neoepitopes derived from PTEN (FT13) and ARID1A (LT02) alongside highly expressed targets such as HMGA2 (LT01; 43.66 TPM). Crucially, the approach proved effective even in 'neoantigen deserts,' recovering viable targets like IL13RA1 (UT01) and NCKAP5 (LT01) from low-TMB samples by using expression-weighted selection to rescue candidates that could otherwise be overlooked. Sample Event Type Gene HLA Rank (%) Priority Score Expression FT13 SNV/Indel OBSCN HLA-B*15:11 1.162 88.16 6.04 TPM FT13 SNV/Indel PTEN HLA-A*02:01 0.175 80.76 107.49 TPM LT01 SNV/Indel NCKAP5 HLA-B*07:02 0.572 27.85 3.28 TPM LT01 SNV/Indel HMGA2 HLA-B*07:02 0.812 15.85 43.66 TPM LT02 Fusion CALD1::REV3L HLA-B*07:02 0.005 98.93 2.51 FFPM LT02 SNV/Indel ARID1A HLA-C*07:02 0.712 76.58 5.64 TPM UT01 SNV/Indel IL13RA1 HLA-A*03:01 0.746 81.66 16.94 TPM UT01 SNV/Indel DHX15 HLA-A*24:02 0.919 20.78 49.62 TPM UT05 SNV/Indel CCT3 HLA-A*30:11 1.178 73.79 38.02 TPM UT05 SNV/Indel ASH1L HLA-A*30:11 0.222 65.51 7.47 TPM UT08 Fusion KDM2A::MYH9 HLA-C*05:01 0.992 47.73 7.85 FFPM UT08 SNV/Indel ITFG1 HLA-B*48:01 0.313 13.42 21.91 TPM Table 2 . Curated neoepitope candidates selected for multi-omic validation. Two candidates per sample were selected based on integrated evidence from predicted HLA binding (NetMHCpan %Rank), a composite priority score, and expression support (TPM for SNVs/indels; FFPM for fusions). TPM, transcripts per million; FFPM, fusion fragments per million. NetMHCpan %Rank thresholds: <0.5, strong binder; 0.5-2 weak binder Discussion Legacy WES and RNA-seq sarcoma cohorts are still valuable, but the technical constrains i.e. large target footprints, uneven capture, and variable depth, complicate standard immunogenomic readouts. In CKS, chromosomal chaos and subclones make it worse. The present analysis indicates that a conservative callability definition coupled with multi-omic concordance can recover interpretable immunogenomic structures from such datasets, while reducing reliance on any single metric. Methodologically, isolated extreme WES depth is not sufficient evidence of amplification in legacy capture data. The tumor–normal depth dispersion suggests capture bias contributes substantially to these spikes, so we treated DNA–RNA concordance as the deciding evidence for functional amplification. The sample LT01 (CDK4/MDM2 concordant outlier co-elevation) is a concrete example of how RNA helps resolve ambiguous DNA-only signals in uneven datasets. The analysis also emphasizes the essential role of orthogonal transcriptomic evidence in rescuing driver events that may be obscured by heterogeneous genomic capture. The PTEN frameshift in sample FT13 serves as a prime example: despite variable DNA coverage across the gene locus which challenged high-confidence allele counting, RNA-seq provided decisive validation. By confirming the robust expression of the mutant transcript (107.5 TPM), this event could be confidently prioritized as a functional neoantigen-generating driver, bypassing the ambiguity of the WES data. This demonstrates that RNA-seq acts not simply as a validation layer, but as an essential discovery tool for resolving actionable targets in regions of suboptimal genomic capture This study has several limitation. Candidate lists are computational predictions and do not establish immunogenicity or in vivo presentation; empirical confirmation (e.g., peptide–MHC presentation by immunopeptidomics and functional T cell assays) is required to confirm immune recognition. In addition, the cohort size was small and reflected a particular legacy dataset, limiting the extrapolation across sarcoma histologies and sequencing protocols. Nevertheless, the analytic pattern, QC-aware callability, concordance-based driver interpretation, and expression-informed prioritization, should be transferred to other archived multi-omic cohorts. Overall, the analysis endorses a practical workflow for extracting interpretable immunogenomic signals and prioritized candidate lists from uneven legacy WES/RNA-seq data in genomically complex sarcomas, complementing established CKS molecular stratification based on copy-number and expression signatures. Declarations Acknowledgements The author gratefully acknowledges the authors of the original study (Kim et al., 2018) for sharing their WES/RNA-seq data publicly via the European Nucleotide Archive – without it the current re-analysis would not have been possible. Author contributions P.M. planned the study design, performed the analysis of the data, drew the figures, interpreted the results and wrote the manuscript. K.A.K. Contributed to the design of the study, discussed the findings, reviewed the manuscript Paperpal was used for language editing only. Competing interest statement The author declares that the research was conducted in the absence of any commercial or financial relationships that could lead to a potential conflict of interest. Data availability Raw sequencing data are available via ENA accession PRJEB23898. Funding This study was supported by the Technical University of Denmark References Anzar I et al (Sep. 2023) The interplay between neoantigens and immune cells in sarcomas treated with checkpoint inhibition. Front Immunol 14:1226445. 10.3389/fimmu.2023.1226445 Kiyotani K, Chan HT, Nakamura Y (2018) ‘Immunopharmacogenomics towards personalized cancer immunotherapy targeting neoantigens’, Cancer Sci., vol. 109, no. 3, pp. 542–549, Mar. 10.1111/cas.13498 Capietto A-H, Hoshyar R, Delamarre L (2022) ‘Sources of Cancer Neoantigens beyond Single-Nucleotide Variants’, Int. J. Mol. Sci., vol. 23, no. 17, p. 10131, Sep. 10.3390/ijms231710131 Crompton BD et al (Nov. 2014) The Genomic Landscape of Pediatric Ewing Sarcoma’, Cancer Discov. 4(11):1326–1341. 10.1158/2159-8290.CD-13-1037 Tirode F et al (Nov. 2014) Genomic Landscape of Ewing Sarcoma Defines an Aggressive Subtype with Co-Association of STAG2 and TP53 Mutations’, Cancer Discov. 4(11):1342–1353. 10.1158/2159-8290.CD-14-0622 Siozopoulou V et al (2021) ‘Immune Checkpoint Inhibitory Therapy in Sarcomas: Is There Light at the End of the Tunnel?’, Cancers, vol. 13, no. 2, p. 360, Jan. 10.3390/cancers13020360 Perry JA et al (2014) ‘Complementary genomic approaches highlight the PI3K/mTOR pathway as a common vulnerability in osteosarcoma’, Proc. Natl. Acad. Sci., vol. 111, no. 51, Dec. 10.1073/pnas.1419260111 Song Y, Yang K, Sun T, Tang R (Dec. 2021) Development and validation of prognostic markers in sarcomas base on a multi-omics analysis. BMC Med Genomics 14(1):31. 10.1186/s12920-021-00876-4 Statz-Geary K et al (1962) ‘DNA Damage Repair Pathway Alterations and Immune Landscape Differences in Pediatric/Adolescent, Young Adult (AYA) and Adult Sarcomas’, Cancers, vol. 17, no. 12, p. Jun. 2025. 10.3390/cancers17121962 Keskin DB et al (Jan. 2019) Neoantigen vaccine generates intratumoral T cell responses in phase Ib glioblastoma trial. Nature 565(7738):234–239. 10.1038/s41586-018-0792-9 Kiyotani K, Chan HT, Nakamura Y (2018) ‘Immunopharmacogenomics towards personalized cancer immunotherapy targeting neoantigens’, Cancer Sci., vol. 109, no. 3, pp. 542–549, Mar. 10.1111/cas.13498 Li J et al (Aug. 2023) The screening, identification, design and clinical application of tumor-specific neoantigens for TCR-T cells. Mol Cancer 22(1):141. 10.1186/s12943-023-01844-5 Luo Z et al (Apr. 2021) Self-Adjuvanted Molecular Activator (SeaMac) Nanovaccines Promote Cancer Immunotherapy. Adv Healthc Mater 10(7):2002080. 10.1002/adhm.202002080 Shi W ‘Advances in Tumor Antigen-Based Anticancer Immunotherapy: Recent Progress, Prevailing Challenges, and, Perspective’ F et al (2023) Adv. Ther., vol. 6, no. 2, p. 2200239, Feb. 10.1002/adtp.202200239 Xie N, Shen G, Gao W, Huang Z, Huang C, Fu L (Jan. 2023) Neoantigens: promising targets for cancer therapy. Signal Transduct Target Ther 8(1):9. 10.1038/s41392-022-01270-x Anzar I et al (Sep. 2023) The interplay between neoantigens and immune cells in sarcomas treated with checkpoint inhibition. Front Immunol 14:1226445. 10.3389/fimmu.2023.1226445 Sha H et al (Feb. 2022) Case Report: Pathological Complete Response in a Lung Metastasis of Phyllodes Tumor Patient Following Treatment Containing Peptide Neoantigen Nano-Vaccine. Front Oncol 12:800484. 10.3389/fonc.2022.800484 Perry JA et al (2014) ‘Complementary genomic approaches highlight the PI3K/mTOR pathway as a common vulnerability in osteosarcoma’, Proc. Natl. Acad. Sci., vol. 111, no. 51, Dec. 10.1073/pnas.1419260111 Seki M et al (Jul. 2015) Integrated genetic and epigenetic analysis defines novel molecular subgroups in rhabdomyosarcoma. Nat Commun 6(1):7557. 10.1038/ncomms8557 Shevkoplias A et al (2025) ‘Molecular subtyping and insights into sarcoma biology and prognosis.’, J. Clin. Oncol., vol. 43, no. 16_suppl, pp. 11536–11536, Jun. 10.1200/JCO.2025.43.16_suppl.11536 Kim J et al (Dec. 2018) Integrated molecular characterization of adult soft tissue sarcoma for therapeutic targets. BMC Med Genet 19:216. 10.1186/s12881-018-0722-6 Andrews S (2010) FastQC: A quality control tool for high throughput sequence data. [Online]. Available: http://www.bioinformatics.babraham.ac.uk/projects/fastqc Martin M (May 2011) Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J 17(1):10. 10.14806/ej.17.1.200 Li H, Durbin R (2009) ‘Fast and accurate short read alignment with Burrows–Wheeler transform’, Bioinformatics, vol. 25, no. 14, pp. 1754–1760, Jul. 10.1093/bioinformatics/btp324 Mudge JM et al (Jan. 2025) GENCODE 2025: reference gene annotation for human and mouse. Nucleic Acids Res 53:D966–D975. no. D110.1093/nar/gkae1078 Dobin A et al (2013) ‘STAR: ultrafast universal RNA-seq aligner’, Bioinformatics, vol. 29, no. 1, pp. 15–21, Jan. 10.1093/bioinformatics/bts635 Bray NL, Pimentel H, Melsted P, Pachter L (May 2016) Near-optimal probabilistic RNA-seq quantification. Nat Biotechnol 34(5):525–527. 10.1038/nbt.3519 McKenna A et al (2010) Sep., ‘The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data’, Genome Res., vol. 20, no. 9, pp. 1297–1303. 10.1101/gr.107524.110 Uhrig S et al (2021) ‘Accurate and efficient detection of gene fusions from RNA sequencing data’, Genome Res., vol. 31, no. 3, pp. 448–460, Mar. 10.1101/gr.257246.119 Szolek A, Schubert B, Mohr C, Sturm M, Feldhahn M, Kohlbacher O (2014) ‘OptiType: precision HLA typing from next-generation sequencing data’, Bioinformatics, vol. 30, no. 23, pp. 3310–3316, Dec. 10.1093/bioinformatics/btu548 Bjerregaard A-M, Nielsen M, Hadrup SR, Szallasi Z, Eklund AC (Sep. 2017) Cancer Immunol Immunother 66(9):1123–1130. 10.1007/s00262-017-2001-3 . ‘MuPeXI: prediction of neo-epitopes from tumor sequencing data’ Mölder F et al (2021) ‘Sustainable data analysis with Snakemake’, F1000Research, vol. 10, p. 33, Jan. 10.12688/f1000research.29032.1 Anaconda Software Distribution (2016) (Nov. [Online]. Available: https://anaconda.com Additional Declarations No competing interests reported. Supplementary Files MantasNeoantigenSupplement1.pdf Cite Share Download PDF Status: Published Journal Publication published 21 Apr, 2026 Read the published version in Cancer Immunology, Immunotherapy → Version 1 posted Editorial decision: Revision requested 06 Mar, 2026 Reviews received at journal 25 Feb, 2026 Reviewers agreed at journal 17 Feb, 2026 Reviewers invited by journal 17 Feb, 2026 Editor assigned by journal 12 Feb, 2026 Submission checks completed at journal 12 Feb, 2026 First submitted to journal 11 Feb, 2026 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-8854019","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":592950888,"identity":"bb334345-de38-45e4-ae05-550fd5c726f3","order_by":0,"name":"Panagiotis Mantas","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAoElEQVRIiWNgGAWjYPACG8YGICnZQIKWNMYeUrUcJkGLOfvZoxt+7jgvu18igfHmDGK0WPbkpd3sPXPbuEcigdlyAzFaDA7kmN3gbbudCNTCJvmAKC3n35jd/Nt2jhQtN3LMbvO2HYBoIcphljPemN2WbUs27jnzsNmSKO+b8+eY3XzbZifb3p588GYPUQ5DMMEJgDQto2AUjIJRMApwAABcRjbtIf9xqQAAAABJRU5ErkJggg==","orcid":"","institution":"Technical University of Denmark","correspondingAuthor":true,"prefix":"","firstName":"Panagiotis","middleName":"","lastName":"Mantas","suffix":""},{"id":592950889,"identity":"8784a6b9-fd39-4fdc-9e2d-f6a68a29a6e9","order_by":1,"name":"Karen A. Krogfelt","email":"","orcid":"","institution":"Roskilde University","correspondingAuthor":false,"prefix":"","firstName":"Karen","middleName":"A.","lastName":"Krogfelt","suffix":""}],"badges":[],"createdAt":"2026-02-11 16:38:36","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-8854019/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-8854019/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1007/s00262-026-04395-y","type":"published","date":"2026-04-21T15:59:57+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":103175936,"identity":"7eb43137-0fb5-4afd-b4e8-d7185e9d198b","added_by":"auto","created_at":"2026-02-22 16:28:48","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":128772,"visible":true,"origin":"","legend":"\u003cp\u003e(WES QC/coverage): (a) Cumulative breadth-of-coverage plot\u003c/p\u003e","description":"","filename":"1.png","url":"https://assets-eu.researchsquare.com/files/rs-8854019/v1/519f1caf088c87e6de972a14.png"},{"id":103175935,"identity":"e4529016-bc19-4459-91c3-cee1106ce586","added_by":"auto","created_at":"2026-02-22 16:28:48","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":79328,"visible":true,"origin":"","legend":"\u003cp\u003eFig. 1 (WES coverage): (b) per-sample mean depth for paired tumor and normal samples with a companion panel showing max depth for the respective samples\u003c/p\u003e","description":"","filename":"2.png","url":"https://assets-eu.researchsquare.com/files/rs-8854019/v1/4333d21a920011779a7efa70.png"},{"id":103505325,"identity":"ab343da7-6512-42bd-a810-32059b78c4a0","added_by":"auto","created_at":"2026-02-26 13:29:57","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":211034,"visible":true,"origin":"","legend":"\u003cp\u003eFig. 2 DNA–RNA Concordance Validates the 12q15 Amplicon Signature. Integration of locus-specific WES depth and RNA-seq abundance reveals concordant outliers for both CDK4 and MDM2 in sample LT01, consistent with functional amplification. While other samples show sporadic depth increases (attributable to capture bias), only LT01 exhibits commensurate transcriptional overexpression. TPM values are shown for qualitative concordance not cross-sample quantitative comparison though\u003c/p\u003e","description":"","filename":"3.png","url":"https://assets-eu.researchsquare.com/files/rs-8854019/v1/881507cc416a21f3c972c4ea.png"},{"id":103504410,"identity":"b81f2947-1164-492f-af9a-c1e4975b9634","added_by":"auto","created_at":"2026-02-26 13:19:44","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":104974,"visible":true,"origin":"","legend":"\u003cp\u003eFig. 3 Neoepitope landscape: Predicted MHC binding affinity versus expression level. Each point represents a candidate neoantigen colored by mutation type (SNV/indel: blue; Fusion: red) and sized proportional to the composite priority score. The x-axis shows log₁₀(TPM+1) for SNV-derived candidates and log₁₀(FFPM+1) for fusion junctions. Labeled points highlight high-confidence therapeutic candidates across archetypes, including the exceptional CALD1::REV3Lfusion (rank 0.005%) and driver-associated targets (ARID1A, HMGA2). Dashed lines indicate strong-binder (≤0.5%) and weak-binder (≥2%) thresholds, with the vertical line marking minimum expression (TPM/FFPM ≥1). This multi-dimensional prioritization reveals expressed, high-affinity candidates suitable for personalized immunotherapy despite technical constraints in legacy sequencing data\u003c/p\u003e","description":"","filename":"4.png","url":"https://assets-eu.researchsquare.com/files/rs-8854019/v1/6414cf198d4c29b88f999f0e.png"},{"id":107928561,"identity":"183df12b-24e0-46d7-b6fa-3ba44bb56fbe","added_by":"auto","created_at":"2026-04-27 16:11:19","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":769135,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8854019/v1/5749a504-87a9-46ec-9750-0264a1c4c0da.pdf"},{"id":103175938,"identity":"b85b49ec-4edf-452b-aa6f-29559735721c","added_by":"auto","created_at":"2026-02-22 16:28:48","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"supplement","size":581585,"visible":true,"origin":"","legend":"","description":"","filename":"MantasNeoantigenSupplement1.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8854019/v1/07a9f5ef2a4583cc0d0cc246.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Repurposing public sarcoma multi-omics for neoantigen discovery","fulltext":[{"header":"Introduction","content":"\u003cp\u003e Soft tissue sarcomas are rare cancers with diverse biology. They have historically shown limited responsiveness to immune checkpoint blockade (ICB), particularly when compared with \"hot\" tumors such as melanoma or lung cancer [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e, \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e]. Standard biomarkers, such as Tumor Mutational Burden (TMB), fall short on predicting immunotherapy response in cancers such as sarcomas because they overlook non-SNV neoantigen sources [\u003cspan additionalcitationids=\"CR4\" citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e]. This metric is not well-suited for sarcomas, which are broadly divided into two genomic classes: \"Simple Karyotype\" sarcomas (driven by specific translocations like EWSR1-FLI1) and \"Complex Karyotype\" sarcomas (CKS), which are driven by chaotic copy number alterations and structural rearrangements rather than high SNV loads [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e]. Thus, CKS tumors are frequently characterized as \"immunologically cold\", despite possessing significant neoantigen potential driven by high genomic instability, copy number alterations and DNA damage response (DDR) pathway defects [\u003cspan additionalcitationids=\"CR8\" citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e]. Personalized vaccines (or shared neoantigens) remain still a promising frontier [\u003cspan additionalcitationids=\"CR11 CR12 CR13 CR14\" citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e]. Identifying immunogenic targets within this structural chaos could open new therapeutic paths for patients who currently have few options.\u003c/p\u003e \u003cp\u003eWhile a handful of genomic studies in sarcoma have explicitly evaluated neoantigens to inform survival associations or support vaccine design [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e, \u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e], most large-scale paired WES and RNA-seq efforts have pursued different objectives. Cohorts have primarily focused on pinpointing core molecular drivers (e.g., PI3K/mTOR vulnerabilities in osteosarcoma or chromatin remodeling mutations in UPS), defining molecular subtypes through integrated clustering as seen in rhabdomyosarcoma classification [\u003cspan additionalcitationids=\"CR19\" citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e]. These analyses prioritized differentiation programs, copy number variations (e.g., CDK4/RB1 signatures in Complex Karyotype Sarcoma) and transcriptional states to nominate drug targets (like PDGFR) and explain disease biology rather than performing systematic neoantigen discovery or immune-focused tumor characterization [\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e]. Yet these same datasets contain the exact raw data required for immunogenomics: somatic variant calls, expression evidence, and immune context. As a result, neoantigen-level information in these cohorts has not been systematically extracted. The raw sequencing is already public; reanalyzing it with current neoantigen prediction and prioritization workflows can uncover immunogenic determinants that were outside the scope of the original driver- and subtype-centric analyses, possibly yielding actionable vaccine targets without new patient samples.\u003c/p\u003e \u003cp\u003eTo bridge this translational gap, a systematic immunogenomic re-analysis was performed by Kim et al. CKS cohort using a state-of-the-art bioinformatics framework [\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e]. By integrating thoroughly quality control measures, which successfully recapitulated the original molecular classifications (e.g., CDK4-amplified subtypes), this study established a foundation for extending the study\u0026rsquo;s scope beyond driver identification. The presented approach specifically maps the neoantigen landscape across distinct genomic architectures, contrasting purported 'Immunogenic Hotspots' against the traditionally viewed 'Immunogenic Deserts' of Complex Karyotype Sarcomas, to isolate overlooked vaccine candidates.\u003c/p\u003e \u003cp\u003eThis is the first comprehensive profiling of the SNV- and fusion-derived neoantigen landscape in this specific CKS cohort. By applying a multi-modal epitope prediction pipeline, the results revealed that even these phenotypically 'cold' tumors, often excluded from immunotherapy trials, harbor high-affinity, patient-specific targets born from their profound genomic instability.\u003c/p\u003e"},{"header":"Materials and Methods","content":"\u003cp\u003eData Acquisition\u003c/p\u003e \u003cp\u003eRaw whole-exome sequencing (WES) and RNA-seq paired-end FASTQ files were retrieved from the European Nucleotide Archive (ENA) under the accession number PRJEB23898. These data correspond to the cohort originally described by Kim et al. (2018), comprising 14 patients with complex karyotype soft tissue sarcomas (CKS) [\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e]. To enable deep multi-modal characterization while sustaining computational feasibility, a representative discovery set of six samples were selected, spanning the genomic subtypes defined in the first study. Stratified group of MSI-High/Hypermutators (n\u0026thinsp;=\u0026thinsp;3): UT05, FT13 and LT02 and group of Chromosomal Stable (\"Desert\") (n\u0026thinsp;=\u0026thinsp;3): UT01, UT08 and LT01. This grouping captures the major biological axes identified by Kim et al. (hypermutation vs. chromosomal stability, treatment-na\u0026iuml;ve vs. chemotherapy-exposed) and allows for a direct comparison of neoantigen landscapes across mechanistically distinct tumor subtypes.\u003c/p\u003e \u003cp\u003eGenomic Preprocessing \u0026amp; Alignment\u003c/p\u003e \u003cp\u003e \u003cstrong\u003eWhole-exome preprocessing and alignment\u003c/strong\u003e \u003cp\u003eRaw WES FASTQ files were assessed with FastQC (v0.12.1) [\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e] to evaluate base quality, adapter contamination, and over-represented sequences. Illumina TruSeq adapter sequences and low-quality bases were trimmed using Cutadapt (v5.2) with a minimum Phred score of 20 and minimum post-trimmed read length of 50 bp [\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e].\u003c/p\u003e \u003c/p\u003e \u003cp\u003eTrimmed reads were aligned to the GRCh38/hg38 reference genome (updated from the original hg19 analysis to ensure compatibility with current genomic resources) using BWA-MEM (v0.7.19) [\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e]. PCR and optical duplicates were marked with Broad\u0026rsquo;s institute tool Picard MarkDuplicates(v3.4.0)(\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://broadinstitute.github.io/picard\u003c/span\u003e\u003cspan address=\"http://broadinstitute.github.io/picard\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e). Following GATK Best Practices, the gnomAD resource was used for germline filtering in Mutect2 and GetPileupSummaries software.\u003c/p\u003e \u003cp\u003eBecause the original study used hg19, all analyses in this study were performed de novo on GRCh38 to ensure compatibility with current annotation resources and HLA/neoantigen tools. The GRCh38/hg38 human reference genome and transcriptome data were downloaded from the Gencode Project website (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.gencodegenes.org/\u003c/span\u003e\u003cspan address=\"https://www.gencodegenes.org/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e) [\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e].\u003c/p\u003e \u003cp\u003e \u003cstrong\u003eRNA Alignment \u0026amp; Quantification\u003c/strong\u003e \u003cp\u003eTumor RNA-seq FASTQs underwent quality control with FastQC and adapter/quality trimming with Cutadapt as above. Reads were then processed through two complementary methods\u003c/p\u003e \u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003eSTAR alignment used for fusion detection and expression-aware mapping. Reads were aligned to GRCh38 (with GENCODE transcriptome) using STAR (v2.7.10b) in two-pass mode with chimeric detection enabled to preserve fusion-relevant soft-clipped and discordant reads [\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e].\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eTranscript-level quantification for expression filtering. In parallel, pseudo-alignment and transcript quantification were performed with Kallisto (v0.51.1) using GENCODE-derived transcript indices [\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e]. Transcript-level TPMs were summed to gene-level TPMs and later used to filter neoantigen candidates by expression.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003cp\u003eSomatic Variant Calling\u003c/p\u003e \u003cp\u003eSomatic mutations encompass SNVs and indels, including frameshift mutations were identified using GATK Mutect2 (v4.6.2.0) in tumor\u0026ndash;normal mode, with each tumor BAM compared to its patient-matched germline WES BAM [\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e]. Default Mutect2 filters were applied via FilterMutectCalls, followed by additional hard and strict filters: a) PASS in Mutect2 filter field i.e. passed all gatk tests such as contamination, strand bias and generic bias. b) Tumor depth\u0026thinsp;\u0026ge;\u0026thinsp;10\u0026times; at the variant site\u003c/p\u003e \u003cp\u003eVariants were functionally annotated with Ensembl VEP (v110), restricting downstream analysis to protein-coding missense SNVs, as well as in-frame and frameshift indels, while excluding synonymous and nonsense (stop-gain) mutations.\u003c/p\u003e \u003cp\u003eThe dataset showed heavy capture bias with mean depth of 29.3x (range: 15.0\u0026ndash;39.2x) so TMB was normalized carefully. The numerator was restricted to non-synonymous somatic mutations (missense, frameshift, and in-frame insertions/deletions) identified within the high-confidence intervals. To prevent TMB underestimation due to varying capture efficiency, the denominator was defined as the sample-specific 'callable coding territory, consisting of protein-coding CDS regions (GENCODE v49) that achieved a minimum of 10x depth. This normalization avoided underestimating mutation load in poorly captured regions.\u003c/p\u003e \u003cp\u003eThe MSI status for samples came directly from Kim et al\u0026rsquo;s original hypermutator analysis, which identified MSI/MMR-associated mutational signatures and expression patterns in the hypermutator subset. Transition-to-transversion (Ti/Tv) ratios were determined using bcftools stats (v1.17) based on somatic variants passing all quality filters (PASS) with a minimum tumor read depth of 10x, restricted to the exome capture regions.\u003c/p\u003e \u003cp\u003eFusion detection from RNA-seq\u003c/p\u003e \u003cp\u003eTo identify immunogenic targets derived from structural variations, fusion transcripts were characterized using Arriba (v2.4.0) on STAR-aligned BAMs. Default Arriba annotations (blacklists for recurrent artifacts, read-through events, and immunoglobulin artifacts) were used [\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e]. High-confidence fusions meeting the following criteria were retained:\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003eAt least 5 junction (split-read) reads and/or discordant read pairs\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eNot present in Arriba's internal blacklist (e.g., recurrent technical artifacts, rRNA, mitochondrial genes, or same-gene read-throughs).\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eOnly fusions classified as 'high' confidence by Arriba were considered.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eIn-Frame Events: Fusions were restricted to in-frame sequences to ensure a plausible open reading frame (ORF) for peptide generation.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eExpression metric: Fusion abundance was quantified using Fragment Families Per Million (FFPM), derived from fusion-specific support reads normalized by total mapped library depth.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003cp\u003eHLA class I typing\u003c/p\u003e \u003cp\u003ePatient-specific HLA class I genotypes (HLA-A, -B, -C; 4-digit resolution) were inferred from tumor RNA-seq using OptiType (v1.3.5), which utilizes coverage patterns across exons 2 and 3 to reconstruct the most likely allele combination. HLA typing was performed once per patient and used for all downstream binding predictions [\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e].\u003c/p\u003e \u003cp\u003ePeptide generation from somatic variants and fusions\u003c/p\u003e \u003cp\u003e \u003cb\u003eSNV/indel-derived peptides\u003c/b\u003e: For each non-synonymous somatic variant passing filters, the corresponding mutant protein sequence was reconstructed based on VEP annotation. A sliding window approach encompassing all possible 8- to 11-mers containing the mutation centered on the mutated residue was extracted where possible (shorter for events near termini). From this window, all overlapping 8-mer to 11-mer peptides that included the mutant residue were generated, as these lengths account for the majority of MHC class I ligands. Frameshift and in-frame indels were handled similarly, using the mutant reading frame downstream of the variant to generate overlapping 8\u0026ndash;11-mers from the altered region.\u003c/p\u003e \u003cp\u003e \u003cstrong\u003eFusion-derived peptides\u003c/strong\u003e \u003cp\u003eFor each in-frame gene fusion, the predicted chimeric coding sequence spanning the fusion junction was obtained from Arriba\u0026rsquo;s annotation. This junction region was translated, and all possible 8\u0026ndash;11-mer peptides that span the breakpoint (i.e., include amino acids from both fusion partners) were generated. Only peptides that included the junction were retained, ensuring that fusion-derived neoantigens were strictly non-self.\u003c/p\u003e \u003c/p\u003e \u003cp\u003eMHC binding prediction and neoantigen filtering\u003c/p\u003e \u003cp\u003ePeptide\u0026ndash;MHC class I binding was predicted using NetMHCpan-4.2, using each patient\u0026rsquo;s OptiType-inferred HLA-A/B/C alleles. For each peptide\u0026ndash;HLA pair, NetMHCpan reported an affinity rank percentile; peptides with rank\u0026thinsp;\u0026lt;\u0026thinsp;2.0% were prioritized. Peptides with binding\u0026thinsp;\u0026gt;\u0026thinsp;2.0% were heavily penalized by the downstream logistic scoring function, effectively removing them from consideration. To reduce false positives and prioritize clinically relevant targets, was MuPeXI-based multi-modal scoring framework was implemented in Python (adapted for Python 3.11 and NetMHCpan-4.2) [\u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e]:\u003c/p\u003e \u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003eBinding affinity component:\u003c/h2\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003eStrong binding: NetMHCpan rank\u0026thinsp;\u0026lt;\u0026thinsp;0.5%\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eWeak binding: 0.5\u0026ndash;2.0%\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003ePeptides outside 2.0% were assigned near-zero priority scores by the logistic function, effectively deprioritizing them from the final candidate list\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eExpression component:\u003c/h3\u003e\n\u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003eSource gene TPM\u0026thinsp;\u0026ge;\u0026thinsp;1 (gene-level expression from Kallisto).\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003ePeptides from genes with TPM\u0026thinsp;\u0026lt;\u0026thinsp;1 were excluded, approximating insufficient antigen supply.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e\n\u003ch3\u003eClonality component:\u003c/h3\u003e\n\u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003eVariant VAF was used as a linear weighting factor for clonality, essentially deprioritizing subclonal events in the final ranking\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eFor fusions, where VAF is not directly measurable from RNA-seq data, normalized fusion abundance (FFPM) was used as a proxy for expression within the scoring framework, and variants were treated as clonal (VAF\u0026thinsp;=\u0026thinsp;1.0) given their status as pathognomonic drivers\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e\n\u003ch3\u003eSelf-similarity filter:\u003c/h3\u003e\n\u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003eEach candidate peptide was checked for exact matches elsewhere in the human reference proteome (Gencode v49), and those with 100% identity to self-peptides were assigned a priority score of zero\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e\n\u003ch3\u003eNeoantigen priority score was computed for ranking, as a weighted combination of:\u003c/h3\u003e\n\u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003eInverse binding rank (stronger binders score higher)\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eSource gene expression was incorporated using a hyperbolic tangent (tanh) transformation, as described in the MuPeXI framework\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eVAF (for SNV/indels) or normalized junction read count (for fusions)\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eA foreignness component was calculated using an agretopicity index, comparing the predicted binding rank of the mutant peptide with that of its wild-type counterpart.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003cp\u003eThis score was used to rank SNV- and fusion-derived neoantigens within each tumor, and the top-ranked peptides formed the focus of downstream interpretation.\u003c/p\u003e \u003cp\u003eReproducibility \u0026amp; Data Availability\u003c/p\u003e \u003cp\u003eThe entire analysis workflow was implemented in Snakemake (v9.14.5) to ensure reproducibility and scalability across cohorts [\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e]. Each major step (QC, alignment, variant calling, fusion detection, HLA typing, peptide generation, MHC binding prediction, scoring) is encapsulated as a separate rule with explicit input/output and software environment definitions. The conda environments were pinned to specific tool versions to maximize reproducibility [34]. The pipeline, along with configuration files and environment specifications, is publicly available at [\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/mantaspanos/sarcoma-multiomics-neoantigens\u003c/span\u003e\u003cspan address=\"https://github.com/mantaspanos/sarcoma-multiomics-neoantigens\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e].\u003c/p\u003e"},{"header":"Results","content":"\u003cdiv id=\"Sec9\" class=\"Section2\"\u003e \u003ch2\u003eGenomic Quality and Strategic Repurposing of Legacy Data\u003c/h2\u003e \u003cp\u003ePaired WES and RNA-seq data were repurposed to ask whether CKS tumors yield actionable neoantigens despite the technically constrained legacy datasets with low coverage. After aggressive QC it was possible to stll call high variant Allele Frequency (VAF) neoantigens. Because CKS tumors are notoriously heterogeneous, candidates with low VAF were deprioritized to address the intratumoral heterogeneity. Initial quality control of the WES data revealed a challenging sequencing output. The assay targeted an extensive 192 Mb exome footprint but achieved a modest mean depth of 39.2x with a highly uneven coverage distribution (max depth 4,735x). Only 31% of the target territory achieved \u0026ge;\u0026thinsp;10x depth, reflecting a substantial capture bias across coding and non-coding regions.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eUneven Capture Efficiency and Coverage Strategy and Distinguishing Technical Bias from Biological Signal\u003c/h3\u003e\n\u003cp\u003eThe UT05 sample illustrated the problem starkly (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e1\u003c/span\u003e-a) since only 27% of the target remained usable at 10x depth. To preserve cohort-wide comparability while maintaining mutation sensitivity, the callable territory was thus defined using a \u0026ge;\u0026thinsp;10x threshold for downstream analyses. Figure\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e1\u003c/span\u003e-b shows the depth coverage which indicated that many loci with extreme coverage likely reflect technical capture bias rather than biological amplification, as evidenced by hyper-covered regions shared between tumor and matched normal samples.\u003c/p\u003e \u003cp\u003eTo identify candidate copy-number events, selected loci were evaluated for multi-omic concordance. In sample LT01, the CDK4 and MDM2 loci displayed a distinct, concordant outlier profile: elevated locus depth (~\u0026thinsp;353x for CDK4) accompanied by extreme transcriptional abundance (64,344 TPM). This dual signal is consistent with the 12q amplicon-driven phenotype described in the CKS molecular classification (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e2\u003c/span\u003e). In contrast, sporadic depth spikes observed at isolated exons in other samples lacked commensurate RNA overexpression, supporting their classification as artifacts.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eTo further mitigate the risk of underestimating immunogenicity in this uneven landscape, the Tumor Mutational Burden (TMB) was normalized to the sample-specific callable protein-coding territory (CDS regions with \u0026ge;\u0026thinsp;10x coverage). This integrative approach reliably \"rescued\" the immunogenomic classification of the cohort, correctly identifying hypermutators (FT13, LT02) and fusion-driven targets (UT08) that would have been obscured by standard genome-wide analysis.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eFigure\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e2\u003c/span\u003e DNA\u0026ndash;RNA Concordance Validates the 12q15 Amplicon Signature. Integration of locus-specific WES depth and RNA-seq abundance reveals concordant outliers for both CDK4 and MDM2 in sample LT01, consistent with functional amplification. While other samples show sporadic depth increases (attributable to capture bias), only LT01 exhibits commensurate transcriptional overexpression. TPM values are shown for qualitative concordance not cross-sample quantitative comparison though\u003c/p\u003e \u003cdiv id=\"Sec11\" class=\"Section2\"\u003e \u003ch2\u003eMolecular stratification into immunogenic archetypes\u003c/h2\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eConsistent with the molecular framework reported by Kim et al., the data suggest a hypermutator subset characterized by elevated nonsynonymous TMB and a transition-enriched SNV spectrum in a subset of cases, suggestive of mismatch repair deficiency-like mutagenesis (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). Within the hypermutator group, however, mutational spectra were heterogeneous: FT13 and UT05 showed markedly elevated Ti/Tv ratios, whereas LT02 remained highly mutated by TMB yet exhibited a distinct Ti/Tv profile, indicating that \u0026ldquo;hypermutation\u0026rdquo; in this cohort is not mechanistically uniform. To support downstream neoantigen interpretation without over-attributing biology individually, the following metrics were integrated i.e. mutation burden (nonsynonymous TMB), mutational spectrum (Ti/Tv as a transparency/QC descriptor), structural variation (expressed fusions), and driver context into three practical immunogenomic archetypes: (i) hypermutator (MSI-like or non\u0026ndash;MSI-like), (ii) SNV-low/structural-driven, and (iii) SNV-low \u0026ldquo;desert\u0026rdquo; (including amplification-driven cases).\u003c/p\u003e \u003cp\u003eCross-validation against Kim et al. confirmed archetype-biologic alignment such as TP53/ATRX/PTEN alterations enriched in hypermutators (Supplemental Table \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003eS1\u003c/span\u003e) and CDK4 amplification in LT01. This analysis broadens these conclusions by identifying expressed structural variants (BLTP3B::NF1, KDM2A::MYH9) as immunogenic sources in SNV-low contexts (Table S3). Notably, LT02 generated 41 fusion-derived candidates versus zero in na\u0026iuml;ve MSI-like samples (FT13/UT05), suggesting that therapy may enrich junction neoantigens with enhanced foreignness potential.\u003c/p\u003e \u003cp\u003eLastly, Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e shows that samples UT08 and LT01 although they have low number of SNVs (SNV-low) they exhibit structural-driven potential. Fusion neoepitopes and structural complexity provide complementary immunogenomic sources beyond the SNV burden alone.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eImmunogenomic profile of the cohort: summarizes cohort-level immunogenomic features, including nonsynonymous TMB normalized to sample-specific callable protein-coding CDS territory (\u0026ge;\u0026thinsp;10\u0026times;), Ti/Tv ratios computed from PASS SNVs within capture targets (reported primarily for contextual interpretation and QC), representative driver events, and the distribution of predicted neoepitopes by source (SNV/indel versus fusion). Ti/Tv is sensitive to target definition and filtering stringency; therefore, it is not used as a standalone classifier of MSI status or treatment etiology but rather as supportive context alongside TMB and structural evidence.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"6\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSample\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003ePhenotype\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eTMB\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eTiTv\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eDriver Events\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eNeoepitope\u003c/p\u003e \u003cp\u003eCounts\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eFT13\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eHypermutator (MSI-like)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e16.43\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e4.05\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003ePTEN indel, OBSCN\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e1088 (1088 SNV / 0 Fusion)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLT02\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eHypermutator (non-MSI-like)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e15.25\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e2.37\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eATRX (mut)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e933 (892 SNV / 41 Fusion)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eUT05\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eHypermutator (MSI-like)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e6.09\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e7.54\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eTP53, PTEN, RB1 (mut)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e903 (903 SNV / 0 Fusion)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eUT08\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eSNV-low / structural-driven\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.83\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e1.42\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eNF1-BLTP3B, KDM2A-MYH9 (fusions)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e65 (58 SNV / 7 Fusion)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eUT01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eSNV-low / Desert\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.90\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e1.79\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eNo recurrent driver detected (SNV/indel)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e83 (83 SNV / 0 Fusion)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLT01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eSNV-low / amplification-driven\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.64\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e2.54\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eRNA-high CDK4\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e37 (24 SNV / 13 Fusion)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec12\" class=\"Section2\"\u003e \u003ch2\u003eRNA-Guided \"Rescue\" Identifies Expressed Target Reservoirs\u003c/h2\u003e \u003cp\u003eRNA-seq provided orthogonal support for events occurring in genomic regions interpreted with ambiguity, enabling the differentiation of true biological signals from technical or baseline variation. In sample FT13, the PTEN locus exhibited heterogenous coverage which challenged high-confidence conventional somatic calling; however, transcriptomic evidence rescued this high-priority driver. Even with poor WES depth, RNA-sequencing confirmed the mutant allele at high transcriptional abundance (107.5 TPM), validating expression of a frameshift (p.Asn323MetfsTer40). This produced a strong-binding neoepitope (YLVLTLTKV; Rank 0.175%), demonstrating that mutational expression can prioritize actionable targets inspite of coverage gaps.\u003c/p\u003e \u003cp\u003eSimilarly, the integration of WES depth and RNA abundance effectively validated the functional copy-number drivers. Figure\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e2\u003c/span\u003e highlights sample LT01 as a distinct outlier, exhibiting simultaneous elevation of genomic coverage and transcript levels for both CDK4 and MDM2 genes. Comparing these syntenic genes strengthened the diagnosis of a functional 12q15 amplification in LT01, separating it from cases such as FT13, where high CDK4 expression occurred without MDM2 co-elevation or genomic gain - a pattern indicative of transcriptional regulation rather than chromosomal amplification. This concordance-based framework provides a strong foundation for ranking the genomic drivers.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec13\" class=\"Section2\"\u003e \u003ch2\u003ePrioritization of candidate neoepitopes\u003c/h2\u003e \u003cp\u003eCandidate prioritization integrated predicted HLA binding strength (NetMHCpan %Rank), expression support (TPM for SNV/indel-derived candidates; FFPM for fusion junctions), and a composite priority score. HLA genotyping confirmed that the allele distribution of the cohort mirrored the high-frequency haplotypes in the Korean population (for example, A*24:02, A*33:03-B*58:01), supporting the relevance of the results to the local demographics (Supplemental Table S2). The expression\u0026ndash;affinity landscape (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e3\u003c/span\u003e) summarizes the selection space and stresses candidates that combine detectable expression with strong predicted binding.\u003c/p\u003e \u003cp\u003eNotably, fusion-derived candidates contributed significantly to the high-affinity tail in some samples. In LT02, CALD1::REV3L achieved an exceptional predicted binding affinity (0.005% Rank) with detectable fusion abundance, showing the potential of expressed junction peptides to rank alongside or above SNV-derived candidates. In UT08, candidate ranking reflected an abundance-weighted rationale: KDM2A::MYH9 was retained because higher fusion abundance (7.85 FFPM) partially offset intermediate binding rank, whereas lower-abundance junctions were deprioritized.\u003c/p\u003e\u003cp\u003eTable 2 lists a selection of high-priority neoepitopes (extended candidate lists and top five candidates, are provided in Supplemental Table S3). The presented candidates are two per sample and identified through the current multi-omic discovery pipeline. These candidates, spanning all three immunogenomic archetypes, demonstrated the capacity of the workflow to prioritize effectively across diverse mutation burdens. High-quality fusion events emerged as leading candidates, with CALD1::REV3L (LT02) achieving an exceptional binding rank of 0.005%, and KDM2A::MYH9 (UT08) demonstrating robust expression support (7.85 FFPM). Furthermore, the prioritization framework, defined in Methods section, successfully linked immunogenicity to known driver biology, highlighting neoepitopes derived from PTEN (FT13) and ARID1A (LT02) alongside highly expressed targets such as HMGA2 (LT01; 43.66 TPM). Crucially, the approach proved effective even in \u0026apos;neoantigen deserts,\u0026apos; recovering viable targets like IL13RA1 (UT01) and NCKAP5 (LT01) from low-TMB samples by using expression-weighted selection to rescue candidates that could otherwise be overlooked.\u003c/p\u003e\n\u003cdiv align=\"left\" class=\"colspec\"\u003e\u003cbr\u003e\u003c/div\u003e\n\u003ctable border=\"1\" cellspacing=\"0\" cellpadding=\"0\" width=\"351\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 38px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eSample\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 49px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eEvent Type\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 82px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eGene\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 62px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eHLA\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 39px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eRank (%)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 38px;\"\u003e\n \u003cp\u003e\u003cstrong\u003ePriority Score\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 42px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eExpression\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 38px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eFT13\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 49px;\"\u003e\n \u003cp\u003eSNV/Indel\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 82px;\"\u003e\n \u003cp\u003eOBSCN\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 62px;\"\u003e\n \u003cp\u003eHLA-B*15:11\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 39px;\"\u003e\n \u003cp\u003e1.162\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 38px;\"\u003e\n \u003cp\u003e88.16\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 42px;\"\u003e\n \u003cp\u003e6.04 TPM\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 38px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eFT13\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 49px;\"\u003e\n \u003cp\u003eSNV/Indel\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 82px;\"\u003e\n \u003cp\u003ePTEN\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 62px;\"\u003e\n \u003cp\u003eHLA-A*02:01\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 39px;\"\u003e\n \u003cp\u003e0.175\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 38px;\"\u003e\n \u003cp\u003e80.76\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 42px;\"\u003e\n \u003cp\u003e107.49 TPM\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 38px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eLT01\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 49px;\"\u003e\n \u003cp\u003eSNV/Indel\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 82px;\"\u003e\n \u003cp\u003eNCKAP5\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 62px;\"\u003e\n \u003cp\u003eHLA-B*07:02\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 39px;\"\u003e\n \u003cp\u003e0.572\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 38px;\"\u003e\n \u003cp\u003e27.85\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 42px;\"\u003e\n \u003cp\u003e3.28 TPM\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 38px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eLT01\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 49px;\"\u003e\n \u003cp\u003eSNV/Indel\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 82px;\"\u003e\n \u003cp\u003eHMGA2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 62px;\"\u003e\n \u003cp\u003eHLA-B*07:02\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 39px;\"\u003e\n \u003cp\u003e0.812\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 38px;\"\u003e\n \u003cp\u003e15.85\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 42px;\"\u003e\n \u003cp\u003e43.66 TPM\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 38px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eLT02\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 49px;\"\u003e\n \u003cp\u003eFusion\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 82px;\"\u003e\n \u003cp\u003eCALD1::REV3L\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 62px;\"\u003e\n \u003cp\u003eHLA-B*07:02\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 39px;\"\u003e\n \u003cp\u003e0.005\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 38px;\"\u003e\n \u003cp\u003e98.93\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 42px;\"\u003e\n \u003cp\u003e2.51 FFPM\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 38px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eLT02\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 49px;\"\u003e\n \u003cp\u003eSNV/Indel\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 82px;\"\u003e\n \u003cp\u003eARID1A\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 62px;\"\u003e\n \u003cp\u003eHLA-C*07:02\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 39px;\"\u003e\n \u003cp\u003e0.712\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 38px;\"\u003e\n \u003cp\u003e76.58\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 42px;\"\u003e\n \u003cp\u003e5.64 TPM\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 38px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eUT01\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 49px;\"\u003e\n \u003cp\u003eSNV/Indel\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 82px;\"\u003e\n \u003cp\u003eIL13RA1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 62px;\"\u003e\n \u003cp\u003eHLA-A*03:01\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 39px;\"\u003e\n \u003cp\u003e0.746\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 38px;\"\u003e\n \u003cp\u003e81.66\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 42px;\"\u003e\n \u003cp\u003e16.94 TPM\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 38px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eUT01\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 49px;\"\u003e\n \u003cp\u003eSNV/Indel\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 82px;\"\u003e\n \u003cp\u003eDHX15\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 62px;\"\u003e\n \u003cp\u003eHLA-A*24:02\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 39px;\"\u003e\n \u003cp\u003e0.919\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 38px;\"\u003e\n \u003cp\u003e20.78\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 42px;\"\u003e\n \u003cp\u003e49.62 TPM\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 38px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eUT05\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 49px;\"\u003e\n \u003cp\u003eSNV/Indel\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 82px;\"\u003e\n \u003cp\u003eCCT3\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 62px;\"\u003e\n \u003cp\u003eHLA-A*30:11\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 39px;\"\u003e\n \u003cp\u003e1.178\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 38px;\"\u003e\n \u003cp\u003e73.79\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 42px;\"\u003e\n \u003cp\u003e38.02 TPM\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 38px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eUT05\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 49px;\"\u003e\n \u003cp\u003eSNV/Indel\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 82px;\"\u003e\n \u003cp\u003eASH1L\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 62px;\"\u003e\n \u003cp\u003eHLA-A*30:11\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 39px;\"\u003e\n \u003cp\u003e0.222\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 38px;\"\u003e\n \u003cp\u003e65.51\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 42px;\"\u003e\n \u003cp\u003e7.47 TPM\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 38px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eUT08\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 49px;\"\u003e\n \u003cp\u003eFusion\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 82px;\"\u003e\n \u003cp\u003eKDM2A::MYH9\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 62px;\"\u003e\n \u003cp\u003eHLA-C*05:01\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 39px;\"\u003e\n \u003cp\u003e0.992\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 38px;\"\u003e\n \u003cp\u003e47.73\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 42px;\"\u003e\n \u003cp\u003e7.85 FFPM\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 38px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eUT08\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 49px;\"\u003e\n \u003cp\u003eSNV/Indel\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 82px;\"\u003e\n \u003cp\u003eITFG1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 62px;\"\u003e\n \u003cp\u003eHLA-B*48:01\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 39px;\"\u003e\n \u003cp\u003e0.313\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 38px;\"\u003e\n \u003cp\u003e13.42\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 42px;\"\u003e\n \u003cp\u003e21.91 TPM\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003eTable \u003cspan class=\"InternalRef\"\u003e2\u003c/span\u003e. Curated neoepitope candidates selected for multi-omic validation. Two candidates per sample were selected based on integrated evidence from predicted HLA binding (NetMHCpan %Rank), a composite priority score, and expression support (TPM for SNVs/indels; FFPM for fusions). TPM, transcripts per million; FFPM, fusion fragments per million. NetMHCpan %Rank thresholds: \u0026lt;0.5, strong binder; 0.5-2 weak binder\u003c/p\u003e"},{"header":"Discussion","content":"\u003cp\u003eLegacy WES and RNA-seq sarcoma cohorts are still valuable, but the technical constrains i.e. large target footprints, uneven capture, and variable depth, complicate standard immunogenomic readouts. In CKS, chromosomal chaos and subclones make it worse. The present analysis indicates that a conservative callability definition coupled with multi-omic concordance can recover interpretable immunogenomic structures from such datasets, while reducing reliance on any single metric.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eMethodologically, isolated extreme WES depth is not sufficient evidence of amplification in legacy capture data. The tumor\u0026ndash;normal depth dispersion suggests capture bias contributes substantially to these spikes, so we treated DNA\u0026ndash;RNA concordance as the deciding evidence for functional amplification. The sample LT01 (CDK4/MDM2 concordant outlier co-elevation) is a concrete example of how RNA helps resolve ambiguous DNA-only signals in uneven datasets.\u003c/p\u003e \u003cp\u003eThe analysis also emphasizes the essential role of orthogonal transcriptomic evidence in rescuing driver events that may be obscured by heterogeneous genomic capture. The PTEN frameshift in sample FT13 serves as a prime example: despite variable DNA coverage across the gene locus which challenged high-confidence allele counting, RNA-seq provided decisive validation. By confirming the robust expression of the mutant transcript (107.5 TPM), this event could be confidently prioritized as a functional neoantigen-generating driver, bypassing the ambiguity of the WES data. This demonstrates that RNA-seq acts not simply as a validation layer, but as an essential discovery tool for resolving actionable targets in regions of suboptimal genomic capture\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eThis study has several limitation. Candidate lists are computational predictions and do not establish immunogenicity or in vivo presentation; empirical confirmation (e.g., peptide\u0026ndash;MHC presentation by immunopeptidomics and functional T cell assays) is required to confirm immune recognition. In addition, the cohort size was small and reflected a particular legacy dataset, limiting the extrapolation across sarcoma histologies and sequencing protocols. Nevertheless, the analytic pattern, QC-aware callability, concordance-based driver interpretation, and expression-informed prioritization, should be transferred to other archived multi-omic cohorts.\u003c/p\u003e \u003cp\u003eOverall, the analysis endorses a practical workflow for extracting interpretable immunogenomic signals and prioritized candidate lists from uneven legacy WES/RNA-seq data in genomically complex sarcomas, complementing established CKS molecular stratification based on copy-number and expression signatures.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eAcknowledgements\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe author gratefully acknowledges the authors of the original study (Kim et al., 2018) for sharing their WES/RNA-seq data publicly via the European Nucleotide Archive \u0026ndash; without it the current re-analysis would not have been possible.\u003c/p\u003e\n\u003cp\u003eAuthor contributions\u003c/p\u003e\n\u003cp\u003eP.M. planned the study design, performed the analysis of the data, drew the figures, interpreted the results and wrote the manuscript. K.A.K. Contributed to the design of the study, discussed the findings, reviewed the manuscript\u003c/p\u003e\n\u003cp\u003ePaperpal was used for language editing only.\u003c/p\u003e\n\u003cp\u003e\u0026nbsp;Competing interest statement\u003c/p\u003e\n\u003cp\u003eThe author declares that the research was conducted in the absence of any commercial or financial relationships that could lead to a potential conflict of interest.\u003c/p\u003e\n\u003cp\u003e\u0026nbsp;\u003cstrong\u003eData availability\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eRaw sequencing data are available via ENA accession PRJEB23898.\u003c/p\u003e\n\u003cp\u003e\u0026nbsp;\u003cstrong\u003eFunding\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis study was supported by the Technical University of Denmark\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eAnzar I et al (Sep. 2023) The interplay between neoantigens and immune cells in sarcomas treated with checkpoint inhibition. Front Immunol 14:1226445. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.3389/fimmu.2023.1226445\u003c/span\u003e\u003cspan address=\"10.3389/fimmu.2023.1226445\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKiyotani K, Chan HT, Nakamura Y (2018) \u0026lsquo;Immunopharmacogenomics towards personalized cancer immunotherapy targeting neoantigens\u0026rsquo;, Cancer Sci., vol. 109, no. 3, pp. 542\u0026ndash;549, Mar. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1111/cas.13498\u003c/span\u003e\u003cspan address=\"10.1111/cas.13498\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCapietto A-H, Hoshyar R, Delamarre L (2022) \u0026lsquo;Sources of Cancer Neoantigens beyond Single-Nucleotide Variants\u0026rsquo;, Int. J. Mol. Sci., vol. 23, no. 17, p. 10131, Sep. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.3390/ijms231710131\u003c/span\u003e\u003cspan address=\"10.3390/ijms231710131\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCrompton BD et al (Nov. 2014) The Genomic Landscape of Pediatric Ewing Sarcoma\u0026rsquo;, Cancer Discov. 4(11):1326\u0026ndash;1341. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1158/2159-8290.CD-13-1037\u003c/span\u003e\u003cspan address=\"10.1158/2159-8290.CD-13-1037\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTirode F et al (Nov. 2014) Genomic Landscape of Ewing Sarcoma Defines an Aggressive Subtype with Co-Association of STAG2 and TP53 Mutations\u0026rsquo;, Cancer Discov. 4(11):1342\u0026ndash;1353. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1158/2159-8290.CD-14-0622\u003c/span\u003e\u003cspan address=\"10.1158/2159-8290.CD-14-0622\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSiozopoulou V et al (2021) \u0026lsquo;Immune Checkpoint Inhibitory Therapy in Sarcomas: Is There Light at the End of the Tunnel?\u0026rsquo;, Cancers, vol. 13, no. 2, p. 360, Jan. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.3390/cancers13020360\u003c/span\u003e\u003cspan address=\"10.3390/cancers13020360\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePerry JA et al (2014) \u0026lsquo;Complementary genomic approaches highlight the PI3K/mTOR pathway as a common vulnerability in osteosarcoma\u0026rsquo;, Proc. Natl. Acad. Sci., vol. 111, no. 51, Dec. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1073/pnas.1419260111\u003c/span\u003e\u003cspan address=\"10.1073/pnas.1419260111\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSong Y, Yang K, Sun T, Tang R (Dec. 2021) Development and validation of prognostic markers in sarcomas base on a multi-omics analysis. BMC Med Genomics 14(1):31. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1186/s12920-021-00876-4\u003c/span\u003e\u003cspan address=\"10.1186/s12920-021-00876-4\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eStatz-Geary K et al (1962) \u0026lsquo;DNA Damage Repair Pathway Alterations and Immune Landscape Differences in Pediatric/Adolescent, Young Adult (AYA) and Adult Sarcomas\u0026rsquo;, Cancers, vol. 17, no. 12, p. Jun. 2025. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.3390/cancers17121962\u003c/span\u003e\u003cspan address=\"10.3390/cancers17121962\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKeskin DB et al (Jan. 2019) Neoantigen vaccine generates intratumoral T cell responses in phase Ib glioblastoma trial. Nature 565(7738):234\u0026ndash;239. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1038/s41586-018-0792-9\u003c/span\u003e\u003cspan address=\"10.1038/s41586-018-0792-9\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKiyotani K, Chan HT, Nakamura Y (2018) \u0026lsquo;Immunopharmacogenomics towards personalized cancer immunotherapy targeting neoantigens\u0026rsquo;, Cancer Sci., vol. 109, no. 3, pp. 542\u0026ndash;549, Mar. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1111/cas.13498\u003c/span\u003e\u003cspan address=\"10.1111/cas.13498\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi J et al (Aug. 2023) The screening, identification, design and clinical application of tumor-specific neoantigens for TCR-T cells. Mol Cancer 22(1):141. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1186/s12943-023-01844-5\u003c/span\u003e\u003cspan address=\"10.1186/s12943-023-01844-5\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLuo Z et al (Apr. 2021) Self-Adjuvanted Molecular Activator (SeaMac) Nanovaccines Promote Cancer Immunotherapy. Adv Healthc Mater 10(7):2002080. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1002/adhm.202002080\u003c/span\u003e\u003cspan address=\"10.1002/adhm.202002080\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eShi W \u0026lsquo;Advances in Tumor Antigen-Based Anticancer Immunotherapy: Recent Progress, Prevailing Challenges, and, Perspective\u0026rsquo; F et al (2023) Adv. Ther., vol. 6, no. 2, p. 2200239, Feb. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1002/adtp.202200239\u003c/span\u003e\u003cspan address=\"10.1002/adtp.202200239\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eXie N, Shen G, Gao W, Huang Z, Huang C, Fu L (Jan. 2023) Neoantigens: promising targets for cancer therapy. Signal Transduct Target Ther 8(1):9. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1038/s41392-022-01270-x\u003c/span\u003e\u003cspan address=\"10.1038/s41392-022-01270-x\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAnzar I et al (Sep. 2023) The interplay between neoantigens and immune cells in sarcomas treated with checkpoint inhibition. Front Immunol 14:1226445. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.3389/fimmu.2023.1226445\u003c/span\u003e\u003cspan address=\"10.3389/fimmu.2023.1226445\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSha H et al (Feb. 2022) Case Report: Pathological Complete Response in a Lung Metastasis of Phyllodes Tumor Patient Following Treatment Containing Peptide Neoantigen Nano-Vaccine. Front Oncol 12:800484. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.3389/fonc.2022.800484\u003c/span\u003e\u003cspan address=\"10.3389/fonc.2022.800484\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePerry JA et al (2014) \u0026lsquo;Complementary genomic approaches highlight the PI3K/mTOR pathway as a common vulnerability in osteosarcoma\u0026rsquo;, Proc. Natl. Acad. Sci., vol. 111, no. 51, Dec. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1073/pnas.1419260111\u003c/span\u003e\u003cspan address=\"10.1073/pnas.1419260111\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSeki M et al (Jul. 2015) Integrated genetic and epigenetic analysis defines novel molecular subgroups in rhabdomyosarcoma. Nat Commun 6(1):7557. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1038/ncomms8557\u003c/span\u003e\u003cspan address=\"10.1038/ncomms8557\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eShevkoplias A et al (2025) \u0026lsquo;Molecular subtyping and insights into sarcoma biology and prognosis.\u0026rsquo;, J. Clin. Oncol., vol. 43, no. 16_suppl, pp. 11536\u0026ndash;11536, Jun. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1200/JCO.2025.43.16_suppl.11536\u003c/span\u003e\u003cspan address=\"10.1200/JCO.2025.43.16_suppl.11536\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKim J et al (Dec. 2018) Integrated molecular characterization of adult soft tissue sarcoma for therapeutic targets. BMC Med Genet 19:216. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1186/s12881-018-0722-6\u003c/span\u003e\u003cspan address=\"10.1186/s12881-018-0722-6\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAndrews S (2010) FastQC: A quality control tool for high throughput sequence data. [Online]. Available: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://www.bioinformatics.babraham.ac.uk/projects/fastqc\u003c/span\u003e\u003cspan address=\"http://www.bioinformatics.babraham.ac.uk/projects/fastqc\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMartin M (May 2011) Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J 17(1):10. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.14806/ej.17.1.200\u003c/span\u003e\u003cspan address=\"10.14806/ej.17.1.200\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi H, Durbin R (2009) \u0026lsquo;Fast and accurate short read alignment with Burrows\u0026ndash;Wheeler transform\u0026rsquo;, Bioinformatics, vol. 25, no. 14, pp. 1754\u0026ndash;1760, Jul. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1093/bioinformatics/btp324\u003c/span\u003e\u003cspan address=\"10.1093/bioinformatics/btp324\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMudge JM et al (Jan. 2025) GENCODE 2025: reference gene annotation for human and mouse. Nucleic Acids Res 53:D966\u0026ndash;D975. no. D110.1093/nar/gkae1078\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDobin A et al (2013) \u0026lsquo;STAR: ultrafast universal RNA-seq aligner\u0026rsquo;, Bioinformatics, vol. 29, no. 1, pp. 15\u0026ndash;21, Jan. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1093/bioinformatics/bts635\u003c/span\u003e\u003cspan address=\"10.1093/bioinformatics/bts635\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBray NL, Pimentel H, Melsted P, Pachter L (May 2016) Near-optimal probabilistic RNA-seq quantification. Nat Biotechnol 34(5):525\u0026ndash;527. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1038/nbt.3519\u003c/span\u003e\u003cspan address=\"10.1038/nbt.3519\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMcKenna A et al (2010) Sep., \u0026lsquo;The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data\u0026rsquo;, Genome Res., vol. 20, no. 9, pp. 1297\u0026ndash;1303. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1101/gr.107524.110\u003c/span\u003e\u003cspan address=\"10.1101/gr.107524.110\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eUhrig S et al (2021) \u0026lsquo;Accurate and efficient detection of gene fusions from RNA sequencing data\u0026rsquo;, Genome Res., vol. 31, no. 3, pp. 448\u0026ndash;460, Mar. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1101/gr.257246.119\u003c/span\u003e\u003cspan address=\"10.1101/gr.257246.119\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSzolek A, Schubert B, Mohr C, Sturm M, Feldhahn M, Kohlbacher O (2014) \u0026lsquo;OptiType: precision HLA typing from next-generation sequencing data\u0026rsquo;, Bioinformatics, vol. 30, no. 23, pp. 3310\u0026ndash;3316, Dec. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1093/bioinformatics/btu548\u003c/span\u003e\u003cspan address=\"10.1093/bioinformatics/btu548\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBjerregaard A-M, Nielsen M, Hadrup SR, Szallasi Z, Eklund AC (Sep. 2017) Cancer Immunol Immunother 66(9):1123\u0026ndash;1130. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1007/s00262-017-2001-3\u003c/span\u003e\u003cspan address=\"10.1007/s00262-017-2001-3\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. \u0026lsquo;MuPeXI: prediction of neo-epitopes from tumor sequencing data\u0026rsquo;\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eM\u0026ouml;lder F et al (2021) \u0026lsquo;Sustainable data analysis with Snakemake\u0026rsquo;, F1000Research, vol. 10, p. 33, Jan. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.12688/f1000research.29032.1\u003c/span\u003e\u003cspan address=\"10.12688/f1000research.29032.1\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAnaconda Software Distribution (2016) (Nov. [Online]. Available: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://anaconda.com\u003c/span\u003e\u003cspan address=\"https://anaconda.com\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"cancer-immunology-immunotherapy","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"ciim","sideBox":"Learn more about [Cancer Immunology, Immunotherapy](http://link.springer.com/journal/262)","snPcode":"262","submissionUrl":"https://submission.nature.com/new-submission/262/3","title":"Cancer Immunology, Immunotherapy","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Springer Hybrid","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"RNA-guided neoantigens, Complex Karyotype Sarcomas, Immunogenomics, Legacy sequencing","lastPublishedDoi":"10.21203/rs.3.rs-8854019/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-8854019/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003e\u003cstrong\u003eBackground\u003c/strong\u003e: Soft tissue sarcomas, particularly Complex Karyotype Sarcomas (CKS), are characterized as \"immunologically cold\" malignancies driven by structural instability rather than a high tumor mutational burden (TMB). Public “legacy” cohorts are a useful resource to uncover immunotherapy biomarkers. This study used the Whole Exome Sequencing (WES) and RNA-sequencing of CKS patients, to overcome technical limitations and to identify and prioritize neoantigens.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eMethods\u003c/strong\u003e: The systematic immunogenomic reanalysis was performed on a landmark cohort of CKS patients (Kim et al., 2018) with a custom bioinformatics workflow which was developed to uncover interpretable immunogenomic signals. This approach consisted of: (1) defining a quality-controlled \"callable territory\" and normalizing TMB metrics respectively; (2) utilizing RNA-seq not only for expression filtering but as an orthogonal validation check for variant transcription and to distinguish functional amplifications from technical depth artifacts; and (3) applying a multi-modal epitope prediction pipeline to identify and prioritize high-affinity neoantigens derived from both somatic SNVs indels and expressed gene fusions.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eResults\u003c/strong\u003e: The reanalysis shows that standard genome-wide metrics frequently underestimated the immunogenic potential. Normalizing the TMB refined the identification of hypermutated and microsatellite instability-like phenotypes. Furthermore, integration of transcriptomic data facilitated the recovery of actionable targets in \"low-TMB\" tumors. A subset of fusion-derived peptides demonstrated predicted binding affinities competitive with SNV-derived candidates.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConclusion\u003c/strong\u003e: This study illustrates that technically constrained multi-omic datasets can be systematically re-analyzed to identify potential therapeutic targets. These data argue for looking beyond aggregate biomarkers; patient-specific, expressed neoepitopes may exist even in sarcomas typically described as immunologically “cold”.\u003c/p\u003e","manuscriptTitle":"Repurposing public sarcoma multi-omics for neoantigen discovery","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-02-22 16:28:43","doi":"10.21203/rs.3.rs-8854019/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2026-03-06T16:14:37+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-02-26T00:17:53+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"129389776706502247666156947468416719016","date":"2026-02-17T16:31:36+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2026-02-17T16:14:53+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2026-02-13T03:43:58+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2026-02-13T03:43:17+00:00","index":"","fulltext":""},{"type":"submitted","content":"Cancer Immunology, Immunotherapy","date":"2026-02-11T16:16:15+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"cancer-immunology-immunotherapy","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"ciim","sideBox":"Learn more about [Cancer Immunology, Immunotherapy](http://link.springer.com/journal/262)","snPcode":"262","submissionUrl":"https://submission.nature.com/new-submission/262/3","title":"Cancer Immunology, Immunotherapy","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Springer Hybrid","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"e618bcf3-f7eb-4540-8984-1be91a4e2a95","owner":[],"postedDate":"February 22nd, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[],"tags":[],"updatedAt":"2026-04-27T16:08:16+00:00","versionOfRecord":{"articleIdentity":"rs-8854019","link":"https://doi.org/10.1007/s00262-026-04395-y","journal":{"identity":"cancer-immunology-immunotherapy","isVorOnly":false,"title":"Cancer Immunology, Immunotherapy"},"publishedOn":"2026-04-21 15:59:57","publishedOnDateReadable":"April 21st, 2026"},"versionCreatedAt":"2026-02-22 16:28:43","video":"","vorDoi":"10.1007/s00262-026-04395-y","vorDoiUrl":"https://doi.org/10.1007/s00262-026-04395-y","workflowStages":[]},"version":"v1","identity":"rs-8854019","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-8854019","identity":"rs-8854019","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.