ButterflyVI: enabling high-throughput variant interpretation and biomarker discovery with functional genomics

doi:10.64898/2026.01.20.700339

ButterflyVI: enabling high-throughput variant interpretation and biomarker discovery with functional genomics

2026 · doi:10.64898/2026.01.20.700339

preprint OA: closed CC-BY-NC-ND-4.0

📄 Open PDF Full text JSON View at publisher

Full text 91,061 characters · extracted from oa-pdf · 5 sections · click to expand

Introduction

Precision oncology aims at tailoring therapeutic protocols to the features of each tumor and patient. To this purpose, an increasing diversity of molecular assays have entered the clinic. In particular, deep target sequencing of selected gene panels allows to identify cancer-associated variants that may be either clinically actionable through selected inhibitors, or biomarkers of treatment sensitivity. However, each gene and patient can exhibit a wide variety of unique variants, with only a handful of them being functional and/or a clinical biomarker. Variant interpretation consists in determining the functional consequences and clinical implications, if any, of each mutation and it represents today a major bottleneck towards the successful implementation of precision oncology. Currently, variant interpretation largely relies on prior knowledge and expert data curation1–4, which are limited and time consuming. On the one hand, predictive approaches have been proposed to prioritize putative functional variants, by leveraging either their recurrence across large patient cohorts 5–7, evolutionary conservation of the mutated residues 8–10, or the predicted biochemical and physical alterations to the protein structure, recently adopting cutting -edge deep learning models 11. While these approaches can rapidly scale to millions of variants, they lack functional validation, which rapidly becomes unfeasible for large numbers of mutations. On the other hand, genome-wide genetic screenings have been used to systematically assess the functional effect of targeting a given gene across multiple in vitro and in vivo models 12,13, with the largest-to-date examples generated by the Cancer Dependency Map consortium (DepMap)14–16. These datasets provide high-throughput functional datasets that could be used by computational models to predict and validate oncogenic dependencies induced by specific variants. Here, we explored the possibility of using results from large -scale functional screenings to systematically classify putative oncogenic and neutral mutations, efficiently providing statistical and functional evidence to interpret thousands of cancer-associated variants.

Results

The rationale behind the use of genetic screenings for variant interpretation relies on the following hypothesis: if a mutation is functional, i.e. it alters the function of the mutated protein, then selective loss of the corresponding gene has a different effect when the gene is mutated and when it is wild -type ( Fig. 1A ). This hypothesis has been demonstrated for mutations activating known oncogenes . In these cases, oncogenic mutations have been shown to constitute tumor dependencies 17, which is, tumor cells depend on the mutated oncogene and its loss leads to reduced cell proliferation and viability (a.k.a. cell fitness), a phenomenon also referred to as oncogene addiction or dependency. Conversely, loss of the same oncogene in cell lines where it does not harbor activating mutations will not affect cell fitness. The identification of oncogene dependencies has provided the rationale for the development of several targeted therapies18 and, thus, it has been the main focus of functional genetic screenings 19,20. However, differential responses to gene loss in mutated vs. wild -type cancer cells can reflect other functional relationships. Beyond oncogene activation, cancer cells rely on the inactivation of tumor suppressor genes , typically through loss -of-function (LoF) variant. As a result, loss of tumor suppressors is expected to increase cell fitness when .CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted January 22, 2026. ; https://doi.org/10.64898/2026.01.20.700339doi: bioRxiv preprint these genes are wild type, while no fitness changes are expected when they already harbor functional LoF variants. Beyond oncogene and tumor suppressor dependencies (OGD and TSD, Fig. 1B), two additional types of differential response are possible, which we defined as: mutation tolerance (MTO), when loss of the mutated gene increases cell fitness, suggesting the mutation may be disadvantageous to the cell, whereas loss of the wild-type gene has no effect; and, bypass of essentiality (BYE), when loss of the wild-type gene decreases cell fitness, which happens for essential genes, whereas loss of the mutated gene has no effect, suggesting that once mutated the gene is no longer essential (Fig. 1 B). Notably, while oncogene and tumor suppressor dependencies have been previously documented for known cancer genes, the existence and systematic assessment of mutation tolerance and bypass of essentiality among cancer variants is still largely unexplored. The ButterflyVI annotation catalog To unbiasedly test functional dependencies for a large number of unique variants beyond those affecting known oncogenes and tumor suppressors, we integrated and re - annotated molecular data and cell fitness readouts for two large -scale datasets from DepMap: a genome -wide CRISPR knock -out screening dataset 21, out of which we retained 1178 cell lines, and a genome -wide RNAi knock-down screening dataset22, out of which we retained 646 cell lines (Suppl. Table 1 ). In total, these datasets include 375,687 and 186 ,037 unique variants, respectively. Differential responses to gene loss (i.e., differential dependency scores) between cell lines that harbored a specific mutation for that gene, altered group, and cell lines that were wild-type were assessed by their Cohen’s D effect size and ANOVA testing, correcting for tumor type. For each test, we required at least 5 cell lines in each group. To test the largest possible number of query variants, we defined 5 levels of resolution, according to which cell lines were considered harboring the query variant if they harbor a mutation that was either exactly the same as the query ( L1 – highest resolution), a “similar” amino -acid substitution, based on a positive BLOSUM62 matrix score23 (L2, missense mutations only), a mutation occurring at the same residue position (L3) or at adjacent positions (L4, ± 3 residues), or the same type of mutation of the query variant ( L5 – lowest resolution), with missense mutations being tested only up to L4 (Fig. 1C – see Methods). Cell lines were assigned to the wild -type group if they did not harbor any kind of mutations or copy number alterations in the gene of interest. Each variant was tested, and the results retained for the highest resolution where at least 5 cell lines were found in each group. Note that since we tested only mutations that were present in the datasets, at least one cell line harbored each queried variant at L1 resolution. Hence, if a variant was found significant at a resolution lower than L1, we further required that the mean dependency score of the cell lines harboring the query variant at L1 resolution was either (i) closer to the mean dependency score of the altered group than to the mean dependency score of the wild - type group, or (ii) fell below the 0.2 quantile or above the 0.8 quantile of the wild -type dependency score distribution, depending on the direction of the effect (see Methods). Even allowing for multiple levels of resolution, only 4% and 6% of the total number of mutations were testable, corresponding to 22,234 mutations in the CRISPR dataset and 7,554 mutations in the RNAi dataset, out of which 1 ,713 and 608, respectively, induced .CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted January 22, 2026. ; https://doi.org/10.64898/2026.01.20.700339doi: bioRxiv preprint a significant differential response to gene loss (Fig. 1D, Suppl. Table 2). Once stratified based on the type of differential response, we found that in both datasets most significant variants were either OGD (45% and 52% for CRISPR and RNAi, respectively) or TSD (28% and 35%). Although a minority, in both datasets we identified multiple variants whose differential response to gene loss reflected either mutation tolerance or bypass of essentiality (MTO: 13.5% and 7%, BYE: 14% and 6%) (Fig. 1E). In the rest of the study, we will refer to the set of annotations that we derived as the Butterfly catalog for variant interpretation (ButterflyVI), drawing an analogy between the 4 wings of a butterfly and the 4 types of differential response to gene loss. Consistent with our initial hypothesis, OGD and TSD variants were respectively significantly enriched in oncogenes and tumor suppressors, in both datasets ( Fig. 1F). Intriguingly, BYE variants were also enriched in tumor suppressors, although only in the CRISPR dataset. Among the 885 variants that were testable in both the CRISPR and RNAi datasets and signiﬁcant in at least one of the two, 397 (45%) exhibited concordant annotations, and these included annotations across all types of dijerential response, except MTO (Fig. 1G). Moreover, of the 2045 variants that were signiﬁcant in at least one dataset, regardless of whether they were testable in both, 818 (40%) were classiﬁed as “oncogenic” or “likely oncogenic” by OncoKB 24 (Fig. 1H, Suppl. Fig. 1A -B), indicating a high agreement between our results and curated experimental and clinical evidence. Importantly, 122 1 variants were previously considered of unknown signiﬁcance and represent novel variant annotations introduced by our study. In a parallel study, OGD were estimated for rare variants using the same datasets but an independent approach (Savino et al. co-submitted manuscript). Nicely, OGD estimated by the two studies were signiﬁcantly concordant, mutually corroborated each other’s results (Suppl. Fig. 2A-D). Most significant dependencies were OGD induced by activating mutations of Ras /Raf oncogenes (KRAS, NRAS, HRAS, BRAF) and, to a lesser extent, PIK3CA, as well as TSD induced by mutations targeting TP53, the significance of which was further driven by the high number of altered cell lines for each variant , and other frequently altered tumor suppressors such as PTEN and RB1 (Fig. 2A -B, Suppl. Fig. 3A -B). Beyond these well - known oncogenic variants, our analyses revealed significant yet still uncharacterized dependencies, such as OGD site -specific variants at RHOA (T19K), TERT (H762 splice- site mutation), IFNA4 (G60E), ZFP64 (P552/Q553), and POU3F3 (H311/S312) (Fig. 2B). Interestingly, a subset of mutations in tumor suppressor genes were also classiﬁed as signiﬁcant OGD events, including PIK3R1 in-frame and splice region variants and BRCA2 frameshift hotspot mutation N1784fs. PIK3R1 variants aligned with two major in-frame mutation clusters that are frequent in uterine endometrial cancer and brain malignancies, as observed in human cohorts from The Cancer Genome Atlas (TCGA) (Suppl. Fig. 4A). In the CRISPR dataset, loss of fitness upon knock -out of PIK3R1 was significant in cell lines harboring in -frame deletions or insertions, and although this signal was more moderate in the RNAi cohort , it remained significant for in -frame insertions (Suppl. Fig. 4B). Importantly, these effects were independent of the approach that was used to normalize gene dependency scores in the CRISPR screening ( Suppl. Fig. 4C). Although these variants are predicted to destabilize inhibitory interactions with PIK3CA and favor tumor growth24 their functional characterization is still incomplete, and the oncogenic dependency observed here may suggest secondary effects. .CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted January 22, 2026. ; https://doi.org/10.64898/2026.01.20.700339doi: bioRxiv preprint Fig. 1 - Systematic annotation of mutations in four functional groups. A, Schematic of the comparison of the effect of knocking -out/down a given gene in cell lines where that gene is altered (red) and when it is wild type (gray). B, Schematics of the four differential responses to gene loss. C, Schematics of the five levels of resolution for mutations analysis. D, Left: number of mutations per type considered for analysis in CRISPR and RNAi datasets. Right: number of testable and significant mutations in each dataset. E, Number of significant mutations in each functional group (OGD, TSD, MTO, BYE) and distribution of mutation types within each group. Colors correspond to the legends in panels C and D. F, Enrichment of oncogenes and tumor suppressor genes within each mutation group. The numbers indicate the count of mutations in OG or TSG relative to the total mutations in that group. G, Comparison of mutation annotations between the CRISPR and RNAi datasets. The plot includes all mutations that are significant in at least one of the CRISPR or RNAi datasets and are testable in H, Distribution of signiﬁcant mutations based on functional group and OncoKB annotation. The pie chart depicts mutations that are signiﬁcant in at least one dataset, regardless of testability in the other. BRCA2 N1784fs mutations were recurrent in gastric tumor samples from TCGA that exhibited microsatellite instability (MSI) (Fig. 2C ) and, c onsistently, these variants occurred exclusively in MSI cancer cell lines (Fig. 2 C). Interestingly, in MSI models, BRCA2 knock-out was deleterious independently of its mutation status , although dependency scores were significantly lower when the gene harbored the N1784fs .CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted January 22, 2026. ; https://doi.org/10.64898/2026.01.20.700339doi: bioRxiv preprint hotspot variant (Fig. 2D). This difference may be due to gene dosage effects, where loss of one allele through a loss -of-function (LoF) variant increases the dependency of a cell on the remaining allele. A similar case was observed for frameshift mutations targeting the Werner helicase WRN, which has been shown to be an MSI dependency25. Consistent with a vulnerability induced by loss of heterozygosity, WRN frameshift mutations in MSI cancer cells induced a stronger dependency to WRN knock-out (Suppl. Fig. 5). Deeper investigation of this phenomenon revealed several putative truncating mutations that were classified as OGD, as they induced higher sensitivity to knock-out of genes, whose loss was also harmful in normal cells, but to a lesser degree (Fig. 2E). A large fraction of these variants affected common essential genes that increased the gene dependency on the wild -type allel e. While deeper investigation will be required for individual cases, these results may reveal novel tumor vulnerabilities , associated with heterozygous LoF variants at common or subtype-specific essential genes26. Putative LoF mutations were also common among BYE variants and affected multiple tumor suppressor genes, seemingly at odds with the observed loss of fitness upon gene knock-out in wild -type cell lines . For example, s ignificant BYE variants comprised multiple LoF mutations affecting the VHL and BAP1 tumor suppressors, which are frequently inactivated in renal carcinoma. Knock-out of VHL was deleterious in wild-type cells both at pan -cancer level and within renal cancer cell lines only, consistent with previous experimental evidence demonstrating loss of cell proliferation and/or senescence upon VHL loss 27,28, while it had no effect in mutated cells, suggesting acquisition of genetic or epigenetic alterations bypassing VHL essentiality (Suppl. Fig. 6A). BAP1 differential response s were observed for both frameshift and splice -region variants ( Fig. 2F ). Among its many functions, BAP1 is the catalytic component of the polycomb repressive deubiquitinase complex, and it catalyzes deubiquitination of histone 2A (H2A). Interestingly, BYE dependencies were uniquely enriched for LoF mutations affecting histone modifiers, like BAP1, and chromatin remodeling factors, in particular, components of the SWI/SNF complex such as ARID1A, ARID2, SMARCA4, and ATRX (Fig. 2 B). Wild -type cell lines or cell lines harboring likely -passenger missense mutations on these genes typically exhibited loss of fitness upon gene knock -out (KO) (Fig. 2G, Suppl. Fig. 6B ), unlike cell lines already harboring LoF events. Similar trends were observed in the CRISPR and RNAi datasets ( Suppl. Fig. 6B-C) and, importantly, these effects were independent of secondary mutations at SWI/SNF components ( Fig. 2G, Suppl. Fig. 6B ), indicating they could not be explained by synthetic lethal interactions29,30. Despite the deleterious effect of knocking -out these genes across a wide variety of wild -type cell lines, LoF mutations at these putative tumor suppressor genes are recurrent and considered oncogenic across multiple human tumors 31. A possible explanation can be found in the “transcriptional numbness” that was recently reported upon KO of epigenetic modifiers in lung cancer and melanoma models32, which provided a fitness advantage only in stress conditions. Our results suggest that KO of chromatin modifiers may even be initially deleterious, at least in vitro , but eventually promote a cell state where these mutations are tolerated or advantageous. Importantly, we observed such bypass of essentiality across multiple independent models and for multiple genes beyond chromatin remodeling factors. .CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted January 22, 2026. ; https://doi.org/10.64898/2026.01.20.700339doi: bioRxiv preprint Fig. 2 - Characterization of the four functional groups of mutations. A, Butterfly plot for CRISPR dataset: systematic comparison of gene dependency scores between cell lines carrying a specific mutation and those that are wild type for the corresponding gene. Each point represents a mutation, with its position determined by th e Cohen’s D effect size and the associated -log10(p-value) computed at the best testable level of resolution. The p -value is directional: positive for OGD (red) and MTO (dark blue), where wild-type cell lines have mean dependency scores closer to zero than mutant cell lines; and negative for TSD (orange) and BYE (light blue), where mutant cell lines have mean dependency scores closer to zero than wild -type cell lines. B, Zoomed -in view of the butterfly plot in panel A (corresponding to the grey-shaded area). C, Incidence of BRCA2 frameshift mutations in the TCGA Pan - cancer Atlas (top) and in the DepMap cell lines for the hotspot at position 1784 (bottom). D, BRCA2 dependency scores in MSI cell lines that are either wild type for BRCA2 or carry a frameshift mutation at the hotspot position 1784. E, Mean dependency score of altered cell lines (x -axis) versus wild -type cell lines (y-axis) in the CRISPR dataset. Each point represents a significant truncating mutation (frameshift or nonsense) classified as OGD, TSD, MTO, or BYE. OGD variants appear in the bottom-left region above the diagonal, indicating increased sensitivity to knockout in altered cell lines compa red with wild -type cell lines. F, BAP1 dependency scores in the CRISPR dataset for cell lines that are either wild type for BAP1 or carry frameshift or splice-site mutations. G, CRISPR dataset dependency scores for four SWI/SNF complex genes in cell lines that are either wild type for the respective gene or carry a BYE mutation. The WT group includes cell lines wild type for all SWI/SNF genes. The “Other SWI/SNF” group includes cell lines that are .CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted January 22, 2026. ; https://doi.org/10.64898/2026.01.20.700339doi: bioRxiv preprint wild type for the gene considered but carry alterations in other SWI/SNF complex genes. H, DDX3X dependency scores in the CRISPR dataset for cell lines that are either wild type for DDX3X or carry a BYE missense mutation in the hotspot region spanning positions 472–475. Indeed, significant BYE variants included recurrent LoF mutations at ATP11B, which was among the most significant BYE dependencies, E3 ubiquitin ligases LTN1 and UBR2, and RNA processing genes RBM10 and DDX3X, the latter of which included newly annotated hotspot variants R475C and H472Y/R/L (Fig. 2H). As several of these genes have been previously proposed to act as tumor suppressors, our results indicate that their loss may be advantageous only under specific conditions or cell states, which will need to be characterized to understand the oncogenic competence of these variants. Biomarker discovery with ButterflyVI annotations Leveraging our ButterflyVI catalog of putative functional variants, we performed a biomarker discovery analysis to identify predictors of sensitivity or resistance to individual gene loss. Briefly, for each gene, we retained variants annotated as functional in either OncoKB, ButterflyVI, or both, and designed a machine learning strategy based on ElasticNet penalized regression to determine which mutated genes were significantly predictive of the dependency scores in DepMap. Beyond mutated genes, features in the model included: the tumor type of each cell line, microsatellite instability (MSI) status, and gene copy number alterations (see Methods) (Fig. 3A ). Features were retained as significant predictors if selected among multiple ElasticNet runs (≥50%) and obtaining a mean ElasticNet coefficient greater than .05, in both RNAi and CRISPR datasets (Fig. 3A). First, we compared our results with those obtained upon applying the same procedure using only OncoKB annotated variants , or using all variants reported for each gene without any filtering. Even though, ElasticNet coefficients were highly correlated among the different analyses, using all mutations led to a significantly smaller number of genetic predictors (Fig. 3 B), supporting the relevance of variant annotations and filtering for functional and clinical studies. Interestingly, upon complementing OncoKB annotations with ButterflyVI, we identified new predictors, several of which did not exhibit any association in the OncoKB -only analysis ( Fig. 3C). Among these, t he strongest associations were between mutations at BMPR2 and COL7A1 and response to WRN and RPL22L1 knock-out, respectively , both of which have been previously implicated as vulnerabilities in MSI tumors 25. Notably, the ElasticNet analysis including all mutations did not identify any trend or significant association between these genes (Fig. 3 B), indicating that this dependency is specifically associated with those variants that were annotated as functional by ButterflyVI. WRN encodes for the Werner helicase, which is involved in DNA repair, and it was found as the strongest dependency in MSI tumors, prompting the development of selective inhibitors 33. BMPR2 encodes for a bone morphogenetic protein (BMP) receptor that binds TGF -beta ligands and activates downstream SMAD transcriptional regulators. BMPR2 frameshift mutations were classified as functional by ButterflyVI, including a hotspot frameshift mutation at residue N583, which is recurrent across gastrointestinal MSI human tumors (Fig. 3D). .CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted January 22, 2026. ; https://doi.org/10.64898/2026.01.20.700339doi: bioRxiv preprint Fig. 3 - ElasticNet analysis for biomarker discovery integrating functional variants from OncoKB and ButterflyVI. A, Schematics of the ElasticNet analysis. B, Comparison of ElasticNet scores from the analysis done using all mutations (x-axis) or filtered mutations from OncoKB and ButterflyVI (y -axis). Each point represents a target–biomarker pair. Negative scores indicate sensitivity to the target knockout, wh ile positive scores indicate resistance. C, Comparison of ElasticNet scores from the analysis done using filtered mutations only from OncoKB (x -axis) or filtered mutations from OncoKB and ButterflyVI (y -axis). D, Frequency of BMPR2 frameshift mutations across tumor types in the TCGA Pan -Cancer Atlas, highlighting the hotspot mutation N583Tfs*44. E, WRN dependency scores for cell lines that are either wild type for BMPR2 (or carry a neutral mutation) or carry a functional mutation in BMPR2 (as defined by OncoKB or ButterflyVI). Left: all CLs; centre: MSI CLs only; right: Colorectal Cancer (CRC) CLs only. F, RPL22L1 dependency scores for cell lines that are either wild type for COL7A1 (or carry a neutral mutation) or carry a functional mutation in COL7A1 (as defined by OncoKB or ButterflyVI). Left: all CLs; right: MSI CLs only. G, Expression levels of RPL22L1 in cell lines that are either wild type for RPL22 or altered (carrying a mutation or deletion). Importantly, differential response to WRN knock-out between BMPR2 mutated and wild- type cell lines was independent of MSI status and it was indeed consistently observed .CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted January 22, 2026. ; https://doi.org/10.64898/2026.01.20.700339doi: bioRxiv preprint across all cell lines, among MSI models only, and among MSI colorectal cancer models, where these mutations were most frequent ( Fig. 3E). Similar results were obtained for COL7A1 mutations and RPL22L1 knock-out. RPL22L1 encodes a ribosomal large subunit protein that is paralogous to RPL22, while COL7A1 encodes type VII collagen, a structural component of anchoring fibrils in epithelial basement membranes. RPL22L1 was also reported as a potential MSI dependency, although sensitivity to its loss was attributed to LoF variants in its paralog RPL2225. Interestingly, cell lines harboring ButterflyVI- annotated COL7A1 mutations lacked LoF events in RPL22, and exhibited a strong dependency to RPL22L1 knock-out both among all cell lines and within MSI models only (Fig. 3 F). Consistently, RPL22L1 was overexpressed in COL7A1 mutated cell lines, independently of MSI status (Fig. 3G). Although residual confounders due to higher order interactions between MSI tumor subtypes and other variants may still exist, our analyses consistently show ed increased WRN or RPL22L1 dependency in BMPR2 or COL7A1 mutated cell lines, respectively, suggesting that these alterations may confer heightened sensitivity to WRN and RPL22L1 therapeutic inhibition, even among MSI tumors. Overall, we selected 2449 genes that were testable and exhibited variable response to gene loss in both datasets , and identified 299 genetic features (mutations or copy number alterations) that were significant predictors of response to the knock -out and knock-down of 177 genes (Fig. 3 A, Suppl. Fig. 7 , Suppl. Table 3 ). Target genes associated with a high number of predictors included several cell cycle regulators such as CDK4, CDK6, and CCND1, with several predictors corresponding to genes in the same pathway; and oncogenes such as PIK3CA and CTNNB1, which comprised themselves among the significant sensitivity predictors, consistent with multiple OGD variants at these genes (Suppl. Fig. 7). To explore potentially actionable therapeutic biomarkers, we focused on target genes that can be directly or indirectly inhibited through either currently approved cancer therapies or drug compounds in clinical trial . Among these, our ElasticNet model identified multiple known in cis and in trans associations, as well as potentially new biomarkers (Fig. 4A ). Previously unreported biomarkers included NOTCH3 amplification that was predictive of sensitivity to BCL2L1 knock-out (Suppl. Fig. 8A), supporting synergistic inhibition of the two oncogenes 34, SMAD3 mutations predicting sensitivity to CDK4 KO (Suppl. Fig. 8B), suggesting these variants impair SMAD3-mediated cell cycle arrest35, and truncating mutations of RPL5, which predicted of sensitivity to MDM2 inhibition (Fig. 4B). .CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted January 22, 2026. ; https://doi.org/10.64898/2026.01.20.700339doi: bioRxiv preprint Fig. 4 - New biomarkers of druggable genes. A, Summary plot of ElasticNet biomarkers of druggable genes. Each point shows a biomarker shared between CRISPR and RNAi datasets that was selected at least 5 times out of 10 ElasticNet runs, had a mean coefficient greater in absolute value than .05 and exhibit the same effect direction in both datasets. A negative weighted mean ElasticNet score indicates sensitivity, whereas a positive score indicates resistance. B–D. MDM2 (B) and MDM4 ( C) dependency scores, and MDM2 mRNA expression ( D) in cell lines class ified by RPL5 status: wild type (or carrying a neutral mutation) versus carrying a functional mutation (as defined by OncoKB or ButterflyVI). E, Relative expression of RPL5 in cells transfected with control siRNA (siCtrl), GAPDH siRNA (siGAPDH), or RPL5 siRNA (siRPL5). Data are presented as the mean .CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted January 22, 2026. ; https://doi.org/10.64898/2026.01.20.700339doi: bioRxiv preprint ± standard error of the mean (SEM). Differences between groups were evaluated using an unpaired Student's t-test. The level of statistical significance is indicated by asterisks: ns, not significant (P > 0.05); ** P ≤ 0.01; * P ≤ 0.05. F, Cellular viability in response to DMSO (control) or Nutlin-3a treatment at varying doses in RT-4 (left) and KU-19-19 (right) models. Cellular viability is expressed as a percentage. Each point represents a single experimental data point (replicate). The colored lines indic ate the mean viability for each knockdown condition (siCtrl, siGAPDH, siRPL5). The shaded bands represent the standard error of the mean (SEM). RPL5, in particular, encodes for the ribosomal protein L5 that has been reported inhibiting MDM2 and activating p53, in response to perturbation of ribosome biogenesis36. In the CRISPR dataset, RPL5 mutations were also predictive of sensitivity to MDM4 knock-out (Fig. 4C) and both MDM2 and MDM4 were highly expressed in cell lines exhibiting LoF RPL5 mutations (Fig. 4D). To validate this biomarker, we selected two independent bladder cancer cell lines, RT -4 and KU -19-19, which did not exhibit alterations to the p53 pathway, and we tested their sensitivity to 48 hours treatment with the MDM2 selective inhibitor Nutlin -3a (Suppl. Fig. 9B). Next, we inhibited RPL5 via siRNA in both cell lines and compared response to treatment against scrambled siRNA or an siRNA targeting GAPDH as control (Fig. 4E, Suppl. Fig. 9C). In both cell lines, RPL5 knock-down increased sensitivity to Nutlin -3a (Fig. 4F), with differential drug response observed already with low doses of the drug (~1 μM). Overall, these results indicate that RPL5 loss sensitize s tumors to MDM2 inhibition and, thus, RPL5 LoF mutations may represent a novel biomarker of response to MDM2 inhibitors in p53 wild-type tumors. The ButterflyVI data portal To visualize and explore all the results presented in this study we have developed the ButterﬂyVI Portal (https://butterﬂyvi.unil.ch/). The portal is organized in gene -centric pages, which can be searched and accessed by the corresponding HUGO symbol of the gene (Fig. 5A-B) and where results are presented in the Variant Interpretation (Fig. 5C) and Biomarker Discovery (Fig. 5D) pages. Variant Interpretation provides an overview of all tested variants and their corresponding ButterﬂyVI annotation ( Fig. 5C), as well as a detailed boxplot representation of the dependency scores in the altered and wild -type group for each variant (Fig. 5E). The Biomarker Discovery page summarizes the results of our ElasticNet analysis, reporting for each genes the variants that were found predictive of resistance or sensitivity to loss of that gene (Fig. 5D). All data can be downloaded in a table formata and all plots can be saved in common exportable formats (PNG and SVG). ButterﬂyVI is built using state -of-the-art frameworks. ButterﬂyVI annotations are stored in a MongoDB NoSQL database for ejicient data access. The front -end application is developed using the Angular framework, and the overview plots are rendered with D3.js. The boxplots are generated using Plotly and ggplot2. a The downloadable ﬁles are provided as supplementary material to this manuscript and will be made available on the website upon publication. .CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted January 22, 2026. ; https://doi.org/10.64898/2026.01.20.700339doi: bioRxiv preprint Fig. 5 - The ButterflyVI data portal A-B, Search page view of the portal. Genes can be queried using their HUGO symbols. Summary gene tabs are displayed underneath for genes with results available from either the variant interpretation or biomarker discovery analyses. C, Detailed view of the variant interpretation analysis. Annotated variants are represented as colored rectangles according to their functional category (OGD, TSD, MTO, BYE, or neutral). Their horizontal position corresponds to their location in the protein s equence, while vertical placement reﬂects the resolution level of testing. A switch allows users to select between CRISPR or RNAi results. D, Detailed view of the biomarker discovery analysis. Biomarker of sensitivity (green) or resistance (red) are shown for both CRISPR and RNAi. By default, only common predictors are displayed, but individual dataset predictors can be revealed using the switch at the bottom. Circle size corresponds to the Weighted Mean ElasticNet score. E, Overview of all tested variants and their ButterﬂyVI a nnotation in CRISPR and RNAi dataset. The number of colored squares indicates the level of resolution (ranging from 5 for same mutation to 1 for the same type level ), while the color denotes the functional category of the annotated mutation. Each variant tab can be expanded to show a detailed boxplot of dependency scores in the altered versus wild -type groups. A slider on the right allows adjustment of the resolution level to display the corresponding boxplots. .CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted January 22, 2026. ; https://doi.org/10.64898/2026.01.20.700339doi: bioRxiv preprint

Discussion

In this study we presented a systematic framework for functionally annotating cancer - associated variants using two independent genome -wide loss of function screenings. We propose an unbiased classification accounting for four distinct types of differential responses to gene loss, which beyond oncogene and tumor suppressor dependencies, revealed dependencies associated with mutation tolerance and bypass of essentiality. Surprisingly, among variants associated with bypass of essentiality , we found several loss-of-function mutations at tumor suppressor genes, such as VHL, ARID1A, ATRX, and RBM10. Although, LoF variants in these genes are recurrent across multiple tumor types, their loss in vitro was deleterious and, for some of them, similar observations were made in vivo. This paradox can be explained by context-specific oncogenic competence where such LoF variants provide an advantage only under certain conditions or cell states 32, similarly to what has been previously reported for certain oncogenes37. Our findings generalized th ese observations across multiple models and for a broad set of tumor suppressors. We should stress, however, that our analysis solely relies on functional screenings using in vitro models, which do not recapitulate the complexity of the tumor microenvironment, both in terms of cell diversity, interactions, and nutrient availability. Moreover, these screenings only measure the impact of gene loss on cell viability, and thus other functionally relevant consequences may not be captured. Despite these limitations, our results and annotations stress the importance of framing functional and clinical studies of variants of unknown significance in the proper context and may allow to discriminate among “universal” tumor suppressors (e.g., TP53 and RB1), which induced tumor suppressor dependencies, and “context -specific” tumor suppressors (e.g., ARID1A and RBM10), which are associated with bypass of essentiality. To account for interactions among different types of variants and tumor subtypes, we performed a broad biomarker discovery analysis leveraging OncoKB and ButterflyVI variant annotations. Here, we aimed at identifying genetic variants that predicted sensitivity and resistance to the loss of a broad set of genes, focusing in particular on therapeutically actionable targets. First, this analysis highlighted the importance of filtering variants based on their predicted or validated functional impact. Indeed, biomarker discovery using all variants that were detected for a given gene missed nearly half of significant associations , despite the higher number of retained mutations and therefore greater statistical power. A large fraction of variants within each gene is expected to be neutral and r etaining such variants in cancer genomics analyses will inevitably hinder statistical and functional studies. Second, leveraging our annotation catalog we were able to identify novel candidate biomarkers , that could be used to identify patients that could most benefit from a given treatment. Recently, therapeutic inhibition of the Werner helicase WRN has been proposed for microsatellite unstable (MSI) tumors 33. Here, we showed that tumors exhibiting BMPR2 loss may exhibit an exquisite sensitivity to this treatment. Similarly, we identified multiple candidate biomarkers for already approved treatments, and, among these, we demonstrated that RPL5 loss increases sensitivity to Nutlin-3a, a selective MDM2 inhibitor. Overall, our study demonstrates the relevance of genome -wide functional genomics screenings to enhance variant interpretation and discover therapeutically actionable .CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted January 22, 2026. ; https://doi.org/10.64898/2026.01.20.700339doi: bioRxiv preprint biomarkers. Nevertheless, these functional assays are still limited in size and, for example, allowed to test only a small fraction of the total number of variants in our cohort. Efforts should be made to scale the generation of these datasets to a larger number and more diverse tumor models, including co -culture assays, tissue explants, and other patient avatar models that could allow to test a broader set of variants and phenotypes. In combination with multi -modal molecular profiling and robust computational models, these approaches will provide an invaluable reference to interpret tumor molecular alterations and translate them into clinically actionable strategies. Data availability The input dataset used to conduct all analyses in this study is available at https://doi.org/10.5281/zenodo.17339253. Code availability The code used to conduct all analyses in this study is available at https://github.com/CSOgroup/ButterﬂyVI. .CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted January 22, 2026. ; https://doi.org/10.64898/2026.01.20.700339doi: bioRxiv preprint

Methods

Mutation-specific analysis Datasets Cell line (CL) data were downloaded from the DepMap portal 15. Specifically, we used CRISPR-Cas9 gene dependency scores from CRISPRGeneEffect.csv (DepMap 24Q4), RNAi-based dependency scores from D2_combined_gene_dep_scores.csv (DEMETER2 Data v6), somatic mutation data from OmicsSomaticMutations.csv (DepMap 24Q4), and absolute gene -level copy number alterations (CNA) from OmicsAbsoluteCNGene.csv (DepMap 24Q4). Additional metadata, including model annotations and microsatellite instability (MSI) status, were obtained from Model.csv and OmicsSignatures.csv, respectively (DepMap 24Q4). For our analysis, we included only those cell lines that had both gene dependency scores and mutational profiles. This resulted in 1,178 cell lines for the CRISPR dataset and 646 for the RNAi dataset, with 537 cell lines shared between the two. However, CNA data were not available for all selected cell lines: CNA profiles were present for 948 CRISPR -associated and 507 RNAi -associated cell lines. The handling of missing CNA values is described in detail in the subsequent sections. Mutation data in human cancers, including the TCGA MAF file and corresponding sample annotations, were obtained from Mina et al.38 Sample annotations We used the OncotreeLineage column from the DepMap file Model.csv to defin e the tumor type of each cell line. Tumor lineages represented by fewer than five CLs in either the CRISPR or RNAi datasets were grouped under the category “other” to ensure statistical robustness. MSI status was determined using the MSIsensor2 score ( MSIScore column) from the DepMap file OmicsSignatures.csv, with a threshold of 20% to define MSI -high samples (MSIsensor2 score ≥ 20), in accordance with the recommendations provided by the developers39,40. MAF file manual curation The MAF file from DepMap includes variants annotated with multiple consequence types (e.g., missense_variant&splice_region_variant). To enable consistent mutation -specific analysis, we manually curated the data to assign a single variant type to each mutation. In cases of multiple annotations, we applied a prioritization scheme to systematically resolve conflicts, ensuring unambiguous classification for downstream analyses. Our curation process involved the following steps: 1. Any annotation containing the term “splice” was considered as splice_region_variant. 2. Specific compound annotations involving NMD_transcript_variant were simplified by retaining the primary protein-coding consequence: o missense_variant&NMD_transcript_variant → missense_variant o frameshift_variant&NMD_transcript_variant → frameshift_variant o stop_gained&NMD_transcript_variant → stop_gained o start_lost&NMD_transcript_variant → start_lost .CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted January 22, 2026. ; https://doi.org/10.64898/2026.01.20.700339doi: bioRxiv preprint 3. The compound annotation frameshift_variant&start_lost&start_retained_variant was simplified as frameshift variant 4. Any annotation containing the term “start_lost” was considered as start_lost. 5. Any annotation containing the term “start_gained” was considered as start_gained. 6. Any annotation containing the term “stop_lost” was considered as stop_lost. 7. The two remaining compound annotations: o protein_altering_variant&incomplete_terminal_codon_variant o coding_sequence_variant&5_prime_UTR_variant were simplified as protein_altering_variant. MAF file OncoKB annotation We annotated the MAF file using the OncoKB Annotator41 to assess the known oncogenic impact of each mutation. Specifically, we performed annotation using the union of the GenomicChange and ProteinChange options, prioritizing GenomicChange, to ensure comprehensive capture of all annotated mutations and to avoid missing variants due to differences in transcript usage or consequence types. In particular, in cases where the GenomicChange annotation was labeled as Unknown, but the corresponding ProteinChange annotation was classified as Oncogenic, Likely Oncogenic , or Resistance, we retained the ProteinChange annotation to maximize inclusion of known functionally relevant variants. Selection of mutations to study To define the set of mutations to analyse we collected all the protein coding region variants present in at least one CL in each dataset. These consist of missense, frameshift, stop gained, splice region, inframe insertion, inframe deletion, start lost, stop lost, and protein altering variants. Copy number alterations We binarized the copy number profiles of CLs as follows: a gene was considered amplified if the corresponding copy number exceeded 6 copies, and deleted if it was below 1 copy. These thresholds were selected to be consistent with those used in cBioPortal. Test at five levels of resolution For each selected mutation, we compared the gene dependency scores between CLs that were wild -type for the gene of interest (i.e., without mutations or copy number alterations) and those that carried the mutation. CLs with missing copy number alteration were excluded from the wild-type group. In contrast, inclusion in the mutated group was based solely on mutation status, regardless of copy number information. To evaluate the statistical association between mutational status and dependency score, we performed an ANOVA while controlling for tumor type. This analysis was conducted using the R implementati on of ANOVA (function ‘Anova’ from ‘car’ package). For each mutation, we also computed Cohen’s D (D) effect size, which represents the standardized mean difference in dependency scores between wild -type and mutated CLs. A positive D indicates a lower depen dency score in the mutated CLs compared to .CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted January 22, 2026. ; https://doi.org/10.64898/2026.01.20.700339doi: bioRxiv preprint the wild-type, while a negative D indicates a higher dependency score in the mutated CLs relative to the wild-type. We required a minimum of five mutated CLs to perform the test. To accommodate this, we defined five levels of resolution for the analysis. 1. L1: same mutation . This is the highest level of resolution and only CLs with the exact same amino acid substitution were included in the mutated group. 2. L2: similar mutation (for missense mutations only). If fewer than five CLs met the previous criterion, we expanded the analysis to include CLs with amino acid substitutions deemed similar based on the BLOSUM62 matrix (i.e., substitutions with a non-negative score). 3. L3: same position. If the number of qualifying CLs was still insufficient, or for all non-missense mutations not testable at the same mutation level, we broadened the analysis to include CLs harboring any variant of the same type occurring (for missense) or starting at the same amino acid position of the mutation of interest. 4. L4: adjacent position. When needed, we further expanded the analysis to include variants of the same type occurring or starting within ±3 amino acids of the position of interest. 5. L5: same type (for non-missense mutations only). Finally, we defined a fifth level of resolution, which included all CLs with the same type of variant anywhere in the gene. Representative examples of the first three resolution levels are shown in the table below: gene type same mutation similar mutation same position PIK3CA missense_variant E545K E545K, E545Q, E545R Any amino acid substitution at position 545 ARID1A frameshift_variant F2141SfsTer59 - Any frameshift mutation starting at position 2141 (e.g., F2141LfsTer9) ARID1A stop_gained R1989Ter - R1989Ter (for stop_gained the same position is equivalent to same mutation) Each mutation was tested at the highest possible level of resolution based on data availability. Annotation We annotated each testable mutation based on its ANOVA p -value and Cohen’s D. In particular, using a cutoff of 0.05 for the p-value and 0.5 for the absolute value of the effect size, we identified five possible scenarios reflecting different effects of the mutation on the cells’ viability. In particular, we annotate each mutation as one of the following: 1. Oncogenic dependency: if p0.5 and wild -type CLs exhibit a mean dependency score closer to 0 than that of mutated CLs. This case reflects .CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted January 22, 2026. ; https://doi.org/10.64898/2026.01.20.700339doi: bioRxiv preprint significant decreased viability of the mutated CLs following gene knock-out/down (whereas wild-type CLs are less affected). 2. Tumor suppressor dependency: if p0.5 and mutated CLs exhibit a mean dependency score closer to 0 than that of wild-type CLs. This case reflects significant increased viability of the wild-type CLs following gene knock-out/down (whereas mutated CLs are less affected). Note that the direction of the effect size remains the same as in the previous case; what differs is the dependency score's distribution relative to 0, reflecting increased or decreased cell viability following gene knock-out or knock-down 3. Mutation tolerance : if p<0.05 and D< -0.5 and wild -type CLs exhibit a mean dependency score closer to 0 than that of mutated CLs. This case reflects significant increased viability of the mutated CLs following gene knock-out/down (whereas wild-type CLs are less affected). 4. Bypass of essentiality : if p<0.05 and D< -0.5 and mutated CLs exhibit a mean dependency score closer to 0 than that of wild -type CLs. This case reflects significant decreased viability of the wild -type CLs following gene knock - out/down (whereas mutated CLs are less affected). 5. Neutral: otherwise. In this case no significant difference in the dependency score is observed between the wild-type and mutated CLs. Refinement step To annotate mutations that cannot be tested at the highest resolution L1 (i.e., the same mutation level), we introduced a refinement procedure that enhances our confidence at lower resolution levels. If a mutation is deemed significant, the refinement process is initiated. This allows us to distinguish cases where CLs with the same mutation of interest align with the behavior of the broader group of altered CLs, versus cases where the signal is solely driven by CLs with different mutations, while CLs with the same mutation behave similarly to wild-type CLs. Specifically, the refinement process involves two distinct steps. 1. Determine whether the mean dependency score of CLs harboring the same mutation (𝑑𝑒𝑝!"#$ #&' ) is more similar to that of other altered CLs (𝑑𝑒𝑝()* ) than to wild-type CLs (𝑑𝑒𝑝+* ). This is assessed by evaluating whether: %𝑑𝑒𝑝!"#$ #&' − 𝑑𝑒𝑝()* % < %𝑑𝑒𝑝!"#$ #&' − 𝑑𝑒𝑝+*% If this condition is met, the annotation is retained; otherwise, the process proceeds to the second step. 2. Determine whether the mean dependency score of CLs with the same mutation lies at the extreme ends of the wild -type CLs dependency score distribution. Specifically, let 𝑄,.. +* and 𝑄,./ +* denote the 0.2 and 0.8 quantile, respectively, of the wild type CLs dependency score distribution. We assessed whether: 𝑑𝑒𝑝!"#$ #&' ≤ 𝑄,.. +* if 𝐷 > 0 .CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted January 22, 2026. ; https://doi.org/10.64898/2026.01.20.700339doi: bioRxiv preprint 𝑑𝑒𝑝!"#$ #&' ≥ 𝑄,./ +* if 𝐷 < 0 If this condition is satisfied, the annotation is retained; otherwise, the annotation is changed to neutral. This approach allows us to retain mutations for which the mean dependency score of CLs with the same mutation is closer to that of wild - type CLs yet still exhibits a significant effect by lying at the extreme of the wild - type dependency score distribution. Trend of neutral mutations We focused on all mutations that were significant in either the CRISPR dataset, the RNAi, or both. To better assess the concordance of the annotations between the CRISPR and RNAi datasets, for mutations significant in only one dataset, we assessed their trend in the other dataset using a Cohen’s D threshold of 0.25 and applied the refinement steps. For ex ample, a neutral mutation with a positive Cohen’s D > 0.25 that passed the refinement steps was classified as showing an oncogene or tumor suppressor dependency trend, depending on the relative position of the dependency score distribution compared to zero. Comparison with DAMs from Savino et al. To enable a direct comparison with the results from Savino et al. , we first aligned the ProteinChange annotations, as the two analyses were based on differently processed MAF files. For instance, we standardized the representation of stop codons by replacing asterisks (*) with ‘Ter’. Since Savino et al. performed their analysis in a tumor -type- specific manner, a given mutation could appear as significant in multiple tumor types. To ensure consistency in interpretation, we compared their set of unique annota ted mutations with our oncogenic dependency results, as their study focused specifically on dependencies with this kind of effect. Selection of the strongest significant annotations We further filtered the significant mutations to retain only the strongest candidates for subsequent co-alteration analysis. Specifically, we retained mutations that met at least one of the following criteria: 1. Concordant annotation between the CRISPR and RNAi datasets (i.e., the mutation is significant with the same annotation in both datasets, or significant in one dataset and shows the same trend in the other.) 2. Strong effect in at least one dataset, defined as having a p -value 1. Co-alteration analysis Selection of genes to study We selected the genes that have at least 5 dependent (dependency score -0.2) in both the CRISPR and RNAi datasets. This resulted in 2449 genes. .CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted January 22, 2026. ; https://doi.org/10.64898/2026.01.20.700339doi: bioRxiv preprint Definition of the genomic alteration matrices We constructed a binary Genomic Alteration Matrix (GAM) capturing the MSI status, tumor type, and the mutational and copy number status of each CL. A gene was considered mutated if it harbored a variant classified as Oncogenic, Likely Oncogenic, or Resistance in OncoKB, or if it had a strong, significant annotation as defined in the previous paragraph. For mutational status, we included all genes with annotations from either OncoKB or TumorScreen. In contrast, for the copy numbers, to limit dimensionality and prevent overfitting, we restricted the set of considered genes. Specifically, we used the union of genes frequently amplified or deleted in cancer (as reported in Sanchez-Vega et al.42) and genes with mutations annotated in OncoKB. Additionally, for each individual elastic net analysis, we included the respective knocked -out/down gene, if not already included. Amplifications and deletions were encoded as separate binary variables for each gene. We assumed diploid status for CLs with missing copy number data to retain them in the analysis and preserve valuable mutational information. To ensure sufficient statistical power, we included only features altered in at least five CLs. The resulting GAM for the CRISPR dataset includes 1,178 CLs and 669 variables: 1 MSI status, 26 tumor types, 334 mutations, 269 amplifications, and 39 deletions. The GAM for the RNAi dataset includes 646 CLs and 489 variables: 1 MSI status, 21 tumor types, 258 mutations, 186 amplifications, and 23 deletions. We also defined two alternative GAMs by modifying the criteria for the mutational columns. In the first alternative GAM, a gene was considered mutated based solely on OncoKB annotations (i.e., only if the variant was classified as Oncogenic, Likely Oncogenic, or Resistance) excluding annotations from our method. In the second alternative GAM, a gene was considered mutated if it harbored any somatic mutation, regardless of its annotation. In both cases, we restricted the analysis to the same set of genes inc luded in the original GAM (based on the union of OncoKB and TumorScreen annotations), ensuring that the number of features remained unchanged. The MSI status, tumor type, and copy number alteration columns were kept consistent across all GAMs. Elastic net analysis For each selected gene, each dataset and each GAM, we performed an Elastic Net (EN) analysis, regressing the continuous gene’s dependency score on the binary variables contained in the G AM. The EN was implemented using the ‘glmnet’ R package, with the mixing parameter set to α = 0.5. The regularization parameter λ was selected via 10 -fold cross-validation using the ‘cv.glmnet()’ function, choosing the value that minimized the mean cross-validation error. To ensure robustness, we repeated the EN analysis 10 times. For each variable, we then computed the mean regression coefficient across the 10 runs, as well as the number of times the variable was selected (i.e., assigned a non-zero coefficient). Filtering of results To identify the strongest predictors of gene dependency, we selected, for each gene, those predictors with an absolute mean coefficient greater than 0.05 and that were selected in at least 5 out of 10 runs in both datasets and had a concordant sign in CRISPR and RNAi. For each predictor we computed a final score indicating its strength as: 𝑠𝑐𝑜𝑟𝑒 = (𝑐𝑜𝑒𝑓𝑓012341 ∗ 𝑛012341 + 𝑐𝑜𝑒𝑓𝑓15(6 ∗ 𝑛15(6 )/2 .CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted January 22, 2026. ; https://doi.org/10.64898/2026.01.20.700339doi: bioRxiv preprint where 𝑐𝑜𝑒𝑓𝑓 denotes the mean coefficient across the 10 runs , and 𝑛 represents the number of times the variable was selected (i.e., assigned a non-zero value). Experimental validation in bladder cancer cell line models The human bladder cancer cell lines KU -19-19 and RT-4 (ACC 395 and ACC 412; DSMZ) were cultured in RPMI 1640, GlutaMAX (Gibco, 61870010) supplemented with 10% FBS (Thermo Fisher Scientific, A5256701) and 1% Penicillin -Streptomycin (Thermo Fisher Scientific, 15070063) at 37 °C in a humidified atmosphere with 5% CO₂. FAM -labeled siCTRL No.1 (AM4620), siGAPDH (AM4650), and siRPL5 (ID s56731) (Invitrogen) were reverse-transfected at 25 nM using Lipofectamine RNAiMAX (Invitrogen) in OptiMEM Reduced Serum Medium (Gibco) according to the manufacturer’s instructions. To verify knockdown efficiency, total RNA was extracted using the RNeasy Mini Kit (Qiagen), and reverse-transcribed with the Superscript III First -Strand Synthesis System (Thermo Fisher Scientific). Quantitative PCR (qPCR) was carried out using SYBR Green Real-Time PCR Master Mix (Thermo Fisher Scientific) and gene -specific primers for GAPDH (Forward: 5′–CTCTGCTCCTCCTGTTCGAC–3′; Reverse: 5′–ATGGTGTCTGAGCGATGTGG– 3′), RPL5 (Forward: 5′ –CCAAATACAGGATGATAGTTCGTG–3′; Reverse: 5′ – TTGGCAGTTCGTGTGCATACGC–3′), and the housekeeping gene HPRT (Forward: 5′ – GTTATGGCGACCCGCAG–3′; Reverse: 5′ –ACCCCTTCCAAATCCTCAGC–3′) on a StepOnePlus Real-Time PCR System (Applied Biosystems). Prior to the main experiment, a Nutlin-3a (Selleckchem, S8059) dose –response assay was performed to determine the sensitivity of KU -19-19 and RT -4 cells, using concentrations based on IC₅₀ values from the Genomics of Drug Sensitivity in Cancer (GDSC) database for KU -19-19 (2.51 µg/mL ≈ 1 2.18 µM) and RT -4 (2.50 µg/mL ≈ 12.18 µM), including doses below (0.15, 0.3, 0.6, 1.2, 3, 6, and 9 µM) and above (25 and 50 µM) the IC₅₀. For experiments involving RPL5 knockdown, cells were treated 24 h post - transfection with selected Nutlin -3a concentrat ions or corresponding DMSO volumes (vehicle control) for 48 h in complete growth medium. At the end of treatment, cell viability was measured using the Cell Counting Kit-8 (WST-8/CCK8 - Abcam, ab228554). Cells were incubated with the ready-to-use reagent for 1 h at 37 °C, and absorbance was measured at 460 nm using a SpectraMax ID3 microplate reader. Absorbance values from blank wells containing only medium were subtracted from sample readings, and cell viability was normalized to untreated controls. Experiments were performed in triplicate for each condition. Data processing and visualization were performed in RStudio. .CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted January 22, 2026. ; https://doi.org/10.64898/2026.01.20.700339doi: bioRxiv preprint

References

1. OncoKB: A Precision Oncology Knowledge Base | JCO Precision Oncology. https://ascopubs.org/doi/full/10.1200/PO.17.00011. 2. Grijith, M. et al. CIViC is a community knowledgebase for expert crowdsourcing the clinical interpretation of variants in cancer. Nat Genet 49, 170–174 (2017). 3. Tate, J. G. et al. COSMIC: the Catalogue Of Somatic Mutations In Cancer. Nucleic Acids Res 47, D941–D947 (2019). 4. Landrum, M. J. et al. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res 46, D1062–D1067 (2018). 5. Chang, M. T. et al. Identifying recurrent mutations in cancer reveals widespread lineage diversity and mutational speciﬁcity. Nat Biotechnol 34, 155–163 (2016). 6. Lawrence, M. S. et al. Discovery and saturation analysis of cancer genes across 21 tumour types. Nature 505, 495–501 (2014). 7. Gao, J. et al. 3D clusters of somatic mutations in cancer reveal numerous rare mutations as functional targets. Genome Medicine 9, 4 (2017). 8. Ng, P . C. & Henikoj, S. Predicting Deleterious Amino Acid Substitutions. Genome Res 11, 863–874 (2001). 9. Ramensky, V ., Bork, P . & Sunyaev, S. Human non-synonymous SNPs: server and survey. Nucleic Acids Res 30, 3894–3900 (2002). 10. Reva, B., Antipin, Y . & Sander, C. Predicting the functional impact of protein mutations: application to cancer genomics. Nucleic Acids Res 39, e118 (2011). 11. Cheng, J. et al. Accurate proteome-wide missense variant eject prediction with AlphaMissense. Science 381, eadg7492 (2023). .CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted January 22, 2026. ; https://doi.org/10.64898/2026.01.20.700339doi: bioRxiv preprint 12. Shalem, O. et al. Genome-Scale CRISPR-Cas9 Knockout Screening in Human Cells. Science 343, 84–87 (2014). 13. Fallon, T. K. & Knouse, K. A. A roadmap toward genome-wide CRISPR screening throughout the organism. Cell Genomics 5, 100777 (2025). 14. Tsherniak, A. et al. Deﬁning a Cancer Dependency Map. Cell 170, 564-576.e16 (2017). 15. DepMap: The Cancer Dependency Map Project at Broad Institute. https://depmap.org/portal/. 16. Arafeh, R., Shibue, T., Dempster, J. M., Hahn, W. C. & Vazquez, F . The present and future of the Cancer Dependency Map. Nat Rev Cancer 25, 59–73 (2025). 17. Pagliarini, R., Shao, W. & Sellers, W. R. Oncogene addiction: pathways of therapeutic response, resistance, and road maps toward a cure. EMBO reports 16, 280–296 (2015). 18. Min, H.-Y . & Lee, H.-Y . Molecular targeted therapy for anticancer treatment. Exp Mol Med 54, 1670–1694 (2022). 19. Nguyen, L. V . & Caldas, C. Functional genomics approaches to improve pre- clinical drug screening and biomarker discovery. EMBO Molecular Medicine 13, e13189 (2021). 20. Behan, F . M. et al. Prioritization of cancer therapeutic targets using CRISPR–Cas9 screens. Nature 568, 511–516 (2019). 21. DepMap 24Q4 Public. Figshare+ https://doi.org/10.25452/ﬁgshare.plus.27993248.v1 (2024). .CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted January 22, 2026. ; https://doi.org/10.64898/2026.01.20.700339doi: bioRxiv preprint 22. McFarland, J. M. et al. Improved estimation of cancer dependencies from large- scale RNAi screens using model-based normalization and data integration. Nat Commun 9, 4610 (2018). 23. Henikoj, S. & Henikoj, J. G. Amino acid substitution matrices from protein blocks. Proceedings of the National Academy of Sciences 89, 10915–10919 (1992). 24. Dsouza, N. R. et al. Structural and Dynamic Analyses of Pathogenic Variants in PIK3R1 Reveal a Shared Mechanism Associated among Cancer, Undergrowth, and Overgrowth Syndromes. Life 14, 297 (2024). 25. Chan, E. M. et al. WRN helicase is a synthetic lethal target in microsatellite unstable cancers. Nature 568, 551–556 (2019). 26. Nichols, C. A. et al. Loss of heterozygosity of essential genes represents a widespread class of potential cancer vulnerabilities. Nat Commun 11, 2517 (2020). 27. Young, A. P. et al. VHL loss actuates a HIF-independent senescence programme mediated by Rb and p400. Nat Cell Biol 10, 361–369 (2008). 28. Ge, J. et al. Mechanisms of resistance to VHL loss-induced genetic and pharmacological vulnerabilities. 2025.06.14.659649 Preprint at https://doi.org/10.1101/2025.06.14.659649 (2025). 29. Helming, K. C. et al. ARID1B is a speciﬁc vulnerability in ARID1A-mutant cancers. Nat Med 20, 251–254 (2014). 30. Centore, R. C., Sandoval, G. J., Soares, L. M. M., Kadoch, C. & Chan, H. M. Mammalian SWI/SNF Chromatin Remodeling Complexes: Emerging Mechanisms and Therapeutic Strategies. Trends in Genetics 36, 936–950 (2020). 31. Mittal, P . & Roberts, C. W. M. The SWI/SNF complex in cancer — biology, biomarkers and therapy. Nat Rev Clin Oncol 17, 435–448 (2020). .CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted January 22, 2026. ; https://doi.org/10.64898/2026.01.20.700339doi: bioRxiv preprint 32. Loukas, I. et al. Selective advantage of epigenetically disrupted cancer cells via phenotypic inertia. Cancer Cell 41, 70-87.e14 (2023). 33. Ferretti, S. et al. Discovery of WRN inhibitor HRO761 with synthetic lethality in MSI cancers. Nature 629, 443–449 (2024). 34. Li, M. et al. Combined Inhibition of Notch Signaling and Bcl-2/Bcl-xL Results in Synergistic Antimyeloma Eject. Mol Cancer Ther 9, 3200–3209 (2010). 35. CDK4 and CDK6 kinases: From basic science to cancer therapy | Science. https://www.science.org/doi/10.1126/science.abc1495. 36. Liu, Y ., Deisenroth, C. & Zhang, Y . RP–MDM2–p53 Pathway: Linking Ribosomal Biogenesis and Tumor Surveillance. Trends in Cancer 2, 191–204 (2016). 37. Developmental chromatin programs determine oncogenic competence in melanoma | Science. https://www.science.org/doi/10.1126/science.abc1048. 38. Mina, M., Iyer, A., Tavernari, D., Raynaud, F . & Ciriello, G. Discovering functional evolutionary dependencies in human cancers. Nat Genet 52, 1198–1207 (2020). 39. Niu, B. et al. MSIsensor: microsatellite instability detection using paired tumor- normal sequence data. Bioinformatics 30, 1015–1016 (2014). 40. GitHub - niu-lab/msisensor2: Microsatellite instability (MSI) detection for tumor only data. https://github.com/niu-lab/msisensor2. 41. oncokb/oncokb-annotator. OncoKBTM (2025). 42. Sanchez-Vega, F. et al. Oncogenic Signaling Pathways in The Cancer Genome Atlas. Cell 173, 321-337.e10 (2018). .CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted January 22, 2026. ; https://doi.org/10.64898/2026.01.20.700339doi: bioRxiv preprint Suppl. Fig. 1 - Concordance between single dataset mutation annotation and OncoKB. Comparison of mutation annotations between the CRISPR (A) or RNAi (B) datasets and OncoKB annotations. The plots include all mutations that are significant in the respective dataset. RNAi OncoKB RNAi annotation vs OncoKB CRISPR OncoKB CRISPR annotation vs OncoKB BYE (35) MTO (41) TSD (215) OGD (317) Unknown (266) Inconclusive (2) Likely Oncogenic (289) Oncogenic (51) BYE (236) MTO (232) TSD (477) OGD (768) Unknown (981) Inconclusive (3) Likely Neutral (2) Likely Oncogenic (652) Oncogenic (75) Supplementary Figure 1 A B .CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted January 22, 2026. ; https://doi.org/10.64898/2026.01.20.700339doi: bioRxiv preprint Suppl. Fig. 2 - Concordance between OGDs and DAMs from Savino et al. A, Number of unique and shared variants tested in the two studies. B, Number of significant variants in the two studies. Only OGD variants from our study are considered for the comparison. Variants tested in both studies are shown in orange. C, Shared significant variants among the commonly testable. D, Number of variants per gene among the shared significant variants. Colors indicate the OncoKB annotation of each variant. Supplementary Figure 2 AB C DShared significant variants among the commonly testable DAMs OGDs (CRISPR) Number of significant variants DAMs vs OGDs (CRISPR) Number of tested variants OncoKB Likely Oncogenic Oncogenic Unknown Number of variants per gene (common hits) OGDsDAMs commonly testable FALSE TRUE 3137732 14 12 9 7 5 4 3 3 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 KRASNRASPIK3CANFE2L2BRAFHRASCTNNB1PIK3R1EPRS1PTCD1ARRDC4 CYBADIDO1DNAH5KCND2KNTC1MAP2K1MTERF4 NEK3NSRP1RAD50RHOARIF1 RRAS2SCAF4WRN 22234 403269 8645 Savino et al.Sesia et al. shared 2175 109 378 390 .CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted January 22, 2026. ; https://doi.org/10.64898/2026.01.20.700339doi: bioRxiv preprint Suppl. Fig. 3 - RNAi dataset butterﬂy plot. A, Butterfly plot for RNAi dataset: systematic comparison of gene dependency scores between cell lines carrying a specific mutation and those that are wild type for the corresponding gene. Each point represents a mutation, with its position determined by the Cohen’s D effect size and the associated -log10(p-value) computed at the best testable level of resolution. The p-value is directional: positive for OGD (red) and MTO (dark blue), where wild-type cell lines have mean dependency scores closer to zero than m utant cell lines; and negative for TSD (orange) and BYE (light blue), where mutant cell lines have mean dependency scores closer to zero than wild -type cell lines. B, Zoomed -in view of the butterfly plot in panel A (corresponding to the grey-shaded area). BRAF L597V BRAF V600D/E BRAF V600K KRAS G12C KRAS G12A KRAS G12D KRAS G12R/F KRAS G12S KRAS G13D KRAS G12VKRAS V14L/I NRAS Q61H NRAS Q61K NRAS Q61L KRAS Q61L PIK3CA Q546K PIK3CA Q545G/K CTNNB1 D32V CTNNB1 S33Y PIK3R1 inframe ins NRAS G13V HRAS Q61L HRAS G13D NRAS Q61R 0 20 40 60 −2.5 0.0 2.5 5.0 Cohen's D Directed p-value (-log10) TP53 88 variants ns fs ns EZH2 Y646F EZH2 Y646N −2.5 0.0 2.5 5.0 −2 −10 12 Cohen's D BMPR2 N583Tfs*44 GTF3C2 fsCOL4A1 fs GOLGA4 fs BMPR2 fs FASN fs RHOA G17A/E/V APC T1556fs PIK3CA Q542K inframe del KMT2Afs SETD2 fs SIGLEC10 P311R UNC79 ns HMCN1 sp MMP28 P453S CSF2RB Y593D ARHGEF26 fs RP1 fs SETD2 ns TP53 87 variants Directed p-value (-log10) RB1 RB1 PTEN PIK3R1 Supplementary Figure 3 A B .CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted January 22, 2026. ; https://doi.org/10.64898/2026.01.20.700339doi: bioRxiv preprint Suppl. Fig. 4 - PIK3R1 inframe mutations as OGD. A, Left: Incidence of PIK3R1 in -frame mutations in the TCGA Pan -Cancer Atlas (top) and DepMap cell lines (bottom). Right: Tumor type distribution of samples harboring PIK3R1 in -frame mutations. B, PIK3R1 dependency scores in cell lines with either wild -type or mutated PIK3R1 for the DepMap data version 24Q4, i.e. the version used for the main analysis. C, PIK3R1 CRISPR dependency scores in cell lines with either wild -type or mutated PIK3R1 for the DepMap 21Q2 data version (left) and uncorrected scores from DepMap 24Q4 (right). Supplementary Figure 4 A 6.09e-12 7.45e-12 4.04e-05 0.0516 0.6812 CRISPR wild_type stop_gainedframeshift splice inframe_deletioninframe_insertion −1 0 1 2 3 PIK3R1 dependency score −1 0 1 2 3 PIK3R1 dependency score RNAi wild_type stop_gainedframeshift splice inframe_deletioninframe_insertion 0.10 0.0029 1.33e-07 non-testable 0.02 B 0.0327 1.74e-12 1.66e-11 1.56e-05 0.52 wild_type stop_gainedframeshift splice inframe_deletioninframe_insertion CRISPR - CERES 21Q2 0.0596 6.06e-12 1.38e-11 4.74e-05 0.51 wild_type stop_gainedframeshift splice inframe_deletioninframe_insertion CRISPR - uncorrected C T576del SH3-2 RhoGAP SH2 SH2 0 724aa100 200 300 400 500 600 0 11 # patients PIK3R1 in-frame mutations (TCGA Pan-cancer Atlas) K567del E451del PIK3R1 in-frame mutations (DepMap cell lines) Breast (17) Cervical (3) Colorectal (5) Endometrial Cancer (63) Esophagogastric (4) Glioblastoma (9) Glioma (9) Head & Neck (2) Hepatobiliary (1) NSCLC (1) Breast (1) CNS (6) Endometrium (3) Ovary (3) Soft tissue (1) .CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted January 22, 2026. ; https://doi.org/10.64898/2026.01.20.700339doi: bioRxiv preprint Suppl. Fig. 5 - WRN frameshift mutations as OGD. WRN dependency scores for cell lines that are either wild type for WRN or carry frameshift mutations. Supplementary Figure 5 CRISPR RNAi WT WT MUTMUT −3 −2 −1 0 WRN dependency score WRN S1128fs p=4.94e-11 non testable .CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted January 22, 2026. ; https://doi.org/10.64898/2026.01.20.700339doi: bioRxiv preprint Suppl. Fig. 6 - BYE mutations in VHL and SWI/SNF complex genes. A-B, VHL dependency scores for all cell lines (A) or for kidney cell lines only (B) that are either wild type for VHL or carry frameshift mutations. C, Dependency scores for four SWI/SNF complex genes in cell lines that are either wild type for the respective gene or carry a specific type of alteration (frameshift, nonsense, splice-site, or deletion). The “swi_snf_wt” Supplementary Figure 6 A VHL frameshift variants - all CLs VHL frameshift variants - Kidney CLs WT MUT WT MUT CRISPR RNAi −1.5 −1.0 −0.5 0.0 VHL dependency score CRISPR RNAi −2.0 −1.5 −1.0 −0.5 0.0 0.5 VHL dependency score WT MUT WT MUT ARID1A SMARCA4 ARID2 CRISPR RNAi frameshiftnonsense splice del other_altswi_snf_altswi_snf_wtframeshiftnonsense splice del other_altswi_snf_altswi_snf_wt −2 −1 0 1 ARID1A dependency score CRISPR RNAi frameshiftnonsense splice del other_altswi_snf_altswi_snf_wtframeshiftnonsense splice del other_altswi_snf_altswi_snf_wt −2 −1 0 1 SMARCA4 dependency score CRISPR RNAi frameshiftnonsense splice del other_altswi_snf_altswi_snf_wtframeshiftnonsense splice del other_altswi_snf_altswi_snf_wt −2 −1 0 1 ARID2 dependency score CRISPR RNAi frameshiftnonsense splice other_altswi_snf_altswi_snf_wtframeshiftnonsense splice other_altswi_snf_altswi_snf_wt −2 −1 0 1 ATRX dependency score ATRXB C Cohen's D Cohen's D ARID1A F2141fs ARID1A fs C2 sp NBEA sp SETD2 ns SETD2 fs KMT2A fs ABCA1 fsTJP2 fs SIGLEC10 P311R TYRO3 K235fs TYRO3 V233Hfs*11 UNC79 ns −3 −2 −1 0 −2.5 −2.0 −1.5 −1.0 −0.5 0.0 Directed p-value (-log10) DDX3X H472L/R/Y VHL fs BAP1 fs ATP11B fs ATP11B ns PRR14 P279Qfs*2 ARID1A fs ARID1A F2141fs LTN1 fs ARID1A ns UBR2 sp ARID2 ns ATRX ns SMARCA4 sp STAG2 sp BAP1 ns R475C SETD2 ns SETD2 fs −4 −3 −2 −1 0 −2.5 −2.0 −1.5 −1.0 −0.5 0.0 Directed p-value (-log10) NBEA sp CRISPR RNAi .CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted January 22, 2026. ; https://doi.org/10.64898/2026.01.20.700339doi: bioRxiv preprint group includes cell lines wild type for all SWI/SNF genes. The “swi_snf_alt” group includes cell lines wild type for the gene considered but carrying alterations in other SWI/SNF complex genes. The “other_alt” group includes cell lines carrying other types of alterations (different from the one considered) in the gene of interest. D, Zoomed-in view of butterfly plots. BYE mutations are shown in light blue, while neutral mutations with a BYE -like trend (as defined in Methods) are shown in grey. .CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted January 22, 2026. ; https://doi.org/10.64898/2026.01.20.700339doi: bioRxiv preprint Supplementary Figure 7 STK11_DEL KEAP1 ERF MITF_AMP CCND3_AMP CDH1 ETV6_AMP VHL TAF2_AMP SMAD3 APC RB1 CDKN2A_DEL IDH1 PIK3CA_AMP KRAS HLA−B TP53 TP53 TERT PIK3R1 SESN2 INPP4B MAP2K1 ERRFI1CDH1 PBRM1 KEAP1 NUP205 NF2_DEL VHLNOTCH3_AMP CCNE1_AMP EZH2 SLFN11FGFR2_AMP EZH2 PMLERBB2_AMP ZFP36L1 DDX3X NAA25 CDH1 TP53PIK3R2_AMP EZH2 VHL KRAS MAP2K1 BRAF KRASBRAF FLT3_AMP RHOA SMARCA4_DEL KEAP1 CDK4 RPL5_AMPPIK3R2_AMP TP53HLA−B NF1CCND1_AMP BAP1 VHL ESCO2 TERTRAC1 NRAS ASXL1COL7A1 NRAS BRAF RB1 GRIN2A TP53HLA−B APC BAP1 NF2ARID2_AMP CDK12_AMP PPP2R1A DDR2_AMP RB1VHL RB1 RPL5 EGFRFBXW7 TP53 TP53_DEL PTEN PTEN_DEL BRAF PTEN SMARCA4 SMARCA4_DEL NF2_DEL CREBBP_DEL NRAS BRAF KEAP1 IDH1VHL HNF1B_AMP VHL TERT BMPR2 RAD21 CTNNB1 APC ARID2_AMP MDM4_AMP FLT3_AMP VHL GTF3C2PIK3CA CDK4 RB1 RB1EZH2 TGFBR2_DEL PTEN_DEL CDK4 RB1 SKA1 SLC33A1 SLC7A1 SOX10 SOX9 SPDEF TAF1 TAF10 TAF2 TBX2 TCERG1 TFDP1 TLN1 TP53BP2 TP63 TRPM7 TSC2 UBE2D3 UBE2N UROD WNK1 YWHAE ZFP36L1 ACLY ADSL AHCYL1 AIFM1 AP2M1 BCL2L1 CCNE2 CDK2 CFLAR DNM1L EBF1 EIF4G1 ERBB2 FOXA1 HUWE1 IDH3A IRF4 ITGAV KRAS MAP2K1 METAP2 MYB MYBL2 NFE2L2 PEA15 PNPT1 PPM1D PPP2R1A PRKCI PTK2 RAC1 RAF1 RPL22L1 SCAP SHOC2 SKP2 SNRPB2 UBC USP7 YAP1 ZNF217 CAB39 COPG1 E2F3 GATA3 MDM2 MED1 PIK3CB PTPN11 SMARCA2 SOS1 TEAD1 TXN HNF1B IGF1R MYC WRN ACO2 CTNNB1 FERMT2 PIK3CA CCND1 CDK6 MED12 CDK4 −10 −50 5 Knock-out gene 048 1 2 n. predictors ZFP36L2_AMP TP53 DDX3X PTPN13_DEL NSD2 ARID1A MYCN_AMP PAX5 BRAF NSD2 EZH2 CCNE1_AMP KDR_AMP VHL CREBBP TNFAIP3_AMP RB1 ROBO1 CDH1 EP300_DEL NF2_DEL TP53 MAP2K1 EGFR CDK4 NOTCH3_AMP ERBB2_AMP FASN FGFR2_AMP NOTCH3_AMP RHOA NSD2 CHEK2_DEL CHD2 WT1_AMP NRAS TET2 HRAS EXOSC9 VHL IDH1 TGFBR2_DEL ARID5B_AMP BRAF TP53 DDX3X FGFR2_AMP CHEK2_DEL FOXP1_AMP NSD2 TGFBR2_DEL VHL NF2_DEL TP53 ERF_AMP NIPBL_AMP GRIN2A TP53 NRAS TP53 RAC1 EZH2 VHL RPL5 CDKN2C_DEL BMPR2 SMAD3 FGFR2_AMP BRAF SLIT3MTAP_DEL ARID5B_AMP RBM10 DDX3X NSD2 HLA−C TP53 TP53 LATS2 ADAR ADRM1 AMD1 AP2S1 ARHGAP45 ARID1B ATP1A1 BCLAF1 BIRC6 BRAF CBFB CCND3 CCNE1 CDC42 CEP57 CFL1 CHD4 CKS1B CLPB COX4I1 CREBBP CRK CSNK1A1 DDX39B DDX3X EGFR EGLN1 EIF4G2 ELL ERBB3 EZH2 FASN FERMT1 FGFR2 FLI1 FLII FLT3 GAB1 GART GNG5 GRB2 GTF2F1 HCCS HRAS HSPA8 ITGB1 ITGB5 JUNB JUP MAPK1 MDM4 MED13 MED19 MED24 MED9 MIOS MPHOSPH8 MRPS18C MYH9 NCAPD3 NDUFB5 NIPBL NISCH NPM1 NRAS NUP54 PABPC1 PAX5 PAX8 PGD PIM2 PLK4 PPP1R12A PPP1R15B PPP2R2A PPP6C PRDM1 PRKAR1A PRMT5 PSMB5 RAB10 RBM10 RRAGC RUNX1 SAMD4B SCD SERBP1 SGO1 SHC1 −10 −50 5 048 1 2 n. predictors CDKN2C_DEL JAK2_AMP BRCA2_AMP TNFAIP3_AMP ARID2_AMP SPOP EZH2 CIITA_AMP MRE11 EXOSC9 IDH1 NFE2L2 KRAS_AMP E2F1_AMP RB1_DEL RB1_DEL RB1 Weighted Mean ElasticNet ScoreWeighted Mean ElasticNet Score BAP1 .CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted January 22, 2026. ; https://doi.org/10.64898/2026.01.20.700339doi: bioRxiv preprint Suppl. Fig. 7 - Common biomarkers among CRISPR and RNAi. Summary plot of ElasticNet biomarkers. Each point shows a biomarker shared between CRISPR and RNAi datasets that was selected at least 5 times out of 10 ElasticNet runs, had a mean coefficient greater in absolute value tha n .05 and exhibit the same effect direction in both datasets. A negative weighted mean ElasticNet score indicates sensitivity, whereas a positive score indicates resistance. .CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted January 22, 2026. ; https://doi.org/10.64898/2026.01.20.700339doi: bioRxiv preprint Suppl. Fig. 8 - Association between biomarker alterations and druggable gene dependency. A-B, Dependency scores for BCL2L1 (A) and CDK4 (B) in cell lines stratiﬁed by biomarker alteration status: wild type (or with a neutral alteration) versus those harboring an ampliﬁcation (e.g., NOTCH3 in panel A) or a functional mutation (as deﬁned by OncoKB or ButterﬂyVI; e.g., SMAD3 in panel B). Supplementary Figure 8 A CRISPR RNAi −3 −2 −1 0 1 SMAD3 CDK4 dependency CRISPR RNAi WT AMP WT AMP −3 −2 −1 0 1 NOTCH3 BCL2L1 dependency B WT MUT WT MUT .CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted January 22, 2026. ; https://doi.org/10.64898/2026.01.20.700339doi: bioRxiv preprint Suppl. Fig. 9 - Dose–response curves and siRNA knockdown validation. A-B, Dose response curves to control DMSO ( A) or Nutlin -3a ( B) treatment on RT -4 and KU -19-19 models . Cellular viability is expressed as a percentage relative to the untreated control. Each data point represents a single experimental data point (replicate). The blue line shows the fit of the four -parameter log-logistic model. The light blue shaded band represents the 95% confidence interval for the model. The vertical red dashed line indicates the calculated IC50, and the pink shaded band represents the 95% confidence interval for the IC50 value. The specific IC50 value and its confidence interval are indicated on the graph. C, Relative expression of GAPDH in cells transfected with control siRNA (siCtrl), GAPDH siRNA (siGAPDH), or RPL5 siRNA (siRPL5). Data are presented as the mean ± standard error of the mean (SEM). Differences between groups were evaluated using an unpaired S tudent's t -test. The level of statistical significance is indicated by asterisks: ns, not significant (P > 0.05); ** P ≤ 0.01; * P ≤ 0.05. Supplementary Figure 9 A B C siCtrl siGAPDH siRPL5 * ns 0.0 0.5 1.0 1.5 GAPDH relative expression RT-4 KU-19-19siCtrl siGAPDH siRPL5 GAPDH relative expression ** ns 0.0 0.5 1.0 RT-4 60 70 80 90 100 0.3 1.0 3.0 10.0 30.0 Equivalent DMSO (µM) Viability (%) KU-19-19 25 50 75 100 0.3 1.0 3.0 10.0 30.0 Equivalent DMSO (µM) Viability (%) RT-4 IC50: 6.35 µM CI: [5.09 − 7.61] µM 40 60 80 100 0.3 1.0 3.0 10.0 30.0 Viability (%) Dose (µM, log scale) KU-19-19 IC50: 11.09 µM 30 60 90 0.3 1.0 3.0 10.0 30.0 Dose (µM, log scale) Viability (%) CI: [−10.25 − 32.42] .CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted January 22, 2026. ; https://doi.org/10.64898/2026.01.20.700339doi: bioRxiv preprint

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: oa-pdf ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00
unpaywall: last seen: 2026-05-23T02:00:01.238055+00:00

License: CC-BY-NC-ND-4.0