Exon inclusion signatures enable accurate estimation of splicing factor activity

doi:10.1101/2024.06.21.600051

Exon inclusion signatures enable accurate estimation of splicing factor activity

2024 · doi:10.1101/2024.06.21.600051

preprint OA: closed

📄 Open PDF Full text JSON View at publisher

Full text 6,627 characters · extracted from oa-doi-fallback · click to expand

ABSTRACT Splicing factors control exon inclusion in messenger RNAs, shaping transcriptome and proteome diversity. Their catalytic activity is regulated by multiple layers, making single-omic measurements on their own fall short in identifying which splicing factors underlie a phenotype. Here, we posit that splicing factor activity can be estimated from changes in exon inclusion. To test this hypothesis, we benchmarked methods for constructing splicing factor→exon networks and estimating splicing factor activity. We found that combining RNA-seq perturbation-based networks with VIPER (Virtual Inference of Protein Activity by Enriched Regulon analysis) accurately captures splicing factor activation as modulated by multiple regulatory layers. This approach integrates splicing factor regulation into a single score derived solely from exon inclusion signatures, allowing functional interpretation of heterogeneous conditions. As a proof of concept, we identify recurrent cancer splicing programs, revealing oncogenic- and tumor suppressor-like splicing factors missed by conventional methods. These programs correlate with patient survival and key cancer hallmarks: initiation, proliferation, and immune evasion. Altogether, we show splicing factor activity can be accurately estimated from exon inclusion changes, enabling comprehensive analyses of splicing regulation with minimal data requirements. Competing Interest Statement Dr. Califano is founder, equity holder, and consultant of DarwinHealth Inc., a company that has licensed some of the algorithms used in this manuscript from Columbia University. Columbia University is also an equity holder in DarwinHealth Inc. The rest of the authors declare no conflicts of interest. Footnotes second revision, with multi-omic experimental characterization of carcinogenesis and major revision of technical sections to broaden the scope of the paper DATA AVAILABILITY In this study The raw proteomics and phosphoproteomics data have been deposited in the PRIDE76 repository via ProteomeXchange with identifier PXD066623. Intermediate files generated from data analyses Intermediate files generated throughout this study are available in this Figshare repository: https://doi.org/10.6084/m9.figshare.27835518 Previously published datasets used List of splicing factors from Papasaikas et al.18 Supplementary Table 2 in the publication. List of splicing factors from Hegele et al.21 Supplementary Table 1 in the publication. List of splicing factors from Seiler et al.19 Supplementary Table 1 in the publication. List of splicing factors from Rogalska et al.8 Supplementary Table 1 in the publication. List of splicing factors from Head et al.20 Supplementary Table 1 in the publication. RNA sequencing data of splicing factor perturbations from ENCORE project We downloaded raw FASTQ files from the ENCORE website for all shRNA and CRISPR experiments targeting RNA binding proteins: https://www.encodeproject.org/encore-matrix/?type=Experiment&status=released&internal_tags=ENCORE. RNA sequencing data of splicing factor perturbations from ENA We downloaded raw FASTQ files from the ENA website listed in Supplementary Table 2 in this manuscript. POSTAR3 CLIP data to construct CLIP-based splicing factor networks Observational RNA sequencing data from human samples from diverse molecular contexts We downloaded raw FASTQ files for Cardoso-Moreira et al.60 through the ENA website. Perturbing RBM39 with Indisulam For the Nijhuis et al.23 study, we downloaded raw FASTQ files from the ENA website (PRJNA673205) and processed LFQ proteomics data from the PRIDE website (PXD022164). Metadata for proteomics experiments were kindly provided by the authors. For the Lu et al.24 study, we downloaded raw FASTQ files from the ENA website (PRJNA683080). Combinatorial knockdowns of splicing factors We downloaded raw FASTQ files for four different studies from the ENA website: PRJNA22324428, PRJNA49852927, PRJNA58774126, and PRJNA32156025. Protein-protein interaction network We downloaded STRING DB’s human protein interaction network from their webpage77. https://stringdb-static.org/download/protein.links.full.v11.5/9606.protein.links.full.v11.5.txt.gz List of genes in SF3b complex https://signor.uniroma2.it/relation_result.php?id=SIGNOR-C442 List of genes in U2snRNP complex https://signor.uniroma2.it/relation_result.php?id=SIGNOR-C479 Perturbing SF3b complex reaction step with splicing drugs We downloaded raw FASTQ files for six different studies from the ENA website: PRJNA37142130, PRJNA68579065, PRJNA66257266, PRJNA38010467, PRJNA35495768, and PRJNA29282769 Splicing event information We downloaded information and their predicted protein impact of splicing events from VastDB. https://vastdb.crg.eu/wiki/Downloads#Homo_sapiens_.28hg38.29 Molecular and clinical data from The Cancer Genome Atlas (TCGA) We used the GDC portal to download the clinical, survival, and raw RNA-seq data. SpliceosomeDB complexes We obtained annotations of splicing factors to splicing reaction complexes from the SpliceosomeDB website. http://spliceosomedb.ucsc.edu/ RBP domain families We obtained annotations of splicing factors to RBP domain families from the RBPDB website. http://rbpdb.ccbr.utoronto.ca/ Reactome pathways processed by MSigDB We obtained ReactomeDB pathways from the MSigDB website. https://www.gsea-msigdb.org/gsea/msigdb/human/genesets.jsp?collection=CP:REACTOME Immune evasion CRISPR screen from Dubrot et al.71 We obtained the immune evasion score for each gene from Supplementary Table 13 in the publication. Human to mouse gene name translations We obtained tables of human-to-mouse gene name translations from BioMart. Data from patients treated with ICB from Riaz et al.43 We obtained raw FASTQ files from the ENA website (PRJNA356761). We obtained patient clinical metadata from Supplementary Table 2 in the publication. Molecular data from the Cancer Cell Line Encyclopedia (CCLE) We downloaded raw RNA-seq .fastq files for the cancer cell lines in the CCLE from the ENA website (PRJNA523380). We downloaded cell line metadata (“sample_info.csv”) and processed mutation mapping across CCLE cell lines (“CCLE_mutations.csv”) from DepMap’s figshare repository (download link: https://ndownloader.figshare.com/articles/13681534/versions/1). RNA sequencing data of carcinogenesis from Danielsson et al.46 We downloaded raw FASTQ files from the ENA website (PRJNA193487). DEMETER2 gene dependencies from DepMap https://ndownloader.figshare.com/articles/6025238/versions/6 CCLE cell line metadata https://ndownloader.figshare.com/articles/13681534/versions/1

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: oa-doi-fallback ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2024) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00