Full text
6,627 characters
· extracted from
oa-doi-fallback
· click to expand
ABSTRACT
Splicing factors control exon inclusion in messenger RNAs, shaping transcriptome and proteome diversity. Their catalytic activity is regulated by multiple layers, making single-omic measurements on their own fall short in identifying which splicing factors underlie a phenotype. Here, we posit that splicing factor activity can be estimated from changes in exon inclusion. To test this hypothesis, we benchmarked methods for constructing splicing factor→exon networks and estimating splicing factor activity. We found that combining RNA-seq perturbation-based networks with VIPER (Virtual Inference of Protein Activity by Enriched Regulon analysis) accurately captures splicing factor activation as modulated by multiple regulatory layers. This approach integrates splicing factor regulation into a single score derived solely from exon inclusion signatures, allowing functional interpretation of heterogeneous conditions. As a proof of concept, we identify recurrent cancer splicing programs, revealing oncogenic- and tumor suppressor-like splicing factors missed by conventional methods. These programs correlate with patient survival and key cancer hallmarks: initiation, proliferation, and immune evasion. Altogether, we show splicing factor activity can be accurately estimated from exon inclusion changes, enabling comprehensive analyses of splicing regulation with minimal data requirements.
Competing Interest Statement
Dr. Califano is founder, equity holder, and consultant of DarwinHealth Inc., a company that has licensed some of the algorithms used in this manuscript from Columbia University. Columbia University is also an equity holder in DarwinHealth Inc. The rest of the authors declare no conflicts of interest.
Footnotes
second revision, with multi-omic experimental characterization of carcinogenesis and major revision of technical sections to broaden the scope of the paper
DATA AVAILABILITY
In this study
The raw proteomics and phosphoproteomics data have been deposited in the PRIDE76 repository via ProteomeXchange with identifier PXD066623.
Intermediate files generated from data analyses
Intermediate files generated throughout this study are available in this Figshare repository: https://doi.org/10.6084/m9.figshare.27835518
Previously published datasets used
List of splicing factors from Papasaikas et al.18
Supplementary Table 2 in the publication.
List of splicing factors from Hegele et al.21
Supplementary Table 1 in the publication.
List of splicing factors from Seiler et al.19
Supplementary Table 1 in the publication.
List of splicing factors from Rogalska et al.8
Supplementary Table 1 in the publication.
List of splicing factors from Head et al.20
Supplementary Table 1 in the publication.
RNA sequencing data of splicing factor perturbations from ENCORE project
We downloaded raw FASTQ files from the ENCORE website for all shRNA and CRISPR experiments targeting RNA binding proteins: https://www.encodeproject.org/encore-matrix/?type=Experiment&status=released&internal_tags=ENCORE.
RNA sequencing data of splicing factor perturbations from ENA
We downloaded raw FASTQ files from the ENA website listed in Supplementary Table 2 in this manuscript.
POSTAR3 CLIP data to construct CLIP-based splicing factor networks
Observational RNA sequencing data from human samples from diverse molecular contexts
We downloaded raw FASTQ files for Cardoso-Moreira et al.60 through the ENA website.
Perturbing RBM39 with Indisulam
For the Nijhuis et al.23 study, we downloaded raw FASTQ files from the ENA website (PRJNA673205) and processed LFQ proteomics data from the PRIDE website (PXD022164). Metadata for proteomics experiments were kindly provided by the authors.
For the Lu et al.24 study, we downloaded raw FASTQ files from the ENA website (PRJNA683080).
Combinatorial knockdowns of splicing factors
We downloaded raw FASTQ files for four different studies from the ENA website: PRJNA22324428, PRJNA49852927, PRJNA58774126, and PRJNA32156025.
Protein-protein interaction network
We downloaded STRING DB’s human protein interaction network from their webpage77. https://stringdb-static.org/download/protein.links.full.v11.5/9606.protein.links.full.v11.5.txt.gz
List of genes in SF3b complex
https://signor.uniroma2.it/relation_result.php?id=SIGNOR-C442
List of genes in U2snRNP complex
https://signor.uniroma2.it/relation_result.php?id=SIGNOR-C479
Perturbing SF3b complex reaction step with splicing drugs
We downloaded raw FASTQ files for six different studies from the ENA website: PRJNA37142130, PRJNA68579065, PRJNA66257266, PRJNA38010467, PRJNA35495768, and PRJNA29282769
Splicing event information
We downloaded information and their predicted protein impact of splicing events from VastDB.
https://vastdb.crg.eu/wiki/Downloads#Homo_sapiens_.28hg38.29
Molecular and clinical data from The Cancer Genome Atlas (TCGA)
We used the GDC portal to download the clinical, survival, and raw RNA-seq data.
SpliceosomeDB complexes
We obtained annotations of splicing factors to splicing reaction complexes from the SpliceosomeDB website. http://spliceosomedb.ucsc.edu/
RBP domain families
We obtained annotations of splicing factors to RBP domain families from the RBPDB website. http://rbpdb.ccbr.utoronto.ca/
Reactome pathways processed by MSigDB
We obtained ReactomeDB pathways from the MSigDB website.
https://www.gsea-msigdb.org/gsea/msigdb/human/genesets.jsp?collection=CP:REACTOME
Immune evasion CRISPR screen from Dubrot et al.71
We obtained the immune evasion score for each gene from Supplementary Table 13 in the publication.
Human to mouse gene name translations
We obtained tables of human-to-mouse gene name translations from BioMart.
Data from patients treated with ICB from Riaz et al.43
We obtained raw FASTQ files from the ENA website (PRJNA356761).
We obtained patient clinical metadata from Supplementary Table 2 in the publication.
Molecular data from the Cancer Cell Line Encyclopedia (CCLE)
We downloaded raw RNA-seq .fastq files for the cancer cell lines in the CCLE from the ENA website (PRJNA523380).
We downloaded cell line metadata (“sample_info.csv”) and processed mutation mapping across CCLE cell lines (“CCLE_mutations.csv”) from DepMap’s figshare repository (download link: https://ndownloader.figshare.com/articles/13681534/versions/1).
RNA sequencing data of carcinogenesis from Danielsson et al.46
We downloaded raw FASTQ files from the ENA website (PRJNA193487).
DEMETER2 gene dependencies from DepMap
https://ndownloader.figshare.com/articles/6025238/versions/6
CCLE cell line metadata
https://ndownloader.figshare.com/articles/13681534/versions/1
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.