Abstract
Mammalian cancer cell lines are essential model systems in biomedical research. We
conducted multi-level proteomics analyses on 54 widely used cancer cell lines derived
from various tissue -of-origins using two prominent proteomics technologies: mass
spectrometry (MS) and reverse -phase protein array (RPPA). Our analysis identified
10,088 proteins, 33,609 phosphorylation sites across 7,289 phosphoproteins, and
56,350 site-specific glycans on 16,296 glycosylation sites from 5,966 glycoproteins ,
along with 305 drug-relevant protein and phosphoprotein targets. Our results reveal
both consistent and distinct patterns in protein expression and modification between
MS and RPPA, underscoring their complementary strengths as discovery tools.
Additionally, we identified protein features that distinguish tissue origins across
different cell line lineages. This dataset supports model system selection for drug target-
related studies in vitro and provides valuable insights into key signaling pathways.
Overall, this comprehensive reso urce enables new opportunities for exploration in
preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for thisthis version posted November 1, 2024. ; https://doi.org/10.1101/2024.10.29.620541doi: bioRxiv preprint
cancer biology and offers significant value to research communities focused on
biomarker profiling, drug target discovery, and understanding mechanisms across
diverse cancer types.
Introduction
Human cancer cell lines serve as essential models for uncovering the molecular
mechanisms and cellular behaviors associated with oncogenesis and disease
progression. Past studies have leveraged various proteomics approaches to establish
proteoform-oriented expression profiles in these cell lines, providing critical insights
into cancer systems biology. Most of these studies utilized large-scale analyses covering
proteome-wide and post -translational modification (PTM) levels, leading to the
creation of datab ases such as the Cancer Cell Line Encyclopedia (CCLE) and the
National Cancer Institute’s 60 (NCI-60) cancer cell line collection. Using 10-plex TMT
quantitative proteomics, Gygi’s team profiled 375 cell lines across 22 lineages,
quantifying over 12,000 pr oteins across wide dynamic ranges 1. In line with that ,
Aebersold’s team conducted quantitative mass spectrometry (MS) proteomics
experiments on NCI-60 cancer cells using a SWATH/DIA-based approach, generating
over 8,000 unique proteins within which about 3,171 proteins were compar ed across
cell lines 2. This proteotypic landscape underscores the value of protein -level data in
interpreting cellular phenotypes, inferring protein coregulatory networks, and
predicting proteo-transcriptomic-informed drug responses. Complementary studies on
NCI-60 cell lines employed label -free MS to analyze proteome and kinome profiles,
while others have used MS to characterize the phosphoproteome, illuminating the
mechanisms of action (MOA) of cancer drugs3,4. The DepMap Project further integrates
proteomic data with pharmacological profiles, deepening our understanding of
molecular vulnerabilities and therapeutic targets in cancer (https://depmap.org/portal/).
In addition to MS -based proteomics, affinity -based protein quantification methods,
such as the Reverse Phase Protein Array (RPPA), allow high-throughput quantification
preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for thisthis version posted November 1, 2024. ; https://doi.org/10.1101/2024.10.29.620541doi: bioRxiv preprint
of proteins and modified proteoforms using specific antibodies5,6. Cell-based RPPA
experiments, including those by our team, are commonly used to investigate biological
mechanisms under perturbation7,8. This technique has characterized NCI-60 cell lines,
generating drug response networks and signaling pathway activity profiles that impact
cellular fates 9-11. A key advantage of RPPA is its ability to quantify low -abundance
proteins and PTM . For example, Davies and colleagues used a panel of 222 protein
features, including total and phosphorylated proteins, to categorize NCI -60 cells into
five clusters, each associated with specific mutations linked to drug respons e10. Other
studies identified protein -inferred drug response , pattern ing the
activation/phosphorylation states of 135 proteins and defining six core cancer signaling
modules related to therapeutic responses 12. Hundreds of cancer cell lines with more
than 200 protein targets have since been profiled by RPPA, with data repositories such
as CCLE, PRIDE ( https://www.ebi.ac.uk/pride/), and MD Anderson’s Cell Lines
Project (MCLP: https://tcpaportal.org/mclp/#/) offering comprehensive datasets that
allow researchers to investigate protein functions, drug targets, and biomarkers across
cancer types13-15. Recent RPPA studies expanded these resources, adding 447 clinically
relevant dual-actionable targets in the cancer field16.
MS-based proteomics and RPPA each offer distinct advantages in proteomics research. MS-
based proteomics digests proteins into peptides, identifies them by comparing MS/MS
spectra with theoretical protein sequences, and enables large -scale characterization of
the entire proteome and modified forms, whereas affinity detection-based methods such
as RPPA employ antibodies to measure protein levels across hundreds of samples
simultaneously5,17-21. While several studies have integrated MS and RPPA data, most
focus on clinical samples. For example, MS and RPPA have been combined to provide
quantitative proteomic landscapes of breast cancer tissue22. Proteomics data derived
from The Cancer Genome Atlas (TCGA) using RPPA (quantifying 150 –200 protein
forms) have also been compared with global MS proteomics from the Clinical
Proteomics Tumor Analysis Consortium (CPTAC,
https://proteomics.cancer.gov/programs/cptac), highlighting the role of protein -driven
preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for thisthis version posted November 1, 2024. ; https://doi.org/10.1101/2024.10.29.620541doi: bioRxiv preprint
therapy development across multiple cancer types 23. Despite these advances, deep
characterization of cancer cell lines using multiple proteomics methods remains limited,
with only a few studies performing such analyses on a systemic level7.
Beyond total protein profiling, PTMs, mainly phosphorylation and glycosylation, are
critical cellular f unction governors in cancer 24-27. Phosphoproteomics, which
characterizes site-specific phosphorylation on proteins , is invaluable for uncovering
dysfunctional kinase activities and thus aiding the development of therapeutic drugs
such as kinase inhibitors4,17,28,29. Databases such as PhosphoSitePlus provide extensive
data on phosphorylation and other PTM data for cancer cell lines, supporting targeted
research (https://www.phosphosite.org/). Similarly, glycoproteomics, which examines
glycosylation modifications, has been extensively studied due to its potential as a
diagnostic biomarker for cancer and other diseases 30,31. Recent advances in MS
technology have enabled large-scale glycosylation profiling of cancer cell lines 32,33,
offering mechanistic insights into glycosylation on a glob al scale during oncogenesis
and disease progression.
To address the gap in multi -dimensional proteomic analysis of cancer cell lines , we
applied label-free MS together with RPPA to obtain quantitative proteomics data on 54
representative cancer cell lines , predominantly from NCI -60. This study provides a
comprehensive proteomics landscape, incorporating the total proteome,
phosphoproteome, and glycoproteome, which reinforces our understanding of protein-
level regulation in cancer cells . By comparing MS and RPPA datasets, our study
underscores the complementary strengths of these two methods: MS offers a broad view
of protein abundance and PTMs, wh ile RPPA facilitates targeted quantification of
specific proteins and PTMs, even at low abundances. This combined approach
highlights the value of multi -dimensional proteomic data in discovering therapeutic
targets, identifying biomarkers for cancer subtyping, and predicting cellular responses,
supporting advances in precision oncology.
preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for thisthis version posted November 1, 2024. ; https://doi.org/10.1101/2024.10.29.620541doi: bioRxiv preprint
Methods
Protein extraction and trypsin digestion for MS analysis
Cells were minced and lysed in lysis buffer (8 M urea, 100 mM Tris hydrochloride, pH
8.0) containing protease and phosphatase inhibitors (Thermo Scientific) followed by 1
min of sonication (3 s on and 3 s off pulse, amplitude 25%). The lysate was centrifuged
at 14,000 × g for 10 min, and the supernatant was collected as whole tissue extract.
Protein concentration was determined by the bicinchoninic acid (BCA) protein assay.
Extracts from each sample (1 mg proteins) were reduced with 10 mM dithiothreitol at
56°C for 30 min and alkylated with 10 mM iodoacetamide at room temperature (RT)
in the dark for an additional 30 min. Samples were then digested using the filter -aided
proteome preparation (FASP) method with trypsin. Briefly, samples were transferred
into a 30 kD Microcon filter (Millipore) and centrifuged at 14,000 × g for 20 min. The
precipitate on the filter was washed twice by adding 300 μL washing buffer (8 M urea
in 100 mM Tris, pH 8.0) into the filter and centrifuged at 14,000 × g for 20 min. The
precipitate was resuspended in 200 μL 100 mM NH4HCO3. Trypsin with a protein-to-
enzyme ratio of 50:1 (w/w) was added to the filter. Proteins were digested at 37°C for
16 h. After tryptic digestion, peptides were collected by centrifugation at 14,000 × g for
20 min and dried in a vacuum concentrator (Thermo Scientific). 5% peptides were used
to detect the proteome, and 95% peptides were used to prepare phospho -peptides with
Fe-NTA enrichment kit (Thermo Scientific, A32992).
Enrichment of phosphopeptides and glycopeptides
The peptide sample was fully dissolved in 100 µL of 80% acetonitrile and 0.1% TFA
solution following vortex mixing. Subsequently, the precipitation was removed by
centrifugation at 16,000 × g for 10 min. The enrichment column was retrieved from the
kit (Thermo Scientific, A32992) , and the preservation solution was removed by
centrifugation at 1,000 × g for 1 min. This was followed by the addition of 100 µL of
80% acetonitrile and 0.1% TFA solution, which was gently mixed and incubated at
preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for thisthis version posted November 1, 2024. ; https://doi.org/10.1101/2024.10.29.620541doi: bioRxiv preprint
room temperature for 3 min. After incubation, the solution was removed by
centrifugation at 1,000 × g for 1 min, and the peptide solution supernatant was added
to the column. The mixture was incubated at room temperature for 30 min, with gentle
mixing every 10 min. Following the incubation period, the peptide solution was
removed by centrifugation at 1,000 × g for 1 min. To wash the column, 100 µL of 80%
acetonitrile and 0.1% TFA solution was added, and the washing solution was removed
by centrifugation, repeating the wash step three times. A clean 1.5 mL Eppendorf tube
was then placed beneath the enrichment column, and 100 µL of 50% acetonitrile and
9% ammonia solution was added for elution. The elution was collected by
centrifugation at 1,000 × g for 1 min, resulting in the enriched phosphorylated peptide
solution. The phosphorylated peptide solution was vacuum -dried, and the peptide
powder was stored at -80°C until analysis.
RPPA sample processing
The Reverse Phase Protein Array (RPPA) was performed following a standardized
workflow to ensure consistency and quality. Protein lysates were initially mixed with a
sample dilution buffer (comprising 50% glycerol, 4X SDS buffer, and 6 ml of beta -
mercaptoethanol) to achieve a final concentratio n of 1.5 mg/ml. Normalized samples
were then further diluted 2-fold in sample dilution buffer, consisting of lysis buffer, 50%
glycerol, and 4X SDS buffer with 6 ml of beta -mercaptoethanol in a 3:4:1 ratio. Five
serial dilutions (1, 1/2, 1/4, 1/8, 1/16) were performed using automated liquid handling
workstations (Tecan Fluent series). The processed samples were loaded into low -
binding 384-well plates (Molecular Devices) and then deposited onto nitrocellulose -
coated glass slides (Grace Bio -Labs ONCYTE superN OVA) using a Quanterix 2470
solid pin contact printer.
To maintain quality control (QC), on-slide controls, including treated and untreated cell
lines as well as a lysate mixture from various cell lines and tonsil tissue, were applied
for staining and quantitative QC checks. Approximately 400 identical slides were
prepared for the study.
preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for thisthis version posted November 1, 2024. ; https://doi.org/10.1101/2024.10.29.620541doi: bioRxiv preprint
Each slide was then processed for colorimetric signal quantification using a validated
panel of 305 antibodies, targeting 227 total proteins and 78 phosphoproteins or other
post-translational modifications (PTMs) (detailed in Supplementary S1). The slides
were blocked with Re -Blot (Millipore) at room temperature, followed by I -block
(Fisher) and antigen retrieval with hydrogen peroxide (Fisher). Sequential blocking
steps with avidin, biotin, and protein block (DAKO) were conducted before a 1 -hour
primary an tibody incubation at room temperature. Secondary antibodies (DAKO)
specific for rabbit or mouse were applied, followed by Tyramide Signal Amplification
(TSA, Akoya) and DAB colorimetric visualization (DAKO). Staining was fully
automated using the DAKO Link 48 Autostainer (Agilent), and the slides were then
scanned on a high-throughput slide scanner to capture and analyze colorimetric signals
accurately.
LC-MS/MS analysis
Dried peptide samples were re-dissolved in Solvent A (0.1% formic acid in water) and
loaded to a trap column (100 μm × 2 cm, home -made; particle size, 3 μm; pore size,
120Å; SunChrom , USA) with a max pressure of 280 bar using Solvent A, then
separated on a home-made 150 μm× 12 cm silica microcolumn (particle size, 1.9 μm;
pore size, 120Å; SunChrom, USA) with a gradient of 5 -35% mobile phase B
(acetonitrile and 0.1% formic acid) at a flow rate of 300 nL/min for 120 min.
The eluted peptides were ionized under 2.2 kV . MS was operated under a data -
dependent acquisition (DDA) mode . For detection with Orbitrap Eclipse mass
spectrometer, a precursor scan was carried out in the Orbitrap by scanning m/z 300 -
1,500 with a resolution of 60,000. Then, MS/MS scanning was carried out in the
Orbitrap by scanning m/z 200-1,400 with a resolution of 15,000. The most intense ions
selected under top-speed mode were isolated in Quadrupole with a 1.6 m/z window and
fragmented by higher energy collisional dissociation (HCD) with normalized collision
energy of 32%. Max injection time set 40 ms for full scans and 30 ms for MS/MS scans.
preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for thisthis version posted November 1, 2024. ; https://doi.org/10.1101/2024.10.29.620541doi: bioRxiv preprint
Dynamic exclusion time was set as 30 s. For phosphopeptide, a precursor scan was
carried out in the Orbitrap by scanning m/z 300-1,500 with a resolution of 60,000. Then,
MS/MS scanning was carried out in the Orbitrap by scanning m/z 200 -1,400 with a
resolution of 30,000. The most intense ions selected under top -speed mode were
isolated in Quadrupole with a 1.6 m/z window and fragmented by higher energy
collisional dissociation (HCD) with a normalized collision energy of 27%. Max
injection time set 30ms for full scans and 54 ms for MS/MS scans.
MS database search
All the MS data were processed in the Maxquant (V1.6.17.0) platform. Raw files were
searched against the human National Center for Biotechnology Information (NCBI) ref-
seq protein database (updated on 07 -04-2013, 32,015 entries). Mass tolerances were
10ppm for precursor and 0.05 Da for product ions. Up to two missed cleavages were
allowed. The data were also searched against a decoy database so that protein
identifications were accepted at a false discovery rate (FDR) of 1%.
Carbamidomethylation (C) was set in the search engine as a fixed modification; Acetyl
(Protein N-term) and Oxidation (M) as variable modifications. For phospho-proteome,
the variable modifications included phosphorylation on serine, threonine, and tyrosine.
The qualitative analysis of N-glycopeptides from the MS/MS data was conducted using
Byonic software version 5.0.3 (Protein Metrics Inc., USA). Employing full tryptic
digestion with allowance for up to two missed cleavages, the precursor mass tolerance
was fixed at 10 ppm, whereas the fragment -mass tolerance was set at 0.02 Da.
Carbamidomethylation of cysteines was set as a fixed modification, and variable
modifications were set as acetyl (N-terminus), methionine oxidation (M), and phospho
(S, T, Y). The maximum dynamic modification number was restricted to 2. The analysis
utilizes the built-in database 132 N -Glycan of human in Byonic. The false discovery
rates (FDRs) at the protein level remained below 1%.
preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for thisthis version posted November 1, 2024. ; https://doi.org/10.1101/2024.10.29.620541doi: bioRxiv preprint
RPPA analysis
Image data were digitally processed using MicroVigene software (version 5.6.0.8),
producing both text (.txt) and image (.tiff) files for each slide. These files were then
analyzed using SuperCurve fitting via the R package SuperCurve to produce expression
data (rawlog2 files) and quality control (QC) metrics for each slide. Correction factors
(CF) were calculated to identify outliers both within and across experiments. For data
normalization (to adjust for loading differences), median subtraction was applied: first,
each antibody column was median subtracted, followed by median subtraction for each
sample row. This resulted in a normalized log2 file, which was squared to create a linear
dataset (Normlinear), detailed in Supplementary S2. These processed data sets were
then prepared for quantitative comparisons and graphical visualization in downstream
analyses.
Data processing
MS-based proteomics and phosphoproteomics data were analyzed using MaxQuant's
label-free quantitation (LFQ) approach to assess protein and phosphorylation -site
abundances. Prior to further processing, proteins , and phosphorylation sites present in
less than 5% of samples were filtered to account for the broad lineage diversity of cell
lines, thus applying a more lenient threshold than usual. Missing values were imputed
with the minimum values from each respective sample. Data were manually examined
using the R package "DEP" (v1.16.0). Data visualization and preprocessing were
performed with "Seurat" (v5.0.1), using the "NormalizeData" function with the
"LogNormalize" method. The R package "COSG" was utilized to i dentify marker
proteins, setting a COSG score threshold of 0.5. For calculating the fold change (FC)
of phosphorylation sites, the "FindAllMarkers" function in "Seurat" was applied.
For mass spectrometry-based glycoproteomics, each glycopeptide is annotated with its
glycan composition, which includes sialic acid (NeuAc), fucose (Fuc), N -
acetylhexosamine (HexNAc), and hexose (Hex). A site-specific glycan is defined as a
preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for thisthis version posted November 1, 2024. ; https://doi.org/10.1101/2024.10.29.620541doi: bioRxiv preprint
particular type of glycan located at a specific glycosylation site on a protein. This
definition encompasses peptides of varying lengths, including those with missed
cleavages and differing charge states of the peptide ions. The abundance of each site -
specific glycan is estimated using spectral counting, which involves summing the
number of spectra corresponding to all peptide ions that contain this type of glycan at
the designated sites.
For RPPA data, antibody probes were mapped to Uniprot Accession IDs of target
proteins. In cases of multiple mappings, a representative protein was selected to
facilitate comparisons between RPPA and MS data.
To enable comparisons across data modalities, the data were processed systematically
for consistency across platforms. For MS proteomics and phosphoproteomics, raw
abundance values were log2(x+1)-transformed, followed by averaging replicate values
to quant ify each cell line. In the case of MS glycoproteomics, site -specific peptide
counts for each modification, site, or protein were averaged across replicates for cell
line quantification. For RPPA proteomics and phosphoproteomics, mean normalized
expression values were calculated from replicates to quantify each cell line. Using these
standardized quantified values, correlation analyses were performed to examine
relationships across cell lines.
Functional annotation, Enrichment analysis, and KSEA analysis
GO annotations were performed using the "org.Hs.eg.db" R package (v3.18.0), while
KEGG pathway data were accessed through "KEGGREST" (October 11, 2024). GO
enrichment analysis was carried out using "clusterProfiler" (v4.10.0).
KSEA (Kinase -Substrate Enrichment Analysis) was conducted using "KSEAapp "
(v0.99.0) to estimate kinase activity changes by averaging multiple substrate
measurements rather than single -substrate dependence. Fold change (FC) values for
phosphorylation sites in each cell line were computed as previously described. Kinase-
substrate relationships were sourced from PhosphoSitePlus® (January 15, 2024). A Z-
preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for thisthis version posted November 1, 2024. ; https://doi.org/10.1101/2024.10.29.620541doi: bioRxiv preprint
score was calculated by comparing the mean log2(FC) of each kinase’s substrate
phosphosites against the mean log2(FC) of all phosphosites in that cell line. Kinases
with significant Z-scores (FDR < 0.05) and more than five substrates were considered
to exhibit altered activity.
Results
Overview of the study
In this study, we investigated 54 widely used tumor cell lines representing various
human tissues. Cultured cells were harvested and prepared separately for MS and RPPA
proteomics analysis. The MS -based analysis was comprehensive, encompassing the
whole proteome, phosphoproteome, and glycoproteome (Fig. 1a). These cell lines were
derived from tissues on breast (n=7), esophagus/stomach (n=8), lymphoid (n=8), lung
(n=5), ovary/fallopian tube (n=5), large bowel (n=5), and other origins (n=16) (Fig. 1b,
Supplemental Table S1).
After applying the filtering criteria described in the Methods, we identified 10,088
proteins in the MS-based dataset, with a median of 6,330 proteins detected per sample
(Fig. 1c, Supplemental Table S2). In the phosphoproteomics dataset, we detected 9,380
phosphoproteins with 33,609 phosphorylation sites (location probability > 0.75), and
median values of ~3,000 phosphoproteins and 6,000 phospho -sites per sample
(Supplemental Table S3). Furthermore, we identified 56,350 site-specific glycans from
16,296 glycosylation sites, spanning 5,966 glycoproteins (Supplemental Table S4). The
RPPA panel included 231 whole -protein and 74 phosphosite -specific antibodies
(Supplemental Table S5).
Principal component analysis (PCA) of protein expression and phosphorylation (Fig.
1d) demonstrated strong reproducibility in the cell line replicates, confirming the
consistency of the data. Phosphorylation reflects the cell's different states and can vary
preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for thisthis version posted November 1, 2024. ; https://doi.org/10.1101/2024.10.29.620541doi: bioRxiv preprint
under external perturbations; in this study, the data represents the basal phosphorylation
levels of the cell lines. The analysis revealed significant differences based on tissue
origin, with a clear distinction between cell lines derived from nonsolid tum ors and
solid tumors.
Additionally, a heatmap of whole protein expression (Fig. 1e) highlighted the diversity
of expression across cell lines. We identified 292 marker proteins (COSG scores > 0.5)
enriched in specific tissues, which demonstrated distinct expression profiles
(Supplemental Table S2). These marker proteins were useful in differentiating tissue -
specific expression patterns.
This comprehensive proteomics dataset provides a multi -layered understanding of the
proteome, phosphoproteome, and glycoproteome across a diverse set of cancer cell
lines, offering valuable insights into cancer biology and the molecular underpinnings
of tumor-specific behaviors.
Phosphoproteomics revealed phosphorylation dynamics of cancer cell lines
Phosphorylation, a key post -translational modification (PTM), plays a crucial role in
regulating cellular signaling pathways. Oncogenicity is often linked to dysregulated
molecular signaling, and understanding phosphorylation patterns can provide insights
into cancer mechanisms and therapeutic targets. As of January 2024, the FDA has
approved 80 small-molecule kinase inhibitors targeting 24 kinases that regulate protein
phosphorylation on serine, threonine, and tyrosine residues, with more therapeutics
under development24,25. Studying phosphorylation patterns of human cancer cell lines
may help estimate the disease mechanisms, select suitable models, and evaluate drug
responses4. In our MS phospho -proteome analysis, we identified 33609
phosphorylation sites, with 90.9 % of them previously reported in the PhosphoSite Plus
database (www.phosphosite.org) (Fig. 2a). Due to the use of immobilized metal affinity
chromatography (IMAC) affinity for phosphopeptide enrichment, the distribution of
serine, threonine, and tyrosine phosphorylation in this study mirrored the natural
preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for thisthis version posted November 1, 2024. ; https://doi.org/10.1101/2024.10.29.620541doi: bioRxiv preprint
occurrence of these PTMs, with serine and threonine being the most common and
tyrosine being the least prevalent. In addition, RPPA provided a detailed
characterization of specific phosphorylation sites, especially for tyrosine, based on 74
site-specific antibodies, complementing the broader MS -based overview with more
targeted data.
Our data revealed significant variations in phosphorylation levels across different cell
lines, which could be useful for identifying abnormal kinase activity. For example,
Abelson 1 (ABL1) phosphorylation was notably higher in the chronic myeloid
leukemia (CML) cell line SPI-801 (a derivative of K-562) compared to other lines (Fig.
2b). CML is driven by the BCR-ABL1 fusion gene, which results in abnormal tyrosine
kinase activity. Targeted therapies, such as imatinib, inhibit BCR-ABL and can induce
apoptosis in CML cells 28. Thus, our data hitlier phosphorylation patterns, which may
offer insights for drug development in specific cancer types.
We also generated a heatmap of signaling pathways across different cell lines (Fig. 2c),
showing no clear correlation between tissue origin and phosphorylation patterns,
suggesting that even cancers from the same tissue may require personalized treatments.
Notably, hyperphosphorylation of the PI3K/Akt/mTOR pathway was observed in the
gastric cancer cell lines MKN45 and MKN7, consistent with reports that these cell lines
are sensitive to inhibitors targeting this pathway from the CancerRxGene database
(www.cancerrxgene.org/celllines). Additionally, Akt2 was blotted across cell lines, and
its overexpression at the protein level and high phosphorylation level was clearly
observed in ovarian cancer cell line OVCAR-3, consistent with the literature picturing
its role in cancer cell signaling and potential as a therapeutic target34-38 (Fig. 2b).
Further, we performed a kinase-substrate enrichment analysis (KSEA) , represented
through a dot plot (Fig. 2d) and kinase tree (Fig. 2e). Noteworthy findings included the
high activity of CDK4 substrates in the Pfeiffer cell line, a diffuse large B -cell
lymphoma line, aligning with reports that CDK4/6 inhibitors are effective against
preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for thisthis version posted November 1, 2024. ; https://doi.org/10.1101/2024.10.29.620541doi: bioRxiv preprint
aggressive B -cell lymphomas 39. In the CCRF-CEM cell li ne from T -cell acute
lymphoblastic leukemia, substrates of CDK4 and CDK6 showed significant
phosphorylation (Fig. 2f) , reflecting the potential for CDK4/6 inhibitors in this
disease40,41. We also noted the overexpression of CSNK2A1, a subtype of CK2, across
multiple cancer cell lines, including SNU-16 (gastric adenocarcinoma) and HepG2
(hepatoblastoma). CK2 is known to be overexpressed in various cancers, and inhibitors
like CX-4945 show therapeutic potential in gastric and liver cancers42-44.
In summary, our phosphoproteomics analysis provides a comprehensive view of kinase
activities, activated signaling pathways, and the relationships between total protein
levels and phosphorylation. These insights are valuable for selecting appropriate cancer
cell line models for drug and cellular signaling research, as well as for predicting the
sensitivities of cancer cell lines to kinase -targeting therapies. This understanding can
help identify key kinase -driven processes and improve the precision of thera peutic
interventions in cancer research.
Glycoproteome landscape of cancer cell lines
Protein glycosylation is a complex post -translational modification (PTM) that
significantly influences protein structure, stability, function, and intracellular
signaling45. Recent advancements in mass spectrometry and computational tools have
enabled the large-scale analysis of intact glycopeptides. In this study, we characterized
56350 site-specific glycans at 16296 glycosylation sites from 5966 glycoproteins. The
IMAC can enrich glycopeptides carrying sialic acids, as well as phosphorylated glycans
such as mannose -6-phosphate and extra acidic amino acid peptides 46. Among the
identified glycans, 49.72% contained sialic acid, 25.11% were high-mannose types, and
25.16% contained fucose (Fig. 3a). This diversity, including glycopeptides found at the
same or different protein sites, highlights the complexity of glycosylation across
different cell lines (Fig. 3b). The variability in glycosylation patterns between cell lines
underscores the challenges in glycoproteomics research. Gene Ontology (GO) analysis
preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for thisthis version posted November 1, 2024. ; https://doi.org/10.1101/2024.10.29.620541doi: bioRxiv preprint
of proteins with high-frequency glycosylation (>75% of samples) revealed significant
enrichment in cadherin binding pathways (Fig. 3c). Notably, proteins such as EGFR,
CLINT1, ESYT2, ITGB1, EIF4G2, PLP4, TNND1 , and PICALM, all heavily
glycosylated, are involved in these pathways (Fig. 3d). For example, Epidermal Growth
Factor Receptor (EGFR), a receptor tyrosine kinase and an important target for cancer
therapies, was found to have 12 glycosylation sites (N1044, N1094, N128, N352, N361,
N398, N413, N444, N5 26, N528, N592, N603, N623) (Fig. 3e). Among these, N361
and N352, located within the extracellular domain of EGFR , have been previously
reported to be essential for maintaining EGF binding sites 31,47,48. Additionally, N528
exhibits significant glycosylation, with 73 site-specific glycans and 44 different glycan
types identified at this site. These modifications may play a critical role in the structure
and function of EGFR, influencing its activity and int eractions. Overall, our
glycoproteomics into the glycosylation patterns of cancer cell lines helps to better
understand critical drug targets and signaling pathways. This data presents an
opportunity to explore how glycosylation and other PTMs, such as phosphorylation,
interact to modulate protein function and cellular behavior, especially in receptor
kinases like EGFR. Further targeted analyses could enhance our understanding of these
interactions.
In conclusion, the glycosylation profiling of cancer cell lines provides a comprehensive
view of glycoprotein status, offering a powerful tool for understanding the molecular
behaviors of cancer cells and informing therapeutic strategies.
Comparison of MS and RPP A proteomics data
The comparison between Mass Spectrometry (MS) and Reverse Phase Protein Array
(RPPA) proteomics data reveals key insights into their performance and correlation.
The Spearman correlation coefficients (Fig. 4a) show high intra-group consistency for
both technologies, with values ranging from 0.85 to 0.95 , indicating strong
preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for thisthis version posted November 1, 2024. ; https://doi.org/10.1101/2024.10.29.620541doi: bioRxiv preprint
reproducibility within the same experimental conditions. Inter -group correlations are
notably weaker, especially in phosphorylation datasets, where MS phosphoproteome
inter-group correlations are 0.40, and RPPA phosphorylation sites are -0.007, showing
greater variability across different cell lines. MS provides a detailed and comprehensive
analysis of proteins and post -translational modifications (PTMs), including
phosphorylation. However, it can exhibit higher variability between experimental
groups and o ccasionally have missing data points. Conversely, RPPA is more
standardized and reproducible, especially for high-throughput applications
In this study, we used RPPA technology to analyze 231 proteins. Out of these, 212
proteins (91.8%) were successfully quantified by MS, but 19 proteins (8.2%) were
missed by MS (Fig. 4c). Due to missing values in the MS data, only 146 proteins
(~63.2%) were quantified by MS in more than half of the samples, while 66 proteins
(28.6%) were quantified in fewer than half. Functional analysis of these proteins
revealed significant involvement in kinase and ubiquitase activity (Fig. 4d). When
comparing the quantit ative data from MS and RPPA, 90% of the proteins showed a
positive correlation between the two methods, with a median correlation coefficient of
0.6 (Fig. 4e), indicating strong agreement between the technologies. Additionally, a
comparison of protein fold changes between myeloid and tumor epithelial cells
demonstrated a stronger correlation, with a Spearman coefficient of 0.79 (Fig. 4f), than
the correlation between protein expression levels.
In summary, MS and RPPA provide complementary insights into proteomics data. MS
offers broad coverage of proteins and PTMs, though with occasional missing data
points, while RPPA delivers more focused, consistent data based on antibody
availability. Together, these methods validate and complement each other, confirming
their reliability in quantitative proteomic analysis.
Discussion
preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for thisthis version posted November 1, 2024. ; https://doi.org/10.1101/2024.10.29.620541doi: bioRxiv preprint
Previous studies have extensively profiled the whole proteome landscape of pan-cancer
cell lines, offering valuable insights for biological research. However, these studies
have largely overlooked the role of post -translational modifications (PTMs) such as
phosphorylation and glycosylation, which are critical for processes like signal
transduction, tumor progression, and drug responses. Our study addresses this gap by
presenting comprehensive datasets focused on these key PTMs, creating a pan -cancer
PTM landscape. With our phosphoproteomics data, we can identify activated signaling
pathways and kinases across different cell lines, offering foundational data for selecting
suitable models for experimental validation and clinical pharmacology. Additionally,
our glycoproteomics data reveals glycosylation sites and glycan types on key signaling
proteins, particularly on transmembrane receptors, enhancing our understanding of
cellular regulation and protein functionality.
We used IMAC enrichment to capture both phosphorylated and glycosylated peptides,
as IMAC is effective in capturing peptides containing phosphorous and sialic acid
groups. The specific enrichment conditions favored phosphorylated peptides in this
study. While the glycopeptides are less abundant than phosphopeptides, they serve as a
valuable supplementary database without requiring extensive enrichment efforts or
additional MS instrument time.
To facilitate quantifying site-specific glycans, we developed a novel method to quantify
glycosylated protein expression based on spectral counting , the code for which is
available in Supplemental Material 6. This code allows users to extract quantitative
information from Byonic ™ search results, bypassing the limitations of traditional
glycopeptide identification tools that lack direct quantitation functionality. Our method
simplifies the quantitation of relative abundances of site -specific glycans across
different cancer cell lines.
We also compared MS with RPPA proteomics and found a strong correlation in the
observed changes in protein expression between the two platforms. This consistency
suggests that conclusions regarding the upregulation or downregulation of proteins
preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for thisthis version posted November 1, 2024. ; https://doi.org/10.1101/2024.10.29.620541doi: bioRxiv preprint
across different sample groups remain reliable regardless of the technology used. MS
offers broader coverage, detecting a wide range of proteins and PTMs, but can miss
data for some proteins. RPPA, while more targeted, sensitive, and reproducible, is
dependent on the availability of specific antibodies for the proteins and PTMs of interest,
preventing it from being used in global proteome study.
In conclusion, our study systematically constructed the proteome, phosphoproteome,
and glycoproteome landscape of human cancer cell lines, revealing consistent protein
expression patterns across cancer types, identifying tissue -specific biomarkers,
highlighting active signaling pathways, and detailing important protein modifications.
These datasets are invaluable for studying cancer mechanisms, identifying biomarkers,
and discovering potential therapeutic targets. Additionally, our comparison of MS and
RPPA technologies provides guidance for choosing the most appropriate method in
cancer proteomics research, depending on the experimental focus.
Technical Validation
To ensure data reliability, we implemented stringent validation steps across the
proteomics workflow:
1. High Intra-Group Correlation: The Spearman correlation coefficients calculated
within the same cell lines show strong consistency for both MS and RPPA, with
values ranging from 0.85 to 0.95. This indicates robust reproducibility under
identical experimental conditions. In contrast, the correlations between different
groups were low (Fig. 4a).
2. Mass Spectrometry (MS) Calibration: MS performance was validated daily
using a HeLa cell lysate digest as a reference standard (Fig. 4b), ensuring
consistent signal intensity and retention time stability throughout the study.
3. Correlation Validation: Protein fold changes between MS and RPPA showed
strong agreement, with a Spearman correlation of 0.79 for protein perturbation
(Fig. 4f), validating the accuracy across different technology platforms.
preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for thisthis version posted November 1, 2024. ; https://doi.org/10.1101/2024.10.29.620541doi: bioRxiv preprint
These validation steps confirm the robustness of our datasets, ensuring their suitability
for future research in cancer biology.
Authors and Affiliations
1. Chemistry Department, Tsinghua University, 30 Shuangqing Road, Haidian
District, Beijing 100084 China
Yiying Zhu, Wenhao Shi
2. Bioscience and Biomedical Engineering Thrust, Systems Hub, The Hong Kong
University of Science and Technology (Guangzhou), Guangzhou, 511453 China.
Division of Emerging Interdisciplinary Areas, Center for Aging Science, The
Hong Kong University of Science and Technology, Clear Water Bay, Hong
Kong, China.
Tang Shaojun, He Tianlong, Annan Qian, Yuqiao Liu
3. Cosmos Wisdom Biotech C o. Ltd, Building 10th, No. 617 Jiner Road,
Hangzhou, 311215, China.
Nan Wang (
[email protected]) [This author was a previous employee of
Mills Institute for Personalized Cancer Care, Fynn Biotechnologies, Jinan,
China]
Author contributions
Study conception and supervision: Y.Z., N.W., S.T., Investigation and acquisition of
MS and RPPA proteomics data: W.S., N.W. Data analysis, integration, and
interpretation: T.H., W.S., A.Q., Y .L., Y .Z. Supervision of the bioinformatics analysis:
S.T. Writing: Y .Z., N.W., W.S., T.H. Reviewing and editing: Y .Z., N.W.
Acknowledgment
We thank PRECEDO Biotechnologies (Hefei, China) for providing valuable cell lines
for this work. We also thank other team members at Fynn Biotechnologies Co., Ltd
preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for thisthis version posted November 1, 2024. ; https://doi.org/10.1101/2024.10.29.620541doi: bioRxiv preprint
(Shangdong, China) for conducting the RPPA.
Funding
This study is supported by the Innovation Funding from the Office of Laboratory
Management at Tsinghua University (53101001124 Y . Z.).
Competing interests
The authors declare no conflict of interest.
References
1 Nusinow, D. P . et al. Quantitative Proteomics of the Cancer Cell Line Encyclopedia. Cell 180,
387-402 e316 (2020). https://doi.org:10.1016/j.cell.2019.12.023
2 Guo, T. et al. Quantitative Proteome Landscape of the NCI -60 Cancer Cell Lines. iScience 21,
664-680 (2019). https://doi.org:10.1016/j.isci.2019.10.059
3 Gholami, A. M. et al. Global proteome analysis of the NCI -60 cell line panel. Cell Rep 4, 609-
620 (2013). https://doi.org:10.1016/j.celrep.2013.07.018
4 Frejno, M. et al. Proteome activity landscapes of tumor cell lines determine drug responses.
Nat Commun 11, 3639 (2020). https://doi.org:10.1038/s41467-020-17336-9
5 Coarfa, C. et al. Reverse-Phase Protein Array: Technology, Application, Data Processing, and
Integration. J Biomol Tech 32, 15-29 (2021). https://doi.org:10.7171/jbt.21-3202-001
6 Ding, Z., Wang, N., Ji, N. & Chen, Z. S. Proteomics technologies for cancer liquid biopsies. Mol
Cancer 21, 53 (2022). https://doi.org:10.1186/s12943-022-01526-8
7 Wang, N. et al. Parallel Analyses by Mass Spectrometry (MS) and Reverse Phase Protein Array
(RPPA) Reveal Complementary Proteomic Profiles in Triple -Negative Breast Cancer (TNBC)
Patient Tissues and Cell Cultures. 2024.2005.2030.596640 (2024).
https://doi.org:10.1101/2024.05.30.596640 %J bioRxiv
8 Lei, Z. N. et al. ABCB1-dependent collateral sensitivity of multidrug-resistant colorectal cancer
cells to the survivin inhibitor MX106 -4C. Drug Resist Updat 73, 101065 (2024).
https://doi.org:10.1016/j.drup.2024.101065
9 Shankavaram, U. T. et al. Transcript and protein expression profiles of the NCI -60 cancer cell
panel: an integromic microarray study. Mol Cancer Ther 6, 820 -832 (2007).
https://doi.org:10.1158/1535-7163.MCT-06-0650
10 Park, E. S. et al. Integrative analysis of proteomic signatures, mutations, and drug
responsiveness in the NCI 60 cancer cell line set. Mol Cancer Ther 9, 257 -267 (2010).
https://doi.org:10.1158/1535-7163.MCT-09-0743
11 Tian, Q. et al. Integrated genomic and proteomic analyses of gene expression in Mammalian
cells. Mol Cell Proteomics 3, 960-969 (2004). https://doi.org:10.1074/mcp.M400055-MCP200
12 Federici, G. et al. Systems analysis of the NCI -60 cancer cell lines by alignment of protein
preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for thisthis version posted November 1, 2024. ; https://doi.org/10.1101/2024.10.29.620541doi: bioRxiv preprint
pathway activation modules with " -OMIC" data fields and therapeutic response signatures.
Mol Cancer Res 11, 676-685 (2013). https://doi.org:10.1158/1541-7786.MCR-12-0690
13 Li, J. et al. Characterization of Human Cancer Cell Lines by Reverse-phase Protein Arrays. Cancer
Cell 31, 225-239 (2017). https://doi.org:10.1016/j.ccell.2017.01.005
14 Ghandi, M. et al. Next-generation characterization of the Cancer Cell Line Encyclopedia. Nature
569, 503-508 (2019). https://doi.org:10.1038/s41586-019-1186-3
15 Vizcaino, J. A. et al. The PRoteomics IDEntifications (PRIDE) database and associated tools:
status in 2013. Nucleic Acids Res 41, D1063-1069 (2013). https://doi.org:10.1093/nar/gks1262
16 Li, J. et al. A protein expression atlas on tissue samples and cell lines from cancer patients
provides insights into tumor heterogeneity and dependencies. Nat Cancer 5, 1579-1595 (2024).
https://doi.org:10.1038/s43018-024-00817-x
17 Meissner, F., Geddes-McAlister, J., Mann, M. & Bantscheff, M. The emerging role of mass
spectrometry-based proteomics in drug discovery. Nat Rev Drug Discov 21, 637-654 (2022).
https://doi.org:10.1038/s41573-022-00409-3
18 Nan Wang, Y . Z., Lianshui Wang, Wenshuang Dai, Taobo Hu, Zhentao Song, Xia Li, Qi Zhang,
Jianfei Ma, Qianghua Xia, Jin Li, Yiqiang Liu, Mengping Long, Zhiyong Ding. (bioRxiv,
2024).
19 Tibes, R. et al. Reverse phase protein array: validation of a novel proteomic technology and
utility for analysis of primary leukemia specimens and hematopoietic stem cells. Mol Cancer
Ther 5, 2512-2521 (2006). https://doi.org:10.1158/1535-7163.MCT-06-0334
20 Wang, N. et al. A reverse phase protein array based phospho -antibody characterization
approach and its applicability for clinical derived tissue specimens. Sci Rep 12, 22373 (2022).
https://doi.org:10.1038/s41598-022-26715-9
21 Abrey, L. E. et al. Report of an international workshop to standardize baseline evaluation and
response criteria for primary CNS lymphoma. J Clin Oncol 23, 5034 -5043 (2005).
https://doi.org:10.1200/JCO.2005.13.524
22 Johansson, H. J. et al. Breast cancer quantitative proteome and proteogenomic landscape. Nat
Commun 10, 1600 (2019). https://doi.org:10.1038/s41467-019-09018-y
23 Savage, S. R. et al. Pan-cancer proteogenomics expands the landscape of therapeutic targets.
Cell 187, 4389-4407 e4315 (2024). https://doi.org:10.1016/j.cell.2024.05.039
24 Roskoski, R., Jr. Properties of FDA-approved small molecule protein kinase inhibitors: A 2024
update. Pharmacol Res 200, 107059 (2024). https://doi.org:10.1016/j.phrs.2024.107059
25 Roskoski, R., Jr. Cost in the United States of FDA -approved small molecule protein kinase
inhibitors used in the treatment of neoplastic and non-neoplastic diseases. Pharmacol Res 199,
107036 (2024). https://doi.org:10.1016/j.phrs.2023.107036
26 Wang, Y. et al. FDA-approved small molecule kinase inhibitors for cancer treatment (2001 -
2015): Medical indication, structural optimization, and binding mode Part I. Bioorg Med Chem
111, 117870 (2024). https://doi.org:10.1016/j.bmc.2024.117870
27 Pinho, S. S. & Reis, C. A. Glycosylation in cancer: mechanisms and clinical implications. Nat Rev
Cancer 15, 540-555 (2015). https://doi.org:10.1038/nrc3982
28 Cohen, M. H. et al. Approval summary for imatinib mesylate capsules in the treatment of
chronic myelogenous leukemia. Clin Cancer Res 8, 935-942 (2002).
29 Rikova, K. et al. Global survey of phosphotyrosine signaling identifies oncogenic kinases in lung
cancer. Cell 131, 1190-1203 (2007). https://doi.org:10.1016/j.cell.2007.11.025
preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for thisthis version posted November 1, 2024. ; https://doi.org/10.1101/2024.10.29.620541doi: bioRxiv preprint
30 He, K. et al. Decoding the glycoproteome: a new frontier for biomarker discovery in cancer. J
Hematol Oncol 17, 12 (2024). https://doi.org:10.1186/s13045-024-01532-x
31 Lih, T. M., Cho, K. C., Schnaubelt, M., Hu, Y . & Zhang, H. Integrated glycoproteomic
characterization of clear cell renal cell carcinoma. Cell Rep 42, 112409 (2023).
https://doi.org:10.1016/j.celrep.2023.112409
32 Hu, Y ., Shah, P ., Clark, D. J., Ao, M. & Zhang, H. Reanalysis of Global Proteomic and
Phosphoproteomic Data Identified a Large Number of Glycopeptides. Anal Chem 90, 8065-
8071 (2018). https://doi.org:10.1021/acs.analchem.8b01137
33 Cho, K. C., Chen, L., Hu, Y ., Schnaubelt, M. & Zhang, H. Developing Workflow for Simultaneous
Analyses of Phosphopeptides and Glycopeptides. ACS Chem Biol 14, 58 -66 (2019).
https://doi.org:10.1021/acschembio.8b00902
34 Meng, Q., Xia, C., Fang, J., Rojanasakul, Y . & Jiang, B. H. Role of PI3K and AKT specific isoforms
in ovarian cancer cell migration, invasion and proliferation through the p70S6K1 pathway. Cell
Signal 18, 2262-2271 (2006). https://doi.org:10.1016/j.cellsig.2006.05.019
35 Khabele, D. et al. Preferential effect of akt2 -dependent signaling on the cellular viability of
ovarian cancer cells in response to EGF. J Cancer 5, 670 -678 (2014).
https://doi.org:10.7150/jca.9688
36 Huang, Q. et al. Akt2 kinase suppresses glyceraldehyde-3-phosphate dehydrogenase (GAPDH)-
mediated apoptosis in ovarian cancer cells via phosphorylating GAPDH at threonine 237 and
decreasing its nuclear translocation. J Biol Chem 286, 42211 -42220 (2011).
https://doi.org:10.1074/jbc.M111.296905
37 Noske, A. et al. Specific inhibition of AKT2 by RNA interference results in reduction of ovarian
cancer cell proliferation: increased expression of AKT in advanced ovarian cancer. Cancer Lett
246, 190-200 (2007). https://doi.org:10.1016/j.canlet.2006.02.018
38 Yuan, Z. Q. et al. Frequent activation of AKT2 and induction of apoptosis by inhibition of
phosphoinositide-3-OH kinase/Akt pathway in human ovarian cancer. Oncogene 19, 2324-2330
(2000). https://doi.org:10.1038/sj.onc.1203598
39 Tanaka, Y. et al. Abemaciclib, a CDK4/6 inhibitor, exerts preclinical activity against aggressive
germinal center -derived B -cell lymphomas. Cancer Sci 111, 749 -759 (2020).
https://doi.org:10.1111/cas.14286
40 Pikman, Y . et al. Synergistic Drug Combinations with a CDK4/6 Inhibitor in T -cell Acute
Lymphoblastic Leukemia. Clin Cancer Res 23, 1012-1024 (2017). https://doi.org:10.1158/1078-
0432.CCR-15-2869
41 Bride, K. L. et al. Rational drug combinations with CDK4/6 inhibitors in acute lymphoblastic
leukemia. Haematologica 107, 1746 -1757 (2022).
https://doi.org:10.3324/haematol.2021.279410
42 Borgo, C., D'Amore, C., Sarno, S., Salvi, M. & Ruzzene, M. Protein kinase CK2: a potential
therapeutic target for diverse human diseases. Signal Transduct Target Ther 6, 183 (2021).
https://doi.org:10.1038/s41392-021-00567-7
43 Prins, R. C. et al. CX-4945, a selective inhibitor of casein kinase -2 (CK2), exhibits anti -tumor
activity in hematologic malignancies including enhanced activity in chronic lymphocytic
leukemia when combined with fludarabine and inhibitors of the B -cell receptor pathway.
Leukemia 27, 2094-2096 (2013). https://doi.org:10.1038/leu.2013.228
44 Zhang, L. et al. Comprehensive landscape of gastric cancer-targeted therapy and identification
preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for thisthis version posted November 1, 2024. ; https://doi.org/10.1101/2024.10.29.620541doi: bioRxiv preprint
of CSNK2A1 as a potential target. Heliyon 10, e36205 (2024).
https://doi.org:10.1016/j.heliyon.2024.e36205
45 Schjoldager, K. T., Narimatsu, Y ., Joshi, H. J. & Clausen, H. Global view of human protein
glycosylation pathways and functions. Nat Rev Mol Cell Biol 21, 729 -749 (2020).
https://doi.org:10.1038/s41580-020-00294-x
46 Riley, N. M., Bertozzi, C. R. & Pitteri, S. J. A Pragmatic Guide to Enrichment Strategies for Mass
Spectrometry-Based Glycoproteomics. Mol Cell Proteomics 20, 100029 (2021).
https://doi.org:10.1074/mcp.R120.002277
47 Lam, D., Arroyo, B., Liberchuk, A. N. & Wolfe, A. L. Effects of N361 Glycosylation on Epidermal
Growth Factor Receptor Biological Function. bioRxiv (2024).
https://doi.org:10.1101/2024.07.12.603279
48 Azimzadeh Irani, M., Kannan, S. & Verma, C. Role of N-glycosylation in EGFR ectodomain ligand
binding. Proteins 85, 1529-1549 (2017). https://doi.org:10.1002/prot.25314
preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for thisthis version posted November 1, 2024. ; https://doi.org/10.1101/2024.10.29.620541doi: bioRxiv preprint
Figure 1. Experimental Design and Overview of the Data
(a) Workflow of proteomics analysis of cancer cell lines: Cultivated cells were divided into separate
aliquots for MS and RPPA sample preparation. After analysis via mass spectrometry or Array -Pro
analyzer, MS data was processed separately for the total pr oteome, phosphoproteome, and
glycoproteome, while RPPA data was processed to measure protein concentrations.
(b) Cell line classification: Cell lines are grouped based on their tissue of origin.
(c) Number of detections per cell line: The left side shows t otal proteins detected, and the right side
shows phosphosites detected across cell lines. The outer circle lists cell line names, with bars representing
the number of detections for each cell line. Bar colors indicate the tissue origin of the cell lines.
(d) PCA of cell lines: Principal component analysis (PCA) displays the proteome (left) and
phosphoproteome (right) expression profiles for each cell line. Each dot represents a sample, and the
percentage value indicates the variance explained.
(e) Heatmap of protein expression: The heatmap shows relative protein expression levels across cell lines,
with the color scheme representing standardized scores. Yellow indicates over-expressed proteins.
a
10088
Proteins
Esophagus/Stomach (n = 8)
Lymphoid (n = 8)
Breast (n = 7)
Ovary/Fallopian Tube (n = 5)
Bowel (n = 5)
Lung (n = 5)
other (n = 16)
c
ed
b
33609
Phosphosites
preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for thisthis version posted November 1, 2024. ; https://doi.org/10.1101/2024.10.29.620541doi: bioRxiv preprint
Figure 2. Phosphoproteome Landscape of Cell Lines
(a) Distribution of Phosphorylation Types: The modifications (uncurated) are compared against the PhosphoSite Plus
database (www.phosphosite.org), with reported modifications shown in blue and unreported modifications in red.
(b) Whole Proteome and Total Phosphorylation of ABL1 and AKT2 in Cell Lines: This study presents the abundance
levels of both ABL1 and AKT2 proteins, along with their phosphorylation levels in various cell lines. Each dot
represents a replicate. Protein abundance is calculated by summing the values of the top three unique peptides, while
phosphorylation abundance is determined by the total of all identified phosphopeptides.
(c) Heatmap of Signaling Pathways in Cell Lines: The phosphorylation levels of proteins across various cell lines
are shown, with the number of detected phosphorylation sites representing these levels. The block colors indicate
pathway enrichment, and the top of the heatmap is color-coded to represent the tissue origins of the cell lines.
(d) Kinase Activity Inferred from Substrates: Kinase activity is depicted based on the number of phosphorylated
substrates across different cell lines. The circle size corresponds to the number of phosphorylated substrates for each
kinase, and the color intensity reflects the Z scores, representing the relative abundance of phosphorylated substrates.
(e) Kinase Detection in Cell Lines Mapped on the Kinase Tree: This panel maps detected kinases across different
categories on a kinase tree. Circle depth represents the number of detections, with darker colors indicating higher
detection frequencies.
(f) Kinase Activation in CCRF-CEM Cell Line: Kinase activation is illustrated based on the number of substrates
phosphorylation and numbers in the CCRF-CEM cell line.
a b
c e
d
f
SPI-801
OVCAR-3 OVCAR-3
SPI-801
Protein Expression
(Normalized log abundance)
Protein Phosphorylation
(Normalized log abundance) Protein Phosphorylation
(Normalized log abundance)
Protein Expression
(Normalized log abundance)
preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for thisthis version posted November 1, 2024. ; https://doi.org/10.1101/2024.10.29.620541doi: bioRxiv preprint
Figure 3. Glycoproteome Landscape of Cell Lines
(a) Distribution of glycosylation types: Site-specific glycans were classified into sialic acid, fucose, and
high-mannose types. A pie chart displays the percentage distribution of these glycan types across the cell
lines.
(b) Distribution of site-specific glycans: The Y-axis shows the counts of site-specific glycans identified
by MS in the cell lines, while the X -axis represents the number of samples in which each glycopeptide
was detected.
(c) Enriched GO te rms of heavily glycosylated proteins: Heavily glycosylated proteins were selected
based on their occurrence in multiple cell lines. The p -value reflects the significance of the biological
functions associated with these glycosylated proteins.
(d) Glycosylated proteins involved in enriched functions: Highlights the specific proteins contributing to
the enriched biological functions identified in panel (c).
(e) Detected glycosylation sites on epidermal growth factor receptor (EGFR): Shows the glycosylation
sites on EGFR detected across the different cell lines.
b
e
a d
c
preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for thisthis version posted November 1, 2024. ; https://doi.org/10.1101/2024.10.29.620541doi: bioRxiv preprint
Figure 4. Comparison between RPPA and MS Whole Proteomics Data.
(a) Spearman correlation coefficients: Comparison of intra - and inter-group correlations for MS
whole proteome, MS phosphoproteome, RPPA whole proteins, and RPPA phosphorylation sites,
shown from left to right.
(b) MS instrument monitoring: Daily tracking of MS performance using whole -cell lysate (Hela
cells) tryptic digest standards.
(c) Protein detection comparison: Analysis of 231 proteins in the RPPA panel. Cyan indicates
proteins detected by MS in more than 50% of samples, orange for proteins detected in fewer than
50%, and dark orange for proteins not detected by MS.
(d) GO term enrichment: Functional analysis of proteins detected in RPPA proteomics, highlighting
major biological processes associated with the RPPA panel.
(e) Protein quantitation correl ation: Spearman’s R -values show the correlation of protein levels
between MS and RPPA data. The X-axis represents R-values, and the Y-axis shows the number of
proteins within each range.
(f) Fold change correlation: Comparison of protein expression fold ch anges across different cell
lines as measured by MS and RPPA, based on datasets from myeloid and tumor epithelial cells.
a cb
e
d
f
preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for thisthis version posted November 1, 2024. ; https://doi.org/10.1101/2024.10.29.620541doi: bioRxiv preprint
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.