Intro
The proteostasis network coordinates the synthesis, folding, trafficking, and degradation of proteins [ 1 – 3 ]. This intricate network encompasses molecular chaperones, degradation pathways, and regulatory systems that collectively ensure the proper maintenance of the proteome [ 1 – 3 ]. Maintaining the balance between these processes is essential for optimal cellular function, and its disruption has been implicated in numerous diseases, including cancer, neurodegenerative disorders, and autoimmune conditions [ 4 – 7 ]. As proteins misfold, are damaged and fail to be degraded, cells face increased stress and dysfunction, contributing to disease pathogenesis [ 4 – 7 ]. Understanding the mechanisms underlying proteostasis impairment is critical for developing therapeutic strategies that restore cellular balance [ 4 – 7 ].
Proteostasis dysregulation has been shown to manifest in distinct patterns across different diseases, reflecting the diversity of underlying mechanisms [ 1 – 5 ]. For example, neurodegenerative conditions such as Alzheimer’s and Parkinson’s diseases are characterized by the progressive aggregation of misfolded proteins, whereas cancers exploit proteostasis network like the ubiquitin-proteasome pathway to sustain rapid cell division [ 1 – 5 ] and molecular chaperones, which have been implicated in multiple hallmarks of cancer [ 8 ] and correlated with poorer prognosis [ 9 – 12 ]. While these disease-specific patterns are well-recognized, their broader significance as systematic signatures of proteostasis dysfunction has yet to be fully elucidated.
Inspired by the impact of the study of mutational signatures in cancer research [ 13 – 15 ], which have identified the molecular drivers of tumor biology and guided targeted therapies, we investigated a similar approach for understanding proteostasis disruption across diseases. Proteostasis pathways are known to be intricately linked to many diseases, each characterized by unique patterns of cellular damage [ 1 – 5 , 16 , 17 ]. To capture these distinct molecular alterations systematically, we describe the concept of proteostasis signatures, which provides a framework for linking specific proteostasis pathway disruptions to disease mechanisms.
By characterising proteostasis signatures, we mapped proteostasis dysregulation across diseases. By defining these signatures, we aim to provide a systematic framework for understanding how proteostasis is disrupted in different disease contexts and stages. This framework has the potential to bridge gaps in our knowledge by linking specific proteostasis pathways to their functional consequences in health and disease.
Results
We first asked whether the proteins involved in proteostasis are preferentially associated with disease. To this end, we analysed a recent comprehensive map of the human proteostasis network [ 18 , 19 ] and used the proteins within the network as our reference set of proteostasis proteins. We computed their association with 32 diseases from 7 disease groups ( S1 Table ) by studying the prevalence of proteostasis proteins within the gene set of each disease (disease gene set). The method for generating each disease gene set is described in Materials & Methods. Our results show that proteostasis proteins are closely associated with disease, as they are significantly over-represented in disease protein sets ( Fig 1 ). Over-representation analysis of each of the 4 protein groups within the top 500 disease-associated genes for every disease was computed with the hypergeometric test.
The relative disease association of proteostasis proteins (PN) was quantified and benchmarked against 3 control groups: kinases, transcription factors, and ion channels. Disease-association was determined by relative over-representation of a protein group within disease gene sets. This was done using the hypergeometric test measuring the statistical significance of their prevalence within each disease gene set. P-values were plotted on a -log(p-value) scale, with higher values representing stronger significance. Based on this quantification, proteostasis proteins are significantly over-represented in all the disease groups studied. They are relatively more disease associated than transcription factors, and in some cases even than kinases.
We then compared the disease association of the proteostasis proteins against 3 well characterized disease-associated functional protein groups: kinases, transcription factors, and ion channels. Kinases [ 20 , 21 ] and transcription factors [ 22 , 23 ] are essential regulatory proteins controlling diverse events in cellular signalling and gene transcription. They were selected as positive control groups, as they have been widely reported to be implicated in a range of diseases [ 20 – 23 ]. Ion channels are membrane proteins that regulate signal transduction across cell membranes [ 24 , 25 ], and were selected as a negative control group, as only 2 of our 7 disease groups studied (cardiovascular and neurodegenerative) are commonly associated with ion channels [ 24 – 26 ]. As expected, kinases and transcription factors are highly over-represented across the disease groups, while ion channels are over-represented in the neurodegenerative and cardiovascular disease groups ( Fig 1 ).
Our analysis reveals a strong relevance of proteostasis proteins in disease, with almost comparable disease-association with kinases ( Fig 1 ), which are a key targeted group of drug targets [ 27 ].
Proteostasis
Based on the observations from the proteostasis network profiles of the diseases, we identified 3 distinct proteostasis states in disease ( Fig 3A ). These proteostasis states describe disease in terms of characteristic perturbations of the proteostasis network. The most important pathways of the proteostasis network for the definition of these states are ALP, UPS and proteostasis regulation. The first proteostasis state is characterized by significant UPS perturbation but limited involvement of extracellular proteostasis ( Fig 3A ). This state is characteristic of cancer ( Fig 3B ). The second proteostasis state involves extensive perturbation of both UPS and extracellular proteostasis ( Fig 3A ). This state is predominantly presented in neurodegenerative diseases ( Fig 3B ). The third proteostasis state involves the distinctive deregulation of extracellular proteostasis but limited in UPS involvement ( Fig 3A ). This state is more common and less discriminatory, with autoimmune, endocrine, cardiovascular, reproductive, and respiratory diseases all presenting this trend.
(A) Three generalised proteostasis perturbation states are capable of discriminating disease types: (i) ALP + UPS + ER- (cancers), (ii) ALP + UPS + ER+ (neurodegenerative diseases), and (iii) ALP + UPS- and ER+ (other disease types analysed in this study). (B) Distinct patterns in enriched proteostasis network pathways (red spider plots) and functional classes (blue spider plots) reflect disease-relevant trends – notably, cancers and neurodegenerative diseases have distinct enrichment patterns compared to cardiovascular, autoimmune, reproductive, respiratory, and endocrine that have fairly similar patterns. The spider plots depict trends of over-representation of the relevant proteostasis network pathways and functional classes across all 7 disease types. Over-representation was determined using the hypergeometric test (p-value < 0.01).
To study the perturbation of the proteostasis network in disease and identify generalizable gene-wise disease signatures for characterizing disease types, we defined the disease-specific proteostasis signatures. We first considered 4 groups of disorders with distinct proteostasis states: cancer, neurodegenerative diseases, autoimmune diseases, and cardiovascular diseases. Clustering these diseases based on their disease-associated genes resulted in 4 clusters. The diseases clustered largely according to their proteostasis states, with a notable exception, in which kidney cancer and pancreatic cancer clustered with autoimmune diseases ( Fig 4A ), a finding consistent with the bidirectional association between cancer and autoimmune disorders reported in the literature [ 28 , 29 ]. Extracting generalized gene-wise disease signatures for ALP, UPS, and extracellular proteostasis identified similar proteostasis signatures of cancers and autoimmune disorders, indicating that similar genes are perturbed in related directions ( Fig 4B ). In addition, we found that proteostasis proteins associated with neurodegenerative diseases are often perturbed in opposite directions to cancers and autoimmune diseases ( Fig 4B ).
(A) Unsupervised clustering of cancers, neurodegenerative diseases, autoimmune diseases, and cardiovascular disease resulted in 4 clusters. The clusters were mostly by disease type, with the notable exception of pancreatic cancer and kidney cancer clustering with autoimmune diseases. (B) Proteostasis signature trends reveal that cancers and autoimmune diseases have a large proportion of common genes perturbed in similar patterns. In contrast, neurodegenerative diseases are perturbed in opposite directions. Each bar represents a gene from the relevant proteostasis network pathway. (C) Functional implications of the proteostasis signatures. The top enriched pathways (up to 10 each) for upregulated (red bars) and downregulated (blue bars) genes for each cluster type are shown.
To gain insight into the functional pathways implicated due to proteostasis alteration in disease, we carried out pathway enrichment analysis. Our results revealed disease pathways modulated by proteostasis ( Fig 4C ). In cancer, upregulated proteostasis proteins were found to be enriched in cell cycle pathways; in autoimmune diseases, cell cycle activation was also observed along with hyperactivation of the innate immune system; in neurodegenerative diseases, the unfolded protein response (UPR) was seen to be activated, likely as a natural reaction to the accumulation of aggregation-prone and damaged proteins. Other dysregulated pathways in neurodegenerative diseases included those involved in protein clearance, such as lysosome formation, autophagy, endocytosis, and recycling, alongside MITF-M-regulated pathways critical for maintaining brain function.
Next, we studied proteostasis perturbations on a temporal scale over the course of disease staging in 6 diseases (3 neurodegenerative diseases and 3 cancers) for which staging data are available: Alzheimer’s disease (AD), Parkinson’s disease (PD), Huntington’s disease (HTT), lung cancer, kidney cancer, and pancreatic cancer ( Fig 5 ). For each disease, differential gene expression analysis was carried out by disease stage against healthy controls. Our results reveal that proteostasis perturbations, regardless of upregulation or downregulation, occur progressively in neurodegenerative diseases but early in cancers ( Fig 5 ). These trends were conserved across all diseases included in this analysis for both disease types.
Patient samples from each disease was compared against against healthy controls. Differential gene analysis reveals that perturbation of the proteostasis network (PN) occurred progressively in neurodegenerative diseases but early in cancers. Each point represents a gene significantly perturbed in disease compared to controls, coloured by its direction and magnitude of change in disease conditions.
Further findings from this disease progression analysis revealed that although both ALP and UPS perturbations are indicative of disease states in both neurodegenerative diseases and cancer, they occur at different stages of disease progression. A larger proportion of affected ALP and UPS proteins are perturbed in early stages of the 3 cancers studied, but only at late stages for the 3 neurodegenerative diseases studied ( Fig 6 ). This result is in line with existing observations that damaged proteins are accumulated in neurodegenerative diseases over the course of aging that leads to toxicity, while cancer cells hijack the proteostasis network to enable survival and proliferation. We also examined genes affected in early-stages (Braak 1/2) of AD ( S1 Fig ). Our workflow enables the identification of previously reported early-stage AD genes such as YAP1 [ 30 ].
The proportion of the ALP and UPS genes affected in each stage of disease is calculated and depicted. While ALP and UPS perturbations are indicative of disease states in both cancers and neurodegenerative diseases, a large proportion of ALP and UPS genes affected only in later stages of disease compared to the early implication of these genes in early stages of cancers.
We further studied how proteostasis perturbations spread across disease progression. We hypothesized that the proteostasis proteins perturbed at earlier stages are central regulators of the proteostasis network, resulting in downstream disarray. To test this possibility, we mapped the perturbation of the proteostasis network in AD ( Fig 7 ). We then quantified and compared the degree and betweenness centrality of the proteostasis proteins involved in each stage. The degree measures the number of connections a protein within the network, and betweenness measures the extent to which a protein lies on the shortest path between protein pairs within the network. Proteins with high degree and betweenness are likely to play key regulatory roles in their functional networks. Based on these metrics, we found that, unlike our initial hypothesis, proteostasis proteins perturbed in mid-stage AD (Braak 2/3) are most central in the AD proteostasis network ( Fig 7 ). This result prompts further investigations on whether early-stage AD proteostasis proteins could be seeds that affect regulatory proteins contributing to proteostasis collapse in later stages.
Genes affected in the early stage are depicted as blue nodes; mid stage as beige nodes; and late stage as red nodes. A stage-wise quantification of the degree and betweenness centralities of the AD-associated proteostasis network genes within the network presented reveals that proteostasis proteins perturbed in mid-stage AD (Braak 2/3) are most central in the AD proteostasis network. The degree centrality measures the number of connections of a protein within the network, and the betweenness measures the extent to which a protein lies on the shortest path between protein pairs within the network. Proteins with high degree and betweenness are likely to play key regulatory roles in their functional networks. The upper and lower bounds of the boxplots represent the interquartile range of degree/betweenness for genes associated with each disease stage. The line contained in the box represents the 50 th percentile of degree/betweenness for genes associated with each disease stage. Whiskers represent non-outlying extreme points while data points beyond the whiskers are plotted individually.
Given that proteostasis perturbations can be observed at early stages of disease, we further investigate if disease risk factors alter the proteostasis network promoting disease susceptibility. Many existing studies have shown that the proteostasis network is altered with age, which is a major risk factor for many chronic diseases. By building on these results, we extend our analysis to smoking, another key risk factor for many chronic diseases that may increase disease risk via the proteostasis network. Smoking is particularly interesting because it has been reported to increase risk for some diseases while decreasing risk for others. Smoking is a risk factor for the development of chronic obstructive pulmonary disease (COPD) [ 31 , 32 ], lung cancer [ 33 ], breast cancer [ 34 , 35 ], and coronary heart disease [ 36 , 37 ]. However, it has also been reported to reduce the risk for ulcerative colitis [ 38 , 39 ], endometrial cancer [ 40 – 42 ], endometriosis [ 43 – 45 ], and PD [ 46 – 48 ].
The differentially expressed genes in smokers were compared against 8 disease gene sets: COPD, lung cancer, breast cancer, coronary heart disease, ulcerative colitis, endometrial cancer, endometriosis, and PD. The differentially expressed genes in smokers were obtained by carrying out differential gene expression analysis between smokers and non-smokers with no reported diseases. According to our hypothesis, we expected to find a higher similarity in perturbed proteostasis genes in smokers with diseases with increased risk due to smoking (hereafter referred to as ‘at-risk’ diseases), and a lower level of similarity between smokers and diseases with lowered risk due to smoking (hereafter referred to as ‘reduced-risk’ diseases). Based on our results, we find that proteostasis perturbations are indeed more similar between smokers and patients with at-risk diseases, and less similar between smokers and patients with reduced-risk diseases ( Fig 8A ). The Jaccard index was used to quantify similarity between perturbed proteostasis proteins due to smoking and proteostasis proteins within each disease gene set ( Materials and Methods ), then normalized for plotting in Fig 8A . We found that quantifying similarities between smoking-impacted proteostasis was more indicative of risk of disease than smoking-impacted kinases or transcription factors ( Fig 8A ), both of which are protein groups strongly associated with disease, as discussed earlier. We then compared the pairwise directional similarity of proteostasis perturbation between the overlapping smoking-perturbed proteostasis and disease-associated proteostasis perturbations. Our comparison revealed that smoking results in a larger directional similarity (i.e., proteostasis genes upregulated due to smoking, are also upregulated in disease, and proteostasis genes downregulated due to smoking, are also downregulated in disease) with at-risk diseases vis-à-vis reduced-risk diseases that have a large proportion of their proteostasis being perturbed in opposite directions ( Fig 8B ).
(A) Smokers present a higher similarity of proteostasis perturbations with at-risk diseases compared to reduced-risk diseases. Computing similarities of proteostasis proteins is more indicative of disease risk as compared to smoking-impacted kinases, transcription factors, or a random sample of differentially expressed genes. (B) At-risk diseases have a higher directional similarity of their perturbed proteostasis proteins with smoking. In contrast, reduced-risk diseases have a large proportion of perturbed proteostasis proteins that are deregulated in the opposite direction. (C) Genes encoding proteostasis proteins are perturbed similarly in smokers and patients with COPD. Genes similarly perturbed between smoking and COPD are likely to be contributive toward increasing COPD risk and onset. (D) Proteostasis proteins corresponding to genes perturbed in smokers and patients with PD. Proteostasis proteins oppositely perturbed between smoking and PD are likely to be protective against PD.
These observations suggest that the proteostasis similarly perturbed between smoking and at-risk diseases are likely to contribute toward increasing disease risk and the development of disease pathologies. Given this observation, we further investigated the proteostasis proteins perturbed similarly in smokers and patients with COPD (at-risk disease with highest similarity). For example, CCL2, an extensively studied protein target in COPD [ 49 , 50 ], is upregulated at both the transcriptomic ( Fig 8C ) and proteomic level [ 51 ] due to smoking. Similarly, proteostasis oppositely perturbed between smoking and reduced-risk diseases are likely to be protective against disease development. Hence, we investigated PD (reduced-risk disease with largest dissimilarity) in more detail. Our analysis identifies UCHL1 to be over-expressed in smokers ( Fig 8D ), mirroring earlier reports of smoking-dependent upregulation at the proteomic level [ 52 ]. This upregulation directly contrasts the downregulation of the UCHL1 gene observed in PD patients ( Fig 8D ), which correlates with the decreased risk for PD found in smokers. Given that UCHL1 has been reported to be a susceptibility gene for PD and proposed as a potential target for therapy [ 52 ], whose downregulation contributes to protein aggregation in Lewy bodies [ 53 ] – a hallmark of PD pathology, it is possible that the upregulation of UCHL1 due to smoking protects against UCHL1 loss-of-function that predisposes the cells to PD related symptoms. Moving forward, it will be interesting to further explore the other genes ( Fig 6C , D ) that present this trend to uncover key contributors to disease vulnerability thus supporting efforts in preventive care.
In this work, we quantified the involvement of the proteostasis network across diseases. We thus found specific disease signatures that characterise disruptions in protein homeostasis in various diseases. Upon further analysis, we found temporal patterns of proteostasis network involvement across disease development that differentiate similar static presentations of proteostasis network disturbances in disease states. In addition, we uncovered how risk factors such as smoking greatly impact the proteostasis network, likely priming cells with increased vulnerability for disease environments. We contextualise our findings within the current literature in the following.
Based on our profiling of proteostasis network functions associated with 32 diseases, we proposed 3 generalised proteostasis network disease perturbation states: (i) ALP + UPS + ER- (cancers), (ii) ALP + UPS + ER+ (neurodegenerative diseases), and (iii) ALP + UPS- and ER+ (other disease types analysed in this study). From this analysis, we observed that perturbations in autophagy/ALP represent a general state of disease. This finding is consistent with the widespread implication of ALP in many disease types [ 54 – 58 ], and studied extensively in cancer [ 59 ], neurodegenerative diseases [ 60 – 62 ], autoimmune diseases [ 57 , 63 – 67 ], respiratory diseases [ 68 – 70 ], cardiovascular diseases [ 71 – 74 ], and endocrine disorders [ 75 – 78 ]. Similarly, the UPS has been widely reviewed in its links to cancer and neurodegenerative diseases [ 79 – 84 ]. Studies have explored the targeting of UPS to modulate autoimmune diseases [ 85 , 86 ] and investigated the relevance of UPS in other disease groups such as cardiovascular diseases [ 87 ] and diabetes [ 88 , 89 ], albeit much less extensively. Furthermore, perturbations in extracellular proteostasis have been reported in disease types such as cardiovascular diseases [ 90 ], as also observed in our analysis, but also more recently discussed in the context of the extracellular matrix in cancer [ 91 ], not identified as a key proteostasis network signature. Perturbation of the UPS being identified as a representative disease signature for cancers and neurodegenerative diseases, or extracellular proteostasis network for non-cancer diseases, within our analysis but not the other disease types explored in literature may be the result of methodological processes, as for example the generation of the disease-gene sets (using the top 500 most associated diseases) and/or p-value cut-offs (we used strict p-values of 0.01). Alternatively, the analysis may also suggest the possibility that while multiple proteostasis network processes may play a role in altering disease states, some processes might have a stronger signal than others hence showing up upon filtering for top disease-associated genes and stricter p-values. Indeed, a comparison against alternative disease-gene sets or generating a consensus gene set from multiple established sources such as DisGeNET [ 92 ] or COSMIC ( https://cancer.sanger.ac.uk/cosmic ) for cancer can help establish the robustness of the identified proteostasis network disease signatures and disease states.
Deconvolution of the static disease states over progression revealed differences in proteostasis network disturbances over the course of disease. We reported that ALP and UPS are implicated early in disease for all studied cancers, while it is progressively changed in neurodegenerative diseases. This finding is supported by reports describing the role of autophagy in tumour initiation in cancer, an early study identified the contribution of decreased Beclin 1 levels (an autophagy regulator) in tumorigenesis [ 93 ], followed by multiple reviews studying the mechanisms of autophagy in the initial stages of cancer [ 94 – 97 ] and later in metastasis [ 98 – 102 ]. Like ALP, UPS has also been reported to play a role in tumorigenesis in early cancer stages [ 103 , 104 ]. Components of the UPS have been identified as potential targets for cancer therapies, including bortezomib [ 105 ], carfilzomib [ 106 ], b-AP15 [ 107 ], and VLX1570 [ 108 ] (proteasome inhibitors), pevonedistat [ 109 ] (E1 ligase inhibitor), and the nutlins [ 110 ] (E3 ligase inhibitors). In contrast, neurodegenerative diseases are characterised by a gradual build-up of misfolded proteins and their aggregates, as well as a progressive loss of neurons. The decline of proteostasis functions with time such as in aging is a likely contributor to this phenomenon. This has been exemplified in a study that found the progressive decrease in cellular proteostasis contributes to ALS onset [ 111 ].
An extension of the workflow in this study toward small patient cohorts would also be a unique application worth exploring. In this study, we illustrated the potential of identifying proteostasis network genes affected in early-stage AD, which is often characterised by small patient cohorts as it is difficult to identify and diagnose AD at early stages. In a similar way, this workflow can in principle be applied toward smaller patient cohorts for example rare diseases. However, such workflows may still be limited in effectiveness for some small datasets such as single/paired samples.
Finally, we established a link between smoking and disease states via the proteostasis network, showing that smoking affects proteostasis network components in a similar way to that in smoking-risk diseases. This result is supported by previous studies showing that inhaled smoke can impair protein folding resulting in ER stress [ 112 ] and smoke-induced aggresome formation contributes to COPD [ 113 ], amongst others. We note that other risk factors for disease, such as obesity can also have significant impacts on the proteostasis network promoting disease vulnerability whereby loss of proteostasis due to obesity leads to cardiovascular disease [ 114 ] or hypothalamic dysfunction [ 115 ]. Given these observations, further exploration of risk factors may allow us to uncover mechanisms that raise disease susceptibility allowing for strengthening preventive efforts.
We highlight that caution must be exercised when interpreting these signatures as association does not necessarily imply causation. Extensive effort needs to go into validating these mechanisms to develop robust and effective interventions. At present, disease signature studies are still predominantly research-based due to limitations like the interpretability of the disease signatures themselves. In addition, the disease signatures do not always have a one-to-one mapping with disease aetiologies, as for example stress-response patterns often overlap across diseases, or the clustering of pancreatic and kidney cancer with autoimmune diseases. These multiple mappings make assigning proteostasis disruptions to specific diseases extremely complex, and so that fitting diseases into known signatures could lead to incorrect conclusions. Given this complexity, differentiation of proteostasis network disease patterns needs to be done rigorously and interpreted accordingly to avoid unreliable associations.
Materials|Methods
A comprehensive list of proteostasis network components [ 18 , 19 ] was obtained from the Proteostasis Consortium ( https://www.proteostasisconsortium.com/ ).
The top 500 genes for each disease, based on PandaOmics [ 116 ], were determined to be disease associated and made up the gene set. This ranking was derived by comparing disease samples to tissue-matched controls in PandaOmics v2.0. The final scores used for ranking are a result of aggregating multiple omics inputs for each gene using a neural network. The multi-omics input include: (i) mRNA expression (level of differential gene expression in disease versus control), (ii) interactome community (the density of known targets, disease-related genes, and differentially expressed genes in its protein-protein interaction network), (iii) causal inference (estimating the number of genes regulated by similar transcription factors), (iv) overexpression (characterizing the effects of gene knock-in/knock-out on cell lines), (v) mutated disease sub-modules (assessing gene relevance based on OMIM, ClinVar, and Open Targets data), (vi) mutations (a combined score from genome and transcriptome-wide association studies), (vii) pathway analysis (evaluating a gene’s role in Reactome pathways using iPANDA and transcriptomic data), (viii) network neighbors (based on the number of directly connected differentially expressed genes in the protein interaction network), and (ix) disease relevance (aggregated scores from OMIM, ClinVar, and Open Targets). All patient datasets used in this study are documented in S2 Table . The proteostasis subset of each disease gene set was obtained by finding the intersection of our full list of proteostasis components with each disease gene set. All disease gene sets are available in S3 Table .
A list of kinases, transcription factors, and ion channels ( S4 Table ) was obtained from the PandaOmics database that annotated more than 20,000 proteins. Over-representation analysis was carried out using the hypergeometric test that measures the statistical significance of each group of proteins being over-represented in the disease gene set. P-values were plotted on a -log(p-value) scale, with higher values representing stronger significance.
The hypergeometric test was used to quantify the enrichment of every pathway/functional class within the disease gene set for each disease. A cutoff of p-value <0.01 was used in the profiling presented in Fig 5 .
The Jaccard Index was used to calculate similarity scores between diseases. This similarity matrix was used for hierarchical clustering (hclust function in R) which generated 4 clusters ( Fig 6A ).
Pathway enrichment analysis was carried out by calculating the probability of a disease gene set being over-represented in a pathway. The Reactome pathways were used for this analysis. A cut-off of false discovery rates (FDR) < 0.05 and more than 2 genes per pathway were applied. For plotting, the most upregulated and downregulated (up to 10 each) were included.
Only diseases for which transcriptomic datasets with disease staging information that was readily available were included in this study. The datasets used were: GSE48350 and GSE84422 (AD); GSE49036 and GSE42966 (PD); GSE64810 and GSE79666 (HTT); GSE30219 (lung cancer); GSE53757 , GSE76207 , and GSE126964 (kidney cancer); GSE62452 (pancreatic cancer).
Datasets that included samples from healthy controls that were smokers and non-smokers were obtained to study the impact of smoking on the proteostasis network. Datasets obtained and used include GSE22047 and GSE108134 .
Differential expression analysis was used to quantify expression differences between conditions: disease stage vs healthy control, or smokers vs non-smokers, for disease staging and smoking transcriptomic datasets respectively. All analyses were performed using the Limma method [ 117 ].
The Jaccard index was used to calculate similarity scores. It measures the ratio of the overlap between the perturbed proteostasis genes due to smoking and the proteostasis genes within each disease gene set to their union.
The proteostasis profiles of diseases ( Fig 2 ) were created in Excel. All other plots were created in R using the ggplot2 package.