Pathway level subtyping identifies a slow-cycling and transcriptionally lethargic biological phenotype associated with poor clinical outcomes in colon cancer independent of genetics

doi:10.21203/rs.3.rs-3891488/v1

Pathway level subtyping identifies a slow-cycling and transcriptionally lethargic biological phenotype associated with poor clinical outcomes in colon cancer independent of genetics

2024 · doi:10.21203/rs.3.rs-3891488/v1

preprint OA: closed

Full text JSON View at publisher

Full text 203,815 characters · extracted from preprint-html · click to expand

Pathway level subtyping identifies a slow-cycling and transcriptionally lethargic biological phenotype associated with poor clinical outcomes in colon cancer independent of genetics | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Pathway level subtyping identifies a slow-cycling and transcriptionally lethargic biological phenotype associated with poor clinical outcomes in colon cancer independent of genetics Sudhir B Malla, Ryan M Byrne, Maxime Lafarge, Shania M Corry, and 37 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-3891488/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 12 Feb, 2024 Read the published version in Nature Genetics → Version 1 posted You are reading this latest preprint version Abstract Molecular stratification, across many tumour types, has used gene-level transcriptional data to identify subtypes associated with distinct genotypes and biological traits, as exemplified by the consensus molecular subtypes (CMS), and more recently the intrinsic CMS (iCMS), in colorectal cancer. In an attempt to develop molecular subtypes that more closely align to cancer-relevant phenotypic traits in KRAS mutant tumours, here we present an approach that uses gene ontology and biological activation state information, rather than gene-level data, for the initial stages of class discovery. In doing so, we define three unique pathway-derived subtypes (PDS); where PDS1 tumours are highly proliferative and display good prognosis, PDS2 tumours are stroma/immune-rich with intermediate prognosis. The final subtype, PDS3, represent a previously overlooked subset of tumours within CMS2, which display a ‘lethargic’ biological phenotype with neural-like traits and the worst prognosis. Remarkably, these biological and clinical features remain consistent across tumour samples independent of KRAS mutational status, supporting the use of PDS for defining cancer-relevant phenotypes regardless of genetics. Cancer Biology Molecular Classification Colorectal Cancer Biological Phenotype Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Introduction A goal of molecular subtyping in cancer is to identify biomarkers that stratify tumours according to biological and clinical phenotypes, providing improved understanding of signalling cascades that underpin tumour development and treatment response. Numerous studies have leveraged transcriptional data in the form of individual gene-level expression values to identify tumour subtypes 1 – 3 , followed by downstream characterisation using collections of gene ontology pathway-level signatures that represent biologically important phenotypes 4 . The value of this approach is exemplified by the Consensus Molecular Subtypes (CMS) 5 and ColoRectal Intrinsic Subtypes (CRIS) 3 of colorectal cancer (CRC). A recent study proposed a refinement of CMS classification through use of individual gene-level data derived from single cell RNA sequencing (scRNAseq), revealing two further intrinsic subtypes, iCMS2 and iCMS3, associated with different prognostic outcomes within the previously-defined CMS4 stromal-rich tumour type 6 . While this study has provided improved resolution to this stromal-rich group, remarkably it did not identify any heterogeneity or provide additional resolution within the largest subtype of epithelial-rich tumours, CMS2. Gene-level approaches to molecular classification align to the model of biological signalling defined by the central dogma, whereby a functional hierarchical order along DNA-RNA-Protein is well-established 7 , and are dominated by associations with genetic alterations underpinning the Vogelstein paradigm 8 , 9 . Using this gene-level approach, focussed specifically on BRAF V300E mutant ( BRAF mut) CRC tumours, Barras and colleagues identified heterogeneity within this genotype, enabling development of two unique biological subtypes within BRAF mut tumours 10 . The clinical predictive value of this BRAF mut-specific approach in selecting patients for targeted therapies was subsequently validated using tumour samples from an independent phase II clinical study 11 . Although subtype discovery using gene-level data represents the most commonly-deployed approach, pathway-level data has been shown to provide a closer link with molecular mechanisms and clinical phenotypes 12 . Therefore, we hypothesise that by using pathway-level data, initially in KRAS mutant ( KRAS mut) CRC tumours, we can identify a more comprehensive view of biological signalling related to disease phenotypes within this important genotype. Using transcriptional data from a series of colorectal tumours, we identify, validate, and characterise a set of novel pathway derived subtypes (PDS), based on mechanistic signalling cascades. Following class discovery performed exclusively in KRAS mut CRC, remarkably this classifier provides new insights into tumour biology regardless of mutational status, across several independent human and mouse tumour cohorts. This approach reveals a previously unseen continuum of intrinsic biology within epithelial-rich CMS2 tumours, which display neural-like traits and can be defined along a polarised gradient of epithelial cell states, including cell cycle, transcriptional activity and stress response. In summary, this PDS classification system identifies tumours defined by near-identical biological traits independent of genetic events, reinforcing the fundamental value of PDS in classifying tumours based on cancer-relevant biological phenotypes. Results Pathway-level class discovery in KRASmut tumours define unique biological subtypes We first generated a matrix of pathway-level single sample scores (n = 1783 per tumour), from the Molecular Signature Database (MSigDB), using a subset of the FOCUS clinical trial cohort (n = 360 tumours; SCORT cohort) to create an initial framework for mapping of the tumour activation status across n = 640,000 + combinations of biological processes, including BIOCARTA, PID, KEGG and REACTOME. Following this pathway-level conversion, KRAS mut samples (n = 165) were used for unsupervised class discovery, which through k -means clustering revealed three distinct pathway-derived subtypes, PDS1, 2 and 3, with distribution of PDS1 = 27%, PDS2 = 38% and PDS3 = 35% (Fig. 1A, Supplementary Fig. 1A,1B and 1C, Supplementary Table 1A). Assessment of mutations according to PDS revealed no distinct mutational type for KRAS itself (Fig. 1B) or a number of other driver mutations (Fig. 1C). Visualisation of the n = 626 gene sets most significantly associated with PDS revealed how transcriptionally distinct these groups are (Fig. 1D; Supplementary Table 1B). These analyses also indicated that CMS2 tumours, and to a lesser extent CMS3 tumours, were split across PDS1 and PDS3, whereas the CMS1/4 inflammatory/stromal tumours were merged within PDS2 (Fig. 1E, 1F, Supplementary Fig. 1D). We next set out to determine the key biological pathways most associated with PDS, using the Hallmarks collection 4 . Consistent with CMS1 and CMS4 associations, PDS2 tumours were enriched for signalling related to inflammatory/immune pathways, such as interferon-α and interferon-γ response, as well as stromal-related epithelial-to-mesenchymal (EMT) and TGF-β activation (Fig. 1G). PDS1 tumours were significantly upregulated in cell cycle-related pathways, including MYC targets, E2F targets, G2M checkpoint, whereas there was near universal repression of cancer-associated hallmarks signalling in PDS3 (Fig. 1G). Overall, these three distinct transcriptional classes confirm the presence and extent of signalling heterogeneity within KRAS mut CRC. Furthermore, given the potential therapeutic value associated with some of the signalling cascades employed, these findings suggest that PDS classification provides a novel basis for forward and reverse translation studies. To ensure that PDS classification can be performed on additional datasets, we explored different classification algorithms, where the PDS-specific n = 626 gene set signatures (Fig. 1D) were used as feature selection for training and developing the classification model (Fig. 1H). Given the best overall accuracy displayed by the support vector machine via Radial Basis Function (svmRBF) algorithm (Supplementary Fig. 1E and 1F), the PDS classifier was based on this model. During the development of the PDS classification model, the prediction probability scoring system was introduced as a way to enumerate intratumoural subtype heterogeneity, where we defined 0.6 as the default threshold for PDS classification, with tumours that do not reach this threshold termed a ‘mixed’ sample (Fig. 1I; Supplementary Fig. 1G and 1H). To ensure this classifier can be easily utilised by the wider research community, an R-based classification package was developed; PDSclassifier R package (see ‘Data and Code Availability’), which accurately classified tumours into the same clusters identified during class discovery. Pathway-derived subtypes identify tumours with conserved signalling regardless of mutational status While the PDS classifier was developed exclusively within KRAS mut colorectal tumours, we next tested its ability to classify samples into biologically similar groups regardless of KRAS mutational status within the entire FOCUS cohort (n = 360). When samples of all mutational backgrounds were included, it classified 87% of samples as PDS1 = 26%, PDS2 = 31%, PDS3 = 30%; and the rest as ‘Mixed’ (13%) (Fig. 2A and 2B). Remarkably, the PDSclassifier stratified tumour samples into robust groups that were entirely independent of KRAS mutational status (Fig. 2C). Furthermore, the same PDS-CMS associations remained evident; while PDS2 were predominantly CMS4/CMS1, CMS2/CMS3 were again distributed across PDS1 and PDS3 (Fig. 2D and 2E; Supplementary Fig. 2A). We next assessed the clinical and molecular landscape of these tumours, which revealed no significant differences for clinical factors (gender, sidedness) between PDS (Supplementary Fig. 2B). Assessment of known driver mutations in each PDS indicated that, in line with CMS associations, there was significant enrichment of BRAF mutations, and fewer APC mutations, in PDS2 compared to PDS1/3 (Fig. 2F, Supplementary Fig. 2C). Importantly, there were no differences in the rates of mutations across key genes within the WNT, MAPK, PIK3CA, cell cycle or TGF-β pathways assessed (Fig. 2F, Supplementary 2C). This trend was also observed when copy number estimates were assessed (Fig. 2G, Supplementary Fig. 2D and 2E). Overall, these findings indicate that despite being developed within KRAS mut tumours, our novel pathway-derived subtypes identify biology independent of KRAS mutational status and are applicable to all CRC tumours, regardless of genetics. Transcriptional landscape across pathway-derived subtypes Using both the Hallmarks and DoRothEA 13 algorithms, we performed the same characterisation in the entire FOCUS cohort (n = 360), alongside colon cancer (CC) samples from two additional independent non-randomised cohorts, including the publicly available cohort used in the development of CMS (GSE39582; n = 566) and a new independent CRC cohort profiled within the S:CORT programme (n = 258; SPINAL S:CORT cohort), with exclusion of ‘mixed’ PDS samples. As observed within KRAS mut tumours, there is a clear transcriptional distinction between each subtype, with the most prominent biological signalling cascades representing cell cycle-related activity in PDS1 and stromal/inflammatory signalling in PDS2, but only signal elevated in PDS3 being KRAS signalling down, which itself represents an indication of signalling repression (Fig. 2H, 2I; Supplementary Fig. 3A and 3B). Remarkably, these findings were again consistent across PDS groups, regardless of KRAS status. We next investigated if these unique transcriptional landscapes were underpinned by distinct transcription factor (TF) activity, using DoRothEA database 14 , which revealed that consistent with our pathway analysis results, cell-cycle related TFs, such as E2F2, were significantly active in PDS1 (Fig. 2J; Supplementary Fig. 3C). PDS2 tumours were significantly activated for n = 32 TFs, mainly related to stromal and inflammatory phenotypes, including SMAD3, STAT3, IRF1, ERG. While repressed for biological pathways within the Hallmarks collection in general, PDS3 exhibited activation of TFs relating to a diverse set of hormonal and developmental biologies, including FOXA2 (Fig. 2J; Supplementary Fig. 3C). Clinical and prognostic relevance of PDS classification To test the clinical relevance of PDS, we assessed relapse-free survival (RFS) rates according to PDS using clinical data from the non-randomised GSE39582 cohort. These analyses revealed that PDS2 and PDS3 displayed poor prognosis compared to PDS1 (Fig. 2K) regardless of KRAS mutational status (Supplementary Fig. 2F). Importantly, these findings were independently validated in the randomised PETACC-3 adjuvant clinical trial 15 , which further reinforced the poor prognosis of PDS3 compared to PDS1 and PDS2 (Fig. 2K). Given that the majority of PDS1 and PDS3 tumours were classified as CMS2, we next sub-stratified these tumours accordingly, which confirmed the clinical value of PDS classification in detecting novel aggressive subset of tumours, with CMS2-PDS1 having a significantly better outcome compared to CMS2-PDS3 tumours (Supplementary Fig. 2G). Remarkably, these analyses revealed that PDS3 tumours have as poor an outcome as the previously identified stroma-rich CMS4 tumour subtype (Fig. 2K; Supplementary Fig. 2G). Cancer-related signalling associated with PDS tumours In line with the Hallmarks and CMS associations, we observe increased immune and stromal populations in PDS2 compared to PDS1 and increased epithelial content in PDS1 compared to PDS2, with PDS3 tumours being intermediate for both with the ESTIMATE method (Fig. 3A; Supplementary Fig. 3D and 3E). Furthermore, quantitative cellular image analyses using H&E slides indicate that, in line with their associations with CMS2/3 classification, PDS1 and PDS3 tumours have similar tumour area characteristics, whereas PDS2 tumours contain reduced tumour (epithelium) area (Fig. 3B). We also assessed for the transcriptional-based proliferative index scores 16 and replication stress signature according to PDS. Intriguingly, given the poor prognostic outcomes for patients with PDS3 tumours, these assessments revealed significantly increased proliferation and high replication stress in PDS1 compared to PDS2 and PDS3 in a stepwise fashion (Fig. 3C). These findings were validated using Ki67 + immunohistochemistry (IHC) in the FOCUS cohort, which in combination with the transcriptional analyses, confirm that PDS3 classification identifies a group of transcriptionally inert and slow cycling tumours (Fig. 3D and 3E). Image-based H&E classifier for rapid PDS tumour classification While transcriptional classifiers provide valuable information for translational studies, their use for patient stratification can be undermined by the costs and rapid-turnaround timeframes required in clinical diagnostics. This led us to develop an image-based PDS machine learning classifier to detect PDS-associated morphological patterns in H&E digital slides, which itself was motivated by prior studies that successfully demonstrated the ability for such models to automatically capture associations between morphological patterns and molecular signatures across several cancer types 17 , including the image-based CMS (imCMS) classifier 18 . Here, we trained a convolutional neural network to predict PDS classes from input H&E digital slides, based on a methodology similar to imCMS, using PDS-labelled data from the FOCUS and SPINAL cohorts (Fig. 4A). We trained several models and assessed their performances under a 5-fold cross-validation protocol (Fig. 4B) with the aggregate confusion matrix of the classification results of 5 trained models on 5 different test partitions shown in (Fig. 4C). These results suggest that images from PDS1 and PDS2 tumours contain visual features that our models used to discriminate between these two classes (PDS1-vs-PDS2-3: AUROC = .740 ± .019; F1-score = .574 and PDS2-vs-PDS1-3: AUROC = .810 ± .033; F1-score = .618). However, our trained models were not able to identify PDS3-specific features as the learned features that were discriminative for PDS1 and PDS2 were also prominent in PDS3-labeled images, as shown by the poor cross-validation performances for this class (PDS3-vs-PDS1-2: AUROC = .557 ± .026; F1-score = 0.338). This result suggests that PDS3 has an intermediate morphology, which itself is aligned with the intermediate description of PDS3 via ESTIMATE and MCP-counter, making our models erroneously classify PDS3 images as either PDS1 or PDS2. Visual inspection of the regions that contributed the most to correct classification of PDS1 and PDS2, and to the misclassification of PDS3, confirmed the presence of PDS1-specific and PDS2-specific morphology in PDS3 images (Fig. 4D; Supplementary Fig. 4). Distinct stem-like and precursor traits associated with PDS tumour classification. Given that our AI-guided imPDS classifier was unable to detect robust morphological patterns associated with PDS3 tumours, in a further attempt to characterise the features associated with PDS-specific biology, we next examined a collection of established stem and precursor signatures in our bulk tumour transcriptional data according to PDS. These analyses revealed an elevation for tubular histology 19 and canonical crypt-base columnar (CBC) stem cell signatures 20 in PDS1, alongside elevated serrated histology 19 and regenerative stem cell (RSC) signatures 20 in PDS2 tumours (Fig. 5A; Supplementary 5A-E). PDS3, however, did not show any clear association with either stem or polyp signatures. To validate these stem cell associations, we selected a subset of cases (n = 20) from the SPINAL cohort for multiplex ISH (Fig. 5B-C), which confirmed elevation of epithelial LGR5 (red) in PDS1 tumours, elevation of epithelial ANXA1 (surrogate marker for Sca1; yellow) in PDS2 tumours and an absence of staining for both markers in PDS3 tumours (Fig. 5B-C). To confirm the associations between PDS and precursor signatures, we next applied the PDS classifier to a cohort of transcriptionally profiled polyp samples (S:CORT polyp cohort), pathologically defined as tubulovillous adenomas (TVA), sessile serrated lesions (SSL) and traditional serrated adenomas (TSA), which further confirmed the TVA association with PDS1, SSL association with PDS2 and lack of precursor association with PDS3 (Fig. 5D). Overall, these results indicate the presence of well characterised molecular and biological features associated with both PDS1 and PDS2 but provided limited insight into the biology underpinning PDS3. PDS1 and PDS3 tumours align along an intrinsic pluripotent-to-differentiated axis To identify traits underpinning PDS3 tumour biology, we interrogated the gene sets used for PDS classification, to identify genes specifically associated with PDS3 (n = 961), followed by assessment of potential biological interactions with Enrichr and STRING (Fig. 6A). There was a significant association between PDS3 biological signalling and components of the polycomb repressor complex (PRC); particularly EZH2, SUZ12 and REST (Fig. 6B, Supplementary Fig. 6A), which displayed a gradient of high to low gene expression across PDS1-2-3 (Fig. 6C). When components of PRC are activated/elevated, they repress a range of downstream targets linked to cellular differentiation 21 (Supplementary Fig. 6B). Using these previously defined PRC targets in line with an overall lower level of PRC genes in PDS3, we reveal a de-repression (and therefore significant elevation) of PRC targets in PDS3 compared to PDS1 and PDS2, using both pairwise and single sample GSEA in FOCUS (Fig. 6D). Given the relationship between PRC and MYC, we assessed a previously defined gene set of MYC targets that when elevated maintain a pluripotent state 21 , which revealed an inverse relationship between MYC target and PRC targets according to PDS classification (Fig. 6E). Remarkably, when assessed together across our bulk tumour data, there was a near perfect linear negative correlation between individual tumour scores along a MYC-PRC axis, with PDS1 tumours displaying high-MYC/low-PRC target expression and PDS3 tumour displaying the inverse low-MYC/high-PRC target expression (Fig. 6F). These findings were replicated in the SPINAL cohort (Supplementary Fig. 6C-D). These data suggest that, in combination with our data indicating lower levels of cell cycle, transcriptional activity, canonical and regenerative stem cells (Fig. 5), PDS3 tumours may be more differentiated when compared to more stem-rich and proliferative PDS1 and PDS2 tumours. To test this, we next analysed transcriptional data from an independent cohort of mouse colonic epithelial cells across six differentiation states (GSE143915 22 ), which revealed a significant association between Myc-PRC target gene expression levels and differentiation status; with stem cells displaying high-Myc/low-PRC target gene expression and enteroendocrine cells (EEC), displaying high-PRC/low-Myc target gene expression (Fig. 6G-H, Supplementary Fig. 6E). Specific assessment of gene associated within the PRC and MYC targets, alongside genes associated with neuroactive ligands and enteroendocrine biology confirm the upregulation of these biological and differentiation modules in human PDS3 tumours (Fig. 6K). Although the tumours employed throughout our study are pathologically-confirmed adenocarcinomas and not neuroendocrine tumours, we do see elevation of separate EEC and neuron cell transcriptional signatures in PDS3 tumours in our bulk tumour data (Fig. 6L). An alignment between PDS3 tumours and a neural-like identity was also observed using the TissueEnrich package 23 , which found an enrichment for cerebral cortex tissue-identity within the gene sets that classify PDS3 (Fig. 6M). Finally, using whole face tissue sections from our SPINAL cohort, we observed that protein expression for the differentiated EEC marker, such as chromogranin A (CRGA) is restricted to surrounding normal glands in PDS1 tumours, however these markers are expressed in small focal regions of neoplastic glands and/or surrounding neural tissue in PDS3 tumours (Fig. 6N); Dual-species disease positioning of PDS classification in genetically engineered mouse models and patient-derived xenografts. As dual-species options are available for the pathway collections used to develop the PDS classifier, we utilised bulk tumour transcriptional data from n = 51 GEMMs across n = 6 genotypes – Apc fl/+ (A); Apc fl/+ Kras G12/+ (AK); Braf V600E/+ Trp53 fl/fl (BP) and Braf V600E/+ Trp53 fl/fl Notch1 Tg/+ (BPN), Kras G12D/+ Trp53 fl/fl Notch1 Tg/+ (KPN); Kras G12D/+ Trp53 fl/fl (KP), to assess how well mouse models align to human PDS classification and characterisation (Fig. 7A). Following PDS classification, A and AK models were exclusively PDS1, while KP, KPN, and BPN models were divided into PDS2 and PDS3 (n = 1 BPN as PDS1), with BP models aligning with PDS3 (Fig. 7A; Supplementary Fig. 7A). There was a clear alignment of the biological hallmarks and CMS classifications associated with PDS1 and PDS2 between human and mouse tumours; with enrichment for CMS2 alongside MYC and cell cycle hallmarks in PDS1, while PDS2 GEMMs tumours were classified as CMS1/4 and had elevated inflammatory/stromal signalling (Fig. 7B). However, in contrast to human biology associated with PDS3, GEMM tumours classified as PDS3 were not associated with CMS2 or the transcriptional repression for hallmarks signalling observed during the characterisation of human tumours, displaying signalling similar to GEMMs classified as having a mixed subtype (Fig. 7B). An alignment between TF activation states in human and mouse PDS1 and PDS2 tumours was confirmed; however these analyses again indicated that the mouse models used did not accurately represent PDS3 biology at a regulon level (Fig. 7C). Assessment of proliferation index and replication stress according to both PDS classification and genotype demonstrated elevation of both phenotypes in PDS1, but no further suppression was observed in PDS3 compared to PDS2 (Fig. 7D; Supplementary Fig. 7B). Finally, while elevation for the Myc-PRC targets signalling continuum is maintained in PDS1 GEMMs, there was again little distinction between PDS2 and PDS3 (Fig. 7E-F). These findings suggest that tumours from the PDS3 mouse models do not faithfully replicate the phenotypic biology underpinning human PDS3 tumours, indicating a critical limitation for pre-clinical development and testing of PDS3-specific therapeutic options. Overall, these data indicate the availability of appropriate GEMM models for human PDS1 and PDS2 evaluation, however the genotypes and models used here were inconsistent with human PDS3 biology. Assessment of bulk tumour-derived PDS subtype-specific phenotypes in CRC epithelial single cell RNA data. Leveraging single cell RNA sequencing data (scRNAseq) from n = 49,155 epithelial cells derived from n = 63 CRC tumours 6 , we next set out to test how well our bulk data analyses translate into this more granular data type. In line with PDS analyses in bulk tumours and epithelial-specific intestinal lineages, these analyses revealed that neoplastic cells maintain a polarised state along a MYC-PRC targets axis, when viewed according to individual patient clusters (Fig. 8A-C, Supplementary Fig. 8A), or when clustered according to the ontology-level classification method used for PDS development (Fig. 8D). Cell cycle annotation revealed that individual PDS1-like neoplastic cells that display higher MYC target signalling were more likely to be in S or G2M phase, whereas PDS3-like cells elevated for PRC target expression were more likely to be in G1 phase (Fig. 8E). These results are more pronounced when assessed in cells at either end of the MYC-PRC continuum, which are most representative of PDS1 and PDS3 biology (Fig. 8F, Supplementary Fig. 8B-D). Additional characterisation of these clusters further confirms faithful alignment between bulk and scRNAseq, as the relations are maintained between phenotypes of elevated proliferation and replication stress in MYC target-high PDS1-like cells, alongside elevation of the KRAS repression signature (KRAS signal DN) and neuroactive ligand receptor interactions in PRC target-high PDS3-like cells (Fig. 8G-H, Supplementary Fig. 8E). Additionally, while the associations between enteroendocrine cell signalling and PDS biology observed in bulk RNA data and IHC assessments were again evident in single tumour epithelial cells, it was clear that not all PDS3-like cells were EECs (Fig. 8I, Supplementary Fig. 8F). Finally, assessment of iCMS labels for tumour epithelial cells, originally derived within this scRNAseq cohort, confirmed findings from our bulk tumour analyses, with a non-exclusive yet significant association between PDS1 and PDS3 biological classification and iCMS (Fig. 8J, Supplementary Fig. 8G). Discussion Numerous studies, including those employing CMS and CRIS signatures, have used individual gene-level data for molecular subtyping class discovery, followed by correlation of identified groups with existing phenotype data 3 , 5 . However, in this study we leverage pathway-level data as an input for class discovery, using pre-defined sets of experimentally validated signatures that mechanistically underpin important cancer-relevant functions, to provide a more direct link between cancer phenotypes in a way that transcends mutational status and current transcriptional subtyping approaches. This approach identified three pathway-derived subtypes, PDS1-3, merging the inflammatory/stromal tumours together as PDS2, while at the same time sub-stratifying epithelial-rich tumours into two distinct pathway-derived subtypes, PDS1 and PDS3. Biological and molecular characterisation reveal that PDS3 lesions display significantly lower cell cycle, stress response and proliferation rates, alongside increased cellular differentiation compared to PDS1/2 tumours. Remarkably, given the repression for many cancer-related biological phenotypic features that dominate PDS3 tumour biology, these lesions display the worst relapse-free survival rates in multiple adjuvant cohorts, including the randomised phase III PETACC-3 clinical trial 15 . These data reveal a novel subtype of highly aggressive tumours, PDS3, that display relapse rates equivalent to those in the stroma-rich CMS4 subtype, previously associated with having the worst outcomes in this setting (summarised in Fig. 8K). Data presented here are consistent with a role for EZH2 in maintaining an EMT regulatory network and stem-rich state through H3K27-mediated repression, similar to that recently described in bladder cancer 24 . However, this does not explain the poor outcomes associated with PDS3 tumours in the adjuvant setting; a point that will certainly require further investigation. It is still possible that the repressed and differentiated patterns that dominate PDS3 bulk tumour profiles are concealing small subsets of aggressive and/or chemotherapy-resistant subclones, similar to mesenchymal subpopulations recently described in melanoma 25 . A recent study identified how FOXA2 orchestrates lineage plasticity driving an adeno-to-neuroendocrine lineage transition in prostate cancer, all of which align with the neural-like traits and activation of FOXA2 we observe in in PDS3 tumours 26 . Importantly, unlike in the prostate setting, the IHC and morphology data presented in our current study only indicate the presence of focal reactivity for neuroendocrine markers and would not support a classification of PDS3 colon tumours as cases of adenocarcinoma with partial neuroendocrine differentiation and certainly not colonic neuroendocrine tumour (NETs). Furthermore, the poor outcomes associated with PDS3 align with a recently reported role for PRC components in determining how well tumours, and tumour cells, respond, adapt and survive during sustained environmental stress; where cells deficient in epigenetic regulators, particularly EZH2, have a superior fitness advantage over EZH2-proficient tumours 27 . These stress-resistant subpopulations were characterised by low cell cycling, “transcriptional numbness” and overall phenotypic inertia, indicating there was a non-genetic basis for development of this fitness advantage, which is remarkably similar to the characteristics of PDS3 tumours presented here. Intriguingly, the presence of senescence-like physiology may itself expose a therapeutic opportunity in PDS3 tumours 28 . Finally, as our study has focussed on molecular information, there may be a series of unaccounted for epidemiological or microbial/viral factors that underpin both the biological traits of the PDS classes and their associated clinical outcomes. A recent single cell RNAseq study in tumour epithelial cells used gene-level data for class discovery, resulting in sub-stratification of stroma-rich CMS4 tumours into the distinct prognostic groups based on intrinsic biology; iCMS2 and iCMS3. However this study did not identify different subtypes within epithelial-rich CMS2 tumours 6 . In data presented here, we clearly demonstrate that by using pathway-level data for class discovery in bulk data, rather than a gene-level method in single cell neoplastic lineages, CMS2 tumours can be robustly and reproducibly sub-stratified into clearly distinct prognostic groups driven by opposing biological signalling. Importantly, these traits are evident across multiple bulk transcriptomic cohorts or when applied to the epithelial scRNAseq data used for development of the iCMS classifier. The biological pathway activation approach that underpins PDS classification will also provide a novel basis for forward and reverse translation studies, which are essential to enable tumour subtyping to move beyond providing prognostic information alone, and to deliver real clinical impact through improved prediction of chemotherapeutic or targeted therapeutic responses. Although the CMS, iCMS, CRIS and PDS approaches may be seen as competing, classification in parallel can reveal important granularity in both epithelial-rich and stroma-rich tumours, which is critical for the development of new treatment approaches. In data presented here, we demonstrate how well represented human PDS biology is within a series of large pre-clinical models, revealing that PDS1 and PDS2 align well with a range of available GEMMs. Importantly, GEMM tumours classified as PDS3 failed to faithfully align to the biological signalling that characterise this subtype in human tumours, which, until addressed, will limit more comprehensive mechanistic studies. The lack of accurate PDS3 models is intriguing but perhaps not surprising, based on how current systems are developed; given the relatively short lifespan of mice, there is a bias towards the development of fast growing lesions that thrive following an “all at once” induction of driver genes 29 . This scenario does not replicate the sequential accumulation of genetic alterations and environmental changes seen in human tumour development, which can occur over a 15 + year period 30 . As such, if PDS3 biology is to be modelled in a way that more faithfully aligns to human tumours, methodologies underpinning model development require urgent attention. Modelling PDS may require the use of sequential and/or temporally inducible alleles, in combination with non-genetic and environmental factors that mimic precursor and tumour developmental latencies, or indeed variations in cell-of-origin during tumour induction. Genetic models of cancer development have defined much of our understanding of tumour biology 8 ; however it is increasingly clear that substantial phenotypic and clinical heterogeneity persist within genetically identical tumours. The PDS classifier was developed in KRAS mut colorectal tumours, yet we demonstrate how it can be used to stratify tumour samples in a genotype-agnostic manner across numerous independent cohorts of colorectal, lung and pancreatic cancer. Using a co-culture system, Tape and colleagues 32 demonstrated how extrinsic factors can trigger intrinsic signalling in non-mutated organoids that mimic signalling states induced by mutations in KRAS +/- loss of TP53 . In line with this reductionist model, data presented here provide evidence that many cancer-relevant signalling cascades can be activated/repressed independent of mutational status, meaning that genotype alone cannot explain the signalling and phenotypes observed. As such, our study provides an additional informative link between gene expression data and clinical/biological phenotypes. Mapping the ongoing interplay between intrinsic and extrinsic signalling within each individual tumour may explain how activation patterns for each pathway are reached, and detangling of such nuanced networks of inter-compartmental crosstalk may ultimately result in the identification of unique tumour-specific explanations, or perhaps more likely, multiple individual cellular-specific findings. Additional studies are now clearly needed to mechanistically explain the development, evolution and progression patterns underpinning PDS3 in particular. Until such time that spatially-resolved multi-omics information can be converted into clinical tools, the classifier developed within this study, alongside existing classifiers such as CRIS, CMS and iCMS, provide the basis for stratification of tumours into discrete biological groups that can be used as a template for mechanistic and translational studies. Methods and Materials Datasets. The majority of the datasets utilised in this study were accessed through the S:CORT programme 33 including FOCUS (GSE156915 34 ), SPINAL (Internal S:CORT) and Polyp Dataset (Internal S:CORT), where microarray expression profiles, mutation, clinical, immunohistochemistry (IHC), tissue blocks and tumour microarrays (TMAs) were available. FOCUS: MRC-funded randomised trial cohort consisting of 360 formalin-fixed paraffin-embedded primary tumour samples for metastatic CRC. SPINAL: 258 FFPE samples from CRC patients, mixed stages. Other publicly available datasets were accessed from Gene Expression Omnibus (GEO) with accession number: GSE39582 2 , GSE76402 3 and GSE143915 22 . The validation of clinical association was carried out PETACC-3 cohort 15 . Gene Expression – Microarray and RNA-Sequencing. In all microarray datasets, the probes-to-genes were collapsed using the collapseRows function in WGCNA R package (v1.70-3), where the probe with the highest average value per gene was selected 39 . For TCGA COREAD RNA-Seq data, TCGAbiolinks R package (v2.16.1) was used to downloaded HT-Seq counts 38 . Using varFilter in genefilter R package (v1.70.0), low variance genes (var.cutoff = 0.25) were excluded. The quantile normalisation and Log2 transformation was applied to the count matrix. STAR counts for TCGA LUAD and TCGA PAAD were also downloaded using the TCGAbiolinks R package (v2.25.2). The publicly available microarray gene expression profiles (GEP) utilised in this study were downloaded from Gene Expression Omnibus (GEO), including, GSE39582, FOCUS, GSE39582, SPINAL, PETACC-3, FOxTROT, TCGA (COAD, LUAD, PAAD), Polyp Validation, GEMMs, GSE143915. Genetically engineered mouse models (GEMMs). All animal experiments were performed according to a UK Home Office licence (Project License 70/8646) and were reviewed by the animal welfare and ethical board of the University of Glasgow. Mice of both genders aged for about 6 to 12 weeks and were systematically induced with a single injection of 2mg tamoxifen (Sigma-Aldrich, T5648) using intraperitoneal injection and sampled at clinical endpoint, which was defined as weight loss and/or hunching and/or cachexia. All experiments were performed on mice with C57BL/6 background. RNA was extracted using either a RNeasy mini kit (Qiagen) or TRIzol reagent (Thermo Fisher Scientific) and its concentrations were assessed using a NanoDrop 200c spectrophotometer (ThermoScientific). RNA quality was evaluated using an Agilent 220 Tapestation, screentape and sequenced using an Illumina TruSeq RNA sample prep kit, then run on an Illumina NextSeq using the High Output 75 cycles kit (2 x 36 cycles, paired end reads, single index). The quality of raw sequence was assessed using the FastQC algorithm (v0.11.8). Sequences were trimmed to remove adaptors and low-quality base calls, defined as those with a Phred score of < 20, using the Trim Galore tool (v0.6.4). Thereafter the trimmed sequences were aligned to the mouse genome build GRCm38.98 using HISAT2 (v2.1.0), then FeatureCounts (v1.6.4) was used to determine raw counts per gene. Two mouse cohorts, that were used in our previous study 40 , namely small cohort (n = 18) and large cohort (n = 39) were amalgamated to create a larger cohort of GEMM. APN and AP models were excluded, as the batch they were sequenced in was deeply confounded by genotype, resulting in a collection of n = 51 tumour samples, including 6 genotypes: Apc fl/+ (A); Apc fl/+ Kras G12/+ (AK); Braf V600E/+ Trp53 fl/fl (BP) and Braf V600E/+ Trp53 fl/fl Notch1 Tg/+ (BPN), Kras G12D/+ Trp53 fl/fl Notch1 Tg/+ (KPN); Kras G12D/+ Trp53 fl/fl (KP). After batch correction using ComBat _seq function in sva R package (v3.40.0), vst function in DESeq2 R package (v1.32.0) was applied to normalize the read counts. Unsupervised Discovery of Pathway Derived Subtype (PDS). Four well-curated publicly available gene set collections: 1) BIOCARTA (gene set: n = 289), 2) KEGG (n = 186), 3) PID (n = 196), and 4) REACTOME (n = 1499), were accessed from the Molecular Gene Signature Database (MSigDB) via msigdbr R package (v7.0.1) and were utilised to generate a matrix of single sample ontology scores from gene expressions of the FOCUS cohort using GSVA R package (v1.26.0). The parameters min.sz = 10 , method=‘ssgsea’ and ssgsea.norm = T was set, that resulted with ontology scores for 1783 gene sets. A subset of 165 KRAS mut primary tumour CRC samples were selected from the single sample score matrix (excluding KRAS wild-types, BRAF and NRAS mutants) for unsupervised class discovery phase. The score matrix was subjected to dimensionality reduction t -SNE analysis, and the two continuous variables (Dim1 and Dim2) were obtained using Rtsne R package (v0.15). The variables were scaled before applying unsupervised k -means clustering. The silhouette width and elbow methods determined k = 3 as an optimal number of clusters ( cluster R package; v2.1.2, and factorextra R package; v1.0.7), and bootstrap resampling method from fpc R package (v2.2.3) coincided k = 3 as the highly stable number for clustering. Following the unsupervised k -means clustering, three groups of clusters were named as Pathway Derived Subtypes (PDS): PDS1, PDS2 and PDS3. Development and application of PDS Classification System. Utilising the discovery subset (n = 165) where PDS classes were defined previously, the discovery set was randomly divided into a training set (n = 125) and a test set (n = 40) based on the bootstrap resampling method using caret R package (v6.0-90). Three different classification algorithms were tested, including the Nearest Shrunken Centroid (NSC; or pam – Prediction Analysis of Microarrays), Lasso and Elastic-Net Regularised Generalised Linear Model (glmnet) and Support Vector Machine via Radial Basis Function (svmRBF), implemented in caret R package (v6.0-90). As a feature selection step to reduce the number of gene sets and draw out only subgroup specific gene sets, the scores above average specific to each subgroup were selected and further highly correlated gene sets (> 0.9) were excluded, resulting in 626 gene sets in total. To make it feasible for users, the test run was performed on gene expression matrix and the ontology score conversion steps were implemented within the classification model. The gene expression matrix of the test samples was first converted to the single sample scores with gsva function along with the parameters ssgsea.norm = F that generated unscaled scores. The unscaled scores were scaled using the min-max scaling method employed by the gsva function, where the minimum value of -8554.114, and maximum value of 12374.819, determined during the class discovery phase, was used. Once the score matrix of the test samples was generated, it was batch-corrected against the training set as the reference batch using the ComBat function from sva R package (v3.42.0). Before running the classification algorithms, leave-one out cross-validation (LOOCV) was employed in the training set to minimise overfitting of classification. The classification algorithms were trained on the training set and the corresponding hyperparameters were adjusted to finalise the models. Out of the three classification models, the svmRBF algorithm displayed a high classification performance on the test data. Thus, the svmRBF was selected to develop the PDS classification system. The PDSclassifier R package has been developed for the PDS classification model that is available to share (see ‘Data and Code Availability’). Subsequently, the PDSclassifier was employed to classify samples from other transcriptomic validation datasets. Gene Set Enrichment Analysis. To define subtype-specific biological association, gene set enrichment analysis (GSEA) was employed using fgsea 41 R package (v1.21.0) with eps = 0 and nPerSimple = 10000 . For fgsea , a ranked gene list was first obtained for each comparison in each dataset via limma R package (v3.50.3). The comparison between subtypes were made in a grouped pairwise manner. Statistical significance was measured with Benjamini-Hochberg (BH) False Discovery Rate (FDR) < 0.05 and normalised enrichment score (NES) indicates the upregulation (positive value) and down-regulation (negative value). Additionally, single sample enrichment scores were generated via GSVA R package (v1.42.0) with ‘ ssgsea’ method 42 . Gene Signatures. The majority of the gene set signatures, including the ‘Hallmarks’ gene set collection, was accessed from the Molecular Signature Database (MSigDB) using msigdbr R package (v7.4.1) for both human and mouse species 4 . Other gene set signatures include, (i) stem-related crypt-base columnar (CBC) and regenerative stem cells (RSC) 20 (ii) precursor polyp-related tubular and serrated signatures obtained from differential gene expression analysis between tubular and serrated adenomas from GSE45270 19 using limma 43 R package (v3.46.0), (iii) enteroendocrine cell (EEC) cell markers from PanglaoDB database 44 and, (iv) MYC and PRC target modules 21 , used for the enrichment analyses. These gene signatures were also applied to mouse data where directly applicable via MSigDB or, the gene signatures were converted to the mouse orthologues via the ensembl using biomaRt R package (v2.50.3) 45 . Proliferative Index, Replication Stress, and Intestinal Stem Cell Index. Replication stress was calculated from a collection of transcriptional signatures associated with cell-cycle and DNA repair extracted from the MSigDB for both species, where GSVA scores were generated using ‘gsva’ method from GSVA 42 R package (v1.42.0), followed by the total sum of the GSVA scores per sample across gene sets, as previously described 46 . The transcriptomic measure of proliferation was measured with ProliferativeIndex 16 R package (v1.0.1) which calculates proliferative index (PI) from a list of proliferative cell nuclear antigen (PCNA)-associated genes. For mouse, the PCNA-associated gene signature was converted to mouse orthologues as mentioned previously before applying the signature to calculate the PI score. Intestinal Stem Cell Index (ISCindex) provides a continuum score from gene expression dataset that represents stem cell phenotype with the extreme ends of scoring scale to be either strongly conventional crypt-base columnar stem cell (CBC) or a regenerative stem cell (RSC) phenotype. The ISCindex 20 R package (v0.0.0.9) was downloaded and used as previously described. Transcription Factor Activity Analysis. For the quantification of transcription factor (TF) activity from gene expression profiles, the collection of well-curated TFs and their targets in DoRothEA 13 database was accessed where the TF-target interaction with high confidence A and B were selected for the analyses on both human and mouse using dorothea R package (v1.6.0). The statistically significant difference on TF activity between subtypes with p-value of < 0.05 was determined using ‘ rowTtest’ function from viper 47 R package (v1.28.0). A list uniquely activated TFs per subtype across cohorts were identified and visually presented in a heatmap. Cell Lineage Analysis. The ESTIMATE was applied to produce immune and stromal fractions as well as tumour purity scores using estimate R package (v1.0.13) 48 . Microenvironment Cell Populations-counter (MCP-counter) was used to quantify the cellular abundance in a heterogenous tissue from transcriptomic data that generates infiltration estimate scores for two stromal and eight immune cell populations 49 . MCP-counter analysis was performed using MCPcounter R package (v1.2.0). Moreover, to explore the wider immune and stromal contexture for finer granularity, xCell was applied to calculate infiltration enrichment scores for 64 different immune and stromal cell populations using xCell R package (v1.1.0) 50 . CRC Molecular Subtyping. Tumour samples in each cohort were classified into CMS using the random forest (RF) method from CMSclassifier (v1.0.0) at default threshold with exception to the FOCUS and SPINAL cohorts where the CMS posterior probability threshold levels were reduced to 0.4 due to large number of non-consensus ‘unknown’ samples 5 . CMS on mouse data were called using ‘Option C’ classification in MmCMS R package (v0.1.0) 40 . CRIS classification was made using CRISclassifier (v1.0.0) at default setting 3 . Unclassified samples were determined using the recommended Benjamini-Hochberg (BH) False Discovery Rate (FDR) of > 0.2. For iCMS classification, the iCMS gene signatures were extracted to create the iCMS template which was subsequently used with the nearest template prediction method embedded in the CMScaller (v2.0.1) 51 , as previously described 6 . Samples above 0.05 FDR were classed as ‘unknown’. Immunohistochemistry (IHC) and Digital Histology Scoring. Tissue microarrays (TMAs) were scanned at x20 magnification and imported into QuPath 52 (v0.2.3). The suitability of individual Ki67 + IHC-stained cores for inclusion was determined by manual visual assessment of the scanned images, after application of the QuPath TMA dearraying tool. Colour deconvolution was applied to separate stains, followed by tissue detection (pixel threshold; resolution: low (7.96µm/px); channel: average channels; gaussian prefilter; smoothing sigma: 2.0; tissue threshold: 235). Cells were detected within the annotated tissue (requested pixel size: 0.5; nucleus background radius: 15.0; median filter radius: 0.0; sigma: 2.5; minimum area: 10.0; maximum area: 300.0; threshold: 0.1, max background intensity 1.0; exclude DAB: false; cell parameters: default; general parameters: default) and smoothed (radius: 25µm). An object classifier (random trees; default settings) was trained by examples of annotated tumour epithelium and stroma. Set cell intensity classification (nucleus: DAB OD mean: 0.18) to differentiate positive and negative tumour epithelium and stroma. Additionally, an H score was also generated by setting a three-tier cell intensity classification (Nucleus: DAB OD mean: 0.10; 0.25; 0.42). 354 patients with matched PDS call and Ki67 + assessment. The total tumour area was also enumerated and assessed based on the digital histology scoring via HALO platform (Indica Labs, Albuquerque, NM, USA) on the Hematoxylin and Eosin (H&E) whole slides images from FOCUS (n = 356), as previously described 53 . Digital Image-based PDSclassifier. We developed a set of deep learning classifiers and analysed their performances using whole slide images (WSIs) from the FOCUS and SPINAL cohorts. After rejecting images of poor quality and images from patients with undefined PDS calls by pathologist review, we used a dataset of 997 WSIs of H&E-stained resection specimens from 520 patients (PDS1: n_slides = 329; PDS2: n_slides = 314; PDS3: n_slides = 354). We conducted experiments under a 5-fold cross-validation protocol: for each fold, we split the data according to a 60-20-20 training-validation-test distribution such that classes and cohorts were stratified. We made sure that the validation and test splits did not overlap across the five folds and that images from the same patient were always in the same split. Tumour regions were manually annotated in all WSIs: we then restricted our experiments to the use of image data from these regions to prevent potential classification bias from non-tumour regions. The tumour regions of these WSIs were tiled into sets of image patches of size 318x318px extracted at magnification 5x (resolution ~ 2µm/px) with 50% overlap. We used a customized 50-layer ResNet as a deep learning architecture to process such input image patches and output a probability density over the PDS classes. Each image patch was labelled using the PDS class of their WSI of origin and our models were trained to maximize the output probability for the target class (minimization of the cross-entropy loss) using mini-batches of size 16. At inference time, we applied the trained models on all the generated tiles from a given WSI and then averaged their predicted probability densities to produce slide-level probability estimates. We then selected the class with the highest relative probability score as the image-based PDS call for this WSI. This approach was based on the weakly supervised learning protocol proposed within the imCMS study 18 . The classification performance of the trained models was systematically assessed using the test patrician for each fold. Mutation and Copy Number Profiles. Mutational and copy number associations between PDS were determined, when available, across at least four different CRC cohorts: FOCUS 34 , GSE39582 2 , SPINAL (Internal S:CORT) and TCGA COREAD 35 . The proportion of driver mutations and their variants KRAS , BRAF and TP53 were examined in detail across these cohorts. The mutational data for TCGA COREAD were retrieved from the GDC via TCGAbiolinks R package (v2.25.2) and analysed using maftools R package (v2.12.0) 54 . Oncoprint was also used to interrogate and visualise the genetic alternations in key driver genes along signalling pathways, including WNT, MAPK, PIK3CA, cell cycle and TGF-β pathways using ComplexHeatmap (v2.12.0) and CRAN circlize (v0.4.15) R packages 55 . For FOCUS and SPINAL, copy number chromosomal arm calls and copy number estimations per gene were available via the S:CORT consortium. For TCGA COREAD, the GISTIC data was accessed via GDC data portal 35 . Survival Analyses. Two different cohorts (GSE39582 and PETACC-3) were explored for clinical and prognostic evaluation. Only PDS samples were considered for the analysis, excluding ‘mixed’ samples. Moreover, patients with missing information on relapse-free status (RFS), relapse-free months or chemotherapy treatment status, and patients with record of less than a month were excluded from all survival analyses in GSE39582. The Cox proportional hazards method was also performed to calculate the hazard ratio and confidence intervals (CI) for statistical group comparisons. All survival analyses and visualisation were carried out with survival (v3.2-13), survminer (v0.4.9) and ggplot 2 (v3.3.6) R packages. Single Cell Data Analyses. The processed count expression matrix for n = 49,155 epithelial cells and the corresponding epithelial metadata were downloaded through the Synapse under the accession code syn26844071, as previously described 6 . The count expression matrix for tumour epithelial cells were normalised using SCTransfrom in Seurat R package (v4.1.1) 56 with method=‘glmGamPoi’ implemented from glmGamPoi R package (v1.8.0). Enrichment scores per cell from signatures (KRAS Signal DN (HALLMARK) 4 , MYC/PRC target modules 21 , neuroactive ligand receptor interaction (KEGG), enteroendocrine cells 44 , proliferative index 16 , PDS-specific gene set (n = 626) were calculated using enrichIt function in the escape R package (v1.6.0) with method=‘ssgsea’ and min.size = 1 . For replication stress, the list of gene sets was used to get the enrichment scores per cell, followed by the sum of enrichment scores per cell. These scores scaled between range of -1 to 1 using rescale function in scales R package (v1.2.0). The CellCycleScoring function in Seurat R package predicted the cell cycle phase per cell into G1, S and G2M. Using MYC and PRC target enrichment scores, cells were defined ‘High’, ‘Mid’ and ‘Low’ by tertile split, where MYC-target-High and PRC-target-High cells were re-evaluated and compared. Additional R packages for single cell analyses include, SeuratDisk (v0.0.0.9020) and SeuratWrappers (v0.3.0). Statistics and Data Visualisation. Statistical analysis conducted in this study has been performed in R using stats (v4.2.1) or ggpubr (v0.4.0) R package for plots, including Student’s t -test, Kruskal-Wallis rank-sum test, Fisher’s exact test, and Pearson’s correlation coefficient test. For copy number by arm analysis, Pearson’ Chi-squared test post-hoc analysis was performed using chisq.posthoc.test R package (v0.1.2) and adjusted p-value with Benjamini-Hochberg using p.adjust function in stats R package. Other R packages that have been utilised for data analysis and data visualisation include, ggtern (v3.3.5), ComplexHeatmap (v2.10.0), circlize (v0.4.15), umap (v0.2.8.0), ggplot2 (v3.3.6), patchwork (v1.1.1), riverplot (v0.10), ggforce (v0.3.3), RColorBrewer (v1.1-2). Data and Code Availability. The PDSclassifier R package (v0.1.0) is available on the Molecular Pathology Lab GitHub ( https://github.com/MolecularPathologyLab/PDSclassifier ). Declarations Financial Support: This work was supported by a CRUK early detection grant (Dunne; A29834), a CRUK International accelerator programme, ACRCelerate, (Dunne, Sansom, Leedham, Lawler, Maughan; A26825), a UK Medical Research Council (MRC) and CRUK co-funded Stratified Medicine Consortium programme grant, S:CORT, (Dunne, Lawler, Maughan; MR/M016587/1), an MRC National Mouse Genetics Network programme (Dunne, Leedham; MC_PC_21042), CRUK Beatson institute funding (Sansom; A21139, A17196 and A31287). Cancer Research UK programme Grant (Leedham; DRCNPG-Jun22\100002), HDR UK Grant (Lawler), AIRC, Associazione Italiana per la Ricerca sul Cancro, Investigator Grants 20697 (Bertotti) and 22802 (Trusolino), Promedica Foundation F-87701-41-01 (Koelzer), a My First AIRC Grant (Isella; ID 19047); AIRC 5x1000 grant 21091 (Trusolino, Bertotti); European Research Council Consolidator Grant 724748 BEAT (Bertotti); H2020 grant agreement no. 754923 COLOSSUS (Trusolino); H2020 INFRAIA grant agreement no. 731105 EDIReX (Bertotti); Fondazione Piemontese per la Ricerca sul Cancro-ONLUS, 5x1000 Ministero della Salute 2016 (Trusolino). General support for the Dunne research group via the QUB Foundation. Author Contributions: SBM: initial data-driven investigation, conceptualisation, discovery and development of classifier, methodology, data analysis, data visualisation, writing-review and editing, RMB: investigation, data analysis, data visualisation, writing-review and editing, ML: digital image-based classifier, histological data analysis, writing-review and editing, SMC: data analysis, data visualisation, writing-review and editing, NCF: histological, data analysis, data visualisation, writing-review and editing, PT: data analysis, data visualisation, writing-review and editing, AC: mouse experiments, mouse model data generation, writing-review and editing, TM: mouse experiments, mouse model data generation, writing-review and editing, AKN: mouse experiments, mouse model data generation, writing-review and editing, KG: mouse data curation, writing-review and editing, RA: data analysis, data visualisation, writing-review and editing, SM: data analysis, data visualisation, writing-review and editing, EM: multiplex analysis, writing-review and editing, HD: multiplex analysis, writing-review and editing, EG: xenograft data generation, data curation, data analysis, writing-review and editing, MV: xenograft data generation, data curation, data analysis, writing-review and editing, ER: data analysis, data visualisation, writing-review and editing, KR: data generation, sequencing, writing-review and editing, SS: writing-review and editing, AM: writing-review and editing, CB: data analysis, writing-review and editing, EH: data analysis, writing-review and editing, NS: writing-review and editing, HH: writing-review and editing, BA: writing-review and editing, ED: data analysis, writing-review and editing, AB: data curation, software, writing-review and editing , SR: data generation, resource, writing-review and editing, CI: writing-review and editing, CM: writing-review and editing, AB: resources, supervision, writing-review and editing, LT: resources, supervision, writing-review and editing, ML: resources, pathological analysis, supervision, writing-review and editing, EK: conceptualisation, supervision, writing-review and editing, ST: resources, supervision, writing-review and editing, TM: resources, supervision, writing-review and editing, ML: resources, supervision, writing-review and editing, SJL: resources, supervision, writing-review and editing, VK: digital image-based classifier, histological data analysis, resources, supervision, writing-review and editing, OJS: resources, supervision, writing-review and editing, PDD: initial data-driven investigation, conceptualisation, resources, supervision, writing-original draft, writing-review and editing. References Budinska, E. et al. Gene expression patterns unveil a new level of molecular heterogeneity in colorectal cancer. J. Pathol. 231, 63–76 (2013). Marisa, L. et al. Gene Expression Classification of Colon Cancer into Molecular Subtypes: Characterization, Validation, and Prognostic Value. PLOS Med. 10, e1001453 (2013). Isella, C. et al. Selective analysis of cancer-cell intrinsic transcriptional traits defines novel clinically relevant subtypes of colorectal cancer. Nat. Commun. 8, 15107 (2017). Liberzon, A. et al. The Molecular Signatures Database Hallmark Gene Set Collection. Cell Syst. 1, 417–425 (2015). Guinney, J. et al. The consensus molecular subtypes of colorectal cancer. Nat. Med. 21, 1350–1356 (2015). Joanito, I. et al. Single-cell and bulk transcriptome sequencing identifies two epithelial tumor cell states and refines the consensus molecular classification of colorectal cancer. Nat. Genet. 54, 963–975 (2022). Crick, F. Central Dogma of Molecular Biology. Nature 227, 561–563 (1970). Vogelstein, B. et al. Genetic Alterations during Colorectal-Tumor Development. N. Engl. J. Med. 319, 525–532 (1988). Fearon, E. R. & Vogelstein, B. A genetic model for colorectal tumorigenesis. Cell 61, 759–767 (1990). Barras, D. et al. BRAF V600E Mutant Colorectal Cancer Subtypes Based on Gene Expression. Clin. Cancer Res. 23, 104–115 (2017). Middleton, G. et al. BRAF-Mutant Transcriptional Subtypes Predict Outcome of Combined BRAF, MEK, and EGFR Blockade with Dabrafenib, Trametinib, and Panitumumab in Patients with Colorectal Cancer. Clin. Cancer Res. 26, 2466–2476 (2020). Papatheodorou, I., Oellrich, A. & Smedley, D. Linking gene expression to phenotypes via pathway information. J. Biomed. Semant. 6, 17 (2015). Garcia-Alonso, L., Holland, C. H., Ibrahim, M. M., Turei, D. & Saez-Rodriguez, J. Benchmark and integration of resources for the estimation of human transcription factor activities. Genome Res. 29, 1363–1375 (2019). Garcia-Alonso, L. et al. Transcription Factor Activities Enhance Markers of Drug Sensitivity in Cancer. Cancer Res. 78, 769–780 (2018). Van Cutsem, E. et al. Randomized Phase III Trial Comparing Biweekly Infusional Fluorouracil/Leucovorin Alone or With Irinotecan in the Adjuvant Treatment of Stage III Colon Cancer: PETACC-3. J. Clin. Oncol. 27, 3117–3125 (2009). Ramaker, R. C. et al. RNA sequencing-based cell proliferation analysis across 19 cancers identifies a subset of proliferation-informative cancers with a common survival signature. Oncotarget 8, 38668–38681 (2017). Lafarge, M. W. & Koelzer, V. H. Towards computationally efficient prediction of molecular signatures from routine histology images. Lancet Digit. Health 3, e752–e753 (2021). Sirinukunwattana, K. et al. Image-based consensus molecular subtype (imCMS) classification of colorectal cancer using deep learning. Gut 70, 544–554 (2021). De Sousa E Melo, F. et al. Poor-prognosis colon cancer is defined by a molecularly distinct subtype and develops from serrated precursor lesions. Nat. Med. 19, 614–618 (2013). Gil Vazquez, E. et al. Dynamic and adaptive cancer stem cell population admixture in colorectal neoplasia. Cell Stem Cell 29, 1213–1228.e8 (2022). Kim, J. et al. A Myc Network Accounts for Similarities between Embryonic Stem and Cancer Cell Transcription Programs. Cell 143, 313–324 (2010). Habowski, A. N. et al. Transcriptomic and proteomic signatures of stemness and differentiation in the colon crypt. Commun. Biol. 3, 453 (2020). Jain, A. & Tuteja, G. TissueEnrich: Tissue-specific gene enrichment analysis. Bioinformatics 35, 1966–1967 (2019). Wang, H. et al. Single-Cell Analyses Reveal Mechanisms of Cancer Stem Cell Maintenance and Epithelial-Mesenchymal Transition in Recurrent Bladder Cancer. Clin. Cancer Res. Off. J. Am. Assoc. Cancer Res. 27, 6265–6278 (2021). Karras, P. et al. A cellular hierarchy in melanoma uncouples growth and metastasis. Nature 610, 190–198 (2022). Han, M. et al. FOXA2 drives lineage plasticity and KIT pathway activation in neuroendocrine prostate cancer. Cancer Cell 40, 1306–1323.e8 (2022). Loukas, I. et al. Selective advantage of epigenetically disrupted cancer cells via phenotypic inertia. Cancer Cell (2022) doi: 10.1016/j.ccell.2022.10.002 . Wang, L., Lankhorst, L. & Bernards, R. Exploiting senescence for the treatment of cancer. Nat. Rev. Cancer 22, 340–355 (2022). Tuveson, D. A. Fighting the Sixth Decade of the Cancer War with Better Cancer Models. Cancer Discov. 11, 801–804 (2021). Jones, S. et al. Comparative lesion sequencing provides insights into tumor evolution. Proc. Natl. Acad. Sci. 105, 4283–4288 (2008). Hidalgo, M. et al. Patient-Derived Xenograft Models: An Emerging Platform for Translational Cancer Research. Cancer Discov. 4, 998–1013 (2014). Qin, X. et al. Cell-type-specific signaling networks in heterocellular organoids. Nat. Methods 17, 335–342 (2020). Lawler, M., Kaplan, R., Wilson, R. H., Maughan, T., & on behalf of the S-CORT Consortium. Changing the Paradigm—Multistage Multiarm Randomized Trials and Stratified Cancer Medicine. The Oncologist 20, 849–851 (2015). Malla, S. B. et al. In-depth Clinical and Biological Exploration of DNA Damage Immune Response as a Biomarker for Oxaliplatin Use in Colorectal Cancer. Clin. Cancer Res. 27, 288–300 (2021). Muzny, D. M. et al. Comprehensive molecular characterization of human colon and rectal cancer. Nature 487, 330–337 (2012). Collisson, E. A. et al. Comprehensive molecular profiling of lung adenocarcinoma. Nature 511, 543–550 (2014). Raphael, B. J. et al. Integrated Genomic Characterization of Pancreatic Ductal Adenocarcinoma. Cancer Cell 32, 185–203.e13 (2017). Colaprico, A. et al. TCGAbiolinks: an R/Bioconductor package for integrative analysis of TCGA data. Nucleic Acids Res. 44, e71 (2016). Langfelder, P. & Horvath, S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9, 559 (2008). Amirkhah, R. et al. MmCMS: Mouse models’ Consensus Molecular Subtypes of colorectal cancer. 2022.06.17.496539 Preprint at https://doi.org/10.1101/2022.06.17.496539 (2022). Korotkevich, G. et al. Fast gene set enrichment analysis. 060012 Preprint at https://doi.org/10.1101/060012 (2021). Hänzelmann, S., Castelo, R. & Guinney, J. GSVA: gene set variation analysis for microarray and RNA-Seq data. BMC Bioinformatics 14, 7 (2013). Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015). Franzén, O., Gan, L.-M. & Björkegren, J. L. M. PanglaoDB: a web server for exploration of mouse and human single-cell RNA sequencing data. Database 2019, baz046 (2019). Durinck, S., Spellman, P. T., Birney, E. & Huber, W. Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt. Nat. Protoc. 4, 1184–1191 (2009). Dreyer, S. B. et al. Targeting DNA Damage Response and Replication Stress in Pancreatic Cancer. Gastroenterology 160, 362–377.e13 (2021). Alvarez, M. J. et al. Functional characterization of somatic mutations in cancer using network-based inference of protein activity. Nat. Genet. 48, 838–847 (2016). Yoshihara, K. et al. Inferring tumour purity and stromal and immune cell admixture from expression data. Nat. Commun. 4, 2612 (2013). Becht, E. et al. Estimating the population abundance of tissue-infiltrating immune and stromal cell populations using gene expression. Genome Biol. 17, 218 (2016). Aran, D., Hu, Z. & Butte, A. J. xCell: digitally portraying the tissue cellular heterogeneity landscape. Genome Biol. 18, 220 (2017). Eide, P. W., Bruun, J., Lothe, R. A. & Sveen, A. CMScaller: an R package for consensus molecular subtyping of colorectal cancer pre-clinical models. Sci. Rep. 7, 16618 (2017). Bankhead, P. et al. QuPath: Open source software for digital pathology image analysis. Sci. Rep. 7, 16878 (2017). Fisher, N. C. et al. Biological Misinterpretation of Transcriptional Signatures in Tumor Samples can Unknowingly Undermine Mechanistic Understanding and Faithful Alignment with Preclinical Data. Clin. Cancer Res. OF1–OF14 (2022) doi: 10.1158/1078-0432.CCR-22-1102 . Mayakonda, A., Lin, D.-C., Assenov, Y., Plass, C. & Koeffler, H. P. Maftools: efficient and comprehensive analysis of somatic variants in cancer. Genome Res. 28, 1747–1756 (2018). Gu, Z., Eils, R. & Schlesner, M. Complex heatmaps reveal patterns and correlations in multidimensional genomic data. Bioinformatics 32, 2847–2849 (2016). Satija, R., Farrell, J. A., Gennert, D., Schier, A. F. & Regev, A. Spatial reconstruction of single-cell gene expression data. Nat. Biotechnol. 33, 495–502 (2015). Additional Declarations The authors declare no competing interests. Supplementary Files PDSSuppFiguresRSq30.11.2022.pptx Supplementary Figures PDSSupplementaryTablesRSq30.11.2022.xlsx Supplementary Tables Cite Share Download PDF Status: Published Journal Publication published 12 Feb, 2024 Read the published version in Nature Genetics → Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-3891488","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":268820035,"identity":"011700b4-fab5-47df-ac67-168f4bc64bd8","order_by":0,"name":"Sudhir B Malla","email":"","orcid":"","institution":"Queen’s University Belfast","correspondingAuthor":false,"prefix":"","firstName":"Sudhir","middleName":"B","lastName":"Malla","suffix":""},{"id":268820036,"identity":"4741e23c-0341-4770-becb-3df6107c6631","order_by":1,"name":"Ryan M Byrne","email":"","orcid":"","institution":"Queen’s University Belfast","correspondingAuthor":false,"prefix":"","firstName":"Ryan","middleName":"M","lastName":"Byrne","suffix":""},{"id":268820037,"identity":"c9767403-05aa-4e02-a68a-d8038e1685c6","order_by":2,"name":"Maxime Lafarge","email":"","orcid":"","institution":"Department of Pathology and Molecular Pathology, University and University Hospital of Zürich, Switzerland","correspondingAuthor":false,"prefix":"","firstName":"Maxime","middleName":"","lastName":"Lafarge","suffix":""},{"id":268820038,"identity":"063c4fad-142b-4317-9b8f-b57269e37d2b","order_by":3,"name":"Shania M Corry","email":"","orcid":"","institution":"Queen’s University Belfast","correspondingAuthor":false,"prefix":"","firstName":"Shania","middleName":"M","lastName":"Corry","suffix":""},{"id":268821615,"identity":"bf05ff46-2b82-4b21-999d-f5a06b3c0be8","order_by":4,"name":"Natalie C Fisher","email":"","orcid":"","institution":"Queen’s University Belfast","correspondingAuthor":false,"prefix":"","firstName":"Natalie","middleName":"C","lastName":"Fisher","suffix":""},{"id":268821616,"identity":"26f0c7c8-6e56-45e6-8395-2b852fb71130","order_by":5,"name":"Petros Tsantoulis","email":"","orcid":"","institution":"Faculty of Medicine, Université de Genève","correspondingAuthor":false,"prefix":"","firstName":"Petros","middleName":"","lastName":"Tsantoulis","suffix":""},{"id":268821617,"identity":"33514f9e-311e-4788-a629-20ef3c342a92","order_by":6,"name":"Andrew Campbell","email":"","orcid":"","institution":"Cancer Research UK Beatson Institute, Glasgow","correspondingAuthor":false,"prefix":"","firstName":"Andrew","middleName":"","lastName":"Campbell","suffix":""},{"id":268821618,"identity":"22709754-298e-42a6-8647-cac8ae1f47de","order_by":7,"name":"Tamsin Lannagan","email":"","orcid":"","institution":"Cancer Research UK Beatson Institute, Glasgow","correspondingAuthor":false,"prefix":"","firstName":"Tamsin","middleName":"","lastName":"Lannagan","suffix":""},{"id":268821619,"identity":"1f53abc3-cd86-48e9-84dc-2d590cb8b414","order_by":8,"name":"Arafath K Najumudeen","email":"","orcid":"","institution":"Cancer Research UK Beatson Institute, Glasgow","correspondingAuthor":false,"prefix":"","firstName":"Arafath","middleName":"K","lastName":"Najumudeen","suffix":""},{"id":268821620,"identity":"b043d27d-92b0-4363-886a-c6d40f249677","order_by":9,"name":"Kathryn Gilroy","email":"","orcid":"","institution":"Cancer Research UK Beatson Institute, Glasgow","correspondingAuthor":false,"prefix":"","firstName":"Kathryn","middleName":"","lastName":"Gilroy","suffix":""},{"id":268821621,"identity":"662e7f4b-b22e-449b-8ba3-8f0d588c8902","order_by":10,"name":"Raheleh Amirkhah","email":"","orcid":"","institution":"Queen’s University Belfast","correspondingAuthor":false,"prefix":"","firstName":"Raheleh","middleName":"","lastName":"Amirkhah","suffix":""},{"id":268821622,"identity":"1a63d44d-9fba-4056-834c-3b1d987c76cd","order_by":11,"name":"Sarah Maguire","email":"","orcid":"","institution":"Queen’s University Belfast","correspondingAuthor":false,"prefix":"","firstName":"Sarah","middleName":"","lastName":"Maguire","suffix":""},{"id":268821623,"identity":"72266a1f-df94-4610-924f-7e7ccd9021c6","order_by":12,"name":"Eoghan Mulholland","email":"","orcid":"","institution":"University of Oxford","correspondingAuthor":false,"prefix":"","firstName":"Eoghan","middleName":"","lastName":"Mulholland","suffix":""},{"id":268821624,"identity":"95497ff9-f7cd-4023-99cb-d090ee5bd94b","order_by":13,"name":"Hayley L Belnoue-Davis","email":"","orcid":"","institution":"University of Oxford","correspondingAuthor":false,"prefix":"","firstName":"Hayley","middleName":"L","lastName":"Belnoue-Davis","suffix":""},{"id":268830057,"identity":"fe2983cc-c3e2-459b-a7af-8d6c8a6c2400","order_by":14,"name":"Elena Grassi","email":"","orcid":"","institution":"University of Torino","correspondingAuthor":false,"prefix":"","firstName":"Elena","middleName":"","lastName":"Grassi","suffix":""},{"id":268830058,"identity":"ee9beaf7-90df-440e-b4bb-1eb7a6851c11","order_by":15,"name":"Marco Viviani","email":"","orcid":"","institution":"University of Torino","correspondingAuthor":false,"prefix":"","firstName":"Marco","middleName":"","lastName":"Viviani","suffix":""},{"id":268830059,"identity":"f7184b89-d883-41fb-a3d7-6e91d4dbb84b","order_by":16,"name":"Emily Rogan","email":"","orcid":"","institution":"Queen’s University Belfast","correspondingAuthor":false,"prefix":"","firstName":"Emily","middleName":"","lastName":"Rogan","suffix":""},{"id":268830060,"identity":"04620f1b-4b46-43cf-a586-23b4c7dd6833","order_by":17,"name":"Keara Redmond","email":"","orcid":"","institution":"Queen’s University Belfast","correspondingAuthor":false,"prefix":"","firstName":"Keara","middleName":"","lastName":"Redmond","suffix":""},{"id":268830061,"identity":"455ad963-b867-4106-93ed-1fa6da3e5361","order_by":18,"name":"Svetlana Sakhnevych","email":"","orcid":"","institution":"Queen’s University Belfast","correspondingAuthor":false,"prefix":"","firstName":"Svetlana","middleName":"","lastName":"Sakhnevych","suffix":""},{"id":268830062,"identity":"83abd85f-4cb7-4eb2-b5ed-ea493b0a3e62","order_by":19,"name":"Aoife McCooey","email":"","orcid":"","institution":"Queen’s University Belfast","correspondingAuthor":false,"prefix":"","firstName":"Aoife","middleName":"","lastName":"McCooey","suffix":""},{"id":268830063,"identity":"3022fe41-fedb-4610-8047-95adcc220263","order_by":20,"name":"Courtney Bull","email":"","orcid":"","institution":"Queen’s University Belfast","correspondingAuthor":false,"prefix":"","firstName":"Courtney","middleName":"","lastName":"Bull","suffix":""},{"id":268830064,"identity":"cd90765d-0169-4879-937f-801a178de6c0","order_by":21,"name":"Emily Hoey","email":"","orcid":"","institution":"Queen’s University Belfast","correspondingAuthor":false,"prefix":"","firstName":"Emily","middleName":"","lastName":"Hoey","suffix":""},{"id":268830065,"identity":"d183e0b2-e6dc-4d06-9282-7dd60f37d96f","order_by":22,"name":"Nicoleta Sinevici","email":"","orcid":"","institution":"Queen’s University Belfast","correspondingAuthor":false,"prefix":"","firstName":"Nicoleta","middleName":"","lastName":"Sinevici","suffix":""},{"id":268830066,"identity":"920d779f-be4b-4d5d-b110-6859f951add3","order_by":23,"name":"Holly Hall","email":"","orcid":"","institution":"Cancer Research UK Beatson Institute, Glasgow","correspondingAuthor":false,"prefix":"","firstName":"Holly","middleName":"","lastName":"Hall","suffix":""},{"id":268830067,"identity":"1e23c665-8d00-46ba-9918-e6fa377dab1b","order_by":24,"name":"Baharak Ahmaderaghi","email":"","orcid":"","institution":"Queen’s University Belfast","correspondingAuthor":false,"prefix":"","firstName":"Baharak","middleName":"","lastName":"Ahmaderaghi","suffix":""},{"id":268830068,"identity":"dbbb2a3c-941f-448e-87d9-0e8a462efc82","order_by":25,"name":"Enric Domingo","email":"","orcid":"","institution":"University of Oxford","correspondingAuthor":false,"prefix":"","firstName":"Enric","middleName":"","lastName":"Domingo","suffix":""},{"id":268830069,"identity":"b028e3c5-0e07-4d71-b214-76cdb893c34c","order_by":26,"name":"Andrew Blake","email":"","orcid":"","institution":"University of Oxford","correspondingAuthor":false,"prefix":"","firstName":"Andrew","middleName":"","lastName":"Blake","suffix":""},{"id":268830070,"identity":"6a12c03f-2bf1-427b-b004-ae47035570fa","order_by":27,"name":"Susan Richman","email":"","orcid":"","institution":"University of Leeds","correspondingAuthor":false,"prefix":"","firstName":"Susan","middleName":"","lastName":"Richman","suffix":""},{"id":268830071,"identity":"3d79db6b-7bce-4b13-8240-5aa119010e68","order_by":28,"name":"Claudio Isella","email":"","orcid":"","institution":"University of Torino","correspondingAuthor":false,"prefix":"","firstName":"Claudio","middleName":"","lastName":"Isella","suffix":""},{"id":268830072,"identity":"3714403b-b303-4cf1-b9ad-e3ba859c194d","order_by":29,"name":"Crispin Miller","email":"","orcid":"","institution":"Cancer Research UK Beatson Institute, Glasgow","correspondingAuthor":false,"prefix":"","firstName":"Crispin","middleName":"","lastName":"Miller","suffix":""},{"id":268830073,"identity":"5f59d53e-efad-4678-afe1-1c9f5f8f5b47","order_by":30,"name":"Andrea Bertotti","email":"","orcid":"","institution":"University of Torino","correspondingAuthor":false,"prefix":"","firstName":"Andrea","middleName":"","lastName":"Bertotti","suffix":""},{"id":268830074,"identity":"64e4f05f-3634-425e-b73f-52a3955182eb","order_by":31,"name":"Livio Trusolino","email":"","orcid":"","institution":"University of Torino","correspondingAuthor":false,"prefix":"","firstName":"Livio","middleName":"","lastName":"Trusolino","suffix":""},{"id":268831625,"identity":"8af239ab-6ec8-4244-bb6f-6e0812c209ee","order_by":32,"name":"Maurice Loughrey","email":"","orcid":"","institution":"Queen’s University Belfast","correspondingAuthor":false,"prefix":"","firstName":"Maurice","middleName":"","lastName":"Loughrey","suffix":""},{"id":268831626,"identity":"d79466f1-1fc6-4f0a-a5e7-b924b13251b3","order_by":33,"name":"Emma Kerr","email":"","orcid":"","institution":"Queen’s University Belfast","correspondingAuthor":false,"prefix":"","firstName":"Emma","middleName":"","lastName":"Kerr","suffix":""},{"id":268831627,"identity":"26772b60-1b68-4871-ab34-3c3b90bebd74","order_by":34,"name":"Sabine Tejpar","email":"","orcid":"","institution":"Katholieke Universiteit Leuven","correspondingAuthor":false,"prefix":"","firstName":"Sabine","middleName":"","lastName":"Tejpar","suffix":""},{"id":268831628,"identity":"8446217f-9fa2-499f-a4ab-481b0f7de51b","order_by":35,"name":"Tim Maughan","email":"","orcid":"","institution":"University of Oxford","correspondingAuthor":false,"prefix":"","firstName":"Tim","middleName":"","lastName":"Maughan","suffix":""},{"id":268831629,"identity":"da51d385-1dc9-4479-9276-c503d58e9b35","order_by":36,"name":"Mark Lawler","email":"","orcid":"","institution":"Queen’s University Belfast","correspondingAuthor":false,"prefix":"","firstName":"Mark","middleName":"","lastName":"Lawler","suffix":""},{"id":268831630,"identity":"6ca80bec-2029-4828-8119-1c4f796faa0c","order_by":37,"name":"Simon J Leedham","email":"","orcid":"","institution":"University of Oxford","correspondingAuthor":false,"prefix":"","firstName":"Simon","middleName":"J","lastName":"Leedham","suffix":""},{"id":268831631,"identity":"984cf930-6220-4bc4-9193-e3838024c0a6","order_by":38,"name":"Viktor H Koelzer","email":"","orcid":"","institution":"Department of Pathology and Molecular Pathology, University and University Hospital of Zürich","correspondingAuthor":false,"prefix":"","firstName":"Viktor","middleName":"H","lastName":"Koelzer","suffix":""},{"id":268831632,"identity":"8068c389-3740-45f7-b6dd-c16d56fcf528","order_by":39,"name":"Owen J Sansom","email":"","orcid":"","institution":"Cancer Research UK Beatson Institute, Glasgow","correspondingAuthor":false,"prefix":"","firstName":"Owen","middleName":"J","lastName":"Sansom","suffix":""},{"id":268831633,"identity":"e6133796-3e7a-4b82-a100-421ad578fdae","order_by":40,"name":"Philip D Dunne","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAu0lEQVRIiWNgGAWjYHACAxAhx8DAA+FKEKvFmHQtiQ1EazFnYN4mXbjDJr3vRu4Bhh81DIkzGwhosWxgK5OeeSYtd+aNvATGnmMMibMJuuoAj5k0b9vh3A03cgwYeBsYEucRqeV/ugFQC+NfErQcSABpYQbZQtBhls1sxda8bcmGM8+8Szgsc0zCmKD3zdmbN97mbbOT5zuee/Dhmxob2RkHCDmMGcY6AEZERKQBnEXI8FEwCkbBKBi5AABTxzs8LlzYMwAAAABJRU5ErkJggg==","orcid":"","institution":"The Patrick G Johnston Centre for Cancer Research, Queen’s University Belfast","correspondingAuthor":true,"prefix":"","firstName":"Philip","middleName":"D","lastName":"Dunne","suffix":""}],"badges":[],"createdAt":"2024-01-23 15:44:58","currentVersionCode":1,"declarations":{"humanSubjects":false,"vertebrateSubjects":true,"conflictsOfInterestStatement":false,"humanSubjectEthicalGuidelines":false,"humanSubjectConsent":false,"humanSubjectClinicalTrial":false,"humanSubjectCaseReport":false,"vertebrateSubjectEthicalGuidelines":true},"doi":"10.21203/rs.3.rs-3891488/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-3891488/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1038/s41588-024-01654-5","type":"published","date":"2024-02-13T00:00:00+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":50060099,"identity":"448b90f4-80fc-4fc9-927d-5551c716c971","added_by":"auto","created_at":"2024-01-23 19:30:10","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":584316,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003ePathway Derived Subtypes of CRC defines three biologically distinct groups. A) \u003c/strong\u003eSchematic summaries the unsupervised class discovery workflow from \u003cem\u003eKRAS\u003c/em\u003emut CRC discovery set (n=165) from the FOCUS cohort that defines Pathway Derived Subtypes (PDS), highlighting overall distribution in the piechart.\u003cstrong\u003e B)\u003c/strong\u003e Barplot displays proportions of \u003cem\u003eKRAS\u003c/em\u003emut variants across PDS.\u003cstrong\u003e C)\u003c/strong\u003e Oncoprint with key driver mutation genes from WNT, MAPK, PIK3CA, cell cycle and TGF-β pathway in the \u003cem\u003eKRAS\u003c/em\u003emut discovery set. \u003cstrong\u003eD) \u003c/strong\u003eHeatmap depicts the PDS-specific single-sample gene set (n=626) scores across FOCUS discovery set (n=165) with PDS and CMS annotation on the top. \u003cstrong\u003eE)\u003c/strong\u003e UMAP from the single-sample gene set scores on the discovery set with PDS (\u003cem\u003eleft\u003c/em\u003e) and CMS (\u003cem\u003eright\u003c/em\u003e) annotations, \u003cem\u003etop\u003c/em\u003e: all samples from the discovery set, bottom: PDS1/PDS3 CMS2/CMS3 samples only.\u003cstrong\u003e F) \u003c/strong\u003eSankey plot shows the number of samples between PDS1/PDS2 and CMS2/CMS3.\u003cstrong\u003eG) \u003c/strong\u003eHeatmap depicting single sample ‘Hallmark’ gene set score across PDS samples in the discovery set with PDS and CMS indicated on the top annotation bar. \u003cstrong\u003eH)\u003c/strong\u003e Schematic summaries the workflow of PDS classifier development from the discovery set.\u003cstrong\u003e I)\u003c/strong\u003e Ternary plot displays PDS prediction probabilities from the PDS classification model in the discovery set (n=165). The red line denotes the default PDS prediction threshold at 0.6 and the dashed black line represents PDS prediction threshold at 0.5 and 0.7. \u003cstrong\u003eJ)\u003c/strong\u003e Sankey plot highlighting PDS calls comparison from discovery and classifier in the discovery set.\u003c/p\u003e","description":"","filename":"Screenshot20240123at2.24.03PM.png","url":"https://assets-eu.researchsquare.com/files/rs-3891488/v1/588a4f0e2a05269c3db66776.png"},{"id":50060386,"identity":"beff75d6-cb2b-4582-bc19-b822a6e50498","added_by":"auto","created_at":"2024-01-23 19:38:10","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":605416,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003ePDS represents distinct tumour phenotype and prognostic association\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eA) \u003c/strong\u003eApplication of PDSclassifier to the entire FOCUS cohort (n=360). \u003cstrong\u003eB)\u003c/strong\u003eTernary plot (\u003cem\u003eleft\u003c/em\u003e) shows the PDS prediction probability and PDS calls with samples below default threshold of 0.6 shown as ‘Mixed’. Distribution of samples in the FOCUS cohort shown in the piechart (\u003cem\u003eright\u003c/em\u003e). \u003cstrong\u003eC)\u003c/strong\u003eHeatmaps show the PDS subtype-specific gene set scores across the FOCUS cohort samples (n=314, ‘mixed’ excluded), with top annotation bar indicating PDS, CMS and KRAS mutation status, ordered based on PDS (\u003cem\u003eleft\u003c/em\u003e) and KRAS mutation status (\u003cem\u003eright\u003c/em\u003e). \u003cstrong\u003eD)\u003c/strong\u003e Sankey plot shows the number of samples between PDS1/PDS2 and CMS2/CMS3 following PDS classification of the entire FOCUS cohort. \u003cstrong\u003eE)\u003c/strong\u003e UMAP from the single-sample gene set scores on all FOCUS samples (‘mixed’ excluded) with PDS (\u003cem\u003eleft\u003c/em\u003e) and CMS (\u003cem\u003eright\u003c/em\u003e) annotations, \u003cem\u003etop\u003c/em\u003e: all samples from the discovery set, bottom: PDS1/PDS3 CMS2/CMS3 samples only. \u003cstrong\u003eF)\u003c/strong\u003e Oncoprint with key driver mutation genes from WNT, MAPK, PIK3CA, cell cycle and TGF-β pathway in the\u003cem\u003e \u003c/em\u003eFOCUS cohort. Significant mutational difference is denoted with * in red; \u003cem\u003eStatistics:\u003c/em\u003eFisher’s exact test. \u003cstrong\u003eG) \u003c/strong\u003eCopy number gain/loss per gene from FOCUS cohort presented in a heatmap. \u003cem\u003eStatistics\u003c/em\u003e: Fisher’s exact test between PDS1 and PDS3, adjusted \u003cem\u003ep\u003c/em\u003e-value annotated at the bottom bar of the heatmap. \u003cstrong\u003eH)\u003c/strong\u003eHeatmaps display single-sample scores of significantly upregulated Hallmark gene set common between FOCUS (\u003cem\u003eleft\u003c/em\u003e) and GSE39582 (\u003cem\u003eright\u003c/em\u003e) with the top annotation bar presenting the PDS prediction probabilities and PDS calls. \u003cstrong\u003eI)\u003c/strong\u003e Radar plots displaying significantly upregulated Hallmark gene set per PDS with increasing normalised enrichment scores (from centre) in the FOCUS (\u003cem\u003etop\u003c/em\u003e) and GSE39582 (\u003cem\u003ebottom\u003c/em\u003e). \u003cstrong\u003eJ)\u003c/strong\u003e Heatmap represents the transcriptional factor activity generated via DoROthEA in association to PDS in the FOCUS (\u003cem\u003eleft\u003c/em\u003e) and GSE39582 (\u003cem\u003eright\u003c/em\u003e). TF repression or activation denoted by red and green, respectively. \u003cstrong\u003eK)\u003c/strong\u003e Kaplan-Meier plots of the relapse-free survival (RFS) of patients classified by PDS in GSE39582 (\u003cem\u003etop\u003c/em\u003e) and PETACC-3 (\u003cem\u003ebottom\u003c/em\u003e) in all samples (\u003cem\u003eleft\u003c/em\u003e; ‘mixed’ excluded) and CMS2/PDS1, CMS2/PDS1 vs all CMS4 samples only (\u003cem\u003eright\u003c/em\u003e). Univariate Cox proportional hazard analysis in PDS shown in the table for GSE39582 and PETACC-3.\u003c/p\u003e","description":"","filename":"Screenshot20240123at2.24.52PM.png","url":"https://assets-eu.researchsquare.com/files/rs-3891488/v1/588f8ddaea4da3a1b9533933.png"},{"id":50060106,"identity":"861514a3-740d-45d3-9186-2adf752bf9e4","added_by":"auto","created_at":"2024-01-23 19:30:10","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":341372,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003ePDS transcriptional biology can be replicated in pan-cancer\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eA) \u003c/strong\u003eESTIMATE Stromal (\u003cem\u003eleft\u003c/em\u003e) and Immune (\u003cem\u003eright\u003c/em\u003e) scores examined across PDS in the FOCUS. \u003cstrong\u003eB)\u003c/strong\u003e Violon plot shows the tumour area (mm\u003csup\u003e2\u003c/sup\u003e) determined from H\u0026amp;E digital histological scores using HALO. \u003cstrong\u003eC)\u003c/strong\u003e Transcriptional measure of proliferation and replication stress examined across PDS shown as violin plots in the FOCUS (\u003cem\u003etop\u003c/em\u003e) and GSE39582 (\u003cem\u003ebottom\u003c/em\u003e). \u003cstrong\u003eD) \u003c/strong\u003eViolin plot displaying % of tumour Ki67\u003csup\u003e+\u003c/sup\u003e per samples across PDS in the FOCUS. \u003cstrong\u003eE)\u003c/strong\u003e Representative Ki67\u003csup\u003e+\u003c/sup\u003e immunohistochemistry (IHC) images of PDS samples.\u003c/p\u003e","description":"","filename":"Screenshot20240123at2.25.14PM.png","url":"https://assets-eu.researchsquare.com/files/rs-3891488/v1/f3fc23db38ad534e006c41ce.png"},{"id":50060101,"identity":"abef37bc-3a38-44b7-840d-2ba26f6b9bba","added_by":"auto","created_at":"2024-01-23 19:30:10","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":717039,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eDigital pathology classification of PDS concludes heterogeneous histological PDS3 features.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eA)\u003c/strong\u003e Schematic of the development of PDS digital image-based classification on the whole slides H\u0026amp;E images. \u003cstrong\u003eB)\u003c/strong\u003e Plots represent area under ROC (AUROC) for PDS prediction based on the digital histology images. \u003cstrong\u003eC)\u003c/strong\u003eConfusionMatrix indicates true transcriptional PDS calls (\u003cem\u003erows\u003c/em\u003e) vs the imPDS prediction rate (\u003cem\u003ecolumns\u003c/em\u003e).\u003cstrong\u003e D)\u003c/strong\u003e Representation of PDS1 and PDS2 as predicted to imPDS1 and imPDS2, where the original H\u0026amp;E whole slide with marked tumour regions is taken, and tile-level confidence probability per imPDS are mapped, and imPDS call is made on the whole slide based on the total tile-level imPDS. This also highlights the tumour heterogeneity with subtype-specific feature also found in across PDS whole slides, albeit at varying level.\u003c/p\u003e","description":"","filename":"Screenshot20240123at2.25.30PM.png","url":"https://assets-eu.researchsquare.com/files/rs-3891488/v1/7ac3f5b292c0df5241f8633b.png"},{"id":50060387,"identity":"b3f016a1-68ce-49bb-8704-b4095c69876e","added_by":"auto","created_at":"2024-01-23 19:38:10","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":902095,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eStem and precursor signatures associated with PDS.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eA)\u003c/strong\u003e Heatmaps depict single sample scores per sample for the crypt-base columnar (CBC), regenerative stem cell (RSC) signatures, tubular and serrated precursor polyp signatures in the FOCUS (\u003cem\u003eleft\u003c/em\u003e) and SPINAL (\u003cem\u003eright\u003c/em\u003e), with the top annotation bar denoting PDS and the Intestinal Stem Cell Index. \u003cstrong\u003eB)\u003c/strong\u003eScatterplot showing correlation between CBC and RSC signature scores with selected samples (labelled) taken from SPINAL cohort for \u003cstrong\u003eC)\u003c/strong\u003e FISH images for stained with LGR5, ANXA1 and DAPI across PDS samples. \u003cstrong\u003eD)\u003c/strong\u003e Schematic represents the S:CORT polyp cohort classified to PDS (\u003cem\u003etop\u003c/em\u003e) with confusion matrix heatmap (\u003cem\u003ebottom\u003c/em\u003e) indicating the number of samples called to each PDS. TVA = tubulovillous adenomas, SSL = sessile serrated lesions, TSA = traditional sessile adenomas.\u003c/p\u003e","description":"","filename":"Screenshot20240123at2.25.45PM.png","url":"https://assets-eu.researchsquare.com/files/rs-3891488/v1/26e45fc0cdc3449602783e71.png"},{"id":50060388,"identity":"77acf3bf-f25f-4402-aa06-c3e37c31ae53","added_by":"auto","created_at":"2024-01-23 19:38:10","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":705616,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eBiological insights into PDS relates to differentiated-like features.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eA)\u003c/strong\u003e 961 unique genes found from PDS3-specific gene sets were input to Enrichr and STRING analysis. \u003cstrong\u003eB)\u003c/strong\u003e Barplot (\u003cem\u003etop\u003c/em\u003e) displays the association of three key genes to the list of PDS3 genes, and STRING protein-protein network formation of these three genes (\u003cem\u003ebottom\u003c/em\u003e). \u003cstrong\u003eC)\u003c/strong\u003eViolin plots show the gene expression measure across PDS for EZH1 (\u003cem\u003eleft\u003c/em\u003e), SUZ12 (\u003cem\u003ecentre\u003c/em\u003e) and REST (\u003cem\u003eright\u003c/em\u003e) in the FOCUS cohort. \u003cstrong\u003eD)\u003c/strong\u003eEnrichment plot from gene set enrichment analysis (GSEA) highlights PRC target enriched in PDS3 compared to PDS1 and PDS2 (\u003cem\u003eleft\u003c/em\u003e) and violin plot of PRC target single-sample scores further consolidates the significant association of PRC targets to PDS3. \u003cstrong\u003eE)\u003c/strong\u003e Enrichment plot from GSEA highlights MYC target negatively enriched in PDS3 compared to PDS1 and PDS2 (\u003cem\u003eleft\u003c/em\u003e) and violin plot of MYC target single-sample scores showing low level of association to PDS3 (\u003cem\u003eright\u003c/em\u003e). \u003cstrong\u003eF)\u003c/strong\u003e Scatterplot depicts the inverse correlation between MYC targets and PRC targets single-sample scores with PDS gradient indicating association of PDS1 to MYC targets and PDS3 to PRC targets. \u003cstrong\u003eG)\u003c/strong\u003eSchematic of FACS sorted mouse intestinal crypt epithelium transcriptional dataset (GSE143915). \u003cstrong\u003eH)\u003c/strong\u003e Boxplot shows levels of PRC targets (\u003cem\u003eleft\u003c/em\u003e) and Myc targets (\u003cem\u003eright\u003c/em\u003e) single sample scores across multiple different mouse intestinal crypt populations.\u003cstrong\u003e I)\u003c/strong\u003eThe inverse correlation of Myc targets and PRC targets is recapitulated in a stem-to-differentiated cell populations along the intestinal crypt of the mouse. \u003cstrong\u003eJ)\u003c/strong\u003e Boxplot shows level of proliferation (\u003cem\u003eleft\u003c/em\u003e) and replication stress (\u003cem\u003eright\u003c/em\u003e) transcriptionally measured across the mouse intestinal crypt populations. \u003cstrong\u003eK)\u003c/strong\u003e Heatmap displays the gene expression measure of the enteroendocrine cells (EEC), neuroactive ligand receptor interaction, PRC targets and MYC targets signatures across PDS in the FOCUS (\u003cem\u003eleft\u003c/em\u003e) and SPINAL (\u003cem\u003eright\u003c/em\u003e) cohorts. Top annotation bar denotes the PDS prediction probabilities and PDS calls (‘mixed’ excluded). \u003cstrong\u003eL)\u003c/strong\u003e Violin plots show the comparison of EEC single-sample scores (\u003cem\u003eleft\u003c/em\u003e) and neurons xCell scores (\u003cem\u003eright\u003c/em\u003e) across PDS in FOCUS. \u003cstrong\u003eM)\u003c/strong\u003e TissueEnrich analysis relates the PDS3 associated gene signature more closely with the enrichment in cerebral cortex show in barplot; red dashed line denotes \u003cem\u003ep\u003c/em\u003e\u0026lt; 0.05. \u003cstrong\u003eN)\u003c/strong\u003e Whole slide image from a PDS3 sample in the SPINAL cohort stained with synaptophysin (SYN), where SYN-stained neuroendocrine cells within tumour glands is displayed.\u003c/p\u003e","description":"","filename":"Screenshot20240123at2.26.11PM.png","url":"https://assets-eu.researchsquare.com/files/rs-3891488/v1/5da3e7df5e4e741263fdb82d.png"},{"id":50060104,"identity":"653f7b79-6227-49f7-b75c-03298972d23c","added_by":"auto","created_at":"2024-01-23 19:30:10","extension":"png","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":418953,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eGenetically engineered mouse models and patient-derived xenografts could not capture PDS3 biology as represented in human\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eA)\u003c/strong\u003e Six genotypic genetically engineered mouse models (GEMMs) were sequenced and PDS classification was applied. \u003cstrong\u003eB)\u003c/strong\u003e Heatmap displays the single-sample scores for the Hallmark gene sets in GEMMs across PDS, including ‘mixed’ samples. Top annotation bar indicates PDS calls, mouse CMS (MmCMS), and genotypes. \u003cstrong\u003eC)\u003c/strong\u003e Heatmap shows transcriptional factor (TF) activity from DoRothEA. TF activity is represented with green, while repression with red. \u003cstrong\u003eD)\u003c/strong\u003eViolin plots show proliferative index (\u003cem\u003etop\u003c/em\u003e) and replication stress (\u003cem\u003ebottom\u003c/em\u003e) across PDS in GEMMs. \u003cstrong\u003eE)\u003c/strong\u003e Boxplots depicts Myc targets (\u003cem\u003etop\u003c/em\u003e) and PRC targets (\u003cem\u003ebottom\u003c/em\u003e) single-sample scores across PDS in GEMMs. \u003cstrong\u003eF)\u003c/strong\u003eScatterplot highlights the inverse correlation between Myc targets and PRC targets single-sample scores recapitulated in GEMMs, with PDS annotation.\u003c/p\u003e","description":"","filename":"Screenshot20240123at2.26.27PM.png","url":"https://assets-eu.researchsquare.com/files/rs-3891488/v1/df405c4f37b2dca9ec105f74.png"},{"id":50060103,"identity":"d3a89361-8d54-44f8-98be-10b7a3f4f74f","added_by":"auto","created_at":"2024-01-23 19:30:10","extension":"png","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":458586,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003ePresence of MYC targets and PRC targets biological axis at single-cell level\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eA)\u003c/strong\u003e Single cell epithelial cells CRC datasets from n=63 patients over 5 cohorts (left); UMAP shows the epithelial cells clustering based on patients. \u003cstrong\u003eB)\u003c/strong\u003eUMAP plots show the cells expressing MYC targets (\u003cem\u003eleft\u003c/em\u003e), PRC targets (\u003cem\u003ecentre\u003c/em\u003e) and overlap of the two (\u003cem\u003eright\u003c/em\u003e). Colour legend key: red = higher MYC target scores, green = higher PRC target scores. \u003cstrong\u003eC)\u003c/strong\u003e Scatterplot of cells highlighting inverse correlation between MYC targets and PRC targets. Colours represents; red = higher for MYC targets, green = higher for PRC targets, yellow = higher for both, grey = lower for both. Threshold were determined based on the \u003cem\u003eFeaturePlot\u003c/em\u003e function from \u003cem\u003eSeurat\u003c/em\u003e R package. \u003cstrong\u003eD)\u003c/strong\u003eUMAP on PDS gene set scores with cell annotated with MYC targets (\u003cem\u003eleft\u003c/em\u003e), PRC targets (\u003cem\u003ecentre\u003c/em\u003e) and overlap of the two (\u003cem\u003eright\u003c/em\u003e). \u003cstrong\u003eE)\u003c/strong\u003eUMAP with annotated cell cycle phases per cells. \u003cstrong\u003eF)\u003c/strong\u003e Barplot shows the proportion and number of cells at different cell cycle phase of G1, S or G2M. \u003cstrong\u003eG)\u003c/strong\u003eUMAP plots annotated with cells indicating proliferative index, replicative stress, KRAS SINGLA DN and neuroactive ligand receptor interaction. \u003cstrong\u003eH)\u003c/strong\u003eBoxplots show proliferative index, replicative stress, KRAS SIGNAL DN and neuroactive ligand receptor interaction between MYC targets-High and PRC targets-High cells (top tertiles). \u003cstrong\u003eI)\u003c/strong\u003e UMAP annotated with cells indicating association with EEC signature (top), and the association of EEC to MYC target-High and PRC target-High cells. \u003cstrong\u003eJ)\u003c/strong\u003e UMAP annotated with iCMS calls from Joanito \u003cem\u003eet al.,\u003c/em\u003e (\u003cem\u003eleft\u003c/em\u003e) and barplot with proportion and number of cells called as iCMS (\u003cem\u003eright\u003c/em\u003e). \u003cstrong\u003eK)\u003c/strong\u003e Schematic summaries the value of PDS in terms of PDS1 and PDS3 highlighting subset of tumours that represent CMS2/CMS3 with high association stem/MYC targets vs differentiation-like/PRC targets features, relating to the poor prognosis in the latter.\u003c/p\u003e","description":"","filename":"Screenshot20240123at2.27.01PM.png","url":"https://assets-eu.researchsquare.com/files/rs-3891488/v1/e0ee49bea02ebc0d231e0e38.png"},{"id":52105545,"identity":"068def41-8fc3-432a-ae43-5c062db0ac72","added_by":"auto","created_at":"2024-03-06 19:28:27","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":5128263,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-3891488/v1/2663e1c2-ae82-4d6e-b447-68b1654de60a.pdf"},{"id":50060108,"identity":"9b756b2d-34ba-44d0-b989-6c121bed3ea4","added_by":"auto","created_at":"2024-01-23 19:30:13","extension":"pptx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":35548883,"visible":true,"origin":"","legend":"\u003cp\u003eSupplementary Figures\u003c/p\u003e","description":"","filename":"PDSSuppFiguresRSq30.11.2022.pptx","url":"https://assets-eu.researchsquare.com/files/rs-3891488/v1/c15b622977face85f1c281f9.pptx"},{"id":50060098,"identity":"02bfd99b-f7f1-4645-89e2-9d8e126187ae","added_by":"auto","created_at":"2024-01-23 19:30:10","extension":"xlsx","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":28873,"visible":true,"origin":"","legend":"\u003cp\u003eSupplementary Tables\u003c/p\u003e","description":"","filename":"PDSSupplementaryTablesRSq30.11.2022.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-3891488/v1/91a265bda5fcae5d8b6170ed.xlsx"}],"financialInterests":"The authors declare no competing interests.","formattedTitle":"\u003cp\u003e\u003cstrong\u003ePathway level subtyping identifies a slow-cycling and transcriptionally lethargic biological phenotype associated with poor clinical outcomes in colon cancer independent of genetics\u003c/strong\u003e\u003c/p\u003e","fulltext":[{"header":"Introduction","content":"\u003cp\u003eA goal of molecular subtyping in cancer is to identify biomarkers that stratify tumours according to biological and clinical phenotypes, providing improved understanding of signalling cascades that underpin tumour development and treatment response. Numerous studies have leveraged transcriptional data in the form of individual gene-level expression values to identify tumour subtypes\u003csup\u003e\u003cspan additionalcitationids=\"CR2\" citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e\u003c/sup\u003e, followed by downstream characterisation using collections of gene ontology pathway-level signatures that represent biologically important phenotypes\u003csup\u003e\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e\u003c/sup\u003e. The value of this approach is exemplified by the Consensus Molecular Subtypes (CMS)\u003csup\u003e\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e\u003c/sup\u003e and ColoRectal Intrinsic Subtypes (CRIS)\u003csup\u003e\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e\u003c/sup\u003e of colorectal cancer (CRC). A recent study proposed a refinement of CMS classification through use of individual gene-level data derived from single cell RNA sequencing (scRNAseq), revealing two further intrinsic subtypes, iCMS2 and iCMS3, associated with different prognostic outcomes within the previously-defined CMS4 stromal-rich tumour type\u003csup\u003e\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e\u003c/sup\u003e. While this study has provided improved resolution to this stromal-rich group, remarkably it did not identify any heterogeneity or provide additional resolution within the largest subtype of epithelial-rich tumours, CMS2.\u003c/p\u003e \u003cp\u003eGene-level approaches to molecular classification align to the model of biological signalling defined by the central dogma, whereby a functional hierarchical order along DNA-RNA-Protein is well-established\u003csup\u003e\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e\u003c/sup\u003e, and are dominated by associations with genetic alterations underpinning the Vogelstein paradigm\u003csup\u003e\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e,\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e\u003c/sup\u003e. Using this gene-level approach, focussed specifically on \u003cem\u003eBRAF\u003c/em\u003e V300E mutant (\u003cem\u003eBRAF\u003c/em\u003emut) CRC tumours, Barras and colleagues identified heterogeneity within this genotype, enabling development of two unique biological subtypes within \u003cem\u003eBRAF\u003c/em\u003emut tumours\u003csup\u003e\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e\u003c/sup\u003e. The clinical predictive value of this \u003cem\u003eBRAF\u003c/em\u003emut-specific approach in selecting patients for targeted therapies was subsequently validated using tumour samples from an independent phase II clinical study\u003csup\u003e\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e\u003c/sup\u003e. Although subtype discovery using gene-level data represents the most commonly-deployed approach, pathway-level data has been shown to provide a closer link with molecular mechanisms and clinical phenotypes\u003csup\u003e\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e\u003c/sup\u003e. Therefore, we hypothesise that by using pathway-level data, initially in \u003cem\u003eKRAS\u003c/em\u003e mutant (\u003cem\u003eKRAS\u003c/em\u003emut) CRC tumours, we can identify a more comprehensive view of biological signalling related to disease phenotypes within this important genotype.\u003c/p\u003e \u003cp\u003eUsing transcriptional data from a series of colorectal tumours, we identify, validate, and characterise a set of novel pathway derived subtypes (PDS), based on mechanistic signalling cascades. Following class discovery performed exclusively in \u003cem\u003eKRAS\u003c/em\u003emut CRC, remarkably this classifier provides new insights into tumour biology regardless of mutational status, across several independent human and mouse tumour cohorts. This approach reveals a previously unseen continuum of intrinsic biology within epithelial-rich CMS2 tumours, which display neural-like traits and can be defined along a polarised gradient of epithelial cell states, including cell cycle, transcriptional activity and stress response. In summary, this PDS classification system identifies tumours defined by near-identical biological traits independent of genetic events, reinforcing the fundamental value of PDS in classifying tumours based on cancer-relevant biological phenotypes.\u003c/p\u003e"},{"header":"Results","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003ePathway-level class discovery in KRASmut tumours define unique biological subtypes\u003c/h2\u003e \u003cp\u003eWe first generated a matrix of pathway-level single sample scores (n\u0026thinsp;=\u0026thinsp;1783 per tumour), from the Molecular Signature Database (MSigDB), using a subset of the FOCUS clinical trial cohort (n\u0026thinsp;=\u0026thinsp;360 tumours; SCORT cohort) to create an initial framework for mapping of the tumour activation status across n\u0026thinsp;=\u0026thinsp;640,000\u0026thinsp;+\u0026thinsp;combinations of biological processes, including BIOCARTA, PID, KEGG and REACTOME. Following this pathway-level conversion, \u003cem\u003eKRAS\u003c/em\u003emut samples (n\u0026thinsp;=\u0026thinsp;165) were used for unsupervised class discovery, which through \u003cem\u003ek\u003c/em\u003e-means clustering revealed three distinct pathway-derived subtypes, PDS1, 2 and 3, with distribution of PDS1\u0026thinsp;=\u0026thinsp;27%, PDS2\u0026thinsp;=\u0026thinsp;38% and PDS3\u0026thinsp;=\u0026thinsp;35% (Fig.\u0026nbsp;1A, Supplementary Fig.\u0026nbsp;1A,1B and 1C, Supplementary Table\u0026nbsp;1A). Assessment of mutations according to PDS revealed no distinct mutational type for \u003cem\u003eKRAS\u003c/em\u003e itself (Fig.\u0026nbsp;1B) or a number of other driver mutations (Fig.\u0026nbsp;1C). Visualisation of the n\u0026thinsp;=\u0026thinsp;626 gene sets most significantly associated with PDS revealed how transcriptionally distinct these groups are (Fig.\u0026nbsp;1D; Supplementary Table\u0026nbsp;1B). These analyses also indicated that CMS2 tumours, and to a lesser extent CMS3 tumours, were split across PDS1 and PDS3, whereas the CMS1/4 inflammatory/stromal tumours were merged within PDS2 (Fig.\u0026nbsp;1E, 1F, Supplementary Fig.\u0026nbsp;1D).\u003c/p\u003e \u003cp\u003eWe next set out to determine the key biological pathways most associated with PDS, using the Hallmarks collection\u003csup\u003e\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e\u003c/sup\u003e. Consistent with CMS1 and CMS4 associations, PDS2 tumours were enriched for signalling related to inflammatory/immune pathways, such as interferon-α and interferon-γ response, as well as stromal-related epithelial-to-mesenchymal (EMT) and TGF-β activation (Fig.\u0026nbsp;1G). PDS1 tumours were significantly upregulated in cell cycle-related pathways, including MYC targets, E2F targets, G2M checkpoint, whereas there was near universal repression of cancer-associated hallmarks signalling in PDS3 (Fig.\u0026nbsp;1G). Overall, these three distinct transcriptional classes confirm the presence and extent of signalling heterogeneity within \u003cem\u003eKRAS\u003c/em\u003emut CRC. Furthermore, given the potential therapeutic value associated with some of the signalling cascades employed, these findings suggest that PDS classification provides a novel basis for forward and reverse translation studies.\u003c/p\u003e \u003cp\u003eTo ensure that PDS classification can be performed on additional datasets, we explored different classification algorithms, where the PDS-specific n\u0026thinsp;=\u0026thinsp;626 gene set signatures (Fig.\u0026nbsp;1D) were used as feature selection for training and developing the classification model (Fig.\u0026nbsp;1H). Given the best overall accuracy displayed by the support vector machine via Radial Basis Function (svmRBF) algorithm (Supplementary Fig.\u0026nbsp;1E and 1F), the PDS classifier was based on this model. During the development of the PDS classification model, the prediction probability scoring system was introduced as a way to enumerate intratumoural subtype heterogeneity, where we defined 0.6 as the default threshold for PDS classification, with tumours that do not reach this threshold termed a \u0026lsquo;mixed\u0026rsquo; sample (Fig.\u0026nbsp;1I; Supplementary Fig.\u0026nbsp;1G and 1H). To ensure this classifier can be easily utilised by the wider research community, an R-based classification package was developed; \u003cem\u003ePDSclassifier\u003c/em\u003e R package (see \u0026lsquo;Data and Code Availability\u0026rsquo;), which accurately classified tumours into the same clusters identified during class discovery.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec4\" class=\"Section2\"\u003e \u003ch2\u003ePathway-derived subtypes identify tumours with conserved signalling regardless of mutational status\u003c/h2\u003e \u003cp\u003eWhile the PDS classifier was developed exclusively within \u003cem\u003eKRAS\u003c/em\u003emut colorectal tumours, we next tested its ability to classify samples into biologically similar groups regardless of \u003cem\u003eKRAS\u003c/em\u003e mutational status within the entire FOCUS cohort (n\u0026thinsp;=\u0026thinsp;360). When samples of all mutational backgrounds were included, it classified 87% of samples as PDS1\u0026thinsp;=\u0026thinsp;26%, PDS2\u0026thinsp;=\u0026thinsp;31%, PDS3\u0026thinsp;=\u0026thinsp;30%; and the rest as \u0026lsquo;Mixed\u0026rsquo; (13%) (Fig.\u0026nbsp;2A and 2B). Remarkably, the PDSclassifier stratified tumour samples into robust groups that were entirely independent of \u003cem\u003eKRAS\u003c/em\u003e mutational status (Fig.\u0026nbsp;2C). Furthermore, the same PDS-CMS associations remained evident; while PDS2 were predominantly CMS4/CMS1, CMS2/CMS3 were again distributed across PDS1 and PDS3 (Fig.\u0026nbsp;2D and 2E; Supplementary Fig.\u0026nbsp;2A). We next assessed the clinical and molecular landscape of these tumours, which revealed no significant differences for clinical factors (gender, sidedness) between PDS (Supplementary Fig.\u0026nbsp;2B). Assessment of known driver mutations in each PDS indicated that, in line with CMS associations, there was significant enrichment of \u003cem\u003eBRAF\u003c/em\u003e mutations, and fewer \u003cem\u003eAPC\u003c/em\u003e mutations, in PDS2 compared to PDS1/3 (Fig.\u0026nbsp;2F, Supplementary Fig.\u0026nbsp;2C). Importantly, there were no differences in the rates of mutations across key genes within the WNT, MAPK, PIK3CA, cell cycle or TGF-β pathways assessed (Fig.\u0026nbsp;2F, Supplementary 2C). This trend was also observed when copy number estimates were assessed (Fig.\u0026nbsp;2G, Supplementary Fig.\u0026nbsp;2D and 2E). Overall, these findings indicate that despite being developed within \u003cem\u003eKRAS\u003c/em\u003emut tumours, our novel pathway-derived subtypes identify biology independent of \u003cem\u003eKRAS\u003c/em\u003e mutational status and are applicable to all CRC tumours, regardless of genetics.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec5\" class=\"Section2\"\u003e \u003ch2\u003eTranscriptional landscape across pathway-derived subtypes\u003c/h2\u003e \u003cp\u003eUsing both the Hallmarks and DoRothEA\u003csup\u003e\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e\u003c/sup\u003e algorithms, we performed the same characterisation in the entire FOCUS cohort (n\u0026thinsp;=\u0026thinsp;360), alongside colon cancer (CC) samples from two additional independent non-randomised cohorts, including the publicly available cohort used in the development of CMS (GSE39582; n\u0026thinsp;=\u0026thinsp;566) and a new independent CRC cohort profiled within the S:CORT programme (n\u0026thinsp;=\u0026thinsp;258; SPINAL S:CORT cohort), with exclusion of \u0026lsquo;mixed\u0026rsquo; PDS samples. As observed within \u003cem\u003eKRAS\u003c/em\u003emut tumours, there is a clear transcriptional distinction between each subtype, with the most prominent biological signalling cascades representing cell cycle-related activity in PDS1 and stromal/inflammatory signalling in PDS2, but only signal elevated in PDS3 being KRAS signalling down, which itself represents an indication of signalling repression (Fig.\u0026nbsp;2H, 2I; Supplementary Fig.\u0026nbsp;3A and 3B). Remarkably, these findings were again consistent across PDS groups, regardless of \u003cem\u003eKRAS\u003c/em\u003e status. We next investigated if these unique transcriptional landscapes were underpinned by distinct transcription factor (TF) activity, using DoRothEA database\u003csup\u003e\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e\u003c/sup\u003e, which revealed that consistent with our pathway analysis results, cell-cycle related TFs, such as E2F2, were significantly active in PDS1 (Fig.\u0026nbsp;2J; Supplementary Fig.\u0026nbsp;3C). PDS2 tumours were significantly activated for n\u0026thinsp;=\u0026thinsp;32 TFs, mainly related to stromal and inflammatory phenotypes, including SMAD3, STAT3, IRF1, ERG. While repressed for biological pathways within the Hallmarks collection in general, PDS3 exhibited activation of TFs relating to a diverse set of hormonal and developmental biologies, including FOXA2 (Fig.\u0026nbsp;2J; Supplementary Fig.\u0026nbsp;3C).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec6\" class=\"Section2\"\u003e \u003ch2\u003eClinical and prognostic relevance of PDS classification\u003c/h2\u003e \u003cp\u003eTo test the clinical relevance of PDS, we assessed relapse-free survival (RFS) rates according to PDS using clinical data from the non-randomised GSE39582 cohort. These analyses revealed that PDS2 and PDS3 displayed poor prognosis compared to PDS1 (Fig.\u0026nbsp;2K) regardless of \u003cem\u003eKRAS\u003c/em\u003e mutational status (Supplementary Fig.\u0026nbsp;2F). Importantly, these findings were independently validated in the randomised PETACC-3 adjuvant clinical trial\u003csup\u003e\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e\u003c/sup\u003e, which further reinforced the poor prognosis of PDS3 compared to PDS1 and PDS2 (Fig.\u0026nbsp;2K). Given that the majority of PDS1 and PDS3 tumours were classified as CMS2, we next sub-stratified these tumours accordingly, which confirmed the clinical value of PDS classification in detecting novel aggressive subset of tumours, with CMS2-PDS1 having a significantly better outcome compared to CMS2-PDS3 tumours (Supplementary Fig.\u0026nbsp;2G). Remarkably, these analyses revealed that PDS3 tumours have as poor an outcome as the previously identified stroma-rich CMS4 tumour subtype (Fig.\u0026nbsp;2K; Supplementary Fig.\u0026nbsp;2G).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec7\" class=\"Section2\"\u003e \u003ch2\u003eCancer-related signalling associated with PDS tumours\u003c/h2\u003e \u003cp\u003eIn line with the Hallmarks and CMS associations, we observe increased immune and stromal populations in PDS2 compared to PDS1 and increased epithelial content in PDS1 compared to PDS2, with PDS3 tumours being intermediate for both with the ESTIMATE method (Fig.\u0026nbsp;3A; Supplementary Fig.\u0026nbsp;3D and 3E). Furthermore, quantitative cellular image analyses using H\u0026amp;E slides indicate that, in line with their associations with CMS2/3 classification, PDS1 and PDS3 tumours have similar tumour area characteristics, whereas PDS2 tumours contain reduced tumour (epithelium) area (Fig.\u0026nbsp;3B). We also assessed for the transcriptional-based proliferative index scores\u003csup\u003e\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e\u003c/sup\u003e and replication stress signature according to PDS. Intriguingly, given the poor prognostic outcomes for patients with PDS3 tumours, these assessments revealed significantly increased proliferation and high replication stress in PDS1 compared to PDS2 and PDS3 in a stepwise fashion (Fig.\u0026nbsp;3C). These findings were validated using Ki67\u003csup\u003e+\u003c/sup\u003e immunohistochemistry (IHC) in the FOCUS cohort, which in combination with the transcriptional analyses, confirm that PDS3 classification identifies a group of transcriptionally inert and slow cycling tumours (Fig.\u0026nbsp;3D and 3E).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003eImage-based H\u0026amp;E classifier for rapid PDS tumour classification\u003c/h2\u003e \u003cp\u003eWhile transcriptional classifiers provide valuable information for translational studies, their use for patient stratification can be undermined by the costs and rapid-turnaround timeframes required in clinical diagnostics. This led us to develop an image-based PDS machine learning classifier to detect PDS-associated morphological patterns in H\u0026amp;E digital slides, which itself was motivated by prior studies that successfully demonstrated the ability for such models to automatically capture associations between morphological patterns and molecular signatures across several cancer types\u003csup\u003e\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e\u003c/sup\u003e, including the image-based CMS (imCMS) classifier\u003csup\u003e\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e\u003c/sup\u003e. Here, we trained a convolutional neural network to predict PDS classes from input H\u0026amp;E digital slides, based on a methodology similar to imCMS, using PDS-labelled data from the FOCUS and SPINAL cohorts (Fig.\u0026nbsp;4A). We trained several models and assessed their performances under a 5-fold cross-validation protocol (Fig.\u0026nbsp;4B) with the aggregate confusion matrix of the classification results of 5 trained models on 5 different test partitions shown in (Fig.\u0026nbsp;4C). These results suggest that images from PDS1 and PDS2 tumours contain visual features that our models used to discriminate between these two classes (PDS1-vs-PDS2-3: AUROC\u0026thinsp;=\u0026thinsp;.740\u0026thinsp;\u0026plusmn;\u0026thinsp;.019; F1-score\u0026thinsp;=\u0026thinsp;.574 and PDS2-vs-PDS1-3: AUROC\u0026thinsp;=\u0026thinsp;.810\u0026thinsp;\u0026plusmn;\u0026thinsp;.033; F1-score\u0026thinsp;=\u0026thinsp;.618). However, our trained models were not able to identify PDS3-specific features as the learned features that were discriminative for PDS1 and PDS2 were also prominent in PDS3-labeled images, as shown by the poor cross-validation performances for this class (PDS3-vs-PDS1-2: AUROC\u0026thinsp;=\u0026thinsp;.557\u0026thinsp;\u0026plusmn;\u0026thinsp;.026; F1-score\u0026thinsp;=\u0026thinsp;0.338). This result suggests that PDS3 has an intermediate morphology, which itself is aligned with the intermediate description of PDS3 via ESTIMATE and MCP-counter, making our models erroneously classify PDS3 images as either PDS1 or PDS2. Visual inspection of the regions that contributed the most to correct classification of PDS1 and PDS2, and to the misclassification of PDS3, confirmed the presence of PDS1-specific and PDS2-specific morphology in PDS3 images (Fig.\u0026nbsp;4D; Supplementary Fig.\u0026nbsp;4).\u003c/p\u003e \u003cp\u003e \u003cb\u003eDistinct stem-like and precursor traits associated with PDS tumour classification.\u003c/b\u003e \u003c/p\u003e \u003cp\u003eGiven that our AI-guided imPDS classifier was unable to detect robust morphological patterns associated with PDS3 tumours, in a further attempt to characterise the features associated with PDS-specific biology, we next examined a collection of established stem and precursor signatures in our bulk tumour transcriptional data according to PDS. These analyses revealed an elevation for tubular histology\u003csup\u003e\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e\u003c/sup\u003e and canonical crypt-base columnar (CBC) stem cell signatures\u003csup\u003e\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e\u003c/sup\u003e in PDS1, alongside elevated serrated histology\u003csup\u003e\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e\u003c/sup\u003e and regenerative stem cell (RSC) signatures \u003csup\u003e\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e\u003c/sup\u003e in PDS2 tumours (Fig.\u0026nbsp;5A; Supplementary 5A-E). PDS3, however, did not show any clear association with either stem or polyp signatures. To validate these stem cell associations, we selected a subset of cases (n\u0026thinsp;=\u0026thinsp;20) from the SPINAL cohort for multiplex ISH (Fig.\u0026nbsp;5B-C), which confirmed elevation of epithelial LGR5 (red) in PDS1 tumours, elevation of epithelial ANXA1 (surrogate marker for Sca1; yellow) in PDS2 tumours and an absence of staining for both markers in PDS3 tumours (Fig.\u0026nbsp;5B-C). To confirm the associations between PDS and precursor signatures, we next applied the PDS classifier to a cohort of transcriptionally profiled polyp samples (S:CORT polyp cohort), pathologically defined as tubulovillous adenomas (TVA), sessile serrated lesions (SSL) and traditional serrated adenomas (TSA), which further confirmed the TVA association with PDS1, SSL association with PDS2 and lack of precursor association with PDS3 (Fig.\u0026nbsp;5D). Overall, these results indicate the presence of well characterised molecular and biological features associated with both PDS1 and PDS2 but provided limited insight into the biology underpinning PDS3.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec9\" class=\"Section2\"\u003e \u003ch2\u003ePDS1 and PDS3 tumours align along an intrinsic pluripotent-to-differentiated axis\u003c/h2\u003e \u003cp\u003eTo identify traits underpinning PDS3 tumour biology, we interrogated the gene sets used for PDS classification, to identify genes specifically associated with PDS3 (n\u0026thinsp;=\u0026thinsp;961), followed by assessment of potential biological interactions with Enrichr and STRING (Fig.\u0026nbsp;6A). There was a significant association between PDS3 biological signalling and components of the polycomb repressor complex (PRC); particularly EZH2, SUZ12 and REST (Fig.\u0026nbsp;6B, Supplementary Fig.\u0026nbsp;6A), which displayed a gradient of high to low gene expression across PDS1-2-3 (Fig.\u0026nbsp;6C). When components of PRC are activated/elevated, they repress a range of downstream targets linked to cellular differentiation\u003csup\u003e\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e\u003c/sup\u003e (Supplementary Fig.\u0026nbsp;6B). Using these previously defined PRC targets in line with an overall lower level of PRC genes in PDS3, we reveal a de-repression (and therefore significant elevation) of PRC targets in PDS3 compared to PDS1 and PDS2, using both pairwise and single sample GSEA in FOCUS (Fig.\u0026nbsp;6D). Given the relationship between PRC and MYC, we assessed a previously defined gene set of MYC targets that when elevated maintain a pluripotent state\u003csup\u003e\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e\u003c/sup\u003e, which revealed an inverse relationship between MYC target and PRC targets according to PDS classification (Fig.\u0026nbsp;6E). Remarkably, when assessed together across our bulk tumour data, there was a near perfect linear negative correlation between individual tumour scores along a MYC-PRC axis, with PDS1 tumours displaying high-MYC/low-PRC target expression and PDS3 tumour displaying the inverse low-MYC/high-PRC target expression (Fig.\u0026nbsp;6F). These findings were replicated in the SPINAL cohort (Supplementary Fig.\u0026nbsp;6C-D).\u003c/p\u003e \u003cp\u003eThese data suggest that, in combination with our data indicating lower levels of cell cycle, transcriptional activity, canonical and regenerative stem cells (Fig.\u0026nbsp;5), PDS3 tumours may be more differentiated when compared to more stem-rich and proliferative PDS1 and PDS2 tumours. To test this, we next analysed transcriptional data from an independent cohort of mouse colonic epithelial cells across six differentiation states (GSE143915\u003csup\u003e22\u003c/sup\u003e), which revealed a significant association between Myc-PRC target gene expression levels and differentiation status; with stem cells displaying high-Myc/low-PRC target gene expression and enteroendocrine cells (EEC), displaying high-PRC/low-Myc target gene expression (Fig.\u0026nbsp;6G-H, Supplementary Fig.\u0026nbsp;6E). Specific assessment of gene associated within the PRC and MYC targets, alongside genes associated with neuroactive ligands and enteroendocrine biology confirm the upregulation of these biological and differentiation modules in human PDS3 tumours (Fig.\u0026nbsp;6K). Although the tumours employed throughout our study are pathologically-confirmed adenocarcinomas and not neuroendocrine tumours, we do see elevation of separate EEC and neuron cell transcriptional signatures in PDS3 tumours in our bulk tumour data (Fig.\u0026nbsp;6L). An alignment between PDS3 tumours and a neural-like identity was also observed using the \u003cem\u003eTissueEnrich\u003c/em\u003e package\u003csup\u003e\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e\u003c/sup\u003e, which found an enrichment for cerebral cortex tissue-identity within the gene sets that classify PDS3 (Fig.\u0026nbsp;6M). Finally, using whole face tissue sections from our SPINAL cohort, we observed that protein expression for the differentiated EEC marker, such as chromogranin A (CRGA) is restricted to surrounding normal glands in PDS1 tumours, however these markers are expressed in small focal regions of neoplastic glands and/or surrounding neural tissue in PDS3 tumours (Fig.\u0026nbsp;6N);\u003c/p\u003e \u003cp\u003e \u003cb\u003eDual-species disease positioning of PDS classification in genetically engineered mouse models and patient-derived xenografts.\u003c/b\u003e \u003c/p\u003e \u003cp\u003eAs dual-species options are available for the pathway collections used to develop the PDS classifier, we utilised bulk tumour transcriptional data from n\u0026thinsp;=\u0026thinsp;51 GEMMs across n\u0026thinsp;=\u0026thinsp;6 genotypes \u0026ndash; \u003cem\u003eApc\u003c/em\u003e\u003csup\u003efl/+\u003c/sup\u003e (A); \u003cem\u003eApc\u003c/em\u003e\u003csup\u003efl/+\u003c/sup\u003e \u003cem\u003eKras\u003c/em\u003e\u003csup\u003eG12/+\u003c/sup\u003e (AK); \u003cem\u003eBraf\u003c/em\u003e\u003csup\u003eV600E/+\u003c/sup\u003e \u003cem\u003eTrp53\u003c/em\u003e\u003csup\u003efl/fl\u003c/sup\u003e (BP) and \u003cem\u003eBraf\u003c/em\u003e\u003csup\u003eV600E/+\u003c/sup\u003e \u003cem\u003eTrp53\u003c/em\u003e\u003csup\u003efl/fl\u003c/sup\u003e Notch1\u003csup\u003eTg/+\u003c/sup\u003e (BPN), \u003cem\u003eKras\u003c/em\u003e\u003csup\u003eG12D/+\u003c/sup\u003e \u003cem\u003eTrp53\u003c/em\u003e\u003csup\u003efl/fl\u003c/sup\u003e \u003cem\u003eNotch1\u003c/em\u003e\u003csup\u003eTg/+\u003c/sup\u003e (KPN); \u003cem\u003eKras\u003c/em\u003e\u003csup\u003eG12D/+\u003c/sup\u003e \u003cem\u003eTrp53\u003c/em\u003e\u003csup\u003efl/fl\u003c/sup\u003e (KP), to assess how well mouse models align to human PDS classification and characterisation (Fig.\u0026nbsp;7A). Following PDS classification, A and AK models were exclusively PDS1, while KP, KPN, and BPN models were divided into PDS2 and PDS3 (n\u0026thinsp;=\u0026thinsp;1 BPN as PDS1), with BP models aligning with PDS3 (Fig.\u0026nbsp;7A; Supplementary Fig.\u0026nbsp;7A). There was a clear alignment of the biological hallmarks and CMS classifications associated with PDS1 and PDS2 between human and mouse tumours; with enrichment for CMS2 alongside MYC and cell cycle hallmarks in PDS1, while PDS2 GEMMs tumours were classified as CMS1/4 and had elevated inflammatory/stromal signalling (Fig.\u0026nbsp;7B). However, in contrast to human biology associated with PDS3, GEMM tumours classified as PDS3 were not associated with CMS2 or the transcriptional repression for hallmarks signalling observed during the characterisation of human tumours, displaying signalling similar to GEMMs classified as having a mixed subtype (Fig.\u0026nbsp;7B). An alignment between TF activation states in human and mouse PDS1 and PDS2 tumours was confirmed; however these analyses again indicated that the mouse models used did not accurately represent PDS3 biology at a regulon level (Fig.\u0026nbsp;7C). Assessment of proliferation index and replication stress according to both PDS classification and genotype demonstrated elevation of both phenotypes in PDS1, but no further suppression was observed in PDS3 compared to PDS2 (Fig.\u0026nbsp;7D; Supplementary Fig.\u0026nbsp;7B). Finally, while elevation for the Myc-PRC targets signalling continuum is maintained in PDS1 GEMMs, there was again little distinction between PDS2 and PDS3 (Fig.\u0026nbsp;7E-F). These findings suggest that tumours from the PDS3 mouse models do not faithfully replicate the phenotypic biology underpinning human PDS3 tumours, indicating a critical limitation for pre-clinical development and testing of PDS3-specific therapeutic options.\u003c/p\u003e \u003cp\u003eOverall, these data indicate the availability of appropriate GEMM models for human PDS1 and PDS2 evaluation, however the genotypes and models used here were inconsistent with human PDS3 biology.\u003c/p\u003e \u003cp\u003e \u003cb\u003eAssessment of bulk tumour-derived PDS subtype-specific phenotypes in CRC epithelial single cell RNA data.\u003c/b\u003e \u003c/p\u003e \u003cp\u003eLeveraging single cell RNA sequencing data (scRNAseq) from n\u0026thinsp;=\u0026thinsp;49,155 epithelial cells derived from n\u0026thinsp;=\u0026thinsp;63 CRC tumours\u003csup\u003e\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e\u003c/sup\u003e, we next set out to test how well our bulk data analyses translate into this more granular data type. In line with PDS analyses in bulk tumours and epithelial-specific intestinal lineages, these analyses revealed that neoplastic cells maintain a polarised state along a MYC-PRC targets axis, when viewed according to individual patient clusters (Fig.\u0026nbsp;8A-C, Supplementary Fig.\u0026nbsp;8A), or when clustered according to the ontology-level classification method used for PDS development (Fig.\u0026nbsp;8D). Cell cycle annotation revealed that individual PDS1-like neoplastic cells that display higher MYC target signalling were more likely to be in S or G2M phase, whereas PDS3-like cells elevated for PRC target expression were more likely to be in G1 phase (Fig.\u0026nbsp;8E). These results are more pronounced when assessed in cells at either end of the MYC-PRC continuum, which are most representative of PDS1 and PDS3 biology (Fig.\u0026nbsp;8F, Supplementary Fig.\u0026nbsp;8B-D). Additional characterisation of these clusters further confirms faithful alignment between bulk and scRNAseq, as the relations are maintained between phenotypes of elevated proliferation and replication stress in MYC target-high PDS1-like cells, alongside elevation of the KRAS repression signature (KRAS signal DN) and neuroactive ligand receptor interactions in PRC target-high PDS3-like cells (Fig.\u0026nbsp;8G-H, Supplementary Fig.\u0026nbsp;8E). Additionally, while the associations between enteroendocrine cell signalling and PDS biology observed in bulk RNA data and IHC assessments were again evident in single tumour epithelial cells, it was clear that not all PDS3-like cells were EECs (Fig.\u0026nbsp;8I, Supplementary Fig.\u0026nbsp;8F). Finally, assessment of iCMS labels for tumour epithelial cells, originally derived within this scRNAseq cohort, confirmed findings from our bulk tumour analyses, with a non-exclusive yet significant association between PDS1 and PDS3 biological classification and iCMS (Fig.\u0026nbsp;8J, Supplementary Fig.\u0026nbsp;8G).\u003c/p\u003e \u003c/div\u003e"},{"header":"Discussion","content":"\u003cp\u003eNumerous studies, including those employing CMS and CRIS signatures, have used individual gene-level data for molecular subtyping class discovery, followed by correlation of identified groups with existing phenotype data\u003csup\u003e\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e,\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e\u003c/sup\u003e. However, in this study we leverage pathway-level data as an input for class discovery, using pre-defined sets of experimentally validated signatures that mechanistically underpin important cancer-relevant functions, to provide a more direct link between cancer phenotypes in a way that transcends mutational status and current transcriptional subtyping approaches. This approach identified three pathway-derived subtypes, PDS1-3, merging the inflammatory/stromal tumours together as PDS2, while at the same time sub-stratifying epithelial-rich tumours into two distinct pathway-derived subtypes, PDS1 and PDS3. Biological and molecular characterisation reveal that PDS3 lesions display significantly lower cell cycle, stress response and proliferation rates, alongside increased cellular differentiation compared to PDS1/2 tumours. Remarkably, given the repression for many cancer-related biological phenotypic features that dominate PDS3 tumour biology, these lesions display the worst relapse-free survival rates in multiple adjuvant cohorts, including the randomised phase III PETACC-3 clinical trial\u003csup\u003e\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e\u003c/sup\u003e. These data reveal a novel subtype of highly aggressive tumours, PDS3, that display relapse rates equivalent to those in the stroma-rich CMS4 subtype, previously associated with having the worst outcomes in this setting (summarised in Fig.\u0026nbsp;8K).\u003c/p\u003e \u003cp\u003eData presented here are consistent with a role for EZH2 in maintaining an EMT regulatory network and stem-rich state through H3K27-mediated repression, similar to that recently described in bladder cancer\u003csup\u003e\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e\u003c/sup\u003e. However, this does not explain the poor outcomes associated with PDS3 tumours in the adjuvant setting; a point that will certainly require further investigation. It is still possible that the repressed and differentiated patterns that dominate PDS3 bulk tumour profiles are concealing small subsets of aggressive and/or chemotherapy-resistant subclones, similar to mesenchymal subpopulations recently described in melanoma\u003csup\u003e\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e\u003c/sup\u003e. A recent study identified how FOXA2 orchestrates lineage plasticity driving an adeno-to-neuroendocrine lineage transition in prostate cancer, all of which align with the neural-like traits and activation of FOXA2 we observe in in PDS3 tumours\u003csup\u003e\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e\u003c/sup\u003e. Importantly, unlike in the prostate setting, the IHC and morphology data presented in our current study only indicate the presence of focal reactivity for neuroendocrine markers and would not support a classification of PDS3 colon tumours as cases of adenocarcinoma with partial neuroendocrine differentiation and certainly not colonic neuroendocrine tumour (NETs). Furthermore, the poor outcomes associated with PDS3 align with a recently reported role for PRC components in determining how well tumours, and tumour cells, respond, adapt and survive during sustained environmental stress; where cells deficient in epigenetic regulators, particularly EZH2, have a superior fitness advantage over EZH2-proficient tumours\u003csup\u003e\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e\u003c/sup\u003e. These stress-resistant subpopulations were characterised by low cell cycling, “transcriptional numbness” and overall phenotypic inertia, indicating there was a non-genetic basis for development of this fitness advantage, which is remarkably similar to the characteristics of PDS3 tumours presented here. Intriguingly, the presence of senescence-like physiology may itself expose a therapeutic opportunity in PDS3 tumours\u003csup\u003e\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e\u003c/sup\u003e. Finally, as our study has focussed on molecular information, there may be a series of unaccounted for epidemiological or microbial/viral factors that underpin both the biological traits of the PDS classes and their associated clinical outcomes.\u003c/p\u003e \u003cp\u003eA recent single cell RNAseq study in tumour epithelial cells used gene-level data for class discovery, resulting in sub-stratification of stroma-rich CMS4 tumours into the distinct prognostic groups based on intrinsic biology; iCMS2 and iCMS3. However this study did not identify different subtypes within epithelial-rich CMS2 tumours\u003csup\u003e\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e\u003c/sup\u003e. In data presented here, we clearly demonstrate that by using pathway-level data for class discovery in bulk data, rather than a gene-level method in single cell neoplastic lineages, CMS2 tumours can be robustly and reproducibly sub-stratified into clearly distinct prognostic groups driven by opposing biological signalling. Importantly, these traits are evident across multiple bulk transcriptomic cohorts or when applied to the epithelial scRNAseq data used for development of the iCMS classifier. The biological pathway activation approach that underpins PDS classification will also provide a novel basis for forward and reverse translation studies, which are essential to enable tumour subtyping to move beyond providing prognostic information alone, and to deliver real clinical impact through improved prediction of chemotherapeutic or targeted therapeutic responses. Although the CMS, iCMS, CRIS and PDS approaches may be seen as competing, classification in parallel can reveal important granularity in both epithelial-rich and stroma-rich tumours, which is critical for the development of new treatment approaches.\u003c/p\u003e \u003cp\u003eIn data presented here, we demonstrate how well represented human PDS biology is within a series of large pre-clinical models, revealing that PDS1 and PDS2 align well with a range of available GEMMs. Importantly, GEMM tumours classified as PDS3 failed to faithfully align to the biological signalling that characterise this subtype in human tumours, which, until addressed, will limit more comprehensive mechanistic studies. The lack of accurate PDS3 models is intriguing but perhaps not surprising, based on how current systems are developed; given the relatively short lifespan of mice, there is a bias towards the development of fast growing lesions that thrive following an “all at once” induction of driver genes\u003csup\u003e\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e\u003c/sup\u003e. This scenario does not replicate the sequential accumulation of genetic alterations and environmental changes seen in human tumour development, which can occur over a 15 + year period\u003csup\u003e\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e\u003c/sup\u003e. As such, if PDS3 biology is to be modelled in a way that more faithfully aligns to human tumours, methodologies underpinning model development require urgent attention. Modelling PDS may require the use of sequential and/or temporally inducible alleles, in combination with non-genetic and environmental factors that mimic precursor and tumour developmental latencies, or indeed variations in cell-of-origin during tumour induction.\u003c/p\u003e \u003cp\u003eGenetic models of cancer development have defined much of our understanding of tumour biology\u003csup\u003e\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e\u003c/sup\u003e; however it is increasingly clear that substantial phenotypic and clinical heterogeneity persist within genetically identical tumours. The PDS classifier was developed in \u003cem\u003eKRAS\u003c/em\u003emut colorectal tumours, yet we demonstrate how it can be used to stratify tumour samples in a genotype-agnostic manner across numerous independent cohorts of colorectal, lung and pancreatic cancer. Using a co-culture system, Tape and colleagues\u003csup\u003e\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e\u003c/sup\u003e demonstrated how extrinsic factors can trigger intrinsic signalling in non-mutated organoids that mimic signalling states induced by mutations in \u003cem\u003eKRAS\u003c/em\u003e +/- loss of \u003cem\u003eTP53\u003c/em\u003e. In line with this reductionist model, data presented here provide evidence that many cancer-relevant signalling cascades can be activated/repressed independent of mutational status, meaning that genotype alone cannot explain the signalling and phenotypes observed. As such, our study provides an additional informative link between gene expression data and clinical/biological phenotypes. Mapping the ongoing interplay between intrinsic and extrinsic signalling within each individual tumour may explain how activation patterns for each pathway are reached, and detangling of such nuanced networks of inter-compartmental crosstalk may ultimately result in the identification of unique tumour-specific explanations, or perhaps more likely, multiple individual cellular-specific findings.\u003c/p\u003e \u003cp\u003eAdditional studies are now clearly needed to mechanistically explain the development, evolution and progression patterns underpinning PDS3 in particular. Until such time that spatially-resolved multi-omics information can be converted into clinical tools, the classifier developed within this study, alongside existing classifiers such as CRIS, CMS and iCMS, provide the basis for stratification of tumours into discrete biological groups that can be used as a template for mechanistic and translational studies.\u003c/p\u003e "},{"header":"Methods and Materials","content":"\u003cp\u003e \u003cb\u003eDatasets.\u003c/b\u003e The majority of the datasets utilised in this study were accessed through the S:CORT programme\u003csup\u003e\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e\u003c/sup\u003e including FOCUS (GSE156915\u003csup\u003e34\u003c/sup\u003e), SPINAL (Internal S:CORT) and Polyp Dataset (Internal S:CORT), where microarray expression profiles, mutation, clinical, immunohistochemistry (IHC), tissue blocks and tumour microarrays (TMAs) were available. FOCUS: MRC-funded randomised trial cohort consisting of 360 formalin-fixed paraffin-embedded primary tumour samples for metastatic CRC. SPINAL: 258 FFPE samples from CRC patients, mixed stages. Other publicly available datasets were accessed from Gene Expression Omnibus (GEO) with accession number: GSE39582\u003csup\u003e2\u003c/sup\u003e, GSE76402\u003csup\u003e3\u003c/sup\u003e and GSE143915\u003csup\u003e22\u003c/sup\u003e. The validation of clinical association was carried out PETACC-3 cohort\u003csup\u003e\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e\u003cp\u003e \u003cb\u003eGene Expression – Microarray and RNA-Sequencing.\u003c/b\u003e In all microarray datasets, the probes-to-genes were collapsed using the \u003cem\u003ecollapseRows\u003c/em\u003e function in \u003cem\u003eWGCNA\u003c/em\u003e R package (v1.70-3), where the probe with the highest average value per gene was selected\u003csup\u003e\u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e\u003c/sup\u003e. For TCGA COREAD RNA-Seq data, \u003cem\u003eTCGAbiolinks\u003c/em\u003e R package (v2.16.1) was used to downloaded HT-Seq counts\u003csup\u003e\u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e\u003c/sup\u003e. Using \u003cem\u003evarFilter\u003c/em\u003e in \u003cem\u003egenefilter\u003c/em\u003e R package (v1.70.0), low variance genes (var.cutoff = 0.25) were excluded. The quantile normalisation and Log2 transformation was applied to the count matrix. STAR counts for TCGA LUAD and TCGA PAAD were also downloaded using the \u003cem\u003eTCGAbiolinks\u003c/em\u003e R package (v2.25.2). The publicly available microarray gene expression profiles (GEP) utilised in this study were downloaded from Gene Expression Omnibus (GEO), including, GSE39582, FOCUS, GSE39582, SPINAL, PETACC-3, FOxTROT, TCGA (COAD, LUAD, PAAD), Polyp Validation, GEMMs, GSE143915.\u003c/p\u003e\u003cp\u003e\u003cb\u003eGenetically engineered mouse models (GEMMs).\u003c/b\u003e All animal experiments were performed according to a UK Home Office licence (Project License 70/8646) and were reviewed by the animal welfare and ethical board of the University of Glasgow. Mice of both genders aged for about 6 to 12 weeks and were systematically induced with a single injection of 2mg tamoxifen (Sigma-Aldrich, T5648) using intraperitoneal injection and sampled at clinical endpoint, which was defined as weight loss and/or hunching and/or cachexia. All experiments were performed on mice with C57BL/6 background. RNA was extracted using either a RNeasy mini kit (Qiagen) or TRIzol reagent (Thermo Fisher Scientific) and its concentrations were assessed using a NanoDrop 200c spectrophotometer (ThermoScientific). RNA quality was evaluated using an Agilent 220 Tapestation, screentape and sequenced using an Illumina TruSeq RNA sample prep kit, then run on an Illumina NextSeq using the High Output 75 cycles kit (2 x 36 cycles, paired end reads, single index). The quality of raw sequence was assessed using the FastQC algorithm (v0.11.8). Sequences were trimmed to remove adaptors and low-quality base calls, defined as those with a Phred score of \u0026lt; 20, using the Trim Galore tool (v0.6.4). Thereafter the trimmed sequences were aligned to the mouse genome build GRCm38.98 using HISAT2 (v2.1.0), then FeatureCounts (v1.6.4) was used to determine raw counts per gene. Two mouse cohorts, that were used in our previous study\u003csup\u003e\u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e40\u003c/span\u003e\u003c/sup\u003e, namely small cohort (n = 18) and large cohort (n = 39) were amalgamated to create a larger cohort of GEMM. APN and AP models were excluded, as the batch they were sequenced in was deeply confounded by genotype, resulting in a collection of n = 51 tumour samples, including 6 genotypes: \u003cem\u003eApc\u003c/em\u003e\u003csup\u003efl/+\u003c/sup\u003e (A); \u003cem\u003eApc\u003c/em\u003e\u003csup\u003efl/+\u003c/sup\u003e\u003cem\u003eKras\u003c/em\u003e\u003csup\u003eG12/+\u003c/sup\u003e (AK); \u003cem\u003eBraf\u003c/em\u003e\u003csup\u003eV600E/+\u003c/sup\u003e\u003cem\u003eTrp53\u003c/em\u003e\u003csup\u003efl/fl\u003c/sup\u003e (BP) and \u003cem\u003eBraf\u003c/em\u003e\u003csup\u003eV600E/+\u003c/sup\u003e\u003cem\u003eTrp53\u003c/em\u003e\u003csup\u003efl/fl\u003c/sup\u003e Notch1\u003csup\u003eTg/+\u003c/sup\u003e (BPN), \u003cem\u003eKras\u003c/em\u003e\u003csup\u003eG12D/+\u003c/sup\u003e\u003cem\u003eTrp53\u003c/em\u003e\u003csup\u003efl/fl\u003c/sup\u003e\u003cem\u003eNotch1\u003c/em\u003e\u003csup\u003eTg/+\u003c/sup\u003e (KPN); \u003cem\u003eKras\u003c/em\u003e\u003csup\u003eG12D/+\u003c/sup\u003e\u003cem\u003eTrp53\u003c/em\u003e\u003csup\u003efl/fl\u003c/sup\u003e (KP). After batch correction using \u003cem\u003eComBat\u003c/em\u003e_seq function in \u003cem\u003esva\u003c/em\u003e R package (v3.40.0), \u003cem\u003evst\u003c/em\u003e function in \u003cem\u003eDESeq2\u003c/em\u003e R package (v1.32.0) was applied to normalize the read counts.\u003c/p\u003e\u003cp\u003e \u003cb\u003eUnsupervised Discovery of Pathway Derived Subtype (PDS).\u003c/b\u003e Four well-curated publicly available gene set collections: 1) BIOCARTA (gene set: n = 289), 2) KEGG (n = 186), 3) PID (n = 196), and 4) REACTOME (n = 1499), were accessed from the Molecular Gene Signature Database (MSigDB) via \u003cem\u003emsigdbr\u003c/em\u003e R package (v7.0.1) and were utilised to generate a matrix of single sample ontology scores from gene expressions of the FOCUS cohort using \u003cem\u003eGSVA\u003c/em\u003e R package (v1.26.0). The parameters \u003cem\u003emin.sz = 10\u003c/em\u003e, \u003cem\u003emethod=‘ssgsea’\u003c/em\u003e and \u003cem\u003essgsea.norm = T\u003c/em\u003e was set, that resulted with ontology scores for 1783 gene sets. A subset of 165 \u003cem\u003eKRAS\u003c/em\u003emut primary tumour CRC samples were selected from the single sample score matrix (excluding \u003cem\u003eKRAS\u003c/em\u003e wild-types, \u003cem\u003eBRAF\u003c/em\u003e and \u003cem\u003eNRAS\u003c/em\u003e mutants) for unsupervised class discovery phase. The score matrix was subjected to dimensionality reduction \u003cem\u003et\u003c/em\u003e-SNE analysis, and the two continuous variables (Dim1 and Dim2) were obtained using \u003cem\u003eRtsne\u003c/em\u003e R package (v0.15). The variables were scaled before applying unsupervised \u003cem\u003ek\u003c/em\u003e-means clustering. The silhouette width and elbow methods determined \u003cem\u003ek\u003c/em\u003e = 3 as an optimal number of clusters (\u003cem\u003ecluster\u003c/em\u003e R package; v2.1.2, and \u003cem\u003efactorextra\u003c/em\u003e R package; v1.0.7), and bootstrap resampling method from \u003cem\u003efpc\u003c/em\u003e R package (v2.2.3) coincided \u003cem\u003ek\u003c/em\u003e = 3 as the highly stable number for clustering. Following the unsupervised \u003cem\u003ek\u003c/em\u003e-means clustering, three groups of clusters were named as Pathway Derived Subtypes (PDS): PDS1, PDS2 and PDS3.\u003c/p\u003e\u003cp\u003e \u003cb\u003eDevelopment and application of PDS Classification System.\u003c/b\u003e Utilising the discovery subset (n = 165) where PDS classes were defined previously, the discovery set was randomly divided into a training set (n = 125) and a test set (n = 40) based on the bootstrap resampling method using \u003cem\u003ecaret\u003c/em\u003e R package (v6.0-90). Three different classification algorithms were tested, including the Nearest Shrunken Centroid (NSC; or pam – Prediction Analysis of Microarrays), Lasso and Elastic-Net Regularised Generalised Linear Model (glmnet) and Support Vector Machine via Radial Basis Function (svmRBF), implemented in \u003cem\u003ecaret\u003c/em\u003e R package (v6.0-90). As a feature selection step to reduce the number of gene sets and draw out only subgroup specific gene sets, the scores above average specific to each subgroup were selected and further highly correlated gene sets (\u0026gt; 0.9) were excluded, resulting in 626 gene sets in total. To make it feasible for users, the test run was performed on gene expression matrix and the ontology score conversion steps were implemented within the classification model. The gene expression matrix of the test samples was first converted to the single sample scores with \u003cem\u003egsva\u003c/em\u003e function along with the parameters \u003cem\u003essgsea.norm = F\u003c/em\u003e that generated unscaled scores. The unscaled scores were scaled using the min-max scaling method employed by the \u003cem\u003egsva\u003c/em\u003e function, where the minimum value of -8554.114, and maximum value of 12374.819, determined during the class discovery phase, was used. Once the score matrix of the test samples was generated, it was batch-corrected against the training set as the reference batch using the \u003cem\u003eComBat\u003c/em\u003e function from \u003cem\u003esva\u003c/em\u003e R package (v3.42.0).\u003c/p\u003e\u003cp\u003eBefore running the classification algorithms, leave-one out cross-validation (LOOCV) was employed in the training set to minimise overfitting of classification. The classification algorithms were trained on the training set and the corresponding hyperparameters were adjusted to finalise the models. Out of the three classification models, the svmRBF algorithm displayed a high classification performance on the test data. Thus, the svmRBF was selected to develop the PDS classification system. The \u003cem\u003ePDSclassifier\u003c/em\u003e R package has been developed for the PDS classification model that is available to share (see ‘Data and Code Availability’). Subsequently, the \u003cem\u003ePDSclassifier\u003c/em\u003e was employed to classify samples from other transcriptomic validation datasets.\u003c/p\u003e\u003cp\u003e \u003cb\u003eGene Set Enrichment Analysis.\u003c/b\u003e To define subtype-specific biological association, gene set enrichment analysis (GSEA) was employed using \u003cem\u003efgsea\u003c/em\u003e\u003csup\u003e\u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e\u003c/sup\u003e R package (v1.21.0) with \u003cem\u003eeps = 0\u003c/em\u003e and \u003cem\u003enPerSimple = 10000\u003c/em\u003e. For \u003cem\u003efgsea\u003c/em\u003e, a ranked gene list was first obtained for each comparison in each dataset via \u003cem\u003elimma\u003c/em\u003e R package (v3.50.3). The comparison between subtypes were made in a grouped pairwise manner. Statistical significance was measured with Benjamini-Hochberg (BH) False Discovery Rate (FDR) \u0026lt; 0.05 and normalised enrichment score (NES) indicates the upregulation (positive value) and down-regulation (negative value). Additionally, single sample enrichment scores were generated via \u003cem\u003eGSVA\u003c/em\u003e R package (v1.42.0) with ‘\u003cem\u003essgsea’\u003c/em\u003e method\u003csup\u003e\u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e\u003cp\u003e \u003cb\u003eGene Signatures.\u003c/b\u003e The majority of the gene set signatures, including the ‘Hallmarks’ gene set collection, was accessed from the Molecular Signature Database (MSigDB) using \u003cem\u003emsigdbr\u003c/em\u003e R package (v7.4.1) for both human and mouse species\u003csup\u003e\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e\u003c/sup\u003e. Other gene set signatures include, (i) stem-related crypt-base columnar (CBC) and regenerative stem cells (RSC)\u003csup\u003e\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e\u003c/sup\u003e (ii) precursor polyp-related tubular and serrated signatures obtained from differential gene expression analysis between tubular and serrated adenomas from GSE45270\u003csup\u003e19\u003c/sup\u003e using \u003cem\u003elimma\u003c/em\u003e\u003csup\u003e\u003cspan citationid=\"CR43\" class=\"CitationRef\"\u003e43\u003c/span\u003e\u003c/sup\u003e R package (v3.46.0), (iii) enteroendocrine cell (EEC) cell markers from PanglaoDB database\u003csup\u003e\u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e44\u003c/span\u003e\u003c/sup\u003e and, (iv) MYC and PRC target modules\u003csup\u003e\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e\u003c/sup\u003e, used for the enrichment analyses. These gene signatures were also applied to mouse data where directly applicable via MSigDB or, the gene signatures were converted to the mouse orthologues via the \u003cem\u003eensembl\u003c/em\u003e using \u003cem\u003ebiomaRt\u003c/em\u003e R package (v2.50.3)\u003csup\u003e\u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e45\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e\u003cp\u003e \u003cb\u003eProliferative Index, Replication Stress, and Intestinal Stem Cell Index.\u003c/b\u003e Replication stress was calculated from a collection of transcriptional signatures associated with cell-cycle and DNA repair extracted from the MSigDB for both species, where GSVA scores were generated using \u003cem\u003e‘gsva’\u003c/em\u003e method from \u003cem\u003eGSVA\u003c/em\u003e\u003csup\u003e\u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e\u003c/sup\u003e R package (v1.42.0), followed by the total sum of the GSVA scores per sample across gene sets, as previously described\u003csup\u003e\u003cspan citationid=\"CR46\" class=\"CitationRef\"\u003e46\u003c/span\u003e\u003c/sup\u003e. The transcriptomic measure of proliferation was measured with \u003cem\u003eProliferativeIndex\u003c/em\u003e\u003csup\u003e\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e\u003c/sup\u003e R package (v1.0.1) which calculates proliferative index (PI) from a list of proliferative cell nuclear antigen (PCNA)-associated genes. For mouse, the PCNA-associated gene signature was converted to mouse orthologues as mentioned previously before applying the signature to calculate the PI score. Intestinal Stem Cell Index (ISCindex) provides a continuum score from gene expression dataset that represents stem cell phenotype with the extreme ends of scoring scale to be either strongly conventional crypt-base columnar stem cell (CBC) or a regenerative stem cell (RSC) phenotype. The \u003cem\u003eISCindex\u003c/em\u003e\u003csup\u003e\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e\u003c/sup\u003e R package (v0.0.0.9) was downloaded and used as previously described.\u003c/p\u003e\u003cp\u003e \u003cb\u003eTranscription Factor Activity Analysis.\u003c/b\u003e For the quantification of transcription factor (TF) activity from gene expression profiles, the collection of well-curated TFs and their targets in DoRothEA\u003csup\u003e\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e\u003c/sup\u003e database was accessed where the TF-target interaction with high confidence A and B were selected for the analyses on both human and mouse using \u003cem\u003edorothea\u003c/em\u003e R package (v1.6.0). The statistically significant difference on TF activity between subtypes with p-value of \u0026lt; 0.05 was determined using ‘\u003cem\u003erowTtest’\u003c/em\u003e function from \u003cem\u003eviper\u003c/em\u003e\u003csup\u003e\u003cspan citationid=\"CR47\" class=\"CitationRef\"\u003e47\u003c/span\u003e\u003c/sup\u003e R package (v1.28.0). A list uniquely activated TFs per subtype across cohorts were identified and visually presented in a heatmap.\u003c/p\u003e\u003cp\u003e \u003cb\u003eCell Lineage Analysis.\u003c/b\u003e The ESTIMATE was applied to produce immune and stromal fractions as well as tumour purity scores using \u003cem\u003eestimate\u003c/em\u003e R package (v1.0.13)\u003csup\u003e\u003cspan citationid=\"CR48\" class=\"CitationRef\"\u003e48\u003c/span\u003e\u003c/sup\u003e. Microenvironment Cell Populations-counter (MCP-counter) was used to quantify the cellular abundance in a heterogenous tissue from transcriptomic data that generates infiltration estimate scores for two stromal and eight immune cell populations\u003csup\u003e\u003cspan citationid=\"CR49\" class=\"CitationRef\"\u003e49\u003c/span\u003e\u003c/sup\u003e. MCP-counter analysis was performed using \u003cem\u003eMCPcounter\u003c/em\u003e R package (v1.2.0). Moreover, to explore the wider immune and stromal contexture for finer granularity, xCell was applied to calculate infiltration enrichment scores for 64 different immune and stromal cell populations using \u003cem\u003exCell\u003c/em\u003e R package (v1.1.0)\u003csup\u003e\u003cspan citationid=\"CR50\" class=\"CitationRef\"\u003e50\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e\u003cp\u003e \u003cb\u003eCRC Molecular Subtyping.\u003c/b\u003e Tumour samples in each cohort were classified into CMS using the random forest (RF) method from \u003cem\u003eCMSclassifier\u003c/em\u003e (v1.0.0) at default threshold with exception to the FOCUS and SPINAL cohorts where the CMS posterior probability threshold levels were reduced to 0.4 due to large number of non-consensus ‘unknown’ samples\u003csup\u003e\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e\u003c/sup\u003e. CMS on mouse data were called using ‘Option C’ classification in \u003cem\u003eMmCMS\u003c/em\u003e R package (v0.1.0)\u003csup\u003e\u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e40\u003c/span\u003e\u003c/sup\u003e. CRIS classification was made using \u003cem\u003eCRISclassifier\u003c/em\u003e (v1.0.0) at default setting\u003csup\u003e\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e\u003c/sup\u003e. Unclassified samples were determined using the recommended Benjamini-Hochberg (BH) False Discovery Rate (FDR) of \u0026gt; 0.2. For iCMS classification, the iCMS gene signatures were extracted to create the iCMS template which was subsequently used with the nearest template prediction method embedded in the \u003cem\u003eCMScaller\u003c/em\u003e (v2.0.1)\u003csup\u003e\u003cspan citationid=\"CR51\" class=\"CitationRef\"\u003e51\u003c/span\u003e\u003c/sup\u003e, as previously described\u003csup\u003e\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e\u003c/sup\u003e. Samples above 0.05 FDR were classed as ‘unknown’.\u003c/p\u003e\u003cp\u003e\u003cb\u003eImmunohistochemistry (IHC) and Digital Histology Scoring.\u003c/b\u003e Tissue microarrays (TMAs) were scanned at x20 magnification and imported into QuPath\u003csup\u003e\u003cspan citationid=\"CR52\" class=\"CitationRef\"\u003e52\u003c/span\u003e\u003c/sup\u003e (v0.2.3). The suitability of individual Ki67\u003csup\u003e+\u003c/sup\u003e IHC-stained cores for inclusion was determined by manual visual assessment of the scanned images, after application of the QuPath TMA dearraying tool. Colour deconvolution was applied to separate stains, followed by tissue detection (pixel threshold; resolution: low (7.96µm/px); channel: average channels; gaussian prefilter; smoothing sigma: 2.0; tissue threshold: 235). Cells were detected within the annotated tissue (requested pixel size: 0.5; nucleus background radius: 15.0; median filter radius: 0.0; sigma: 2.5; minimum area: 10.0; maximum area: 300.0; threshold: 0.1, max background intensity 1.0; exclude DAB: false; cell parameters: default; general parameters: default) and smoothed (radius: 25µm). An object classifier (random trees; default settings) was trained by examples of annotated tumour epithelium and stroma. Set cell intensity classification (nucleus: DAB OD mean: 0.18) to differentiate positive and negative tumour epithelium and stroma. Additionally, an H score was also generated by setting a three-tier cell intensity classification (Nucleus: DAB OD mean: 0.10; 0.25; 0.42). 354 patients with matched PDS call and Ki67\u003csup\u003e+\u003c/sup\u003e assessment. The total tumour area was also enumerated and assessed based on the digital histology scoring via HALO platform (Indica Labs, Albuquerque, NM, USA) on the Hematoxylin and Eosin (H\u0026amp;E) whole slides images from FOCUS (n = 356), as previously described\u003csup\u003e\u003cspan citationid=\"CR53\" class=\"CitationRef\"\u003e53\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e\u003cp\u003e \u003cb\u003eDigital Image-based PDSclassifier.\u003c/b\u003e We developed a set of deep learning classifiers and analysed their performances using whole slide images (WSIs) from the FOCUS and SPINAL cohorts. After rejecting images of poor quality and images from patients with undefined PDS calls by pathologist review, we used a dataset of 997 WSIs of H\u0026amp;E-stained resection specimens from 520 patients (PDS1: n_slides = 329; PDS2: n_slides = 314; PDS3: n_slides = 354). We conducted experiments under a 5-fold cross-validation protocol: for each fold, we split the data according to a 60-20-20 training-validation-test distribution such that classes and cohorts were stratified. We made sure that the validation and test splits did not overlap across the five folds and that images from the same patient were always in the same split. Tumour regions were manually annotated in all WSIs: we then restricted our experiments to the use of image data from these regions to prevent potential classification bias from non-tumour regions. The tumour regions of these WSIs were tiled into sets of image patches of size 318x318px extracted at magnification 5x (resolution ~ 2µm/px) with 50% overlap. We used a customized 50-layer ResNet as a deep learning architecture to process such input image patches and output a probability density over the PDS classes. Each image patch was labelled using the PDS class of their WSI of origin and our models were trained to maximize the output probability for the target class (minimization of the cross-entropy loss) using mini-batches of size 16. At inference time, we applied the trained models on all the generated tiles from a given WSI and then averaged their predicted probability densities to produce slide-level probability estimates. We then selected the class with the highest relative probability score as the image-based PDS call for this WSI. This approach was based on the weakly supervised learning protocol proposed within the imCMS study\u003csup\u003e\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e\u003c/sup\u003e. The classification performance of the trained models was systematically assessed using the test patrician for each fold.\u003c/p\u003e\u003cp\u003e \u003cb\u003eMutation and Copy Number Profiles.\u003c/b\u003e Mutational and copy number associations between PDS were determined, when available, across at least four different CRC cohorts: FOCUS\u003csup\u003e\u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e\u003c/sup\u003e, GSE39582\u003csup\u003e2\u003c/sup\u003e, SPINAL (Internal S:CORT) and TCGA COREAD\u003csup\u003e\u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e\u003c/sup\u003e. The proportion of driver mutations and their variants \u003cem\u003eKRAS\u003c/em\u003e, \u003cem\u003eBRAF\u003c/em\u003e and \u003cem\u003eTP53\u003c/em\u003e were examined in detail across these cohorts. The mutational data for TCGA COREAD were retrieved from the GDC via \u003cem\u003eTCGAbiolinks\u003c/em\u003e R package (v2.25.2) and analysed using \u003cem\u003emaftools\u003c/em\u003e R package (v2.12.0)\u003csup\u003e\u003cspan citationid=\"CR54\" class=\"CitationRef\"\u003e54\u003c/span\u003e\u003c/sup\u003e. Oncoprint was also used to interrogate and visualise the genetic alternations in key driver genes along signalling pathways, including WNT, MAPK, PIK3CA, cell cycle and TGF-β pathways using \u003cem\u003eComplexHeatmap\u003c/em\u003e (v2.12.0) and CRAN \u003cem\u003ecirclize\u003c/em\u003e (v0.4.15) R packages\u003csup\u003e\u003cspan citationid=\"CR55\" class=\"CitationRef\"\u003e55\u003c/span\u003e\u003c/sup\u003e. For FOCUS and SPINAL, copy number chromosomal arm calls and copy number estimations per gene were available via the S:CORT consortium. For TCGA COREAD, the GISTIC data was accessed via GDC data portal\u003csup\u003e\u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e\u003cp\u003e \u003cb\u003eSurvival Analyses.\u003c/b\u003e Two different cohorts (GSE39582 and PETACC-3) were explored for clinical and prognostic evaluation. Only PDS samples were considered for the analysis, excluding ‘mixed’ samples. Moreover, patients with missing information on relapse-free status (RFS), relapse-free months or chemotherapy treatment status, and patients with record of less than a month were excluded from all survival analyses in GSE39582. The Cox proportional hazards method was also performed to calculate the hazard ratio and confidence intervals (CI) for statistical group comparisons. All survival analyses and visualisation were carried out with \u003cem\u003esurvival\u003c/em\u003e (v3.2-13), \u003cem\u003esurvminer\u003c/em\u003e (v0.4.9) and \u003cem\u003eggplot\u003c/em\u003e2 (v3.3.6) R packages.\u003c/p\u003e\u003cp\u003e \u003cb\u003eSingle Cell Data Analyses.\u003c/b\u003e The processed count expression matrix for n = 49,155 epithelial cells and the corresponding epithelial metadata were downloaded through the Synapse under the accession code syn26844071, as previously described\u003csup\u003e\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e\u003c/sup\u003e. The count expression matrix for tumour epithelial cells were normalised using \u003cem\u003eSCTransfrom\u003c/em\u003e in \u003cem\u003eSeurat\u003c/em\u003e R package (v4.1.1)\u003csup\u003e\u003cspan citationid=\"CR56\" class=\"CitationRef\"\u003e56\u003c/span\u003e\u003c/sup\u003e with \u003cem\u003emethod=‘glmGamPoi’\u003c/em\u003e implemented from \u003cem\u003eglmGamPoi\u003c/em\u003e R package (v1.8.0). Enrichment scores per cell from signatures (KRAS Signal DN (HALLMARK)\u003csup\u003e\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e\u003c/sup\u003e, MYC/PRC target modules\u003csup\u003e\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e\u003c/sup\u003e, neuroactive ligand receptor interaction (KEGG), enteroendocrine cells\u003csup\u003e\u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e44\u003c/span\u003e\u003c/sup\u003e, proliferative index\u003csup\u003e\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e\u003c/sup\u003e, PDS-specific gene set (n = 626) were calculated using \u003cem\u003eenrichIt\u003c/em\u003e function in the \u003cem\u003eescape\u003c/em\u003e R package (v1.6.0) with \u003cem\u003emethod=‘ssgsea’\u003c/em\u003e and \u003cem\u003emin.size = 1\u003c/em\u003e. For replication stress, the list of gene sets was used to get the enrichment scores per cell, followed by the sum of enrichment scores per cell. These scores scaled between range of -1 to 1 using \u003cem\u003erescale\u003c/em\u003e function in \u003cem\u003escales\u003c/em\u003e R package (v1.2.0). The \u003cem\u003eCellCycleScoring\u003c/em\u003e function in \u003cem\u003eSeurat\u003c/em\u003e R package predicted the cell cycle phase per cell into G1, S and G2M. Using MYC and PRC target enrichment scores, cells were defined ‘High’, ‘Mid’ and ‘Low’ by tertile split, where MYC-target-High and PRC-target-High cells were re-evaluated and compared. Additional R packages for single cell analyses include, \u003cem\u003eSeuratDisk\u003c/em\u003e (v0.0.0.9020) and \u003cem\u003eSeuratWrappers\u003c/em\u003e (v0.3.0).\u003c/p\u003e\u003cp\u003e \u003cb\u003eStatistics and Data Visualisation.\u003c/b\u003e \u003c/p\u003e\u003cp\u003eStatistical analysis conducted in this study has been performed in R using \u003cem\u003estats\u003c/em\u003e (v4.2.1) or \u003cem\u003eggpubr\u003c/em\u003e (v0.4.0) R package for plots, including Student’s \u003cem\u003et\u003c/em\u003e-test, Kruskal-Wallis rank-sum test, Fisher’s exact test, and Pearson’s correlation coefficient test. For copy number by arm analysis, Pearson’ Chi-squared test post-hoc analysis was performed using \u003cem\u003echisq.posthoc.test\u003c/em\u003e R package (v0.1.2) and adjusted p-value with Benjamini-Hochberg using \u003cem\u003ep.adjust\u003c/em\u003e function in \u003cem\u003estats\u003c/em\u003e R package.\u003c/p\u003e\u003cp\u003eOther R packages that have been utilised for data analysis and data visualisation include, \u003cem\u003eggtern\u003c/em\u003e (v3.3.5), \u003cem\u003eComplexHeatmap\u003c/em\u003e (v2.10.0), \u003cem\u003ecirclize\u003c/em\u003e (v0.4.15), \u003cem\u003eumap\u003c/em\u003e (v0.2.8.0), \u003cem\u003eggplot2\u003c/em\u003e (v3.3.6), patchwork (v1.1.1), riverplot (v0.10), ggforce (v0.3.3), RColorBrewer (v1.1-2).\u003c/p\u003e\u003cp\u003e\u003cstrong\u003e\u003cem\u003eData and Code Availability.\u003c/em\u003e\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe \u003cem\u003ePDSclassifier\u003c/em\u003e R package (v0.1.0) is available on the Molecular Pathology Lab GitHub (\u003ca href=\"https://github.com/MolecularPathologyLab/PDSclassifier\"\u003ehttps://github.com/MolecularPathologyLab/PDSclassifier\u003c/a\u003e).\u003cstrong\u003e\u003cem\u003e\u0026nbsp;\u003c/em\u003e\u003c/strong\u003e\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eFinancial Support:\u0026nbsp;\u003c/strong\u003eThis work was supported by a CRUK early detection grant (Dunne; A29834), a CRUK International accelerator programme, ACRCelerate, (Dunne, Sansom, Leedham, Lawler, Maughan; A26825), a UK Medical Research Council (MRC) and CRUK co-funded Stratified Medicine Consortium programme grant, S:CORT, (Dunne, Lawler, Maughan; MR/M016587/1), an MRC National Mouse Genetics Network programme (Dunne, Leedham; MC_PC_21042), CRUK Beatson institute funding (Sansom; A21139, A17196 and A31287). Cancer Research UK programme Grant (Leedham; DRCNPG-Jun22\\100002), HDR UK Grant (Lawler), AIRC, Associazione Italiana per la Ricerca sul Cancro, Investigator Grants 20697 (Bertotti) and 22802 (Trusolino), Promedica Foundation F-87701-41-01 (Koelzer), a My First AIRC Grant (Isella; ID 19047); AIRC 5x1000 grant 21091 (Trusolino, Bertotti); European Research Council Consolidator Grant 724748 BEAT (Bertotti); H2020 grant agreement no. 754923 COLOSSUS (Trusolino); H2020 INFRAIA grant agreement no. 731105 EDIReX (Bertotti); Fondazione Piemontese per la Ricerca sul Cancro-ONLUS, 5x1000 Ministero della Salute 2016 (Trusolino). General support for the Dunne research group via the QUB Foundation.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthor Contributions:\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eSBM:\u003c/strong\u003e initial data-driven investigation, conceptualisation, discovery and development of classifier, methodology, data analysis, data visualisation, writing-review and editing, \u003cstrong\u003eRMB:\u003c/strong\u003e investigation, data analysis, data visualisation, writing-review and editing,\u003cstrong\u003e\u0026nbsp;ML:\u003c/strong\u003e digital image-based classifier, histological data analysis, writing-review and editing, \u003cstrong\u003eSMC:\u003c/strong\u003e data analysis, data visualisation, writing-review and editing, \u003cstrong\u003eNCF:\u0026nbsp;\u003c/strong\u003ehistological, data analysis, data visualisation, writing-review and editing, \u003cstrong\u003ePT:\u003c/strong\u003e data analysis, data visualisation, writing-review and editing, \u003cstrong\u003eAC:\u003c/strong\u003e mouse experiments, mouse model data generation, writing-review and editing, \u003cstrong\u003eTM:\u003c/strong\u003e mouse experiments, mouse model data generation, writing-review and editing, \u003cstrong\u003eAKN:\u003c/strong\u003e mouse experiments, mouse model data generation, writing-review and editing, \u003cstrong\u003eKG:\u003c/strong\u003e mouse data curation, writing-review and editing, \u003cstrong\u003eRA:\u003c/strong\u003e data analysis, data visualisation, writing-review and editing, \u003cstrong\u003eSM:\u003c/strong\u003e data analysis, data visualisation, writing-review and editing,\u003cstrong\u003e\u0026nbsp;EM:\u003c/strong\u003e multiplex analysis, writing-review and editing, \u003cstrong\u003eHD:\u003c/strong\u003e multiplex analysis, writing-review and editing, \u003cstrong\u003eEG:\u003c/strong\u003e xenograft data generation, data curation, data analysis, writing-review and editing, \u003cstrong\u003eMV:\u0026nbsp;\u003c/strong\u003exenograft data generation, data curation, data analysis, writing-review and editing, \u003cstrong\u003eER:\u003c/strong\u003e data analysis, data visualisation, writing-review and editing,\u003cstrong\u003e\u0026nbsp;KR:\u003c/strong\u003e data generation, sequencing, writing-review and editing, \u003cstrong\u003eSS:\u003c/strong\u003e writing-review and editing, \u003cstrong\u003eAM:\u003c/strong\u003e writing-review and editing, \u003cstrong\u003eCB:\u0026nbsp;\u003c/strong\u003edata analysis, writing-review and editing, \u003cstrong\u003eEH:\u003c/strong\u003e data analysis, writing-review and editing, \u003cstrong\u003eNS:\u003c/strong\u003e writing-review and editing, \u003cstrong\u003eHH:\u003c/strong\u003e writing-review and editing, \u003cstrong\u003eBA:\u003c/strong\u003e writing-review and editing, \u003cstrong\u003eED:\u003c/strong\u003e data analysis, writing-review and editing, \u003cstrong\u003eAB:\u003c/strong\u003e data curation, software, writing-review and editing , \u003cstrong\u003eSR:\u003c/strong\u003e data generation, resource, writing-review and editing, \u003cstrong\u003eCI:\u003c/strong\u003e writing-review and editing, \u003cstrong\u003eCM:\u003c/strong\u003e writing-review and editing, \u003cstrong\u003eAB:\u003c/strong\u003e resources, supervision, writing-review and editing, \u003cstrong\u003eLT:\u003c/strong\u003e resources, supervision, writing-review and editing, \u003cstrong\u003eML:\u003c/strong\u003e resources, pathological analysis, supervision, writing-review and editing, \u003cstrong\u003eEK:\u003c/strong\u003e conceptualisation, supervision, writing-review and editing, \u003cstrong\u003eST:\u003c/strong\u003e resources, supervision, writing-review and editing, \u003cstrong\u003eTM:\u0026nbsp;\u003c/strong\u003eresources, supervision, writing-review and editing, \u003cstrong\u003eML:\u003c/strong\u003e resources, supervision, writing-review and editing, \u003cstrong\u003eSJL:\u003c/strong\u003e resources, supervision, writing-review and editing, \u003cstrong\u003eVK:\u003c/strong\u003e digital image-based classifier, histological data analysis, resources, supervision, writing-review and editing, \u003cstrong\u003eOJS:\u003c/strong\u003e resources, supervision, writing-review and editing, \u003cstrong\u003ePDD:\u003c/strong\u003e initial data-driven investigation, conceptualisation, resources, supervision, writing-original draft, writing-review and editing.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eBudinska, E. \u003cem\u003eet al.\u003c/em\u003e Gene expression patterns unveil a new level of molecular heterogeneity in colorectal cancer. J. Pathol. 231, 63\u0026ndash;76 (2013).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMarisa, L. \u003cem\u003eet al.\u003c/em\u003e Gene Expression Classification of Colon Cancer into Molecular Subtypes: Characterization, Validation, and Prognostic Value. PLOS Med. 10, e1001453 (2013).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eIsella, C. \u003cem\u003eet al.\u003c/em\u003e Selective analysis of cancer-cell intrinsic transcriptional traits defines novel clinically relevant subtypes of colorectal cancer. Nat. Commun. 8, 15107 (2017).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLiberzon, A. \u003cem\u003eet al.\u003c/em\u003e The Molecular Signatures Database Hallmark Gene Set Collection. Cell Syst. 1, 417\u0026ndash;425 (2015).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGuinney, J. \u003cem\u003eet al.\u003c/em\u003e The consensus molecular subtypes of colorectal cancer. Nat. Med. 21, 1350\u0026ndash;1356 (2015).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJoanito, I. \u003cem\u003eet al.\u003c/em\u003e Single-cell and bulk transcriptome sequencing identifies two epithelial tumor cell states and refines the consensus molecular classification of colorectal cancer. Nat. Genet. 54, 963\u0026ndash;975 (2022).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCrick, F. Central Dogma of Molecular Biology. Nature 227, 561\u0026ndash;563 (1970).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eVogelstein, B. \u003cem\u003eet al.\u003c/em\u003e Genetic Alterations during Colorectal-Tumor Development. N. Engl. J. Med. 319, 525\u0026ndash;532 (1988).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFearon, E. R. \u0026amp; Vogelstein, B. A genetic model for colorectal tumorigenesis. Cell 61, 759\u0026ndash;767 (1990).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBarras, D. \u003cem\u003eet al.\u003c/em\u003e BRAF V600E Mutant Colorectal Cancer Subtypes Based on Gene Expression. Clin. Cancer Res. 23, 104\u0026ndash;115 (2017).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMiddleton, G. \u003cem\u003eet al.\u003c/em\u003e BRAF-Mutant Transcriptional Subtypes Predict Outcome of Combined BRAF, MEK, and EGFR Blockade with Dabrafenib, Trametinib, and Panitumumab in Patients with Colorectal Cancer. Clin. Cancer Res. 26, 2466\u0026ndash;2476 (2020).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePapatheodorou, I., Oellrich, A. \u0026amp; Smedley, D. Linking gene expression to phenotypes via pathway information. J. Biomed. Semant. 6, 17 (2015).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGarcia-Alonso, L., Holland, C. H., Ibrahim, M. M., Turei, D. \u0026amp; Saez-Rodriguez, J. Benchmark and integration of resources for the estimation of human transcription factor activities. Genome Res. 29, 1363\u0026ndash;1375 (2019).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGarcia-Alonso, L. \u003cem\u003eet al.\u003c/em\u003e Transcription Factor Activities Enhance Markers of Drug Sensitivity in Cancer. Cancer Res. 78, 769\u0026ndash;780 (2018).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eVan Cutsem, E. \u003cem\u003eet al.\u003c/em\u003e Randomized Phase III Trial Comparing Biweekly Infusional Fluorouracil/Leucovorin Alone or With Irinotecan in the Adjuvant Treatment of Stage III Colon Cancer: PETACC-3. J. Clin. Oncol. 27, 3117\u0026ndash;3125 (2009).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRamaker, R. C. \u003cem\u003eet al.\u003c/em\u003e RNA sequencing-based cell proliferation analysis across 19 cancers identifies a subset of proliferation-informative cancers with a common survival signature. Oncotarget 8, 38668\u0026ndash;38681 (2017).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLafarge, M. W. \u0026amp; Koelzer, V. H. Towards computationally efficient prediction of molecular signatures from routine histology images. Lancet Digit. Health 3, e752\u0026ndash;e753 (2021).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSirinukunwattana, K. \u003cem\u003eet al.\u003c/em\u003e Image-based consensus molecular subtype (imCMS) classification of colorectal cancer using deep learning. Gut 70, 544\u0026ndash;554 (2021).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDe Sousa E Melo, F. \u003cem\u003eet al.\u003c/em\u003e Poor-prognosis colon cancer is defined by a molecularly distinct subtype and develops from serrated precursor lesions. Nat. Med. 19, 614\u0026ndash;618 (2013).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGil Vazquez, E. \u003cem\u003eet al.\u003c/em\u003e Dynamic and adaptive cancer stem cell population admixture in colorectal neoplasia. Cell Stem Cell 29, 1213\u0026ndash;1228.e8 (2022).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKim, J. \u003cem\u003eet al.\u003c/em\u003e A Myc Network Accounts for Similarities between Embryonic Stem and Cancer Cell Transcription Programs. Cell 143, 313\u0026ndash;324 (2010).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHabowski, A. N. \u003cem\u003eet al.\u003c/em\u003e Transcriptomic and proteomic signatures of stemness and differentiation in the colon crypt. Commun. Biol. 3, 453 (2020).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJain, A. \u0026amp; Tuteja, G. TissueEnrich: Tissue-specific gene enrichment analysis. Bioinformatics 35, 1966\u0026ndash;1967 (2019).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang, H. \u003cem\u003eet al.\u003c/em\u003e Single-Cell Analyses Reveal Mechanisms of Cancer Stem Cell Maintenance and Epithelial-Mesenchymal Transition in Recurrent Bladder Cancer. Clin. Cancer Res. Off. J. Am. Assoc. Cancer Res. 27, 6265\u0026ndash;6278 (2021).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKarras, P. \u003cem\u003eet al.\u003c/em\u003e A cellular hierarchy in melanoma uncouples growth and metastasis. Nature 610, 190\u0026ndash;198 (2022).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHan, M. \u003cem\u003eet al.\u003c/em\u003e FOXA2 drives lineage plasticity and KIT pathway activation in neuroendocrine prostate cancer. Cancer Cell 40, 1306\u0026ndash;1323.e8 (2022).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLoukas, I. \u003cem\u003eet al.\u003c/em\u003e Selective advantage of epigenetically disrupted cancer cells via phenotypic inertia. Cancer Cell (2022) doi:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.ccell.2022.10.002\u003c/span\u003e\u003cspan address=\"10.1016/j.ccell.2022.10.002\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang, L., Lankhorst, L. \u0026amp; Bernards, R. Exploiting senescence for the treatment of cancer. Nat. Rev. Cancer 22, 340\u0026ndash;355 (2022).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTuveson, D. A. Fighting the Sixth Decade of the Cancer War with Better Cancer Models. Cancer Discov. 11, 801\u0026ndash;804 (2021).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJones, S. \u003cem\u003eet al.\u003c/em\u003e Comparative lesion sequencing provides insights into tumor evolution. \u003cem\u003eProc. Natl. Acad. Sci.\u003c/em\u003e 105, 4283\u0026ndash;4288 (2008).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHidalgo, M. \u003cem\u003eet al.\u003c/em\u003e Patient-Derived Xenograft Models: An Emerging Platform for Translational Cancer Research. Cancer Discov. 4, 998\u0026ndash;1013 (2014).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eQin, X. \u003cem\u003eet al.\u003c/em\u003e Cell-type-specific signaling networks in heterocellular organoids. Nat. Methods 17, 335\u0026ndash;342 (2020).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLawler, M., Kaplan, R., Wilson, R. H., Maughan, T., \u0026amp; on behalf of the S-CORT Consortium. Changing the Paradigm\u0026mdash;Multistage Multiarm Randomized Trials and Stratified Cancer Medicine. The Oncologist 20, 849\u0026ndash;851 (2015).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMalla, S. B. \u003cem\u003eet al.\u003c/em\u003e In-depth Clinical and Biological Exploration of DNA Damage Immune Response as a Biomarker for Oxaliplatin Use in Colorectal Cancer. Clin. Cancer Res. 27, 288\u0026ndash;300 (2021).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMuzny, D. M. \u003cem\u003eet al.\u003c/em\u003e Comprehensive molecular characterization of human colon and rectal cancer. Nature 487, 330\u0026ndash;337 (2012).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCollisson, E. A. \u003cem\u003eet al.\u003c/em\u003e Comprehensive molecular profiling of lung adenocarcinoma. Nature 511, 543\u0026ndash;550 (2014).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRaphael, B. J. \u003cem\u003eet al.\u003c/em\u003e Integrated Genomic Characterization of Pancreatic Ductal Adenocarcinoma. Cancer Cell 32, 185\u0026ndash;203.e13 (2017).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eColaprico, A. \u003cem\u003eet al.\u003c/em\u003e TCGAbiolinks: an R/Bioconductor package for integrative analysis of TCGA data. Nucleic Acids Res. 44, e71 (2016).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLangfelder, P. \u0026amp; Horvath, S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9, 559 (2008).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAmirkhah, R. \u003cem\u003eet al.\u003c/em\u003e MmCMS: Mouse models\u0026rsquo; Consensus Molecular Subtypes of colorectal cancer. 2022.06.17.496539 Preprint at \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1101/2022.06.17.496539\u003c/span\u003e\u003cspan address=\"10.1101/2022.06.17.496539\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2022).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKorotkevich, G. \u003cem\u003eet al.\u003c/em\u003e Fast gene set enrichment analysis. 060012 Preprint at \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1101/060012\u003c/span\u003e\u003cspan address=\"10.1101/060012\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2021).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eH\u0026auml;nzelmann, S., Castelo, R. \u0026amp; Guinney, J. GSVA: gene set variation analysis for microarray and RNA-Seq data. BMC Bioinformatics 14, 7 (2013).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRitchie, M. E. \u003cem\u003eet al.\u003c/em\u003e limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFranz\u0026eacute;n, O., Gan, L.-M. \u0026amp; Bj\u0026ouml;rkegren, J. L. M. PanglaoDB: a web server for exploration of mouse and human single-cell RNA sequencing data. \u003cem\u003eDatabase\u003c/em\u003e 2019, baz046 (2019).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDurinck, S., Spellman, P. T., Birney, E. \u0026amp; Huber, W. Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt. Nat. Protoc. 4, 1184\u0026ndash;1191 (2009).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDreyer, S. B. \u003cem\u003eet al.\u003c/em\u003e Targeting DNA Damage Response and Replication Stress in Pancreatic Cancer. Gastroenterology 160, 362\u0026ndash;377.e13 (2021).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAlvarez, M. J. \u003cem\u003eet al.\u003c/em\u003e Functional characterization of somatic mutations in cancer using network-based inference of protein activity. Nat. Genet. 48, 838\u0026ndash;847 (2016).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYoshihara, K. \u003cem\u003eet al.\u003c/em\u003e Inferring tumour purity and stromal and immune cell admixture from expression data. Nat. Commun. 4, 2612 (2013).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBecht, E. \u003cem\u003eet al.\u003c/em\u003e Estimating the population abundance of tissue-infiltrating immune and stromal cell populations using gene expression. Genome Biol. 17, 218 (2016).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAran, D., Hu, Z. \u0026amp; Butte, A. J. xCell: digitally portraying the tissue cellular heterogeneity landscape. Genome Biol. 18, 220 (2017).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eEide, P. W., Bruun, J., Lothe, R. A. \u0026amp; Sveen, A. CMScaller: an R package for consensus molecular subtyping of colorectal cancer pre-clinical models. Sci. Rep. 7, 16618 (2017).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBankhead, P. \u003cem\u003eet al.\u003c/em\u003e QuPath: Open source software for digital pathology image analysis. Sci. Rep. 7, 16878 (2017).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFisher, N. C. \u003cem\u003eet al.\u003c/em\u003e Biological Misinterpretation of Transcriptional Signatures in Tumor Samples can Unknowingly Undermine Mechanistic Understanding and Faithful Alignment with Preclinical Data. Clin. Cancer Res. OF1\u0026ndash;OF14 (2022) doi:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1158/1078-0432.CCR-22-1102\u003c/span\u003e\u003cspan address=\"10.1158/1078-0432.CCR-22-1102\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMayakonda, A., Lin, D.-C., Assenov, Y., Plass, C. \u0026amp; Koeffler, H. P. Maftools: efficient and comprehensive analysis of somatic variants in cancer. Genome Res. 28, 1747\u0026ndash;1756 (2018).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGu, Z., Eils, R. \u0026amp; Schlesner, M. Complex heatmaps reveal patterns and correlations in multidimensional genomic data. Bioinformatics 32, 2847\u0026ndash;2849 (2016).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSatija, R., Farrell, J. A., Gennert, D., Schier, A. F. \u0026amp; Regev, A. Spatial reconstruction of single-cell gene expression data. Nat. Biotechnol. 33, 495\u0026ndash;502 (2015).\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":false,"highlight":"","institution":"Queen's University Belfast","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Molecular Classification, Colorectal Cancer, Biological Phenotype","lastPublishedDoi":"10.21203/rs.3.rs-3891488/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-3891488/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eMolecular stratification, across many tumour types, has used gene-level transcriptional data to identify subtypes associated with distinct genotypes and biological traits, as exemplified by the consensus molecular subtypes (CMS), and more recently the intrinsic CMS (iCMS), in colorectal cancer. In an attempt to develop molecular subtypes that more closely align to cancer-relevant phenotypic traits in \u003cem\u003eKRAS\u003c/em\u003e mutant tumours, here we present an approach that uses gene ontology and biological activation state information, rather than gene-level data, for the initial stages of class discovery. In doing so, we define three unique pathway-derived subtypes (PDS); where PDS1 tumours are highly proliferative and display good prognosis, PDS2 tumours are stroma/immune-rich with intermediate prognosis. The final subtype, PDS3, represent a previously overlooked subset of tumours within CMS2, which display a \u0026lsquo;lethargic\u0026rsquo; biological phenotype with neural-like traits and the worst prognosis. Remarkably, these biological and clinical features remain consistent across tumour samples independent of \u003cem\u003eKRAS\u003c/em\u003e mutational status, supporting the use of PDS for defining cancer-relevant phenotypes regardless of genetics.\u003c/p\u003e","manuscriptTitle":"Pathway level subtyping identifies a slow-cycling and transcriptionally lethargic biological phenotype associated with poor clinical outcomes in colon cancer independent of genetics","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-01-23 19:30:05","doi":"10.21203/rs.3.rs-3891488/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"e72e144b-0bbf-41f1-a5d1-cfe7dd60b539","owner":[],"postedDate":"January 23rd, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[{"id":28323123,"name":"Cancer Biology"}],"tags":[],"updatedAt":"2024-03-06T19:28:21+00:00","versionOfRecord":{"articleIdentity":"rs-3891488","link":"https://doi.org/10.1038/s41588-024-01654-5","journal":{"identity":"nature-genetics","isVorOnly":false,"title":"Nature Genetics"},"publishedOn":"2024-02-13 00:00:00","publishedOnDateReadable":"February 13th, 2024"},"versionCreatedAt":"2024-01-23 19:30:05","video":"","vorDoi":"10.1038/s41588-024-01654-5","vorDoiUrl":"https://doi.org/10.1038/s41588-024-01654-5","workflowStages":[]},"version":"v1","identity":"rs-3891488","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-3891488","identity":"rs-3891488","version":["v1"]},"buildId":"qtupq5eGEP_6zYnWcrvyt","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2024) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00