Single-cell RNA Sequencing Defines Prognostic Subtypes and Identifies AIF1L as a Therapeutic Target in Colorectal Cancer | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Single-cell RNA Sequencing Defines Prognostic Subtypes and Identifies AIF1L as a Therapeutic Target in Colorectal Cancer Li Gao, Wang Dingxue, Chen Guo This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-7271791/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 26 Dec, 2025 Read the published version in BMC Cancer → Version 1 posted 12 You are reading this latest preprint version Abstract Colorectal cancer (CRC) progression and therapy resistance are driven by heterogeneous tumor cell populations and microenvironmental interactions. However, a comprehensive single-cell atlas across patients that captures this heterogeneity, and its clinical implications has been lacking. Such an atlas could reveal rare tumor subpopulations that underpin disease aggressiveness and offer new prognostic biomarkers or therapeutic targets. We integrated two high-quality single-cell RNA sequencing datasets from 29 CRC patients (approximately 70,000 cells) to construct a cellular atlas encompassing immune, stromal and epithelial compartments. Malignant epithelial cells were distinguished via inferCNV-based copy number alteration analysis and reclustered, yielding seven transcriptionally distinct malignant subpopulations. One malignant epithelial cluster, marked by high expression of the long non-coding RNA ELFN1-AS1, exhibited the highest stemness signature. From this stem-like cluster, we derived a four-gene prognostic model (RPL21, GAL, ELFN1-AS1, AIF1L). In the TCGA-COAD cohort, this model stratified patients into high-risk and low-risk groups with significantly different survival outcomes, with the high-risk group experiencing significantly worse survival. High-risk tumors were enriched for metabolic and translational pathways and displayed distinct immune and genomic features. AIF1L was identified as a hub gene within the signature, and its knockdown in CRC cells enhanced migration and invasion, functionally validating its role in tumor progression. Our study provides a high-resolution single-cell atlas of CRC and identifies a previously unrecognized stem-like tumor cell subpopulation with prognostic significance. These findings highlight novel prognostic biomarkers and suggest potential therapeutic targets (such as AIF1L) that could inform patient stratification and the development of targeted therapies for CRC. Colorectal cancer Single-cell RNA Sequencing tumor microenvironment AIF1L Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 1 Introduction Colorectal cancer (CRC) is among the most prevalent and lethal malignancies worldwide, accounting for roughly one in ten cancer diagnoses and being the second leading cause of cancer-related mortality[ 1 , 2 ]. CRC claimed an estimated 881,000 lives globally[ 3 ]. Standard treatment modalities for CRC include surgical resection of localized tumors, systemic chemotherapy (often fluoropyrimidine-based) for advanced disease, and newer immunotherapies for select patients. Surgery remains the cornerstone of cure in early-stage CRC, and adjuvant chemotherapy can improve survival in resected stage III and high-risk stage II cases[ 4 ]. However, outcomes for advanced and metastatic CRC remain poor: the five-year survival rate in metastatic CRC is only on the order of 10–15%[ 5 ]. Many patients eventually experience disease recurrence or develop resistance to chemotherapy, underscoring major unresolved clinical issues. Indeed, tumor relapse and treatment resistance are primary drivers of CRC’s high mortality[ 6 ]. Even immunotherapy, which has revolutionized care in other malignancies, benefits only a small subset of CRC patients, as most CRCs are immunologically cold or refractory [ 7 ]. These challenges – metastasis with poor prognosis, frequent recurrence, and therapy resistance – highlight the need for deeper biological insights to guide new therapeutic strategies. Single-cell RNA sequencing (scRNA-seq) has emerged as a powerful approach to dissect cellular heterogeneity in cancers. Unlike bulk sequencing, which masks the variability between individual cells, scRNA-seq profiles gene expression at single-cell resolution, enabling the identification of distinct cell types and states within the tumor microenvironment (TME[ 8 ]. This technology has been rapidly adopted in cancer research to characterize diverse cell populations and their interactions, providing unprecedented detail on tumor composition and evolution[ 9 ]. In CRC, recent scRNA-seq studies have begun to catalog the complex ecosystem of malignant, immune, and stromal cells in both primary and metastatic tumors. For example, early single-cell analyses revealed discrete subsets of cancer-associated fibroblasts in CRC[ 10 ], while others identified heterogeneous malignant cell populations with unique transcriptional programs. Notably, some studies have linked single-cell features to clinical outcomes, identifying a “stemness” gene signature in tumor cells that correlates with higher relapse rates[ 11 ] and small scale integrations of single-cell data with bulk transcriptomes have shown promise for prognostic stratification [ 12 ]. However, the majority of CRC single-cell studies to date have been limited in scale and scope: typically profiling only tens of tumors or fewer [ 10 ], often focusing on descriptive tumor atlases without integrating longitudinal patient outcomes or therapeutic responses. Moreover, while computational analyses have nominated putative driver genes of CRC progression (for instance, LYZ, LCN2, CEACAM5, and FOXQ1 were recently implicated in premalignant lesion advancement) [ 7 ], these candidates have generally not been functionally validated. In summary, although scRNA-seq has demonstrated the ability to unearth CRC’s intratumoral heterogeneity and suggest new biomarkers, most studies have not leveraged its full potential to identify clinically significant cellular subsets or drivers with confirmed roles in disease aggressiveness. Here we address these gaps by deploying an integrative single-cell analysis across multiple CRC datasets to resolve the fine-scale landscape of epithelial tumor cell heterogeneity. Recognizing that stem-like cancer cells are thought to fuel tumor progression, metastasis and relapse in CRC, we specifically interrogate malignant subpopulations with stem-cell-like characteristics. Through large-scale data integration, we define and characterize a key stem-like tumor cell subset that appears to underlie therapy resistance and poor patient outcomes. We further develop a prognostic model based on single-cell-derived gene signatures, enabling risk stratification of CRC patients and linking tumor cell heterogeneity to survival. Finally, we functionally validate a novel candidate driver gene from the stem-like malignant population, demonstrating its role in CRC growth and therapeutic response. Together, these findings provide a new framework for understanding CRC at single-cell resolution and identify actionable biomarkers. In the following sections, we describe the results of this integrated analysis and the insights it yields into CRC heterogeneity and progression. 2 Methods 2.1 CRC Data Acquisition The scRNA-seq datasets analyzed in this research were retrieved from the GEO repository (Gene Expression Omnibus, NCBI, https://www.ncbi.nlm.nih.gov/geo/ ) under the accession numbers GSE132465 and GSE188711. These datasets include omental samples from six CRC patients. Additionally, bulk RNA-seq data and corresponding clinical annotations were acquired from the TCGA CRC dataset ( https://portal.gdc.cancer.gov/ ), encompassing a total of 521 eligible samples, which were employed to develop a prognostic signature. 2.2 Data Quality Control and Dimensionality Reduction The raw scRNA-seq data were preprocessed using Scanpy (version 1.9.1, Python 3.8)[ 13 ]. Potential doublets were identified and excluded utilizing Scrublet (version 3.0) with default settings [ 14 ]. Cells of inferior quality were further filtered according to the following thresholds: (1) gene counts ranging from 300 to 8000, (2) total UMI counts between 500 and 50,000, and (3) mitochondrial gene percentages lower than 20%. Post filtering, sc.pp.normalize_total was applied for normalization, followed by a log-transformation step using sc.pp.log1p. Highly variable genes were subsequently identified using the highly_variable_genes function with the flavour parameter set as seurat_v3. Batch effects across samples were corrected using Harmony (version 0.0.10) with theta set to 2 [ 15 ]. Clustering was conducted via the Leiden algorithm at a resolution of 0.1, and visualization employed UMAP with parameters set at n_neighbors = 15 and min_dist = 0.5. 2.3 Malignant Epithelial cell identification using InferCNV To differentiate malignant from non-malignant epithelial cells in CRC tissues, we applied InferCNVpy ( https://github.com/broadinstitute/inferCNV ) to assess copy number variations (CNVs) across cell types. Immune cells were designated as the reference baseline, and epithelial cells displaying elevated CNV scores were classified as malignant. Malignant cells identified by this method were re-clustered using the Leiden algorithm. 2.4 Heterogeneity of CRC subpopulation Differentially expressed genes among CRC cell subpopulations were identified via sc.tl.rank_genes_groups. Subsequently, GO Biological Process enrichment analyses were conducted using the GSEApy package. Additionally, gene set enrichment analysis (GSEA) was executed utilizing the KEGG_2021_Human dataset to assess functional distinctions across these subpopulations. 2.5 Subcluster Stemness Analysis and Trajectory Analysis CytoTRACE 2 was employed to determine differentiation states of distinct CRC subclusters, generating CytoTRACE scores to infer differentiation levels [ 16 ]. Regulatory transcription factors (TFs) within each subpopulation were further investigated using pySCENIC[ 17 ]. Initially, GRNBoost predicted potential TF-target gene interactions, followed by DNA motif enrichment to detect direct TF binding targets. AUCell was used to quantify regulator activity per cell, selecting the top five TFs according to activity scores. 2.6 Cell–Cell communication Analysis Intercellular communication among identified subpopulations was evaluated through CellPhoneDB software (version 1.6.1), with an emphasis on receptor–ligand pairs and signaling pathway[ 18 ]. 2.7 Construction and Validation of a Novel Prognostic Risk Model Using univariate Cox regression (P < 0.05), we identified OS-related genes from the top 500 genes of UBE2C + epithelial cells. Important prognostic markers were subsequently refined by employing least absolute shrinkage and selection operator (LASSO) regression, enhancing the prediction accuracy. Individual risk scores were calculated by summing the product of each gene’s expression level and its regression coefficient (Risk score = Σ(Coef. gene × gene expression)). Based on median risk scores, patients were categorized into high- and low-risk groups. Kaplan–Meier survival curves and receiver operating characteristic (ROC) analyses validated the prognostic potential of the derived model. 2.8 Immune Infiltration Analysis and Functional Enrichment Analysis Analysis The immune cell infiltration landscape between high- and low-risk cohorts was characterized using the CIBERSORT algorithm, aiming to associate immune cell profiles with prognostic biomarkers [ 19 ]. TIDE (Tumor Immune Dysfunction and Exclusion) scores were computed for both cohorts, and immune checkpoint gene expression differences were statistically assessed using Wilcoxon tests. Differential expression analysis between risk groups was conducted using DESeq2. Functional enrichment analyses, including KEGG pathway and GO Biological Process, were performed using the clusterProfiler R package (version 4.6.2)[ 20 ]. 2.9 Gene Mutation Analysis Somatic mutation profiles of CRC samples were obtained from the TCGA database. Tumor mutational burden (TMB) scores were computed with the maftools package [ 15 ]. The correlation between TMB and risk scores was evaluated by Spearman’s rank correlation. Kaplan–Meier survival curves were constructed after dividing samples into high- and low-TMB groups based on median TMB values to investigate prognostic implication [ 21 ]. 2.10 Drug Sensitivity Assessment The IC50 values for chemotherapeutic drugs across groups were estimated using the pRRophetic package (version 0.5)[ 22 ]. 2.11 Cell Lines and Culture RKO cells sourced from the Chinese Academy of Sciences Type Culture Collection were cultured in RPMI 1640 medium (Gibco, USA) supplemented with 10% fetal bovine serum and 1% penicillin/streptomycin at 37°C, 5% CO 2 , and 95% humidity. 2.12 Construction and Transfection of shRNA For stable knockdown of AIF1L, shRNA constructs targeting sequences siRNA1 (5′-CGAGCTGATAAACAACATGTT-3′) and siRNA2 (5′-GCTCCTAGAGAACACCCATTT-3′) were synthesized and cloned into pLKO.1-TRC vector, confirmed via Sanger sequencing. Lentiviruses were produced by co-transfecting these constructs with packaging vectors psPAX2 and pMD2.G into 293T cells using Lipofectamine 3000 (Invitrogen, USA). Lentiviral supernatants were harvested 72 hours post-transfection, filtered (0.45 µm), and used to infect logarithmically growing RKO cells in the presence of Polybrene (4 µg/mL). Puromycin (8 µg/mL) selection was initiated after 48 hours, lasting approximately one week. Knockdown efficacy was validated by RT-qPCR using specific primers for AIF1L (forward: 5′-CCGAGACTTTGTGAACATGAT-3′, reverse: 5′-CCACCTGGAGATGAAGAAGAT-3′). 2.13 Wound Healing Assay Stable cells were cultured until ~ 95% confluence in 6-well plates. Scratches were created using sterile pipette tips (200 µL), rinsed with PBS, and replenished with fresh medium. Images were acquired at 0 and 48 hours to assess wound closure. 2.14 Transwell Assay Transwell chambers (8 µm pores, Corning, USA), pre-coated with Matrigel (BD Biosciences, 356234) for invasion assays, were used to evaluate migration/invasion abilities. Cells (2.0×105/mL) in serum-free medium were seeded into upper chambers, and lower chambers contained medium with 10% FBS. After 48 hours, cells traversing membranes were fixed, stained with crystal violet, and counted microscopically. 2.15 Statistical Analysis R (version 4.4.1) was used for statistical analysis. Wilcoxon tests and Pearson correlation analyses were performed with significance set as follows: * P < 0.05, **P < 0.01, *** P < 0.001, **** P < 0.0001, "ns" indicated no significance. 3 Results 3.1 Single-cell landscape of CRC To construct a high-resolution cellular atlas of CRC, we integrated two publicly available scRNA-seq datasets comprising 39 samples (Supplementary Table S1). The overall study workflow is summarized in Fig. 1. After stringent quality control (gene count > 300; mitochondrial transcript fraction < 15%) and removal of doublets, 23,7875 high-quality single cells remained. Unsupervised clustering with the Leiden algorithm (resolution = 0.2) and UMAP visualization identified 11 distinct clusters (Fig. 2A). Based on canonical marker genes and literature annotations, these clusters were assigned to immune (T cells (cluster 0), CD8⁺ T cells (cluster 1), macrophages (cluster 2), B cells (cluster 3), plasma cells (cluster 6), plasmacytoid dendritic cells (cluster 9)), stromal (fibroblasts (cluster 5), endothelial cells (cluster 7), mast cells (cluster 8), Schwann cells (cluster 10) and epithelial cells (cluster 4) compartments (Fig. 2B), with representative markers shown in Fig. 2E. Comparing adjacent normal and tumor tissues revealed significant shifts in cellular composition (Fig. 2C, D), normal mucosa contained higher proportions of CD8⁺ T cells and fibroblasts, whereas tumor samples were enriched for total T cells and macrophages. These changes suggest the presence of functionally specialized T cell and macrophage subsets in CRC lesions that likely influence immune surveillance and tumor progression. 3.2 Malignant cell identification via single-cell CNA analysis To distinguish malignant epithelial cells from their non-malignant counterparts, we applied infercnvpy (v0.4.0) to infer CNVs from our scRNA-seq data, using immune and stromal populations as diploid references. The resulting genome-wide CNV heatmap across all clusters (immune, stromal and epithelial) is shown in Fig. 3A (red, gain; blue, loss). We then computed a per-cell CNV score and projected these values onto the UMAP embedding (Fig. 3B), which revealed a bimodal distribution with a median score of ~0.02. Cells exceeding this threshold were provisionally labeled “malignant,” while those below were deemed “non-malignant” (Fig. 3C, left). Focusing on the epithelial compartment, we applied the same cutoff (Fig. 3C, right) and thereby identified 2 100 malignant epithelial cells for downstream analysis. To validate the biological relevance of our CNV-based classification, we performed GO enrichment on genes upregulated in the malignant epithelial subpopulations. This analysis highlighted processes related to apoptosis regulation, cell adhesion and intercellular signaling (Fig. 3D), although terms such as “cardiac muscle cell adhesion” likely reflect pathway database generalization rather than CRC-specific biology. We further conducted GSEA comparing malignant versus non-malignant epithelial cells and observed significant enrichment of epithelial-to-mesenchymal transition (EMT), TNF-α signaling, coagulation, apoptosis, cholesterol homeostasis and complement pathways (Fig. 3E). Collectively, these findings confirm that CNV-inferred malignant epithelial cells exhibit active pro-tumorigenic programs and underscore the utility of single-cell CNV profiling for resolving cancer cell heterogeneity within the CRC microenvironment. 3.3 Heterogeneity of malignant epithelial subpopulations To dissect functional diversity within the malignant epithelial compartment, we reclustered the infercnvpy-defined malignant epithelial cells using the Leiden algorithm, yielding 7 discrete subpopulations. Based on their top marker genes, we designated these clusters as CEACAM5⁺ mEpi, ELFN1-AS1⁺ mEpi, HMGB1⁺ mEpi, IFITM3⁺ mEpi, HSPA1B⁺ mEpi, PHGR1⁺ mEpi and SPINK4⁺ mEpi (Fig. 4A). In UMAP space, each subset forms a distinct locus, implying differences in differentiation status or functional programs. We next applied CytoTRACE2 to infer each subpopulation’s developmental potential, where higher scores denote a more progenitor-like (undifferentiated) state. Overlaying CytoTRACE scores on the UMAP (Fig. 4B–C) revealed that the ELFN1-AS1⁺ mEpi subset exhibits the highest stemness, suggesting it may serve as a tumor-initiating or metastasis-competent pool (Fig. 4D). Given that ELFN1-AS1 is a long noncoding RNA implicated in cell plasticity, this cluster represents a compelling focus for future mechanistic studies. Finally, to elucidate regulatory networks underpinning each subpopulation, we performed single-cell regulatory network inference with pySCENIC. As shown in Fig. 3E, each mEpi subset is characterized by a unique TF signature: CEACAM5⁺ mEpi is enriched for POU1F1 and DLX5, suggesting proliferative and transcriptional remodeling capacity; HMGB1⁺ mEpi for the cell-cycle regulator MYBL2; IFITM3⁺ mEpi for ZNF family factors; HSPA1B⁺ mEpi for POU3F1; and so on. Notably, PLAG1 emerges as a top TF in the ELFN1-AS1⁺ mEpi population, hinting at cooperation with ELFN1-AS1 in maintaining stem-like features. Together, these analyses reveal a multilayered landscape of malignant epithelial heterogeneity in CRC and nominate ELFN1-AS1⁺ mEpi as a priority target for functional validation. 3.4 Prognostic model based on the ELFN1-AS1⁺ mEpi signature To evaluate the clinical relevance of the ELFN1-AS1⁺ mEpi subpopulation, we ranked its top 500 most highly expressed genes and tested their association with overall survival in the TCGA-COAD cohort. First, univariate Cox regression narrowed the candidate list, and subsequent LASSO Cox analysis—with penalty parameter chosen at λ_min—identified four genes (RPL21, GAL, ELFN1-AS1 and AIF1L) whose expression levels were independently prognostic (LASSO coefficients: 0.00209, 0.03439, 0.03530 and 0.14608, respectively; Fig. 5A). We then constructed a risk score as the sum of each gene’s expression multiplied by its coefficient. TCGA-COAD patients were dichotomized at the median risk score into low- and high-risk groups. Kaplan–Meier analysis demonstrated that patients in the low-risk group experienced significantly longer overall survival than those in the high-risk group (P < 0.001; Fig. 5B). Time-dependent ROC curves further confirmed robust predictive performance, with AUCs of 0.73, 0.70 and 0.76 at 1, 3 and 5 years, respectively (Fig. 5C). In summary, leveraging ELFN1-AS1⁺ mEpi–derived candidate genes and clinical outcomes from TCGA-COAD, we established a concise four-gene prognostic model that stratifies CRC patients with high fidelity. These findings highlight RPL21, GAL, ELFN1-AS1 and AIF1L as potential biomarkers and motivate future mechanistic studies on their roles in tumor progression and the immune TME. 5. Molecular Functions and Immune Landscape Distinctions Between High- and Low-Risk Groups To elucidate transcriptomic distinctions between high- and low-risk CRC patients, we conducted GSEA on GO Biological Process and KEGG pathway gene sets (Fig. 5D–E). In the high-risk cohort, GO enrichment highlighted “mitochondrial respiratory chain complex assembly,” “NADH dehydrogenase complex assembly” and “ribosomal large subunit biogenesis,” reflecting upregulated bioenergetic processes and ribosome biogenesis. KEGG analysis similarly revealed significant activation of “oxidative phosphorylation” and “alkaloid biosynthesis,” underscoring enhanced mitochondrial function and metabolic remodeling. Complementarily, GSVA confirmed these trends at the sample level (Fig. 5F), with high-risk tumors showing elevated enrichment scores for “DNA polymerization,” “purine metabolism,” “ribosome” and even neurodegeneration-related sets such as “Huntington’s disease,” indicative of broad increases in DNA synthesis and protein translation machinery. Conversely, low-risk tumors were preferentially enriched for pathways including “type II diabetes mellitus” and “phosphatidylinositol signaling system,” suggesting distinct metabolic wiring and signaling states. Together, these results imply that high-risk CRCs are characterized by hyperactive energy metabolism and translational capacity, which may contribute to their more aggressive clinical behavior. We applied CIBERSORT to compare immune cell abundances between high- and low-risk groups (Fig. 6A). High-risk tumors exhibited significantly higher proportions of CD4⁺ memory-activated T cells (P < 0.05) and M0 macrophages, whereas low-risk samples were enriched for CD4⁺ memory-resting T cells. This suggests that high-risk CRC harbors a more inflamed yet potentially dysregulated microenvironment, which may contribute to immune exhaustion and poorer outcomes. Next, we contrasted the mutational landscapes of each risk group. In high-risk patients (Fig. 6B), canonical drivers such as APC, TP53 and TTN showed elevated mutation frequencies, while low-risk tumors more frequently carried KRAS and PIK3CA alterations (Fig. 6C). These distinct mutational trajectories imply divergent tumor evolution and may guide risk-adapted therapeutic decisions. Finally, using oncoPredict, we estimated each group’s response to a panel of anticancer agents. Compounds such as Z-764467149_1000 and Afatinib_1022 had lower predicted IC₅₀ values in the high-risk cohort, indicating higher sensitivity, whereas BMS-754807_2171 was preferentially effective in low-risk tumors (Fig. 6D). Integrating risk stratification with pharmacogenomic profiling highlights candidate drugs for personalized treatment. Together, these multi-layered analyses demonstrate that high-risk CRC is marked by hyperactive energy metabolism, a pro-inflammatory yet dysfunctional immune milieu, greater genomic instability, and unique drug vulnerabilities—insights that may inform precision prognostication and tailored therapeutic strategies. 6 Hub gene identification by machine learning To pinpoint the most influential prognostic markers, we applied a random forest (RF) classifier to the four-gene signature. Feature importance scores (mean decrease in Gini impurity) revealed that GAL and AIF1L both achieved approximately 0.10 importance (Fig. 7A). We then compared GAL and AIF1L expression between tumor and adjacent normal tissues using TCGA-COAD and GTEx data. GAL was significantly upregulated in tumors, and patients with lower GAL expression exhibited better overall survival (Fig. 7B). In contrast, AIF1L showed higher baseline expression in normal colon but carried the largest LASSO coefficient in our prognostic model. This apparent discrepancy—high AIF1L in normal tissue yet strong adverse prognostic weight—suggests a complex role for AIF1L in CRC biology and warrants further functional investigation. 7 . In vitro validation of AIF1L function in CRC cells To determine AIF1L’s biological impact, we generated stable AIF1L knockdown in RKO cells using two independent shRNAs. In wound-healing assays, AIF1L-depleted cells closed scratch gaps significantly faster than control cells, indicating enhanced migratory capacity (Fig. 8A). Similarly, Transwell invasion assays demonstrated a marked increase in invasive cell numbers upon AIF1L silencing (Fig. 8B). These results reveal that loss of AIF1L promotes CRC cell motility and invasion, supporting its functional relevance in tumor progression and highlighting it as a candidate for mechanistic follow-up. Discussion Our single-cell transcriptomic analysis provides an integrative view of CRC heterogeneity, linking cellular subpopulations to clinical outcomes and therapeutic vulnerabilities. We assembled a compendium of ~ 70,000 cells from 29 CRC tumors, enabling high-resolution delineation of the TME and malignant compartment. Using CNVs we distinguished malignant epithelial cells from non-malignant cells, then uncovered seven transcriptionally distinct malignant subpopulations. Notably, one subpopulation marked by the long noncoding RNA ELFN1-AS1 exhibited the highest stemness scores, suggesting a progenitor-like, tumor-initiating cell pool. From this subset, we derived a parsimonious four-gene prognostic signature (RPL21, GAL, ELFN1-AS1, AIF1L) that stratified patients into high- and low-risk groups with significantly different survival. Furthermore, we identified AIF1L as a potential hub gene and validated its functional role in vitro, demonstrating that AIF1L knockdown enhances CRC cell migration and invasion. Collectively, these findings underscore the power of single cell approaches to reveal intra-tumoral diversity and pinpoint novel prognostic biomarkers and therapeutic targets. Our results corroborate and extend emerging insights from recent single-cell studies of CRC. Previous scRNA-seq analyses have highlighted the complex cellular makeup of CRC tumors – including diverse immune infiltrates, stromal elements, and malignant cell states – that traditional bulk profiling cannot resolve[ 23 ]. For example, Xiao et al. combined single-cell and spatial transcriptomics in CRC and similarly identified seven malignant epithelial subtypes with distinct gene programs (e.g. characterized by markers like CAV1, FOS/JUN, ZEB2, etc.) [ 23 ]. Our identification of seven malignant subpopulations aligns with these reports, reinforcing that multiple co-existing tumor cell lineages drive CRC progression. Importantly, we provide additional context by integrating single-cell data with patient survival outcomes. While prior studies have defined transcriptional subtypes or CSC populations in CRC[ 11 ], our work directly links a stem-like malignant subcluster (ELFN1-AS1 high) to a prognostic gene signature. This approach builds upon earlier prognostic models based on bulk gene expression[ 24 ], but adds a unique dimension by rooting the signature in a specific cell population of clinical interest. Additionally, the prominence of ELFN1-AS1 in our stem-like cluster is supported by accumulating evidence that this lncRNA is an oncogenic driver in gastrointestinal cancers. ELFN1-AS1 is reported to promote CRC cell proliferation, migration and even chemoresistance[ 25 , 26 ], consistent with our proposal that the ELFN1-AS1 + subpopulation represents an aggressive cell state. By integrating these findings with ours, a picture emerges in which intratumoral heterogeneity – particularly the presence of stem-like, lncRNA-driven malignant cells underlies disease progression and therapy resistance in CRC. The discovery of a four-gene risk signature rooted in a single-cell-defined cluster has practical implications for patient stratification. Our prognostic model (incorporating RPL21, GAL, ELFN1-AS1, and AIF1L) demonstrated robust performance (5-year AUC = 0.76) in distinguishing high vs low-risk patients. Notably, these genes would not intuitively be grouped together without data-driven identification; their combined predictive value highlights the advantage of mining single-cell data for novel gene combinations. In clinical terms, this 4-gene panel could potentially supplement existing prognostic systems and guide treatment intensity if validated in prospective cohorts. Biologically, analysis of the high-risk group shed light on features of aggressive tumors. We observed enrichment of pathways related to oxidative phosphorylation, ribosomal biogenesis, and DNA replication in high-risk tumors, indicating a hyperactive metabolic and protein synthesis program. Such metabolic reprogramming is a known hallmark of cancer progression [ 24 ], and its prominence in the poor-outcome group suggests these tumors have elevated bioenergetic and biosynthetic demands that may confer faster growth or therapy resistance. In contrast, low-risk tumors showed upregulation of pathways like insulin signaling and adipogenesis, hinting at fundamentally different metabolic states. These differences echo the concept that CRC can follow divergent evolutionary trajectories – for instance, one subtype may become more proliferative and metabolically “hungry,” while another remains more quiescent. We also found that high-risk tumors harbored a more “inflamed” immune microenvironment, with higher fractions of activated memory CD4 + T cells and undifferentiated (M0) macrophages, whereas low-risk tumors had more resting memory T cells. Paradoxically, an inflamed TME in high-risk tumors might reflect an ineffective anti-tumor immune response – potentially due to immune exhaustion or suppression – which could contribute to worse outcomes. This notion aligns with recent work showing that aggressive CRCs can co-opt immune escape mechanisms despite substantial immune cell infiltration[ 23 ]. Moreover, the distinct mutational landscapes we noted (e.g. higher APC and TP53 mutation rates in high-risk tumors versus more frequent KRAS/PIK3CA mutations in low-risk tumors) suggest our risk groups may correspond to different molecular subtypes of CRC. High-risk patients might belong to a more genomically unstable, mesenchymal-biased subtype (often associated with poor prognosis), whereas low-risk patients might represent tumors with a different oncogenic profile. These insights could inform tailored therapies – for instance, high-risk tumors with p53 loss and heightened oxidative phosphorylation might benefit from metabolic inhibitors or drugs targeting the p53 pathway, whereas low-risk, KRAS-mutant tumors might be susceptible to EGFR or MEK/PI3K pathway inhibitors. Our data-driven approach also nominated AIF1L (Allograft Inflammatory Factor 1-Like) as a potential tumor suppressor in CRC. Interestingly, AIF1L had the largest positive coefficient in our Cox model (denoting higher expression associated with higher risk), yet functionally it appears to restrain cancer cell motility. We found that knocking down AIF1L in CRC cells substantially increased their migratory and invasive capabilities. This finding is biologically coherent with studies in breast cancer, where AIF1L is downregulated in tumors and low AIF1L levels correlate with worse survival[ 27 ]. Mechanistically, AIF1L localizes to actin cytoskeletal structures and its overexpression can suppress cell spreading and protrusive activity by down-regulating focal adhesion kinase (FAK) and RhoA signaling[ 27 ]. Thus, loss of AIF1L removes a brake on cytoskeletal dynamics, promoting a more migratory, invasive phenotype – exactly what we observed in CRC cells. The fact that AIF1L is more highly expressed in normal colon epithelium than in tumors (as also noted in breast tissue) implies it may act as a differentiation or structural maintenance factor that is silenced during tumor progression. The seemingly contradictory association of AIF1L with poor prognosis in our model might be explained by context-dependent expression or by compensation in aggressive tumors – for instance, surviving high-grade tumor cells might upregulate AIF1L to modulate an overly motile phenotype, or AIF1L could be co-expressed with other risk genes in certain cell states. In any case, AIF1L emerges as a noteworthy candidate for further investigation: its exact role in CRC progression (tumor-suppressive vs. contextually pro-tumor) and its draggability (perhaps via pathways like FAK/RhoA) merit deeper exploration. In summary, our study demonstrates a multi-scale analytical framework, from single-cell dissection of tumor ecosystems to population-level prognostic modeling and experimental validation that can yield both fundamental and translational insights. We reveal that CRC tumors are mosaics of phenotypically diverse cells, among which a stem-like, ELFN1-AS1 high subset appears to orchestrate malignancy and relapse. By capturing the signals of this subset in a simple gene signature, we can stratify patients by risk and identify molecular vulnerabilities (such as AIF1L-mediated pathways). These findings open avenues for more precise prognostication, for instance, monitoring the abundance or activity of the ELFN1-AS1 + subpopulation (through its gene signature or biomarkers) could help identify patients at higher risk of recurrence who might benefit from adjuvant therapy intensification. Therapeutically, if further studies confirm AIF1L’s role in restraining invasion, strategies to boost AIF1L activity or mimic its effects (perhaps via FAK/RhoA inhibition, given its downstream targets) could be explored to impede metastasis in high-risk CRC. Conversely, targeting the vulnerabilities of the high-risk metabolic phenotypes another rational direction. As single-cell technologies continue to advance, we anticipate that integrating such data with clinical and experimental studies will increasingly enable “precision oncology” approaches identifying not just what mutations a tumor has, but what cell states it contains, and tailoring interventions accordingly. Our work contributes to this vision by mapping CRC heterogeneity and translating it into candidates for prognostic and therapeutic development. Abbreviations CRC: Colorectal cancer CNVs : copy number variations GSEA : gene set enrichment analysis TMB : Tumor mutational burden TME : tumor microenvironment Declarations Consent for publication Not applicable. Availability of data and materials The data used and analyzed in the current study are available from the corresponding author upon reasonable request. Funding This study was supported by the Traditional Chinese Medicine and Ethnic Medicine Science & Technology Research Project Guizhou Provincial Administration of Traditional Chinese Medicine, QZYY-2024-003 Authors' contributions L.G. and W.D. conceived and designed the study. L.G. performed single-cell RNA-seq data processing, clustering, inferCNV analysis, subpopulation annotation, and prognostic model construction. W.D. conducted in vitro experiments, including shRNA knockdown, wound healing, and Transwell assays. C.G. supervised the project, secured funding, and provided critical input on study design and data interpretation. L.G. drafted the manuscript; W.D. and C.G. critically revised the manuscript for important intellectual content. All authors reviewed and approved the final version of the manuscript. Acknowledgements Not applicable. Ethics approval and consent to participate Not applicable. Consent for publication Not applicable. Competing interests The authors declare no competing interests. References Bray F, Laversanne M, Sung H, Ferlay J, Siegel RL, Soerjomataram I, Jemal A: Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries . Ca-a Cancer Journal for Clinicians 2024, 74 (3):229-263. Siegel RL, Kratzer TB, Giaquinto AN, Sung H, Jemal A: Cancer statistics, 2025 . Ca-a Cancer Journal for Clinicians 2025, 75 (1):10-45. Rawla P, Sunkara T, Barsouk A: Epidemiology of colorectal cancer: incidence, mortality, survival, and risk factors . Przeglad gastroenterologiczny 2019, 14 (2):89-103. Schmoll H-J, Tabernero J, Maroun J, de Braud F, Price T, Van Cutsem E, Hill M, Hoersch S, Rittweger K, Haller DG: Capecitabine Plus Oxaliplatin Compared With Fluorouracil/Folinic Acid As Adjuvant Therapy for Stage III Colon Cancer: Final Results of the NO16968 Randomized Controlled Phase III Trial . Journal of Clinical Oncology 2015, 33 (32):3733-+. Shin AE, Giancotti FG, Rustgi AK: Metastatic colorectal cancer: mechanisms and emerging therapeutics . Trends in Pharmacological Sciences 2023, 44 (4):222-236. Chen L, Yang F, Chen S, Tai J: Mechanisms on chemotherapy resistance of colorectal cancer stem cells and research progress of reverse transformation: A mini-review . Frontiers in Medicine 2022, 9 . Wang J, Zhang Y, Chen X, Sheng Q, Yang J, Zhu Y, Wang Y, Yan F, Fang J: Single-Cell Transcriptomics Reveals Cellular Heterogeneity and Drivers in Serrated Pathway-Driven Colorectal Cancer Progression . International Journal of Molecular Sciences 2024, 25 (20). Wen R, Zhou L, Peng Z, Fan H, Zhang T, Jia H, Gao X, Hao L, Lou Z, Cao F et al : Single-cell sequencing technology in colorectal cancer: a new technology to disclose the tumor heterogeneity and target precise treatment . Frontiers in Immunology 2023, 14 . Kothalawala WJ, Bartak BK, Nagy ZB, Zsigrai S, Szigeti KA, Valcz G, Takacs I, Kalmar A, Molnar B: A Detailed Overview About the Single-Cell Analyses of Solid Tumors Focusing on Colorectal Cancer . Pathology & Oncology Research 2022, 28 . Joanito I, Wirapati P, Zhao N, Nawaz Z, Yeo G, Lee F, Eng CLP, Macalinao DC, Kahraman M, Srinivasan H et al : Single-cell and bulk transcriptome sequencing identifies two epithelial tumor cell states and refines the consensus molecular classification of colorectal cancer . Nature Genetics 2022, 54 (7):963-+. Lin K, Chowdhury S, Zeineddine MA, Zeineddine FA, Hornstein NJ, Villarreal OE, Maru DM, Haymaker CL, Vauthey J-N, Chang GJ et al : Identification of Colorectal Cancer Cell Stemness from Single-Cell RNA Sequencing . Molecular Cancer Research 2024, 22 (4):337-346. Cai L, Guo X, Zhang Y, Xie H, Liu Y, Zhou J, Feng H, Zheng J, Li Y: Integrated analysis of single-cell and bulk RNA-sequencing to predict prognosis and therapeutic response for colorectal cancer . Scientific Reports 2025, 15 (1). Wolf FA, Angerer P, Theis FJ: SCANPY: large-scale single-cell gene expression data analysis . Genome Biology 2018, 19 . González-Silva L, Quevedo L, Varela I: Tumor Functional Heterogeneity Unraveled by scRNA-seq Technologies . Trends in Cancer 2020, 6 (1):13-19. Korsunsky I, Millard N, Fan J, Slowikowski K, Zhang F, Wei K, Baglaenko Y, Brenner M, Loh PR, Raychaudhuri S: Fast, sensitive and accurate integration of single-cell data with Harmony . Nature Methods 2019, 16 (12):1289-+. Minji Kang JJAA, Gunsagar S. Gulati, Rachel Gleyzer, Susanna Avagyan, Erin L. Brown, Wubing Zhang, Abul Usmani, Noah Earland, Zhenqin Wu, James Zou, Ryan C. Fields, David Y. Chen, Aadel A. Chaudhuri, Aaron M. Newman: Mapping single-cell developmental potential in health and disease with interpretable deep learning . biorxiv 2019. Aibar S, González-Blas CB, Moerman T, Van AHT, Imrichova H, Hulselmans G, Rambow F, Marine JC, Geurts P, Aerts J et al : SCENIC: single-cell regulatory network inference and clustering . Nature Methods 2017, 14 (11):1083-+. Efremova M, Vento-Tormo M, Teichmann SA, Vento-Tormo R: CellPhoneDB: inferring cell-cell communication from combined expression of multi-subunit ligand-receptor complexes . Nature Protocols 2020, 15 (4):1484-1506. Newman AM, Liu CL, Green MR, Gentles AJ, Feng WG, Xu Y, Hoang CD, Diehn M, Alizadeh AA: Robust enumeration of cell subsets from tissue expression profiles . Nature Methods 2015, 12 (5):453-+. Wu TZ, Hu EQ, Xu SB, Chen MJ, Guo PF, Dai ZH, Feng TZ, Zhou L, Tang WL, Zhan L et al : clusterProfiler 4.0: A universal enrichment tool for interpreting omics data . Innovation 2021, 2 (3). Mayakonda A, Lin DC, Assenov Y, Plass C, Koeffler HP: Maftools: efficient and comprehensive analysis of somatic variants in cancer . Genome Research 2018, 28 (11):1747-1756. Geeleher P, Cox N, Huang RS: pRRophetic: An R Package for Prediction of Clinical Chemotherapeutic Response from Tumor Gene Expression Levels . Plos One 2014, 9 (9). Xiao J, Yu X, Meng F, Zhang Y, Zhou W, Ren Y, Li J, Sun Y, Sun H, Chen G et al : Integrating spatial and single-cell transcriptomics reveals tumor heterogeneity and intercellular networks in colorectal cancer . Cell Death & Disease 2024, 15 (5). Guan J, Min S, Xia Y, Guo Z, Zhou X: Identifying colorectal cancer subtypes and establishing a prognostic model using metabolic plasticity and ferroptosis genes . Scientific Reports 2024, 14 (1). Li Y, Gan Y, Liu J, Li J, Zhou Z, Tian R, Sun R, Liu J, Xiao Q, Li Y et al : Downregulation of MEIS1 mediated by ELFN1-AS1/EZH2/DNMT3a axis promotes tumorigenesis and oxaliplatin resistance in colorectal cancer . Signal Transduction and Targeted Therapy 2022, 7 (1). Li C, Hong S, Hu H, Liu T, Yan G, Sun D: MYC-Induced Upregulation of Lncrna ELFN1-AS1 Contributes to Tumor Growth in Colorectal Cancer via Epigenetically Silencing TPM1 . Molecular Cancer Research 2022, 20 (11):1697-1708. Liu P, Li W, Hu Y, Jiang Y: Absence of AIF1L contributes to cell migration and a poor prognosis of breast cancer . Oncotargets and Therapy 2018, 11 :5485-5498. Additional Declarations No competing interests reported. Supplementary Files SupplementaryTableS1.xlsx Cite Share Download PDF Status: Published Journal Publication published 26 Dec, 2025 Read the published version in BMC Cancer → Version 1 posted Editorial decision: Revision requested 24 Sep, 2025 Reviews received at journal 23 Sep, 2025 Reviews received at journal 16 Sep, 2025 Reviewers agreed at journal 13 Sep, 2025 Reviewers agreed at journal 12 Sep, 2025 Reviews received at journal 09 Sep, 2025 Reviewers agreed at journal 18 Aug, 2025 Reviewers invited by journal 08 Aug, 2025 Editor assigned by journal 08 Aug, 2025 Editor invited by journal 06 Aug, 2025 Submission checks completed at journal 06 Aug, 2025 First submitted to journal 06 Aug, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-7271791","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":498231106,"identity":"bd9dd8dd-9085-46d5-87e0-6ff67d339f4d","order_by":0,"name":"Li Gao","email":"","orcid":"","institution":"First Affiliated Hospital of Guizhou University of Traditional Chinese Medicine","correspondingAuthor":false,"prefix":"","firstName":"Li","middleName":"","lastName":"Gao","suffix":""},{"id":498231107,"identity":"d163b433-4d0b-494e-83af-ec3728924f62","order_by":1,"name":"Wang Dingxue","email":"","orcid":"","institution":"First Affiliated Hospital of Guizhou University of Traditional Chinese Medicine","correspondingAuthor":false,"prefix":"","firstName":"Wang","middleName":"","lastName":"Dingxue","suffix":""},{"id":498231108,"identity":"d37d8574-4f7d-48c0-b6e1-a003d64ba6b0","order_by":2,"name":"Chen Guo","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA4klEQVRIiWNgGAWjYDACCWTOBwMbO9K0MM4oSEsmTQszz4dDjA2EdPDPbj728GubDYO8/xkzaRuDA8wM7IePbsBryZ1j6caybWkMhgeAWnIM7vAx8KSl3cCnxUAix0xasu0wg2FjD0jLM2YGCR4zAlryv0G0NPOYSVsYHGZsIKwlh03yI1CLPBtQCwMxWiRupAFVnktjMOBhK7bsMUhLZiPkF/4Zyc8kf5QBQ6z/8MYbP/7Y2PGzHz6GVwsIMPMwMNRvOMBhAo4jNkLKQYDxB5CQb2B//IEY1aNgFIyCUTDyAABiIkR3+BxxXQAAAABJRU5ErkJggg==","orcid":"","institution":"University of Traditional Chinese Medicine","correspondingAuthor":true,"prefix":"","firstName":"Chen","middleName":"","lastName":"Guo","suffix":""}],"badges":[],"createdAt":"2025-08-01 13:38:30","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-7271791/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-7271791/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1186/s12885-025-15241-2","type":"published","date":"2025-12-26T15:57:49+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":89230571,"identity":"bb97dba0-f5e8-46bf-9122-704a2531b3ad","added_by":"auto","created_at":"2025-08-17 14:15:06","extension":"jpeg","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":239754,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eComputational and experimental workflow of the study.\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"floatimage1.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-7271791/v1/cf56a8590b085fc9b8364a6d.jpeg"},{"id":89230577,"identity":"a136adb4-9a34-4fe8-a310-a5e937f935df","added_by":"auto","created_at":"2025-08-17 14:15:06","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":433324,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eThe cell landscape of colorectal cancer.\u003c/strong\u003e (A) UMAP visualization of the 11 clusters identified in the CRC single-cell dataset. (B) UMAP visualization of the annotated cell types (immune, stromal, and epithelial). (C) Stacked bar plot showing the proportion of each major cell type in normal vs. tumor tissues. (D) Stacked bar plot showing the proportion of each cluster across different samples. (E) Dot plot illustrating the percentage of cells expressing specific marker genes (dot size) and their mean expression levels (color scale) in each cluster.\u003c/p\u003e","description":"","filename":"floatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-7271791/v1/48e4de831b5fad74b8b23ae1.png"},{"id":89233009,"identity":"add769bf-a46b-4cd5-b98b-2f58ed02ebde","added_by":"auto","created_at":"2025-08-17 14:31:06","extension":"jpeg","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":985126,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eMalignant epithelial cell identification using single-cell CNV.\u003c/strong\u003e (A) Heatmap of CNVs inferred by infercnvpy, with the horizontal axis representing cell groups (including immune, stromal, and epithelial subpopulations) and the vertical axis denoting genomic positions; red indicates CNV gains, and blue indicates CNV losses. (B) UMAP visualization of the CNV score for each cell. (C) UMAP plots illustrating malignant (orange) versus non-malignant (blue) cells at the global level (left) and specifically highlighting epithelial cells (right). (D) GO term analysis of the genes significantly enriched in malignant epithelial cells (\u003cem\u003eP\u003c/em\u003e \u0026lt; 0.05). (E) GSEA comparing malignant versus non-malignant epithelial cells.\u003c/p\u003e","description":"","filename":"floatimage3.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-7271791/v1/1754441d842dc68de33fba30.jpeg"},{"id":89231753,"identity":"2f3423e8-fea0-466e-b149-fd6c2b75a6cc","added_by":"auto","created_at":"2025-08-17 14:23:06","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":476621,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eMalignant epithelial subcluster analysis. \u003c/strong\u003e(A) UMAP embedding of the seven malignant epithelial subpopulations identified by Leiden clustering. (B) CytoTRACE2-derived clustering of all malignant epithelial cells, colored by subcluster identity (without displaying individual differentiation scores). (C) UMAP heatmap of relative differentiation potential from CytoTRACE2: purple denotes more differentiated states, orange denotes higher stemness (less differentiation). (D) Boxplots comparing CytoTRACE2 scores across the seven malignant subclusters, illustrating differences in inferred developmental potential. (E) Single-cell regulatory network inference by pySCENIC: the top five transcription factors enriched in each subcluster are highlighted in red.\u003c/p\u003e","description":"","filename":"floatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-7271791/v1/89af7762ac3965a3aec539cb.png"},{"id":89230593,"identity":"df3b7615-14a1-4903-9f88-0202d97cfa08","added_by":"auto","created_at":"2025-08-17 14:15:06","extension":"jpeg","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":705340,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eConstruction and evaluation of the four-gene prognostic model.\u003c/strong\u003e (A) LASSO-Cox analysis: coefficient paths (left) and cross-validation plot (right) identify RPL21, GAL, ELFN1-AS1 and AIF1L.\u003c/p\u003e\n\u003cp\u003e(B) Kaplan–Meier curves for high- versus low-risk TCGA-COAD patients (\u003cem\u003eP \u003c/em\u003e\u0026lt; 0.001). (C) Time-dependent ROC curves showing AUCs at 1, 3 and 5 years. (D) GSEA of GO Biological Processes enriched in the high-risk group. (E) GSEA of KEGG pathways enriched in the high-risk group. (F) GSVA barplot of key pathways upregulated in high-risk (blue) versus low-risk (green) groups.\u003c/p\u003e","description":"","filename":"floatimage5.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-7271791/v1/17b2b946ae12b1f7cc661371.jpeg"},{"id":89230585,"identity":"e95e134e-a14c-49e8-9fda-28918d734329","added_by":"auto","created_at":"2025-08-17 14:15:06","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":458547,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eImmune infiltration, mutation profiles and drug sensitivity by risk group. \u003c/strong\u003e(A) Boxplots of CIBERSORT immune cell fractions in high- vs. low-risk patients. High-risk tumors show more CD4⁺ memory-activated T cells and M0 macrophages; low-risk tumors have more CD4⁺ memory-resting T cells (*\u003cem\u003eP\u003c/em\u003e \u0026lt; 0.05; ns, not significant). (B) Mutation oncoplot for high-risk group, showing the top 15 most frequently mutated genes and per-sample mutation burden. (C) Mutation oncoplot for low-risk group, same format as in (B). (D) Boxplots of predicted IC50 values for five representative drugs from oncoPredict; lower IC50 indicates higher sensitivity (*\u003cem\u003eP \u003c/em\u003e\u0026lt; 0.05; **\u003cem\u003eP\u003c/em\u003e \u0026lt; 0.01; ****\u003cem\u003eP\u003c/em\u003e \u0026lt; 0.0001).\u003c/p\u003e","description":"","filename":"floatimage6.png","url":"https://assets-eu.researchsquare.com/files/rs-7271791/v1/ef48e7ef06fae99dfc1b9105.png"},{"id":89230578,"identity":"b41809c0-f2c9-4e0d-b407-dfee7ce08fa3","added_by":"auto","created_at":"2025-08-17 14:15:06","extension":"png","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":152729,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eIdentification and characterization of hub prognostic genes in the four-gene signature.\u003c/strong\u003e (A) Random forest error rate across 1,000 trees (Left). Right: Variable importance of each gene, with GAL showing the highest contribution. (B) GAL expression is significantly higher in CRC tissues than in normal tissues (Left). High GAL expression is associated with worse overall survival (p = 0.0029) (Right). (C) Left: AIF1L expression is lower in tumors than in normal tissues (Left). No significant survival difference between high and low AIF1L expression groups (p = 0.78) (Right).\u003c/p\u003e","description":"","filename":"floatimage7.png","url":"https://assets-eu.researchsquare.com/files/rs-7271791/v1/1f8cffbe742431350ba49b1e.png"},{"id":89230588,"identity":"ed55eeef-5a6c-4bac-b1e6-0ae7f7d53c18","added_by":"auto","created_at":"2025-08-17 14:15:06","extension":"jpeg","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":573997,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eIn vitro validation of AIF1L knockdown effects on RKO cell migration and invasion. \u003c/strong\u003e(A) Representative wound healing images of RKO cells at 0-, 24-, and 48-hours following transduction with control, negative control (NC), or AIF1L-targeting shRNA (shRNA-1 or shRNA-2). Quantification of wound closure (%) is shown in the lower panels at 24 h and 48 h (n = 3). (B) Transwell invasion assay of RKO cells transduced with indicated constructs. Representative images of invaded cells (left) and quantification of the number of invasive cells per field (right) are shown (n = 3).\u003c/p\u003e","description":"","filename":"floatimage8.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-7271791/v1/8a6a1324fb47a15f3854b1a8.jpeg"},{"id":99172306,"identity":"63cf5263-40cc-48ae-9589-0bc4e078732c","added_by":"auto","created_at":"2025-12-29 16:07:36","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":5956210,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7271791/v1/a79b5526-35ea-477f-a527-8e0f6c52e121.pdf"},{"id":89231750,"identity":"1cff1b37-6873-4b43-a555-38ed70c676c4","added_by":"auto","created_at":"2025-08-17 14:23:06","extension":"xlsx","order_by":0,"title":"","display":"","copyAsset":false,"role":"supplement","size":10723,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryTableS1.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-7271791/v1/0fd5f50a7636dd2009529798.xlsx"}],"financialInterests":"No competing interests reported.","formattedTitle":"Single-cell RNA Sequencing Defines Prognostic Subtypes and Identifies AIF1L as a Therapeutic Target in Colorectal Cancer","fulltext":[{"header":"1 Introduction","content":"\u003cp\u003eColorectal cancer (CRC) is among the most prevalent and lethal malignancies worldwide, accounting for roughly one in ten cancer diagnoses and being the second leading cause of cancer-related mortality[\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e, \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e]. CRC claimed an estimated 881,000 lives globally[\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e]. Standard treatment modalities for CRC include surgical resection of localized tumors, systemic chemotherapy (often fluoropyrimidine-based) for advanced disease, and newer immunotherapies for select patients. Surgery remains the cornerstone of cure in early-stage CRC, and adjuvant chemotherapy can improve survival in resected stage III and high-risk stage II cases[\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e]. However, outcomes for advanced and metastatic CRC remain poor: the five-year survival rate in metastatic CRC is only on the order of 10\u0026ndash;15%[\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e]. Many patients eventually experience disease recurrence or develop resistance to chemotherapy, underscoring major unresolved clinical issues. Indeed, tumor relapse and treatment resistance are primary drivers of CRC\u0026rsquo;s high mortality[\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e]. Even immunotherapy, which has revolutionized care in other malignancies, benefits only a small subset of CRC patients, as most CRCs are immunologically cold or refractory [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e]. These challenges \u0026ndash; metastasis with poor prognosis, frequent recurrence, and therapy resistance \u0026ndash; highlight the need for deeper biological insights to guide new therapeutic strategies.\u003c/p\u003e\u003cp\u003eSingle-cell RNA sequencing (scRNA-seq) has emerged as a powerful approach to dissect cellular heterogeneity in cancers. Unlike bulk sequencing, which masks the variability between individual cells, scRNA-seq profiles gene expression at single-cell resolution, enabling the identification of distinct cell types and states within the tumor microenvironment (TME[\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e]. This technology has been rapidly adopted in cancer research to characterize diverse cell populations and their interactions, providing unprecedented detail on tumor composition and evolution[\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e]. In CRC, recent scRNA-seq studies have begun to catalog the complex ecosystem of malignant, immune, and stromal cells in both primary and metastatic tumors. For example, early single-cell analyses revealed discrete subsets of cancer-associated fibroblasts in CRC[\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e], while others identified heterogeneous malignant cell populations with unique transcriptional programs. Notably, some studies have linked single-cell features to clinical outcomes, identifying a \u0026ldquo;stemness\u0026rdquo; gene signature in tumor cells that correlates with higher relapse rates[\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e] and small scale integrations of single-cell data with bulk transcriptomes have shown promise for prognostic stratification [\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e]. However, the majority of CRC single-cell studies to date have been limited in scale and scope: typically profiling only tens of tumors or fewer [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e], often focusing on descriptive tumor atlases without integrating longitudinal patient outcomes or therapeutic responses. Moreover, while computational analyses have nominated putative driver genes of CRC progression (for instance, LYZ, LCN2, CEACAM5, and FOXQ1 were recently implicated in premalignant lesion advancement) [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e], these candidates have generally not been functionally validated. In summary, although scRNA-seq has demonstrated the ability to unearth CRC\u0026rsquo;s intratumoral heterogeneity and suggest new biomarkers, most studies have not leveraged its full potential to identify clinically significant cellular subsets or drivers with confirmed roles in disease aggressiveness.\u003c/p\u003e\u003cp\u003eHere we address these gaps by deploying an integrative single-cell analysis across multiple CRC datasets to resolve the fine-scale landscape of epithelial tumor cell heterogeneity. Recognizing that stem-like cancer cells are thought to fuel tumor progression, metastasis and relapse in CRC, we specifically interrogate malignant subpopulations with stem-cell-like characteristics. Through large-scale data integration, we define and characterize a key stem-like tumor cell subset that appears to underlie therapy resistance and poor patient outcomes. We further develop a prognostic model based on single-cell-derived gene signatures, enabling risk stratification of CRC patients and linking tumor cell heterogeneity to survival. Finally, we functionally validate a novel candidate driver gene from the stem-like malignant population, demonstrating its role in CRC growth and therapeutic response. Together, these findings provide a new framework for understanding CRC at single-cell resolution and identify actionable biomarkers. In the following sections, we describe the results of this integrated analysis and the insights it yields into CRC heterogeneity and progression.\u003c/p\u003e"},{"header":"2 Methods","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e\u003ch2\u003e2.1 CRC Data Acquisition\u003c/h2\u003e\u003cp\u003eThe scRNA-seq datasets analyzed in this research were retrieved from the GEO repository (Gene Expression Omnibus, NCBI, \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.ncbi.nlm.nih.gov/geo/\u003c/span\u003e\u003cspan address=\"https://www.ncbi.nlm.nih.gov/geo/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e) under the accession numbers GSE132465 and GSE188711. These datasets include omental samples from six CRC patients. Additionally, bulk RNA-seq data and corresponding clinical annotations were acquired from the TCGA CRC dataset (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://portal.gdc.cancer.gov/\u003c/span\u003e\u003cspan address=\"https://portal.gdc.cancer.gov/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e), encompassing a total of 521 eligible samples, which were employed to develop a prognostic signature.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec4\" class=\"Section2\"\u003e\u003ch2\u003e2.2 Data Quality Control and Dimensionality Reduction\u003c/h2\u003e\u003cp\u003eThe raw scRNA-seq data were preprocessed using Scanpy (version 1.9.1, Python 3.8)[\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e]. Potential doublets were identified and excluded utilizing Scrublet (version 3.0) with default settings [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e]. Cells of inferior quality were further filtered according to the following thresholds: (1) gene counts ranging from 300 to 8000, (2) total UMI counts between 500 and 50,000, and (3) mitochondrial gene percentages lower than 20%. Post filtering, sc.pp.normalize_total was applied for normalization, followed by a log-transformation step using sc.pp.log1p. Highly variable genes were subsequently identified using the highly_variable_genes function with the flavour parameter set as seurat_v3. Batch effects across samples were corrected using Harmony (version 0.0.10) with theta set to 2 [\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e]. Clustering was conducted via the Leiden algorithm at a resolution of 0.1, and visualization employed UMAP with parameters set at n_neighbors\u0026thinsp;=\u0026thinsp;15 and min_dist\u0026thinsp;=\u0026thinsp;0.5.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec5\" class=\"Section2\"\u003e\u003ch2\u003e2.3 Malignant Epithelial cell identification using InferCNV\u003c/h2\u003e\u003cp\u003eTo differentiate malignant from non-malignant epithelial cells in CRC tissues, we applied InferCNVpy (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/broadinstitute/inferCNV\u003c/span\u003e\u003cspan address=\"https://github.com/broadinstitute/inferCNV\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e) to assess copy number variations (CNVs) across cell types. Immune cells were designated as the reference baseline, and epithelial cells displaying elevated CNV scores were classified as malignant. Malignant cells identified by this method were re-clustered using the Leiden algorithm.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec6\" class=\"Section2\"\u003e\u003ch2\u003e2.4 Heterogeneity of CRC subpopulation\u003c/h2\u003e\u003cp\u003eDifferentially expressed genes among CRC cell subpopulations were identified via sc.tl.rank_genes_groups. Subsequently, GO Biological Process enrichment analyses were conducted using the GSEApy package. Additionally, gene set enrichment analysis (GSEA) was executed utilizing the KEGG_2021_Human dataset to assess functional distinctions across these subpopulations.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec7\" class=\"Section2\"\u003e\u003ch2\u003e2.5 Subcluster Stemness Analysis and Trajectory Analysis\u003c/h2\u003e\u003cp\u003eCytoTRACE 2 was employed to determine differentiation states of distinct CRC subclusters, generating CytoTRACE scores to infer differentiation levels [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e]. Regulatory transcription factors (TFs) within each subpopulation were further investigated using pySCENIC[\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e]. Initially, GRNBoost predicted potential TF-target gene interactions, followed by DNA motif enrichment to detect direct TF binding targets. AUCell was used to quantify regulator activity per cell, selecting the top five TFs according to activity scores.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec8\" class=\"Section2\"\u003e\u003ch2\u003e2.6 Cell\u0026ndash;Cell communication Analysis\u003c/h2\u003e\u003cp\u003eIntercellular communication among identified subpopulations was evaluated through CellPhoneDB software (version 1.6.1), with an emphasis on receptor\u0026ndash;ligand pairs and signaling pathway[\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e].\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec9\" class=\"Section2\"\u003e\u003ch2\u003e2.7 Construction and Validation of a Novel Prognostic Risk Model\u003c/h2\u003e\u003cp\u003eUsing univariate Cox regression (P\u0026thinsp;\u0026lt;\u0026thinsp;0.05), we identified OS-related genes from the top 500 genes of UBE2C\u0026thinsp;+\u0026thinsp;epithelial cells. Important prognostic markers were subsequently refined by employing least absolute shrinkage and selection operator (LASSO) regression, enhancing the prediction accuracy. Individual risk scores were calculated by summing the product of each gene\u0026rsquo;s expression level and its regression coefficient (Risk score\u0026thinsp;=\u0026thinsp;Σ(Coef. gene \u0026times; gene expression)). Based on median risk scores, patients were categorized into high- and low-risk groups. Kaplan\u0026ndash;Meier survival curves and receiver operating characteristic (ROC) analyses validated the prognostic potential of the derived model.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec10\" class=\"Section2\"\u003e\u003ch2\u003e2.8 Immune Infiltration Analysis and Functional Enrichment Analysis\u003c/h2\u003e\u003cp\u003eAnalysis\u003c/p\u003e\u003cp\u003eThe immune cell infiltration landscape between high- and low-risk cohorts was characterized using the CIBERSORT algorithm, aiming to associate immune cell profiles with prognostic biomarkers [\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e]. TIDE (Tumor Immune Dysfunction and Exclusion) scores were computed for both cohorts, and immune checkpoint gene expression differences were statistically assessed using Wilcoxon tests.\u003c/p\u003e\u003cp\u003eDifferential expression analysis between risk groups was conducted using DESeq2. Functional enrichment analyses, including KEGG pathway and GO Biological Process, were performed using the clusterProfiler R package (version 4.6.2)[\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e].\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec11\" class=\"Section2\"\u003e\u003ch2\u003e2.9 Gene Mutation Analysis\u003c/h2\u003e\u003cp\u003eSomatic mutation profiles of CRC samples were obtained from the TCGA database. Tumor mutational burden (TMB) scores were computed with the maftools package [\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e]. The correlation between TMB and risk scores was evaluated by Spearman\u0026rsquo;s rank correlation. Kaplan\u0026ndash;Meier survival curves were constructed after dividing samples into high- and low-TMB groups based on median TMB values to investigate prognostic implication [\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e].\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec12\" class=\"Section2\"\u003e\u003ch2\u003e2.10 Drug Sensitivity Assessment\u003c/h2\u003e\u003cp\u003eThe IC50 values for chemotherapeutic drugs across groups were estimated using the pRRophetic package (version 0.5)[\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e].\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec13\" class=\"Section2\"\u003e\u003ch2\u003e2.11 Cell Lines and Culture\u003c/h2\u003e\u003cp\u003eRKO cells sourced from the Chinese Academy of Sciences Type Culture Collection were cultured in RPMI 1640 medium (Gibco, USA) supplemented with 10% fetal bovine serum and 1% penicillin/streptomycin at 37\u0026deg;C, 5% CO\u003csub\u003e2\u003c/sub\u003e, and 95% humidity.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec14\" class=\"Section2\"\u003e\u003ch2\u003e2.12 Construction and Transfection of shRNA\u003c/h2\u003e\u003cp\u003eFor stable knockdown of AIF1L, shRNA constructs targeting sequences siRNA1 (5\u0026prime;-CGAGCTGATAAACAACATGTT-3\u0026prime;) and siRNA2 (5\u0026prime;-GCTCCTAGAGAACACCCATTT-3\u0026prime;) were synthesized and cloned into pLKO.1-TRC vector, confirmed via Sanger sequencing. Lentiviruses were produced by co-transfecting these constructs with packaging vectors psPAX2 and pMD2.G into 293T cells using Lipofectamine 3000 (Invitrogen, USA). Lentiviral supernatants were harvested 72 hours post-transfection, filtered (0.45 \u0026micro;m), and used to infect logarithmically growing RKO cells in the presence of Polybrene (4 \u0026micro;g/mL). Puromycin (8 \u0026micro;g/mL) selection was initiated after 48 hours, lasting approximately one week. Knockdown efficacy was validated by RT-qPCR using specific primers for AIF1L (forward: 5\u0026prime;-CCGAGACTTTGTGAACATGAT-3\u0026prime;, reverse: 5\u0026prime;-CCACCTGGAGATGAAGAAGAT-3\u0026prime;).\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec15\" class=\"Section2\"\u003e\u003ch2\u003e2.13 Wound Healing Assay\u003c/h2\u003e\u003cp\u003eStable cells were cultured until ~\u0026thinsp;95% confluence in 6-well plates. Scratches were created using sterile pipette tips (200 \u0026micro;L), rinsed with PBS, and replenished with fresh medium. Images were acquired at 0 and 48 hours to assess wound closure.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec16\" class=\"Section2\"\u003e\u003ch2\u003e2.14 Transwell Assay\u003c/h2\u003e\u003cp\u003eTranswell chambers (8 \u0026micro;m pores, Corning, USA), pre-coated with Matrigel (BD Biosciences, 356234) for invasion assays, were used to evaluate migration/invasion abilities. Cells (2.0\u0026times;105/mL) in serum-free medium were seeded into upper chambers, and lower chambers contained medium with 10% FBS. After 48 hours, cells traversing membranes were fixed, stained with crystal violet, and counted microscopically.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec17\" class=\"Section2\"\u003e\u003ch2\u003e2.15 Statistical Analysis\u003c/h2\u003e\u003cp\u003eR (version 4.4.1) was used for statistical analysis. Wilcoxon tests and Pearson correlation analyses were performed with significance set as follows: *\u003cem\u003eP\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.05, **P\u0026thinsp;\u0026lt;\u0026thinsp;0.01, ***\u003cem\u003eP\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001, ****\u003cem\u003eP\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.0001, \"ns\" indicated no significance.\u003c/p\u003e\u003c/div\u003e"},{"header":"3 Results","content":"\u003cp\u003e\u003cstrong\u003e3.1 Single-cell landscape of CRC\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTo construct a high-resolution cellular atlas of CRC, we integrated two publicly available scRNA-seq datasets comprising 39 samples (Supplementary Table S1). The overall study workflow is summarized in Fig. 1. After stringent quality control (gene count \u0026gt; 300; mitochondrial transcript fraction \u0026lt; 15%) and removal of doublets, 23,7875 high-quality single cells remained. Unsupervised clustering with the Leiden algorithm (resolution = 0.2) and UMAP visualization identified 11 distinct clusters (Fig. 2A). Based on canonical marker genes and literature annotations, these clusters were assigned to immune (T cells (cluster 0), CD8⁺ T cells (cluster 1), macrophages (cluster 2), B cells (cluster 3), plasma cells (cluster 6), plasmacytoid dendritic cells (cluster 9)), stromal (fibroblasts (cluster 5), endothelial cells (cluster 7), mast cells (cluster 8), Schwann cells (cluster 10) and epithelial cells (cluster 4) compartments (Fig. 2B), with representative markers shown in Fig. 2E.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eComparing adjacent normal and tumor tissues revealed significant shifts in cellular composition (Fig. 2C, D), normal mucosa contained higher proportions of CD8⁺ T cells and fibroblasts, whereas tumor samples were enriched for total T cells and macrophages. These changes suggest the presence of functionally specialized T cell and macrophage subsets in CRC lesions that likely influence immune surveillance and tumor progression.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e3.2 Malignant cell identification via single-cell CNA analysis\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTo distinguish malignant epithelial cells from their non-malignant counterparts, we applied infercnvpy (v0.4.0) to infer CNVs from our scRNA-seq data, using immune and stromal populations as diploid references. The resulting genome-wide CNV heatmap across all clusters (immune, stromal and epithelial) is shown in Fig. 3A (red, gain; blue, loss). We then computed a per-cell CNV score and projected these values onto the UMAP embedding (Fig. 3B), which revealed a bimodal distribution with a median score of ~0.02. Cells exceeding this threshold were provisionally labeled \u0026ldquo;malignant,\u0026rdquo; while those below were deemed \u0026ldquo;non-malignant\u0026rdquo; (Fig. 3C, left). Focusing on the epithelial compartment, we applied the same cutoff (Fig. 3C, right) and thereby identified 2 100 malignant epithelial cells for downstream analysis.\u003c/p\u003e\n\u003cp\u003eTo validate the biological relevance of our CNV-based classification, we performed GO enrichment on genes upregulated in the malignant epithelial subpopulations. This analysis highlighted processes related to apoptosis regulation, cell adhesion and intercellular signaling (Fig. 3D), although terms such as \u0026ldquo;cardiac muscle cell adhesion\u0026rdquo; likely reflect pathway database generalization rather than CRC-specific biology. We further conducted GSEA comparing malignant versus non-malignant epithelial cells and observed significant enrichment of epithelial-to-mesenchymal transition (EMT), TNF-\u0026alpha; signaling, coagulation, apoptosis, cholesterol homeostasis and complement pathways (Fig. 3E). Collectively, these findings confirm that CNV-inferred malignant epithelial cells exhibit active pro-tumorigenic programs and underscore the utility of single-cell CNV profiling for resolving cancer cell heterogeneity within the CRC microenvironment.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e3.3 Heterogeneity of malignant epithelial subpopulations\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTo dissect functional diversity within the malignant epithelial compartment, we reclustered the infercnvpy-defined malignant epithelial cells using the Leiden algorithm, yielding 7 discrete subpopulations. Based on their top marker genes, we designated these clusters as CEACAM5⁺ mEpi, ELFN1-AS1⁺ mEpi, HMGB1⁺ mEpi, IFITM3⁺ mEpi, HSPA1B⁺ mEpi, PHGR1⁺ mEpi and SPINK4⁺ mEpi (Fig. 4A). In UMAP space, each subset forms a distinct locus, implying differences in differentiation status or functional programs.\u003c/p\u003e\n\u003cp\u003eWe next applied CytoTRACE2 to infer each subpopulation\u0026rsquo;s developmental potential, where higher scores denote a more progenitor-like (undifferentiated) state. Overlaying CytoTRACE scores on the UMAP (Fig. 4B\u0026ndash;C) revealed that the ELFN1-AS1⁺ mEpi subset exhibits the highest stemness, suggesting it may serve as a tumor-initiating or metastasis-competent pool (Fig. 4D). Given that ELFN1-AS1 is a long noncoding RNA implicated in cell plasticity, this cluster represents a compelling focus for future mechanistic studies.\u003c/p\u003e\n\u003cp\u003eFinally, to elucidate regulatory networks underpinning each subpopulation, we performed single-cell regulatory network inference with pySCENIC. As shown in Fig. 3E, each mEpi subset is characterized by a unique TF signature: CEACAM5⁺ mEpi is enriched for POU1F1 and DLX5, suggesting proliferative and transcriptional remodeling capacity; HMGB1⁺ mEpi for the cell-cycle regulator MYBL2; IFITM3⁺ mEpi for ZNF family factors; HSPA1B⁺ mEpi for POU3F1; and so on. Notably, PLAG1 emerges as a top TF in the ELFN1-AS1⁺ mEpi population, hinting at cooperation with ELFN1-AS1 in maintaining stem-like features. Together, these analyses reveal a multilayered landscape of malignant epithelial heterogeneity in CRC and nominate ELFN1-AS1⁺ mEpi as a priority target for functional validation.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e3.4 Prognostic model based on the ELFN1-AS1⁺ mEpi signature\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTo evaluate the clinical relevance of the ELFN1-AS1⁺ mEpi subpopulation, we ranked its top 500 most highly expressed genes and tested their association with overall survival in the TCGA-COAD cohort. First, univariate Cox regression narrowed the candidate list, and subsequent LASSO Cox analysis\u0026mdash;with penalty parameter chosen at \u0026lambda;_min\u0026mdash;identified four genes (RPL21, GAL, ELFN1-AS1 and AIF1L) whose expression levels were independently prognostic (LASSO coefficients: 0.00209, 0.03439, 0.03530 and 0.14608, respectively; Fig. 5A). We then constructed a risk score as the sum of each gene\u0026rsquo;s expression multiplied by its coefficient. TCGA-COAD patients were dichotomized at the median risk score into low- and high-risk groups. Kaplan\u0026ndash;Meier analysis demonstrated that patients in the low-risk group experienced significantly longer overall survival than those in the high-risk group (P \u0026lt; 0.001; Fig. 5B). Time-dependent ROC curves further confirmed robust predictive performance, with AUCs of 0.73, 0.70 and 0.76 at 1, 3 and 5 years, respectively (Fig. 5C).\u003c/p\u003e\n\u003cp\u003eIn summary, leveraging ELFN1-AS1⁺ mEpi\u0026ndash;derived candidate genes and clinical outcomes from TCGA-COAD, we established a concise four-gene prognostic model that stratifies CRC patients with high fidelity. These findings highlight RPL21, GAL, ELFN1-AS1 and AIF1L as potential biomarkers and motivate future mechanistic studies on their roles in tumor progression and the immune TME.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e5. Molecular Functions and Immune Landscape Distinctions Between High- and Low-Risk Groups\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTo elucidate transcriptomic distinctions between high- and low-risk CRC patients, we conducted GSEA on GO Biological Process and KEGG pathway gene sets (Fig. 5D\u0026ndash;E). In the high-risk cohort, GO enrichment highlighted \u0026ldquo;mitochondrial respiratory chain complex assembly,\u0026rdquo; \u0026ldquo;NADH dehydrogenase complex assembly\u0026rdquo; and \u0026ldquo;ribosomal large subunit biogenesis,\u0026rdquo; reflecting upregulated bioenergetic processes and ribosome biogenesis. KEGG analysis similarly revealed significant activation of \u0026ldquo;oxidative phosphorylation\u0026rdquo; and \u0026ldquo;alkaloid biosynthesis,\u0026rdquo; underscoring enhanced mitochondrial function and metabolic remodeling.\u003c/p\u003e\n\u003cp\u003eComplementarily, GSVA confirmed these trends at the sample level (Fig. 5F), with high-risk tumors showing elevated enrichment scores for \u0026ldquo;DNA polymerization,\u0026rdquo; \u0026ldquo;purine metabolism,\u0026rdquo; \u0026ldquo;ribosome\u0026rdquo; and even neurodegeneration-related sets such as \u0026ldquo;Huntington\u0026rsquo;s disease,\u0026rdquo; indicative of broad increases in DNA synthesis and protein translation machinery. Conversely, low-risk tumors were preferentially enriched for pathways including \u0026ldquo;type II diabetes mellitus\u0026rdquo; and \u0026ldquo;phosphatidylinositol signaling system,\u0026rdquo; suggesting distinct metabolic wiring and signaling states. Together, these results imply that high-risk CRCs are characterized by hyperactive energy metabolism and translational capacity, which may contribute to their more aggressive clinical behavior.\u003c/p\u003e\n\u003cp\u003eWe applied CIBERSORT to compare immune cell abundances between high- and low-risk groups (Fig. 6A). High-risk tumors exhibited significantly higher proportions of CD4⁺ memory-activated T cells (P \u0026lt; 0.05) and M0 macrophages, whereas low-risk samples were enriched for CD4⁺ memory-resting T cells. This suggests that high-risk CRC harbors a more inflamed yet potentially dysregulated microenvironment, which may contribute to immune exhaustion and poorer outcomes.\u003c/p\u003e\n\u003cp\u003eNext, we contrasted the mutational landscapes of each risk group. In high-risk patients (Fig. 6B), canonical drivers such as APC, TP53 and TTN showed elevated mutation frequencies, while low-risk tumors more frequently carried KRAS and PIK3CA alterations (Fig. 6C). These distinct mutational trajectories imply divergent tumor evolution and may guide risk-adapted therapeutic decisions.\u003c/p\u003e\n\u003cp\u003eFinally, using oncoPredict, we estimated each group\u0026rsquo;s response to a panel of anticancer agents. Compounds such as Z-764467149_1000 and Afatinib_1022 had lower predicted IC₅₀ values in the high-risk cohort, indicating higher sensitivity, whereas BMS-754807_2171 was preferentially effective in low-risk tumors (Fig. 6D). Integrating risk stratification with pharmacogenomic profiling highlights candidate drugs for personalized treatment.\u003c/p\u003e\n\u003cp\u003eTogether, these multi-layered analyses demonstrate that high-risk CRC is marked by hyperactive energy metabolism, a pro-inflammatory yet dysfunctional immune milieu, greater genomic instability, and unique drug vulnerabilities\u0026mdash;insights that may inform precision prognostication and tailored therapeutic strategies.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e6 Hub gene identification by machine learning\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTo pinpoint the most influential prognostic markers, we applied a random forest (RF) classifier to the four-gene signature. Feature importance scores (mean decrease in Gini impurity) revealed that GAL and AIF1L both achieved approximately 0.10 importance (Fig. 7A). We then compared GAL and AIF1L expression between tumor and adjacent normal tissues using TCGA-COAD and GTEx data. GAL was significantly upregulated in tumors, and patients with lower GAL expression exhibited better overall survival (Fig. 7B). In contrast, AIF1L showed higher baseline expression in normal colon but carried the largest LASSO coefficient in our prognostic model. This apparent discrepancy\u0026mdash;high AIF1L in normal tissue yet strong adverse prognostic weight\u0026mdash;suggests a complex role for AIF1L in CRC biology and warrants further functional investigation.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e7 . In vitro validation of AIF1L function in CRC cells\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTo determine AIF1L\u0026rsquo;s biological impact, we generated stable AIF1L knockdown in RKO cells using two independent shRNAs. In wound-healing assays, AIF1L-depleted cells closed scratch gaps significantly faster than control cells, indicating enhanced migratory capacity (Fig. 8A). Similarly, Transwell invasion assays demonstrated a marked increase in invasive cell numbers upon AIF1L silencing (Fig. 8B). These results reveal that loss of AIF1L promotes CRC cell motility and invasion, supporting its functional relevance in tumor progression and highlighting it as a candidate for mechanistic follow-up.\u0026nbsp;\u003c/p\u003e"},{"header":"Discussion","content":"\u003cp\u003eOur single-cell transcriptomic analysis provides an integrative view of CRC heterogeneity, linking cellular subpopulations to clinical outcomes and therapeutic vulnerabilities. We assembled a compendium of ~\u0026thinsp;70,000 cells from 29 CRC tumors, enabling high-resolution delineation of the TME and malignant compartment. Using CNVs we distinguished malignant epithelial cells from non-malignant cells, then uncovered seven transcriptionally distinct malignant subpopulations. Notably, one subpopulation marked by the long noncoding RNA ELFN1-AS1 exhibited the highest stemness scores, suggesting a progenitor-like, tumor-initiating cell pool. From this subset, we derived a parsimonious four-gene prognostic signature (RPL21, GAL, ELFN1-AS1, AIF1L) that stratified patients into high- and low-risk groups with significantly different survival. Furthermore, we identified \u003cb\u003eAIF1L\u003c/b\u003e as a potential hub gene and validated its functional role in vitro, demonstrating that AIF1L knockdown enhances CRC cell migration and invasion. Collectively, these findings underscore the power of single cell approaches to reveal intra-tumoral diversity and pinpoint novel prognostic biomarkers and therapeutic targets.\u003c/p\u003e\u003cp\u003eOur results corroborate and extend emerging insights from recent single-cell studies of CRC. Previous scRNA-seq analyses have highlighted the complex cellular makeup of CRC tumors \u0026ndash; including diverse immune infiltrates, stromal elements, and malignant cell states \u0026ndash; that traditional bulk profiling cannot resolve[\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e]. For example, Xiao et al. combined single-cell and spatial transcriptomics in CRC and similarly identified seven malignant epithelial subtypes with distinct gene programs (e.g. characterized by markers like CAV1, FOS/JUN, ZEB2, etc.) [\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e]. Our identification of seven malignant subpopulations aligns with these reports, reinforcing that multiple co-existing tumor cell lineages drive CRC progression. Importantly, we provide additional context by integrating single-cell data with patient survival outcomes. While prior studies have defined transcriptional subtypes or CSC populations in CRC[\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e], our work directly links a stem-like malignant subcluster (ELFN1-AS1 high) to a prognostic gene signature. This approach builds upon earlier prognostic models based on bulk gene expression[\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e], but adds a unique dimension by rooting the signature in a specific cell population of clinical interest. Additionally, the prominence of ELFN1-AS1 in our stem-like cluster is supported by accumulating evidence that this lncRNA is an oncogenic driver in gastrointestinal cancers. ELFN1-AS1 is reported to promote CRC cell proliferation, migration and even chemoresistance[\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e, \u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e], consistent with our proposal that the ELFN1-AS1\u0026thinsp;+\u0026thinsp;subpopulation represents an aggressive cell state. By integrating these findings with ours, a picture emerges in which intratumoral heterogeneity \u0026ndash; particularly the presence of stem-like, lncRNA-driven malignant cells underlies disease progression and therapy resistance in CRC.\u003c/p\u003e\u003cp\u003eThe discovery of a four-gene risk signature rooted in a single-cell-defined cluster has practical implications for patient stratification. Our prognostic model (incorporating RPL21, GAL, ELFN1-AS1, and AIF1L) demonstrated robust performance (5-year AUC\u0026thinsp;=\u0026thinsp;0.76) in distinguishing high vs low-risk patients. Notably, these genes would not intuitively be grouped together without data-driven identification; their combined predictive value highlights the advantage of mining single-cell data for novel gene combinations. In clinical terms, this 4-gene panel could potentially supplement existing prognostic systems and guide treatment intensity if validated in prospective cohorts. Biologically, analysis of the high-risk group shed light on features of aggressive tumors. We observed enrichment of pathways related to oxidative phosphorylation, ribosomal biogenesis, and DNA replication in high-risk tumors, indicating a hyperactive metabolic and protein synthesis program. Such metabolic reprogramming is a known hallmark of cancer progression [\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e], and its prominence in the poor-outcome group suggests these tumors have elevated bioenergetic and biosynthetic demands that may confer faster growth or therapy resistance. In contrast, low-risk tumors showed upregulation of pathways like insulin signaling and adipogenesis, hinting at fundamentally different metabolic states. These differences echo the concept that CRC can follow divergent evolutionary trajectories \u0026ndash; for instance, one subtype may become more proliferative and metabolically \u0026ldquo;hungry,\u0026rdquo; while another remains more quiescent. We also found that high-risk tumors harbored a more \u0026ldquo;inflamed\u0026rdquo; immune microenvironment, with higher fractions of activated memory CD4\u003csup\u003e+\u003c/sup\u003e T cells and undifferentiated (M0) macrophages, whereas low-risk tumors had more resting memory T cells. Paradoxically, an inflamed TME in high-risk tumors might reflect an ineffective anti-tumor immune response \u0026ndash; potentially due to immune exhaustion or suppression \u0026ndash; which could contribute to worse outcomes. This notion aligns with recent work showing that aggressive CRCs can co-opt immune escape mechanisms despite substantial immune cell infiltration[\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e]. Moreover, the distinct mutational landscapes we noted (e.g. higher APC and TP53 mutation rates in high-risk tumors versus more frequent KRAS/PIK3CA mutations in low-risk tumors) suggest our risk groups may correspond to different molecular subtypes of CRC. High-risk patients might belong to a more genomically unstable, mesenchymal-biased subtype (often associated with poor prognosis), whereas low-risk patients might represent tumors with a different oncogenic profile. These insights could inform tailored therapies \u0026ndash; for instance, high-risk tumors with p53 loss and heightened oxidative phosphorylation might benefit from metabolic inhibitors or drugs targeting the p53 pathway, whereas low-risk, KRAS-mutant tumors might be susceptible to EGFR or MEK/PI3K pathway inhibitors.\u003c/p\u003e\u003cp\u003eOur data-driven approach also nominated AIF1L (Allograft Inflammatory Factor 1-Like) as a potential tumor suppressor in CRC. Interestingly, AIF1L had the largest positive coefficient in our Cox model (denoting higher expression associated with higher risk), yet functionally it appears to restrain cancer cell motility. We found that knocking down AIF1L in CRC cells substantially increased their migratory and invasive capabilities. This finding is biologically coherent with studies in breast cancer, where AIF1L is downregulated in tumors and low AIF1L levels correlate with worse survival[\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e]. Mechanistically, AIF1L localizes to actin cytoskeletal structures and its overexpression can suppress cell spreading and protrusive activity by down-regulating focal adhesion kinase (FAK) and RhoA signaling[\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e]. Thus, loss of AIF1L removes a brake on cytoskeletal dynamics, promoting a more migratory, invasive phenotype \u0026ndash; exactly what we observed in CRC cells. The fact that AIF1L is more highly expressed in normal colon epithelium than in tumors (as also noted in breast tissue) implies it may act as a differentiation or structural maintenance factor that is silenced during tumor progression. The seemingly contradictory association of AIF1L with poor prognosis in our model might be explained by context-dependent expression or by compensation in aggressive tumors \u0026ndash; for instance, surviving high-grade tumor cells might upregulate AIF1L to modulate an overly motile phenotype, or AIF1L could be co-expressed with other risk genes in certain cell states. In any case, AIF1L emerges as a noteworthy candidate for further investigation: its exact role in CRC progression (tumor-suppressive vs. contextually pro-tumor) and its draggability (perhaps via pathways like FAK/RhoA) merit deeper exploration.\u003c/p\u003e\u003cp\u003eIn summary, our study demonstrates a multi-scale analytical framework, from single-cell dissection of tumor ecosystems to population-level prognostic modeling and experimental validation that can yield both fundamental and translational insights. We reveal that CRC tumors are mosaics of phenotypically diverse cells, among which a stem-like, ELFN1-AS1 high subset appears to orchestrate malignancy and relapse. By capturing the signals of this subset in a simple gene signature, we can stratify patients by risk and identify molecular vulnerabilities (such as AIF1L-mediated pathways). These findings open avenues for more precise prognostication, for instance, monitoring the abundance or activity of the ELFN1-AS1\u0026thinsp;+\u0026thinsp;subpopulation (through its gene signature or biomarkers) could help identify patients at higher risk of recurrence who might benefit from adjuvant therapy intensification. Therapeutically, if further studies confirm AIF1L\u0026rsquo;s role in restraining invasion, strategies to boost AIF1L activity or mimic its effects (perhaps via FAK/RhoA inhibition, given its downstream targets) could be explored to impede metastasis in high-risk CRC. Conversely, targeting the vulnerabilities of the high-risk metabolic phenotypes another rational direction. As single-cell technologies continue to advance, we anticipate that integrating such data with clinical and experimental studies will increasingly enable \u0026ldquo;precision oncology\u0026rdquo; approaches identifying not just what mutations a tumor has, but what cell states it contains, and tailoring interventions accordingly. Our work contributes to this vision by mapping CRC heterogeneity and translating it into candidates for prognostic and therapeutic development.\u003c/p\u003e"},{"header":"Abbreviations","content":"\u003cp\u003e\u003cstrong\u003eCRC:\u0026nbsp;\u003c/strong\u003eColorectal cancer\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCNVs\u003c/strong\u003e: copy number variations\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eGSEA\u003c/strong\u003e: gene set enrichment analysis\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eTMB\u003c/strong\u003e: Tumor mutational burden\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eTME\u003c/strong\u003e: tumor microenvironment\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eConsent for publication\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNot applicable.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAvailability of data and materials\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe data used and analyzed in the current study are available from the corresponding author upon reasonable request.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis study was supported by the Traditional Chinese Medicine and Ethnic Medicine Science \u0026amp; Technology Research Project Guizhou Provincial Administration of Traditional Chinese Medicine, QZYY-2024-003\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthors\u0026apos; contributions\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eL.G. and W.D. conceived and designed the study. L.G. performed single-cell RNA-seq data processing, clustering, inferCNV analysis, subpopulation annotation, and prognostic model construction. W.D. conducted in vitro experiments, including shRNA knockdown, wound healing, and Transwell assays. C.G. supervised the project, secured funding, and provided critical input on study design and data interpretation. L.G. drafted the manuscript; W.D. and C.G. critically revised the manuscript for important intellectual content. All authors reviewed and approved the final version of the manuscript.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAcknowledgements\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNot applicable.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eEthics approval and consent to participate\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNot applicable.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConsent for publication\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNot applicable.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCompeting interests\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors declare no competing interests.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eBray F, Laversanne M, Sung H, Ferlay J, Siegel RL, Soerjomataram I, Jemal A: \u003cstrong\u003eGlobal cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries\u003c/strong\u003e. \u003cem\u003eCa-a Cancer Journal for Clinicians \u003c/em\u003e2024, \u003cstrong\u003e74\u003c/strong\u003e(3):229-263.\u003c/li\u003e\n\u003cli\u003eSiegel RL, Kratzer TB, Giaquinto AN, Sung H, Jemal A: \u003cstrong\u003eCancer statistics, 2025\u003c/strong\u003e. \u003cem\u003eCa-a Cancer Journal for Clinicians \u003c/em\u003e2025, \u003cstrong\u003e75\u003c/strong\u003e(1):10-45.\u003c/li\u003e\n\u003cli\u003eRawla P, Sunkara T, Barsouk A: \u003cstrong\u003eEpidemiology of colorectal cancer: incidence, mortality, survival, and risk factors\u003c/strong\u003e. \u003cem\u003ePrzeglad gastroenterologiczny \u003c/em\u003e2019, \u003cstrong\u003e14\u003c/strong\u003e(2):89-103.\u003c/li\u003e\n\u003cli\u003eSchmoll H-J, Tabernero J, Maroun J, de Braud F, Price T, Van Cutsem E, Hill M, Hoersch S, Rittweger K, Haller DG: \u003cstrong\u003eCapecitabine Plus Oxaliplatin Compared With Fluorouracil/Folinic Acid As Adjuvant Therapy for Stage III Colon Cancer: Final Results of the NO16968 Randomized Controlled Phase III Trial\u003c/strong\u003e. \u003cem\u003eJournal of Clinical Oncology \u003c/em\u003e2015, \u003cstrong\u003e33\u003c/strong\u003e(32):3733-+.\u003c/li\u003e\n\u003cli\u003eShin AE, Giancotti FG, Rustgi AK: \u003cstrong\u003eMetastatic colorectal cancer: mechanisms and emerging therapeutics\u003c/strong\u003e. \u003cem\u003eTrends in Pharmacological Sciences \u003c/em\u003e2023, \u003cstrong\u003e44\u003c/strong\u003e(4):222-236.\u003c/li\u003e\n\u003cli\u003eChen L, Yang F, Chen S, Tai J: \u003cstrong\u003eMechanisms on chemotherapy resistance of colorectal cancer stem cells and research progress of reverse transformation: A mini-review\u003c/strong\u003e. \u003cem\u003eFrontiers in Medicine \u003c/em\u003e2022, \u003cstrong\u003e9\u003c/strong\u003e.\u003c/li\u003e\n\u003cli\u003eWang J, Zhang Y, Chen X, Sheng Q, Yang J, Zhu Y, Wang Y, Yan F, Fang J: \u003cstrong\u003eSingle-Cell Transcriptomics Reveals Cellular Heterogeneity and Drivers in Serrated Pathway-Driven Colorectal Cancer Progression\u003c/strong\u003e. \u003cem\u003eInternational Journal of Molecular Sciences \u003c/em\u003e2024, \u003cstrong\u003e25\u003c/strong\u003e(20).\u003c/li\u003e\n\u003cli\u003eWen R, Zhou L, Peng Z, Fan H, Zhang T, Jia H, Gao X, Hao L, Lou Z, Cao F\u003cem\u003e et al\u003c/em\u003e: \u003cstrong\u003eSingle-cell sequencing technology in colorectal cancer: a new technology to disclose the tumor heterogeneity and target precise treatment\u003c/strong\u003e. \u003cem\u003eFrontiers in Immunology \u003c/em\u003e2023, \u003cstrong\u003e14\u003c/strong\u003e.\u003c/li\u003e\n\u003cli\u003eKothalawala WJ, Bartak BK, Nagy ZB, Zsigrai S, Szigeti KA, Valcz G, Takacs I, Kalmar A, Molnar B: \u003cstrong\u003eA Detailed Overview About the Single-Cell Analyses of Solid Tumors Focusing on Colorectal Cancer\u003c/strong\u003e. \u003cem\u003ePathology \u0026amp; Oncology Research \u003c/em\u003e2022, \u003cstrong\u003e28\u003c/strong\u003e.\u003c/li\u003e\n\u003cli\u003eJoanito I, Wirapati P, Zhao N, Nawaz Z, Yeo G, Lee F, Eng CLP, Macalinao DC, Kahraman M, Srinivasan H\u003cem\u003e et al\u003c/em\u003e: \u003cstrong\u003eSingle-cell and bulk transcriptome sequencing identifies two epithelial tumor cell states and refines the consensus molecular classification of colorectal cancer\u003c/strong\u003e. \u003cem\u003eNature Genetics \u003c/em\u003e2022, \u003cstrong\u003e54\u003c/strong\u003e(7):963-+.\u003c/li\u003e\n\u003cli\u003eLin K, Chowdhury S, Zeineddine MA, Zeineddine FA, Hornstein NJ, Villarreal OE, Maru DM, Haymaker CL, Vauthey J-N, Chang GJ\u003cem\u003e et al\u003c/em\u003e: \u003cstrong\u003eIdentification of Colorectal Cancer Cell Stemness from Single-Cell RNA Sequencing\u003c/strong\u003e. \u003cem\u003eMolecular Cancer Research \u003c/em\u003e2024, \u003cstrong\u003e22\u003c/strong\u003e(4):337-346.\u003c/li\u003e\n\u003cli\u003eCai L, Guo X, Zhang Y, Xie H, Liu Y, Zhou J, Feng H, Zheng J, Li Y: \u003cstrong\u003eIntegrated analysis of single-cell and bulk RNA-sequencing to predict prognosis and therapeutic response for colorectal cancer\u003c/strong\u003e. \u003cem\u003eScientific Reports \u003c/em\u003e2025, \u003cstrong\u003e15\u003c/strong\u003e(1).\u003c/li\u003e\n\u003cli\u003eWolf FA, Angerer P, Theis FJ: \u003cstrong\u003eSCANPY: large-scale single-cell gene expression data analysis\u003c/strong\u003e. \u003cem\u003eGenome Biology \u003c/em\u003e2018, \u003cstrong\u003e19\u003c/strong\u003e.\u003c/li\u003e\n\u003cli\u003eGonz\u0026aacute;lez-Silva L, Quevedo L, Varela I: \u003cstrong\u003eTumor Functional Heterogeneity Unraveled by scRNA-seq Technologies\u003c/strong\u003e. \u003cem\u003eTrends in Cancer \u003c/em\u003e2020, \u003cstrong\u003e6\u003c/strong\u003e(1):13-19.\u003c/li\u003e\n\u003cli\u003eKorsunsky I, Millard N, Fan J, Slowikowski K, Zhang F, Wei K, Baglaenko Y, Brenner M, Loh PR, Raychaudhuri S: \u003cstrong\u003eFast, sensitive and accurate integration of single-cell data with Harmony\u003c/strong\u003e. \u003cem\u003eNature Methods \u003c/em\u003e2019, \u003cstrong\u003e16\u003c/strong\u003e(12):1289-+.\u003c/li\u003e\n\u003cli\u003eMinji Kang JJAA, Gunsagar S. Gulati, Rachel Gleyzer, Susanna Avagyan, Erin L. Brown, Wubing Zhang, Abul Usmani, Noah Earland, Zhenqin Wu, James Zou, Ryan C. Fields, David Y. Chen, Aadel A. Chaudhuri, Aaron M. Newman: \u003cstrong\u003eMapping single-cell developmental potential in health and disease with interpretable deep learning\u003c/strong\u003e. \u003cem\u003ebiorxiv \u003c/em\u003e2019.\u003c/li\u003e\n\u003cli\u003eAibar S, Gonz\u0026aacute;lez-Blas CB, Moerman T, Van AHT, Imrichova H, Hulselmans G, Rambow F, Marine JC, Geurts P, Aerts J\u003cem\u003e et al\u003c/em\u003e: \u003cstrong\u003eSCENIC: single-cell regulatory network inference and clustering\u003c/strong\u003e. \u003cem\u003eNature Methods \u003c/em\u003e2017, \u003cstrong\u003e14\u003c/strong\u003e(11):1083-+.\u003c/li\u003e\n\u003cli\u003eEfremova M, Vento-Tormo M, Teichmann SA, Vento-Tormo R: \u003cstrong\u003eCellPhoneDB: inferring cell-cell communication from combined expression of multi-subunit ligand-receptor complexes\u003c/strong\u003e. \u003cem\u003eNature Protocols \u003c/em\u003e2020, \u003cstrong\u003e15\u003c/strong\u003e(4):1484-1506.\u003c/li\u003e\n\u003cli\u003eNewman AM, Liu CL, Green MR, Gentles AJ, Feng WG, Xu Y, Hoang CD, Diehn M, Alizadeh AA: \u003cstrong\u003eRobust enumeration of cell subsets from tissue expression profiles\u003c/strong\u003e. \u003cem\u003eNature Methods \u003c/em\u003e2015, \u003cstrong\u003e12\u003c/strong\u003e(5):453-+.\u003c/li\u003e\n\u003cli\u003eWu TZ, Hu EQ, Xu SB, Chen MJ, Guo PF, Dai ZH, Feng TZ, Zhou L, Tang WL, Zhan L\u003cem\u003e et al\u003c/em\u003e: \u003cstrong\u003eclusterProfiler 4.0: A universal enrichment tool for interpreting omics data\u003c/strong\u003e. \u003cem\u003eInnovation \u003c/em\u003e2021, \u003cstrong\u003e2\u003c/strong\u003e(3).\u003c/li\u003e\n\u003cli\u003eMayakonda A, Lin DC, Assenov Y, Plass C, Koeffler HP: \u003cstrong\u003eMaftools: efficient and comprehensive analysis of somatic variants in cancer\u003c/strong\u003e. \u003cem\u003eGenome Research \u003c/em\u003e2018, \u003cstrong\u003e28\u003c/strong\u003e(11):1747-1756.\u003c/li\u003e\n\u003cli\u003eGeeleher P, Cox N, Huang RS: \u003cstrong\u003epRRophetic: An R Package for Prediction of Clinical Chemotherapeutic Response from Tumor Gene Expression Levels\u003c/strong\u003e. \u003cem\u003ePlos One \u003c/em\u003e2014, \u003cstrong\u003e9\u003c/strong\u003e(9).\u003c/li\u003e\n\u003cli\u003eXiao J, Yu X, Meng F, Zhang Y, Zhou W, Ren Y, Li J, Sun Y, Sun H, Chen G\u003cem\u003e et al\u003c/em\u003e: \u003cstrong\u003eIntegrating spatial and single-cell transcriptomics reveals tumor heterogeneity and intercellular networks in colorectal cancer\u003c/strong\u003e. \u003cem\u003eCell Death \u0026amp; Disease \u003c/em\u003e2024, \u003cstrong\u003e15\u003c/strong\u003e(5).\u003c/li\u003e\n\u003cli\u003eGuan J, Min S, Xia Y, Guo Z, Zhou X: \u003cstrong\u003eIdentifying colorectal cancer subtypes and establishing a prognostic model using metabolic plasticity and ferroptosis genes\u003c/strong\u003e. \u003cem\u003eScientific Reports \u003c/em\u003e2024, \u003cstrong\u003e14\u003c/strong\u003e(1).\u003c/li\u003e\n\u003cli\u003eLi Y, Gan Y, Liu J, Li J, Zhou Z, Tian R, Sun R, Liu J, Xiao Q, Li Y\u003cem\u003e et al\u003c/em\u003e: \u003cstrong\u003eDownregulation of MEIS1 mediated by ELFN1-AS1/EZH2/DNMT3a axis promotes tumorigenesis and oxaliplatin resistance in colorectal cancer\u003c/strong\u003e. \u003cem\u003eSignal Transduction and Targeted Therapy \u003c/em\u003e2022, \u003cstrong\u003e7\u003c/strong\u003e(1).\u003c/li\u003e\n\u003cli\u003eLi C, Hong S, Hu H, Liu T, Yan G, Sun D: \u003cstrong\u003eMYC-Induced Upregulation of Lncrna \u0026lt;i\u0026gt;ELFN1-AS1\u0026lt;/i\u0026gt; Contributes to Tumor Growth in Colorectal Cancer via Epigenetically Silencing TPM1\u003c/strong\u003e. \u003cem\u003eMolecular Cancer Research \u003c/em\u003e2022, \u003cstrong\u003e20\u003c/strong\u003e(11):1697-1708.\u003c/li\u003e\n\u003cli\u003eLiu P, Li W, Hu Y, Jiang Y: \u003cstrong\u003eAbsence of AIF1L contributes to cell migration and a poor prognosis of breast cancer\u003c/strong\u003e. \u003cem\u003eOncotargets and Therapy \u003c/em\u003e2018, \u003cstrong\u003e11\u003c/strong\u003e:5485-5498.\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"bmc-cancer","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"bcan","sideBox":"Learn more about [BMC Cancer](http://bmccancer.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/bcan/default.aspx","title":"BMC Cancer","twitterHandle":"BMC_series","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"Colorectal cancer, Single-cell RNA Sequencing, tumor microenvironment, AIF1L","lastPublishedDoi":"10.21203/rs.3.rs-7271791/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-7271791/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eColorectal cancer (CRC) progression and therapy resistance are driven by heterogeneous tumor cell populations and microenvironmental interactions. However, a comprehensive single-cell atlas across patients that captures this heterogeneity, and its clinical implications has been lacking. Such an atlas could reveal rare tumor subpopulations that underpin disease aggressiveness and offer new prognostic biomarkers or therapeutic targets. We integrated two high-quality single-cell RNA sequencing datasets from 29 CRC patients (approximately 70,000 cells) to construct a cellular atlas encompassing immune, stromal and epithelial compartments. Malignant epithelial cells were distinguished via inferCNV-based copy number alteration analysis and reclustered, yielding seven transcriptionally distinct malignant subpopulations. One malignant epithelial cluster, marked by high expression of the long non-coding RNA ELFN1-AS1, exhibited the highest stemness signature. From this stem-like cluster, we derived a four-gene prognostic model (RPL21, GAL, ELFN1-AS1, AIF1L). In the TCGA-COAD cohort, this model stratified patients into high-risk and low-risk groups with significantly different survival outcomes, with the high-risk group experiencing significantly worse survival. High-risk tumors were enriched for metabolic and translational pathways and displayed distinct immune and genomic features. AIF1L was identified as a hub gene within the signature, and its knockdown in CRC cells enhanced migration and invasion, functionally validating its role in tumor progression. Our study provides a high-resolution single-cell atlas of CRC and identifies a previously unrecognized stem-like tumor cell subpopulation with prognostic significance. These findings highlight novel prognostic biomarkers and suggest potential therapeutic targets (such as AIF1L) that could inform patient stratification and the development of targeted therapies for CRC.\u003c/p\u003e","manuscriptTitle":"Single-cell RNA Sequencing Defines Prognostic Subtypes and Identifies AIF1L as a Therapeutic Target in Colorectal Cancer","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-08-17 14:15:01","doi":"10.21203/rs.3.rs-7271791/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2025-09-24T06:49:39+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-09-23T07:31:57+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-09-16T06:34:58+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"222309398325010994975231696170490562612","date":"2025-09-13T14:25:59+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"65051421649825710010013041170626740989","date":"2025-09-12T22:20:52+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-09-09T13:45:09+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"216872876738553231686923622537875784877","date":"2025-08-18T09:15:20+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2025-08-08T13:29:22+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2025-08-08T13:23:24+00:00","index":"","fulltext":""},{"type":"editorInvited","content":"","date":"2025-08-06T11:42:19+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2025-08-06T08:58:58+00:00","index":"","fulltext":""},{"type":"submitted","content":"BMC Cancer","date":"2025-08-06T08:55:15+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"bmc-cancer","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"bcan","sideBox":"Learn more about [BMC Cancer](http://bmccancer.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/bcan/default.aspx","title":"BMC Cancer","twitterHandle":"BMC_series","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"ac794748-d419-47c3-afe9-b12488589ff8","owner":[],"postedDate":"August 17th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[],"tags":[],"updatedAt":"2025-12-29T16:01:20+00:00","versionOfRecord":{"articleIdentity":"rs-7271791","link":"https://doi.org/10.1186/s12885-025-15241-2","journal":{"identity":"bmc-cancer","isVorOnly":false,"title":"BMC Cancer"},"publishedOn":"2025-12-26 15:57:49","publishedOnDateReadable":"December 26th, 2025"},"versionCreatedAt":"2025-08-17 14:15:01","video":"","vorDoi":"10.1186/s12885-025-15241-2","vorDoiUrl":"https://doi.org/10.1186/s12885-025-15241-2","workflowStages":[]},"version":"v1","identity":"rs-7271791","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-7271791","identity":"rs-7271791","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.