Establishing and validating a prognostic model for colorectal cancer patients by integrating data from single-cell RNA sequencing and bulk RNA sequencing

doi:10.21203/rs.3.rs-3787497/v1

Establishing and validating a prognostic model for colorectal cancer patients by integrating data from single-cell RNA sequencing and bulk RNA sequencing

2024 · doi:10.21203/rs.3.rs-3787497/v1

preprint OA: closed CC-BY-4.0

📄 Open PDF Full text JSON View at publisher

Full text 75,818 characters · extracted from preprint-html · click to expand

Establishing and validating a prognostic model for colorectal cancer patients by integrating data from single-cell RNA sequencing and bulk RNA sequencing | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Establishing and validating a prognostic model for colorectal cancer patients by integrating data from single-cell RNA sequencing and bulk RNA sequencing Qiujin Huang, Dengwei You, Zhi Huang, Zhaohui Yin, Xuya Zhao This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-3787497/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Background Colorectal adenocarcinoma (COAD) is a major global cause of mortality. While conventional RNA sequencing (RNA-seq) has been used to study its prognostic indicators, it lacks precision in identifying cellular alterations. This study aimed to develop a predictive framework for COAD by integrating scRNA-seq with conventional RNA-seq. Methods This study acquired primary RNA sequencing data from The Cancer Genome Atlas (TCGA) database and single-cell RNA sequencing data on colorectal adenocarcinoma (COAD) from the Gene Expression Omnibus (GEO) database. The t-SNE method reduced dimensionality and identified clusters. Additionally, Weighted Gene Correlation Network Analysis (WGCNA) identified crucial modules and genes with differential expression (DEGs). Cox regression analysis was utilized to construct the prognostic model and explore mutation profiles and immune statuses across different risk groups. Results Integration of scRNA-seq data from four samples revealed 15 distinct clusters covering 8 cell types. Differential analysis identified important cell types, including B cells (Naïve and Plasma cells), Endothelial cells, Epithelial cells, Monocytes, Natural Killer (NK) cells, Smooth muscle cells, and T cells (CD8+). Subsequently, a prognostic model was built using 28 genes showing differential expression, with four DEGs displaying a significant correlation with higher risk scores, poorer survival outcomes, and increased APC mutation rates. Various prognostic and immune characteristics were observed within this context. Conclusion Integrating 10x scRNA-seq and bulk RNA-seq data, we established a prognostic framework for colorectal adenocarcinoma (COAD) in this study. Additionally, we identified two distinct groups, each displaying different prognoses and immune features. scRNA-seq prognosis prognostic model colorectal adenocarcinoma Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Figure 9 MATERIALS AND METHODS Raw Data Acquisition Using 10x scRNA-seq, we examined two COAD samples (T1 and T2) and two corresponding normal tissue samples (N1 and N2) to analyze colon adenocarcinoma. These data, sourced from the GSE161277 series, comprised 3190, 4438, 2882, and 4097 cells for each respective sample[5]. Additionally, we acquired bulk RNA sequencing data, mutation profiles, and clinicopathological details for COAD from the TCGA (The Cancer Genome Atlas) database. scRNA-Seq Data Processing and Analysis To begin with, the 10x scRNA-seq data was converted into a Seurat object using the 'Seurat' package in R software during the processing of the data. This was followed by a quality control step, where raw counts were assessed. To eliminate cells of poor quality, the calculation of the ratio between mitochondrial or ribosomal genes was performed. Lastly, to identify the top 2000 genes with the most variability, we employed the "FindVariableFeatures" function after quality control. In the processing of 10× scRNA-seq data, the fourth step involved conducting principal component analysis (PCA) on the selected 2000 genes[6]. This was followed by using t-SNE for reducing dimensions and identifying clusters. To identify important marker genes in various clusters, we used the 'Find All Markers' feature as the fifth step. This involved setting a log2 [Foldchange (FC)] threshold of 0.35 and a minimum percentage (min. pct) of 0.15. Finally, the sixth step involved the use of R software's "SingleR" package for the annotation of clusters to distinguish various cell types. Then, let us conduct a differential analysis after annotation using the same parameters. We conducted functional enrichment analysis on the key cell types identified using the "ReactomeGSA" package in R software. As part of the enrichment process, the "analyze_sc_clusters" function was used, followed by the "pathways" function to extract the results. The 'monocle' package in R was utilized for cell trajectory and pseudo-time analysis, employing the 'DDRTree' technique to reduce dimensions. We then utilized the "BEAM" statistical method to assess gene contributions during cell development, focusing on the top 200 genes for visualization purposes. To analyze cell-cell communication and visualize networks, we utilized the R software packages 'CellChat' and 'patchwork'. Differentially Expressed Genes Identification and Functional Enrichment Analysis We performed differential expression analysis using the R software package 'limma' to identify genes that were expressed differently in the TCGA cohort. The criteria set for filtering included an absolute log2 fold change (log2FC) greater than 1.2 and a false discovery rate (FDR) below 0.05. A volcano plot was then created to display the DEGs' distribution. Afterward, we employed the R software's 'clusterProfiler' package to perform analyses on the Kyoto Encyclopedia of Genes and Genomes (KEGG) and Gene Ontology (GO). The purpose of these analyses was to investigate the pathways and biological processes that are most significantly enriched and linked to the DEGs. Analysis of gene correlation networks using weights The "WGCNA" package in R was used to identify key hub genes among differentially expressed genes (DEGs) using the Weighted Gene Correlation Network Analysis (WGCNA) method. WGCNA involves a two-pronged approach: clustering gene expressions and correlating them with phenotypic traits. This analysis is primarily comprised of four stages: calculating gene correlation coefficients, identifying gene modules, constructing a co-expression network, and correlating these modules with specific traits. During the network construction, we chose a soft thresholding power of 12. The modules were then visualized in a dendrogram following clustering. To identify the most important DEGs in the development of colorectal adenocarcinoma (COAD), a heatmap of module traits was created using correlation coefficients and p-values. Finally, we focused on the overlapping genes between the marker genes and the DEGs identified in the WGCNA for in-depth analysis. Construction of the prognostic risk model The risk score for each patient was determined using the following equation: In this context, 'i' denotes the gene expression level, and beta represents the coefficient corresponding to the receptor-ligand pair derived from multivariate Cox regression analysis[7]. Patients were divided into high-risk and low-risk categories based on a predetermined threshold, specifically the middle value. Survival curves were generated using the Kaplan-Meier method for prognostic analysis. Furthermore, the log-rank test was utilized to assess the statistical significance of the noted disparities. Comparison of clinical significance and mutation patterns in high-risk and low-risk populations We then investigated the correlation between the risk score and the clinicopathological characteristics of patients within the TCGA group. Cox regression analysis was conducted using the "survcomp" package in R software to assess if the risk score serves as an independent prognostic indicator for colorectal adenocarcinoma (COAD) patients. At the same time, the forest plot package in R was utilized to generate forest plots that depict the findings from both univariate and multivariate Cox regression analyses. Furthermore, to examine the distinct genetic mutation patterns in the groups at high and low risk, we utilized the 'oncoplot' feature in the 'maftools' package of the R software to create waterfall plots. Comparison of immune cell infiltration and immune function status between high and Low-Risk Groups Afterward, the cibersort technique was employed to assess the levels of immune cell infiltration and the activity in pathways related to the immune system. The Wilcoxon rank-sum test was used to evaluate the statistical disparities between the high-risk and low-risk groups. Furthermore, the R 'ESTIMATE' package was employed to evaluate the level of immune infiltration in colorectal cancer patients categorized into high and low-risk cohorts. For visualization, the "ggplot2" package in R was employed. INTRODUCTION Colorectal cancer (CRC), a leading malignancy globally, originates from the epithelial cells of the colon or rectum. Benign polyps often develop into invasive cancers, with the development of colorectal cancer (CRC) being influenced by a combination of genetic, environmental, and lifestyle factors[ 1 ]. Risk factors that are acknowledged include family background, eating patterns, age, as well as specific hereditary conditions such as Lynch syndrome and familial adenomatous polyposis[ 2 ]. The pathogenesis of CRC typically follows a progression from normal mucosa to adenomatous polyps, leading eventually to carcinoma. This progression is often paralleled by a series of genetic and epigenetic alterations, including mutations in the APC, KRAS, and TP53 genes. The understanding of these molecular pathways has been instrumental in the development of targeted therapies and personalized treatment approaches. Diagnostically, CRC screening is crucial and includes methods such as fecal occult blood tests, colonoscopy, and imaging techniques. Early detection greatly improves prognosis, emphasizing the importance of regular screening, especially in high-risk groups. Therapeutically, CRC treatment is stage-dependent and can involve a combination of surgical resection, chemotherapy, radiotherapy, and increasingly, targeted biological therapies and immunotherapies. Surgical removal of the tumor is the primary treatment modality for localized disease. In contrast, advanced stages often require a multimodal approach, including systemic therapies. The molecular comprehension of CRC has made significant progress lately, resulting in the creation of targeted medications. These drugs, including EGFR and VEGF inhibitors, have demonstrated potential in enhancing outcomes, especially in cases of metastasis. Immunotherapy, especially checkpoint inhibitors, has also emerged as an effective treatment option in certain subtypes of CRC, notably those with high microsatellite instability[ 3 ]. The advent of single-cell analysis combined with transcriptomics represents a groundbreaking advancement in cancer research, offering unprecedented insights into the complex biology of tumors. This integrative approach provides a detailed landscape of tumor heterogeneity, enabling researchers to dissect the intricate cellular composition of tumors and the dynamic interactions within the tumor microenvironment. Single-cell transcriptomics allows for the characterization of individual cells within a tumor, revealing variations in gene expression that are obscured in bulk analyses. This granularity is crucial in cancer, where heterogeneity among cancer cells can drive disease progression, metastasis, and treatment resistance. By profiling individual cells, researchers can identify distinct subpopulations, including rare cancer stem cells or immune cells, which play pivotal roles in cancer biology but are often undetected in bulk analyses. Furthermore, the use of single-cell analysis helps to decipher the intricacies of the tumor microenvironment, encompassing diverse cell populations including immune cells, fibroblasts, and endothelial cells. Understanding the interactions among these cells and the cancer cells is essential for comprehending tumor progression and response to therapies. For instance, the immune landscape within a tumor can predict response to immunotherapies, and single-cell approaches can identify immunosuppressive mechanisms employed by tumors[ 4 ]. The future of colorectal cancer (CRC) research is poised for transformation through the application of single-cell analysis. This innovative technique promises to unravel the complex heterogeneity of CRC, providing a more nuanced understanding of the disease at the cellular level. By dissecting the intricate interactions between diverse cell types within the tumor microenvironment, single-cell analysis will offer insights into the mechanisms driving CRC progression, metastasis, and treatment resistance. RESULTS scRNA-Seq and Cell Typing of Normal and Colon adenocarcinoma Samples The single-cell data were derived from two patients with colorectal cancer and two normal individuals, totaling 14,607 cells sourced from the GSE161277 dataset. Post-quality control, the results of these cells are detailed in Figure S1. After conducting principal component analysis (PCA) and t-SNE analysis, we successfully classified the cells into 15 separate clusters. The "SingleR" package was then used to annotate these defined cell clusters. The subgroups of cells that were identified consisted of B cells (Naïve and Plasma cells), Endothelial cells, Epithelial cells, Monocytes, Natural Killer (NK) cells, Smooth muscle cells, and T cells (CD8+). Functional enrichment analysis using ReactomeGSA[ 8 ] revealed that the cell types identified are primarily engaged in processes such as the binding of Ficolins to repetitive carbohydrate structures on the surface of target cells, COX reactions, and serotonin metabolism. We analyzed the cell path and estimated time for the three cell types that were significantly identified using the 'monocle' software (Fig. 1). Additionally, the probability of cell-cell communication was calculated to examine the cell-cell communication network[ 9 ]. The deduction of this network was extended by utilizing particular pathways and interactions between ligands and receptors. Our analysis highlighted the pivotal role of the COLLAGEN signaling pathway in the cell-cell communication network (Fig. 2). Detection of Genes with Altered Expression in Bulk RNA-Seq Data Differential expression analysis yielded 8,800 genes as differentially expressed genes (DEGs), with 5,012 being up-regulated and 3,788 down-regulated (Fig. 3). The analysis of Gene Ontology (GO) revealed a notable enrichment of these differentially expressed genes (DEGs) in biological processes such as the formation of ribonucleoprotein complexes and the development of ribosomes. Based on the analysis from the Kyoto Encyclopedia of Genes and Genomes (KEGG), the enrichment was mainly observed in pathways associated with the cell cycle and infection caused by the Human T-cell leukemia virus 1. Following that, an analysis called Weighted Gene Correlation Network Analysis (WGCNA) was performed to identify differentially expressed genes (DEGs) linked to the growth and advancement of colorectal adenocarcinoma (COAD). During the co-expression network's construction, a soft thresholding power of 12 was noted. The pink module emerged as significantly correlated with COAD development, as evidenced by its correlation coefficient and p-value. Finally, a group of 28 genes, representing both marker genes and genes from the WGCNA module, were selected for the creation of an expression matrix to facilitate further investigation (Fig. 4). Prognostic Model Construction Initially, we used univariate Cox regression analysis to identify 12 genes with potential prognostic value in our analysis of colorectal adenocarcinoma (COAD) within the TCGA cohort. Subsequently, through LASSO regression analysis, we refined this list to four key DEGs for our final risk model[ 10 ]. CD177, CLCA4, CLDN23, and SMPDL3A were identified as independent prognostic DEGs using multivariate Cox analysis. The risk score formula used was: risk score = (expression level of CD177 * -0.176) + (CLCA4 * -0.142) + (CLDN23 * -0.592) + (SMPDL3A * -0.804). Afterward, the patients were categorized into groups of high and low risk, depending on the risk score value that represented the middle value. The analysis of survival showed that individuals in the high-risk category experienced a lower overall survival (OS) in comparison to those in the low-risk category (Fig. 5). Comparison of clinical significance and mutation patterns in high-risk and low-risk populations Following this, our inquiry concentrated on the connection between risk scores and clinicopathological (Fig. 6) characteristics, uncovering a link between elevated risk scores and T-Stage condition. The additional examination was performed using both univariate and multivariate Cox regression to evaluate whether risk type could predict prognosis independently in COAD patients, compared to other standard clinicopathological factors. Our findings indicated that risk type indeed acts as an independent prognostic indicator for these patients (Fig. 7). Afterward, a cascade diagram was generated to analyze the particular genetic mutation patterns in the high- and low-risk categories. In both groups, the genes APC, KRAS, and NEB exhibited the highest frequencies of mutation. Significantly, the high-risk group exhibited a greater frequency of mutations in the APC gene in contrast to the low-risk group (Fig. 8). Comparing the immune function of high-risk and low-risk groups We used the 'estimate' package in R to evaluate patients with colorectal cancer and categorize them into high- and low-risk groups in our predictive model. The high-risk group exhibited a significantly lower immune score in comparison to the low-risk group. Furthermore, we evaluated the presence of 21 different immune cell types in these patients using Cibersort. According to our research, the high-risk group exhibited reduced levels of four types of immune cells, namely Plasma cells, M2 Macrophages, Activated Dendritic cells, and Resting Mast cells, in comparison to the low-risk group[ 11 ]. In contrast, the high-risk group showed higher levels of expression in Resting NK cells and M0 Macrophages compared to the low-risk group (Fig. 9). Discussion It's crucial to emphasize the significant implications this approach has for both tumor treatment and research. Firstly, single-cell analysis in COAD allows for an unprecedented resolution in understanding tumor heterogeneity. Colorectal tumors, like many cancers, are not uniform; they consist of a diverse array of cell types, each with unique genetic expressions. By dissecting these variations at a single-cell level, we gain a deeper insight into the molecular mechanisms driving tumor progression and resistance to therapy. This is particularly valuable in identifying subpopulations of cancer cells that might be resistant to current treatments or are more aggressive, thereby guiding more effective therapeutic strategies. Secondly, the integration of differential gene expression analysis enhances our ability to identify key driver genes and pathways in COAD. By correlating specific gene expression patterns with clinical outcomes, such as patient survival or response to treatment, we can pinpoint potential therapeutic targets. For example, discovering overexpressed genes in malignant tumor cells can result in the creation of focused treatments targeting these particular genes. Moreover, the use of single-cell sequencing in COAD can also aid in the identification of new biomarkers for early detection and prognosis. As we better understand the gene expression profiles of early-stage tumors and their evolution into more advanced stages, we can develop diagnostic tools that detect COAD at an earlier, more treatable stage. Additionally, the analysis of the tumor microenvironment at a single-cell level provides insights into the complex interactions between cancer cells and immune cells. Understanding these interactions is key to developing and optimizing immunotherapies for COAD. For instance, identifying immune evasion mechanisms employed by tumor cells can guide the development of immune checkpoint inhibitors or other immunomodulatory treatments. In summary, the integration of differential gene expression with single-cell analysis in COAD is a powerful approach that promises to revolutionize our understanding of the disease. It holds the potential to identify novel therapeutic targets, enhance the efficacy of existing treatments, contribute to the development of personalized medicine, and ultimately improve patient outcomes in colorectal cancer. Prognostic models play a crucial role in predicting patient outcomes and guiding clinical decision-making. By integrating various clinical and molecular data, including genetic, epigenetic, and expression profiles, these models can predict disease progression, patient survival, and response to therapy with greater accuracy. This is especially critical in cancers where the heterogeneity of the tumor and its microenvironment significantly influence the disease course and treatment response. With personalized prognostic models, clinicians can tailor treatment plans to individual patients, potentially improving outcomes and minimizing unnecessary treatments. In terms of immune infiltration, its study has revolutionized the field of oncology. The tumor microenvironment, comprising various immune cells, influences tumor progression, metastasis, and response to therapy. By analyzing the types and states of immune cells within tumors, researchers can gain insights into the tumor's behavior and its interaction with the host's immune system. This understanding is crucial for the development of effective immunotherapies, such as checkpoint inhibitors, which have shown remarkable success in treating certain types of cancer. Furthermore, the study of immune infiltration helps in identifying new therapeutic targets and understanding mechanisms of drug resistance. For example, tumors that evolve mechanisms to evade immune detection might be resistant to certain therapies, prompting the need for alternative strategies. In conclusion, the development of sophisticated prognostic models and the in-depth study of immune infiltration are indispensable in the realm of tumor research. These approaches facilitate a more nuanced understanding of cancer biology, aid in the creation of more effective treatment strategies, and are integral to the future of personalized oncology. Conclusion By merging information from 10× scRNA-seq and bulk RNA-seq, a predictive framework was constructed to forecast the progression of colorectal adenocarcinoma (COAD) in this study. Within this study, we identified two distinct patient cohorts, each displaying different prognoses and immune characteristics. Notably, a higher risk score, indicative of worse survival outcomes, was also linked to a higher incidence of APC mutations. The potential of this model to function as a valuable biomarker for categorizing the risk of COAD patients and forecasting their reaction to treatments is significant. Future prospective studies are essential to confirm and expand upon these initial findings. Declarations Data availability The datasets generated during and/or analyzed during the current study are available from the corresponding author upon reasonable request. Funding This work was supported by science and technology fund project of Guizhou Provincial Health Commission NO. gzwkj2021-179 Ethical approval Not applicable. Consent to participate Not applicable. Consent for publication Not applicable. Conflict of interest The authors declare no conflicts of interest. Author information Department of anus and intestine surgery, Guizhou Medical University, Guiyang 550002, P.R. China; Qiujin Huang Department of Interventional Radiology, the Affiliated Hospital of Guizhou Medical University, Guiyang 550002, P.R. China Dengwei You Department of Interventional Radiology, the Affiliated Hospital of Guizhou Medical University, Guiyang 550002, P.R. China Zhi Huang Department of anus and intestine surgery，the GuiZhou Moutai Hospital, Zunyi 5190100, P.R. China Zhaohui Yin Department of Interventional Radiology, Affiliated Cancer Hospital of Guizhou Medical University, Guiyang; 550002, P.R. China Xuya Zhao Contributions QJH designs the study, implements the code, analyzes the results and writes the paper. DWY and ZH implements the code and analyze the results. All authors read and approved the final version. ZHY writes the paper. XYZ designs the study, analyzes the results and writes the paper. All authors read and approved the final manuscripts. References Li J, Ma X, Chakravarti D, Shalapour S, DePinho RA. Genetic and biological hallmarks of colorectal cancer. Genes Dev. 2021;35(11–12):787–820. 10.1101/gad.348226.120 . Thanikachalam K, Khan G. Colorectal Cancer and Nutrition. Nutrients. 2019;11(1):164. 10.3390/nu11010164 . Zhou J, Ji Q, Li Q. Resistance to anti-EGFR therapies in metastatic colorectal cancer: underlying mechanisms and reversal strategies. J Exp Clin Cancer Res. 2021;40(1):328. 10.1186/s13046-021-02130-2 . Xie YH, Chen YX, Fang JY. Comprehensive review of targeted therapy for colorectal cancer. Signal Transduct Target Ther. 2020;5(1):22. 10.1038/s41392-020-0116-z . Jiang A, Wang J, Liu N, Zheng X, Li Y, Ma Y, et al. Integration of Single-Cell RNA Sequencing and Bulk RNA Sequencing Data to Establish and Validate a Prognostic Model for Patients With Lung Adenocarcinoma. Front Genet. 2022;13:833797. 10.3389/fgene.2022.833797 . Chi H, Zhao S, Yang J, Gao X, Peng G, Zhang J, et al. T-cell exhaustion signatures characterize the immune landscape and predict HCC prognosis via integrating single-cell RNA-seq and bulk RNA-sequencing. Front Immunol. 2023;14:1137025. 10.3389/fimmu.2023.1137025 . Peiffert D, Tournier-Rangeard L, Gérard JP, Lemanski C, François E, Giovannini M, et al. Induction chemotherapy and dose intensification of the radiation boost in locally advanced anal canal carcinoma: final analysis of the randomized UNICANCER ACCORD 03 trial. J Clin Oncol. 2012;30(16):1941–8. 10.1200/JCO.2011.35.4837 . Zhang J, Peng G, Chi H, Yang J, Xie X, Song G, et al. CD8 + T-cell marker genes reveal different immune subtypes of oral lichen planus by integrating single-cell RNA-seq and bulk RNA-sequencing. BMC Oral Health. 2023;23(1):464. 10.1186/s12903-023-03138-0 . Zhang J, Liu X, Huang Z, Wu C, Zhang F, Han A, et al. T cell-related prognostic risk model and tumor immune environment modulation in lung adenocarcinoma based on single-cell and bulk RNA sequencing. Comput Biol Med. 2023;152:106460. 10.1016/j.compbiomed.2022.106460 . Zhu K, Yan A, Zhou F, Zhao S, Ning J, Yao L, et al. A Pyroptosis-Related Signature Predicts Overall Survival and Immunotherapy Responses in Lung Adenocarcinoma. Front Genet. 2022;13:891301. 10.3389/fgene.2022.891301 . Xiaoqin Z, Zhouqi L, Huan P, Xinyi F, Bin S, Jiming W, et al. Development of a prognostic signature for immune-associated genes in bladder cancer and exploring potential drug findings. Int Urol Nephrol. 2023. 10.1007/s11255-023-03796-7 . Additional Declarations No competing interests reported. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-3787497","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":263968991,"identity":"6b645785-6808-4e91-8273-7a6278203898","order_by":0,"name":"Qiujin Huang","email":"","orcid":"","institution":"Guizhou Medical University","correspondingAuthor":false,"prefix":"","firstName":"Qiujin","middleName":"","lastName":"Huang","suffix":""},{"id":263968992,"identity":"b488cc18-dadf-49f8-a636-bf062b7bb384","order_by":1,"name":"Dengwei You","email":"","orcid":"","institution":"the Affiliated Hospital of Guizhou Medical University","correspondingAuthor":false,"prefix":"","firstName":"Dengwei","middleName":"","lastName":"You","suffix":""},{"id":263968993,"identity":"b137ad91-bb94-4ac7-8c29-0ebec4c467d2","order_by":2,"name":"Zhi Huang","email":"","orcid":"","institution":"the Affiliated Hospital of Guizhou Medical University","correspondingAuthor":false,"prefix":"","firstName":"Zhi","middleName":"","lastName":"Huang","suffix":""},{"id":263968994,"identity":"322c896a-0ddd-4ff4-87ab-c81112beb5c7","order_by":3,"name":"Zhaohui Yin","email":"","orcid":"","institution":"the Guizhou Moutai Hospital","correspondingAuthor":false,"prefix":"","firstName":"Zhaohui","middleName":"","lastName":"Yin","suffix":""},{"id":263968995,"identity":"d7fd84a1-21e3-4531-9f12-fc85038eee83","order_by":4,"name":"Xuya Zhao","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAzElEQVRIiWNgGAWjYDACCShtf7yB4QBpWhjOHCBZy40E4jQwyM9ufviYt+2wPOPM5w8PF9QwyPOLEbCMcc4xY8OZbYcNm6VzDA7POMZgOHM2AeuYJRLMJD62HWZsk85hOMzDxpBgcJuAFjaJ9G8SiW2H7Xskjz84zPOPCC08EjlgWxJnSDAYHOZtI0KLhEROseGMc+nJG3iAfuHtkyDsF/kZ6Rsf85RZ225gP/74M883G3l+aQJaoKAZbitRykGgjmiVo2AUjIJRMAIBAGYwQiZpoKOZAAAAAElFTkSuQmCC","orcid":"","institution":"the Affiliated Hospital of Guizhou Medical University","correspondingAuthor":true,"prefix":"","firstName":"Xuya","middleName":"","lastName":"Zhao","suffix":""}],"badges":[],"createdAt":"2023-12-21 14:32:44","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-3787497/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-3787497/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":49081838,"identity":"1deb9fb2-d624-4d8c-bbbc-f3edfc2e9da4","added_by":"auto","created_at":"2024-01-02 20:09:50","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":608295,"visible":true,"origin":"","legend":"\u003cp\u003e(A-D) Annotation of distinct clusters and identification of cell types in colorectal adenocarcinoma (COAD) using 10x Genomics single-cell RNA sequencing (scRNA-seq) data.\u003c/p\u003e","description":"","filename":"Figure1.png","url":"https://assets-eu.researchsquare.com/files/rs-3787497/v1/a15a0fc78422c87e8b1a82e2.png"},{"id":49081831,"identity":"1e8e92b8-ca15-49a8-8e25-ea88ebd0182d","added_by":"auto","created_at":"2024-01-02 20:09:50","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":347679,"visible":true,"origin":"","legend":"\u003cp\u003e(A) Pseudo temporal analysis. (B) Functional enrichment analysis for the identified hub cell types using the “ReactomeGSA” package.\u003c/p\u003e","description":"","filename":"Figure2.png","url":"https://assets-eu.researchsquare.com/files/rs-3787497/v1/0cd307bd7b381b7eeb70294a.png"},{"id":49083318,"identity":"346f74e9-a874-4235-a82c-025999fc2995","added_by":"auto","created_at":"2024-01-02 20:25:50","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":516918,"visible":true,"origin":"","legend":"\u003cp\u003eAnalysis of intercellular communication networks and identification of differentially expressed genes in the TCGA cohort. (A-B) Cell-cell communication network. (C) DEGs in TCGA cohort. (D-E) GO and KEGG enrichment analysis of the identified DEGs.\u003c/p\u003e","description":"","filename":"Figure3.png","url":"https://assets-eu.researchsquare.com/files/rs-3787497/v1/353a3dfdd2c2b7dc28af1ec4.png"},{"id":49081832,"identity":"9b469345-cbd1-46c1-bb03-34f75a774c47","added_by":"auto","created_at":"2024-01-02 20:09:50","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":389883,"visible":true,"origin":"","legend":"\u003cp\u003eIdentification of hub DEGs that participate in COAD development through WGCNA.\u003c/p\u003e","description":"","filename":"Figure4.png","url":"https://assets-eu.researchsquare.com/files/rs-3787497/v1/02ef95f152b365d42129649e.png"},{"id":49083119,"identity":"b070d7f8-93da-49d6-8f86-b6e4bd8ff665","added_by":"auto","created_at":"2024-01-02 20:17:50","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":337394,"visible":true,"origin":"","legend":"\u003cp\u003ePrognostic model establishment for patients with COAD.\u003c/p\u003e","description":"","filename":"Figure5.png","url":"https://assets-eu.researchsquare.com/files/rs-3787497/v1/bd9e77d96ad8c037299fa6a0.png"},{"id":49081834,"identity":"541d3822-54a1-4090-8945-5adb220358eb","added_by":"auto","created_at":"2024-01-02 20:09:50","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":591867,"visible":true,"origin":"","legend":"\u003cp\u003eThe association between risk score and prevalent clinicopathological features in colorectal adenocarcinoma (COAD).\u003c/p\u003e","description":"","filename":"Figure6.png","url":"https://assets-eu.researchsquare.com/files/rs-3787497/v1/8446d57b11979beca49d27f4.png"},{"id":49081839,"identity":"e9aa7d92-9d5d-4606-8589-8984177f394a","added_by":"auto","created_at":"2024-01-02 20:09:50","extension":"png","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":366718,"visible":true,"origin":"","legend":"\u003cp\u003eWaterfall charts provide a summary of the gene mutation profiles in groups with high and low-risk.\u003c/p\u003e","description":"","filename":"Figure7.png","url":"https://assets-eu.researchsquare.com/files/rs-3787497/v1/0f6e4a42808601f3952787d4.png"},{"id":49083117,"identity":"4a5f2ad4-cefa-41f1-a9ea-42bac3efb11e","added_by":"auto","created_at":"2024-01-02 20:17:50","extension":"png","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":466982,"visible":true,"origin":"","legend":"\u003cp\u003eThe correlation between risk score and its implications in clinical practice.\u003c/p\u003e","description":"","filename":"Figure8.png","url":"https://assets-eu.researchsquare.com/files/rs-3787497/v1/abca971dacb0331130f0bcc5.png"},{"id":49081840,"identity":"4fdb2f28-b006-44b5-9914-8106f21a30b7","added_by":"auto","created_at":"2024-01-02 20:09:50","extension":"png","order_by":9,"title":"Figure 9","display":"","copyAsset":false,"role":"figure","size":213293,"visible":true,"origin":"","legend":"\u003cp\u003eEstimation of immune cell infiltration scores in low- and high-risk groups using Cibersort and estimate analysis.\u003c/p\u003e","description":"","filename":"Figure9.png","url":"https://assets-eu.researchsquare.com/files/rs-3787497/v1/1a8456648c5e444e8a8a81fc.png"},{"id":49124503,"identity":"5b0a4214-65f7-43c7-a0b5-20c92cfd49c7","added_by":"auto","created_at":"2024-01-03 14:37:26","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":2610333,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-3787497/v1/7ad295be-0409-4156-b966-89d036aa2af7.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Establishing and validating a prognostic model for colorectal cancer patients by integrating data from single-cell RNA sequencing and bulk RNA sequencing","fulltext":[{"header":"MATERIALS AND METHODS","content":"\u003cp\u003e\u003cstrong\u003eRaw Data Acquisition\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eUsing 10x scRNA-seq, we examined two COAD samples (T1 and T2) and two corresponding normal tissue samples (N1 and N2) to analyze colon adenocarcinoma. These data, sourced from the GSE161277 series, comprised 3190, 4438, 2882, and 4097 cells for each respective sample[5]. Additionally, we acquired bulk RNA sequencing data, mutation profiles, and clinicopathological details for COAD from the TCGA (The Cancer Genome Atlas) database.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003escRNA-Seq Data Processing and Analysis\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTo begin with, the 10x scRNA-seq data was converted into a Seurat object using the \u0026apos;Seurat\u0026apos; package in R software during the processing of the data. This was followed by a quality control step, where raw counts were assessed. To eliminate cells of poor quality, the calculation of the ratio between mitochondrial or ribosomal genes was performed. Lastly, to identify the top 2000 genes with the most variability, we employed the \u0026quot;FindVariableFeatures\u0026quot; function after quality control. In the processing of 10\u0026times; scRNA-seq data, the fourth step involved conducting principal component analysis (PCA) on the selected 2000 genes[6]. This was followed by using t-SNE for reducing dimensions and identifying clusters. To identify important marker genes in various clusters, we used the \u0026apos;Find All Markers\u0026apos; feature as the fifth step. \u0026nbsp;This involved setting a log2 [Foldchange (FC)] threshold of 0.35 and a minimum percentage (min. pct) of 0.15. Finally, the sixth step involved the use of R software\u0026apos;s \u0026quot;SingleR\u0026quot; package for the annotation of clusters to distinguish various cell types. Then, let us conduct a differential analysis after annotation using the same parameters. We conducted functional enrichment analysis on the key cell types identified using the \u0026quot;ReactomeGSA\u0026quot; package in R software. As part of the enrichment process, the \u0026quot;analyze_sc_clusters\u0026quot; function was used, followed by the \u0026quot;pathways\u0026quot; function to extract the results. The \u0026apos;monocle\u0026apos; package in R was utilized for cell trajectory and pseudo-time analysis, employing the \u0026apos;DDRTree\u0026apos; technique to reduce dimensions. We then utilized the \u0026quot;BEAM\u0026quot; statistical method to assess gene contributions during cell development, focusing on the top 200 genes for visualization purposes. To analyze cell-cell communication and visualize networks, we utilized the R software packages \u0026apos;CellChat\u0026apos; and \u0026apos;patchwork\u0026apos;.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eDifferentially Expressed Genes Identification and Functional Enrichment\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAnalysis\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe performed differential expression analysis using the R software package \u0026apos;limma\u0026apos; to identify genes that were expressed differently in the TCGA cohort. The criteria set for filtering included an absolute log2 fold change (log2FC) greater than 1.2 and a false discovery rate (FDR) below 0.05. A volcano plot was then created to display the DEGs\u0026apos; distribution. Afterward, we employed the R software\u0026apos;s \u0026apos;clusterProfiler\u0026apos; package to perform analyses on the Kyoto Encyclopedia of Genes and Genomes (KEGG) and Gene Ontology (GO). The purpose of these analyses was to investigate the pathways and biological processes that are most significantly enriched and linked to the DEGs.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAnalysis of gene correlation networks using weights\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe \u0026quot;WGCNA\u0026quot; package in R was used to identify key hub genes among differentially expressed genes (DEGs) using the Weighted Gene Correlation Network Analysis (WGCNA) method. WGCNA involves a two-pronged approach: clustering gene expressions and correlating them with phenotypic traits. This analysis is primarily comprised of four stages: calculating gene correlation coefficients, identifying gene modules, constructing a co-expression network, and correlating these modules with specific traits. During the network construction, we chose a soft thresholding power of 12. The modules were then visualized in a dendrogram following clustering. To identify the most important DEGs in the development of colorectal adenocarcinoma (COAD), a heatmap of module traits was created using correlation coefficients and p-values. Finally, we focused on the overlapping genes between the marker genes and the DEGs identified in the WGCNA for in-depth analysis.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConstruction of\u0026nbsp;the\u0026nbsp;prognostic risk model\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe risk score for each patient was determined using the following equation:\u003c/p\u003e\n\u003cp\u003e\u003cimg src=\"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAKoAAAAhCAYAAABJATCZAAAFC0lEQVR4Ae2X21EkOxBExwVswAV8wARswAW++MUDPMACLMABHMADfJiNQ+zZza2W5rVw6dlbipiQulSPVFa2GjbbHs3AGTCwOQOMDbEZ2LZQWwRnwUAL9Sza1CBbqK2Bs2CghXoWbWqQLdTWwFkw0EI9izY1yBbqF2rg8fFxu9ls9v5eXl6+EMW/kfpooT49PU2Jv76+XrDy9vb24U/cvkFT9/ll42nww8PDlhprHO/v79urq6vtxcXF9vX1dQHx9vb2g5uZUIm/ubn5xTf8jvIsEv+DhqOFKgeXl5dbiHYo4JFY9dk1K8BdQq01qUXcWoXKeREWGBEswqsDEc+EyvkQKnGKFg4cxJF7Fq/fIbP9sw8521OwYs8XL+M4Kz/9MgfnyJeMnOY9BN+nCZViFAbcqYPYmVBtShUlL0u1nVr/q+L8EyBf7H21FHg29/n5+Q9+5eQzhAqekXjoh4LiZeFF8ZkYuE/hYuMrlzoAH3H4OUa13BvNJ6uq3m4kp3gCHBXcZdslVAhh/5hm76r1X+/JzexFrHgUYdoRfN6oubdrjcDv7u52uXzsHSIecZGTwU3p2gJVqNi1nfpSfZpQEdBIaB4sAUK2vik8bRzMfDNb5pMgZgVNXMbqo505a7OPTUJZe1MrsoxJP+yjnzWZuY24Uertkz659hbWhsDhLW9YeZxxQQ5+7IMXQdUzm5+5ChURjnIjemqDaZRPbjK3NjglDr6od+j4K6Fmc0ZFOaQ+HhjA3iocMuPwdY+1QsnDeEjzpg9r7FmLZ0fu6SvR5qMBOcBnPuz4cYZTBkI5NN7GUgcMCBzR5ZDfxJf7rOELwRHPWXlhZoM68uA8yu2fAHA1ypfYqUUOfLPXrPN5hkn77y5qOXCmsE2mYG2waSqZ9RD6MdtE5n2D2pVMbGKq8dStxCh6xU4+XxTiFbN1nGueWmv2TO7ZP1U1pvIEj8Tm+Sq3NQfP5CEOzKxHwjIOnzzb7EYlB37wAYY6xC5faiVr11o1R33er4ga8fPZ4m4DKknUPiLTAzCnMGZ2ciEaCMihkKzL4V2nH2vs2QRsFVvFw/7sBazNSOyuEwOfbHLlpzv369r8afdGNkfFn76s66cf/xk/+B8qHj795Gbmpk4BkmeEvWI7tJZxnyZUSUvhUUQ7cx00jqa6p1DIwdqbjjjWVWjYU4C5rrXYI2cOsVnH+vr4Iriv/diZRnLW+k/HrjyjZotXvurzLB91T/1nqvJOLvvAuRAqN2+OEfbcZ/1tQqW4AJkdlUz2Usw0UOJTKFVYiga7Q5v5rJX12cOurySTg1yZL+tbA38w5siYtM/W5EhM6Tf7vOIPHv8uRRTUTSyeV/4y7ynrkXioKWfc5AgTLh1eKvYAu9jrTWsM86hW7tf1n1dM3R08CwwS/SVwiNR+f3//a40NQjkEh9eHZ0WUtvThUPhQu9ZPgoBr88xFHket4161gylHnom8+B86uMmsU2NoPPlGQrPZCBVx4Ieo/eyTS1zs/+2ovMofM/yLlec8D3//6gtWxYwtX6rEl7V8EXN/tD5aqKMkbRszkA2xmaN5l1DHmf9/1hbqF/acm2gkzGproe5vQgt1P0ff4uGn/1uKr7BoC3WFTWlISwZaqEtO2rJCBlqoK2xKQ1oy0EJdctKWFTLQQl1hUxrSkoEW6pKTtqyQgRbqCpvSkJYMtFCXnLRlhQy0UFfYlIa0ZKCFuuSkLStk4Afh2vKkJZ95vAAAAABJRU5ErkJggg==\"\u003e\u003cbr\u003e\u003c/p\u003e\n\u003cp\u003eIn this context, \u0026apos;i\u0026apos; denotes the gene expression level, and beta represents the coefficient corresponding to the receptor-ligand pair derived from multivariate Cox regression analysis[7]. Patients were divided into high-risk and low-risk categories based on a predetermined threshold, specifically the middle value. Survival curves were generated using the Kaplan-Meier method for prognostic analysis. Furthermore, the log-rank test was utilized to assess the statistical significance of the noted disparities.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eComparison of clinical significance and mutation patterns in high-risk and low-risk populations\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe then investigated the correlation between the risk score and the clinicopathological characteristics of patients within the TCGA group. Cox regression analysis was conducted using the \u0026quot;survcomp\u0026quot; package in R software to assess if the risk score serves as an independent prognostic indicator for colorectal adenocarcinoma (COAD) patients. At the same time, the forest plot package in R was utilized to generate forest plots that depict the findings from both univariate and multivariate Cox regression analyses. Furthermore, to examine the distinct genetic mutation patterns in the groups at high and low risk, we utilized the \u0026apos;oncoplot\u0026apos; feature in the \u0026apos;maftools\u0026apos; package of the R software to create waterfall plots.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eComparison of immune cell infiltration and immune function status between high and Low-Risk Groups\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eAfterward, the cibersort technique was employed to assess the levels of immune cell infiltration and the activity in pathways related to the immune system. The Wilcoxon rank-sum test was used to evaluate the statistical disparities between the high-risk and low-risk groups. Furthermore, the R \u0026apos;ESTIMATE\u0026apos; package was employed to evaluate the level of immune infiltration in colorectal cancer patients categorized into high and low-risk cohorts. For visualization, the \u0026quot;ggplot2\u0026quot; package in R was employed.\u003c/p\u003e"},{"header":"INTRODUCTION","content":"\u003cp\u003eColorectal cancer (CRC), a leading malignancy globally, originates from the epithelial cells of the colon or rectum. Benign polyps often develop into invasive cancers, with the development of colorectal cancer (CRC) being influenced by a combination of genetic, environmental, and lifestyle factors[\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e]. Risk factors that are acknowledged include family background, eating patterns, age, as well as specific hereditary conditions such as Lynch syndrome and familial adenomatous polyposis[\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e]. The pathogenesis of CRC typically follows a progression from normal mucosa to adenomatous polyps, leading eventually to carcinoma. This progression is often paralleled by a series of genetic and epigenetic alterations, including mutations in the APC, KRAS, and TP53 genes. The understanding of these molecular pathways has been instrumental in the development of targeted therapies and personalized treatment approaches.\u003c/p\u003e \u003cp\u003eDiagnostically, CRC screening is crucial and includes methods such as fecal occult blood tests, colonoscopy, and imaging techniques. Early detection greatly improves prognosis, emphasizing the importance of regular screening, especially in high-risk groups.\u003c/p\u003e \u003cp\u003eTherapeutically, CRC treatment is stage-dependent and can involve a combination of surgical resection, chemotherapy, radiotherapy, and increasingly, targeted biological therapies and immunotherapies. Surgical removal of the tumor is the primary treatment modality for localized disease. In contrast, advanced stages often require a multimodal approach, including systemic therapies. The molecular comprehension of CRC has made significant progress lately, resulting in the creation of targeted medications. These drugs, including EGFR and VEGF inhibitors, have demonstrated potential in enhancing outcomes, especially in cases of metastasis. Immunotherapy, especially checkpoint inhibitors, has also emerged as an effective treatment option in certain subtypes of CRC, notably those with high microsatellite instability[\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e]. The advent of single-cell analysis combined with transcriptomics represents a groundbreaking advancement in cancer research, offering unprecedented insights into the complex biology of tumors. This integrative approach provides a detailed landscape of tumor heterogeneity, enabling researchers to dissect the intricate cellular composition of tumors and the dynamic interactions within the tumor microenvironment.\u003c/p\u003e \u003cp\u003eSingle-cell transcriptomics allows for the characterization of individual cells within a tumor, revealing variations in gene expression that are obscured in bulk analyses. This granularity is crucial in cancer, where heterogeneity among cancer cells can drive disease progression, metastasis, and treatment resistance. By profiling individual cells, researchers can identify distinct subpopulations, including rare cancer stem cells or immune cells, which play pivotal roles in cancer biology but are often undetected in bulk analyses. Furthermore, the use of single-cell analysis helps to decipher the intricacies of the tumor microenvironment, encompassing diverse cell populations including immune cells, fibroblasts, and endothelial cells. Understanding the interactions among these cells and the cancer cells is essential for comprehending tumor progression and response to therapies. For instance, the immune landscape within a tumor can predict response to immunotherapies, and single-cell approaches can identify immunosuppressive mechanisms employed by tumors[\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eThe future of colorectal cancer (CRC) research is poised for transformation through the application of single-cell analysis. This innovative technique promises to unravel the complex heterogeneity of CRC, providing a more nuanced understanding of the disease at the cellular level. By dissecting the intricate interactions between diverse cell types within the tumor microenvironment, single-cell analysis will offer insights into the mechanisms driving CRC progression, metastasis, and treatment resistance.\u003c/p\u003e"},{"header":"RESULTS","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003escRNA-Seq and Cell Typing of Normal and Colon adenocarcinoma Samples\u003c/h2\u003e \u003cp\u003eThe single-cell data were derived from two patients with colorectal cancer and two normal individuals, totaling 14,607 cells sourced from the GSE161277 dataset. Post-quality control, the results of these cells are detailed in Figure S1. After conducting principal component analysis (PCA) and t-SNE analysis, we successfully classified the cells into 15 separate clusters. The \"SingleR\" package was then used to annotate these defined cell clusters. The subgroups of cells that were identified consisted of B cells (Na\u0026iuml;ve and Plasma cells), Endothelial cells, Epithelial cells, Monocytes, Natural Killer (NK) cells, Smooth muscle cells, and T cells (CD8+). Functional enrichment analysis using ReactomeGSA[\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e] revealed that the cell types identified are primarily engaged in processes such as the binding of Ficolins to repetitive carbohydrate structures on the surface of target cells, COX reactions, and serotonin metabolism. We analyzed the cell path and estimated time for the three cell types that were significantly identified using the 'monocle' software (Fig.\u0026nbsp;1). Additionally, the probability of cell-cell communication was calculated to examine the cell-cell communication network[\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e]. The deduction of this network was extended by utilizing particular pathways and interactions between ligands and receptors. Our analysis highlighted the pivotal role of the COLLAGEN signaling pathway in the cell-cell communication network (Fig.\u0026nbsp;2).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec4\" class=\"Section2\"\u003e \u003ch2\u003eDetection of Genes with Altered Expression in Bulk RNA-Seq Data\u003c/h2\u003e \u003cp\u003eDifferential expression analysis yielded 8,800 genes as differentially expressed genes (DEGs), with 5,012 being up-regulated and 3,788 down-regulated (Fig.\u0026nbsp;3). The analysis of Gene Ontology (GO) revealed a notable enrichment of these differentially expressed genes (DEGs) in biological processes such as the formation of ribonucleoprotein complexes and the development of ribosomes. Based on the analysis from the Kyoto Encyclopedia of Genes and Genomes (KEGG), the enrichment was mainly observed in pathways associated with the cell cycle and infection caused by the Human T-cell leukemia virus 1. Following that, an analysis called Weighted Gene Correlation Network Analysis (WGCNA) was performed to identify differentially expressed genes (DEGs) linked to the growth and advancement of colorectal adenocarcinoma (COAD). During the co-expression network's construction, a soft thresholding power of 12 was noted. The pink module emerged as significantly correlated with COAD development, as evidenced by its correlation coefficient and p-value. Finally, a group of 28 genes, representing both marker genes and genes from the WGCNA module, were selected for the creation of an expression matrix to facilitate further investigation (Fig.\u0026nbsp;4).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec5\" class=\"Section2\"\u003e \u003ch2\u003ePrognostic Model Construction\u003c/h2\u003e \u003cp\u003eInitially, we used univariate Cox regression analysis to identify 12 genes with potential prognostic value in our analysis of colorectal adenocarcinoma (COAD) within the TCGA cohort. Subsequently, through LASSO regression analysis, we refined this list to four key DEGs for our final risk model[\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e]. CD177, CLCA4, CLDN23, and SMPDL3A were identified as independent prognostic DEGs using multivariate Cox analysis. The risk score formula used was: risk score = (expression level of CD177 * -0.176) + (CLCA4 * -0.142) + (CLDN23 * -0.592) + (SMPDL3A * -0.804). Afterward, the patients were categorized into groups of high and low risk, depending on the risk score value that represented the middle value. The analysis of survival showed that individuals in the high-risk category experienced a lower overall survival (OS) in comparison to those in the low-risk category (Fig.\u0026nbsp;5).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec6\" class=\"Section2\"\u003e \u003ch2\u003eComparison of clinical significance and mutation patterns in high-risk and low-risk populations\u003c/h2\u003e \u003cp\u003eFollowing this, our inquiry concentrated on the connection between risk scores and clinicopathological (Fig.\u0026nbsp;6) characteristics, uncovering a link between elevated risk scores and T-Stage condition. The additional examination was performed using both univariate and multivariate Cox regression to evaluate whether risk type could predict prognosis independently in COAD patients, compared to other standard clinicopathological factors. Our findings indicated that risk type indeed acts as an independent prognostic indicator for these patients (Fig.\u0026nbsp;7). Afterward, a cascade diagram was generated to analyze the particular genetic mutation patterns in the high- and low-risk categories. In both groups, the genes APC, KRAS, and NEB exhibited the highest frequencies of mutation. Significantly, the high-risk group exhibited a greater frequency of mutations in the APC gene in contrast to the low-risk group (Fig.\u0026nbsp;8).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec7\" class=\"Section2\"\u003e \u003ch2\u003eComparing the immune function of high-risk and low-risk groups\u003c/h2\u003e \u003cp\u003eWe used the 'estimate' package in R to evaluate patients with colorectal cancer and categorize them into high- and low-risk groups in our predictive model. The high-risk group exhibited a significantly lower immune score in comparison to the low-risk group. Furthermore, we evaluated the presence of 21 different immune cell types in these patients using Cibersort. According to our research, the high-risk group exhibited reduced levels of four types of immune cells, namely Plasma cells, M2 Macrophages, Activated Dendritic cells, and Resting Mast cells, in comparison to the low-risk group[\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e]. In contrast, the high-risk group showed higher levels of expression in Resting NK cells and M0 Macrophages compared to the low-risk group (Fig.\u0026nbsp;9).\u003c/p\u003e \u003c/div\u003e"},{"header":"Discussion","content":"\u003cp\u003eIt's crucial to emphasize the significant implications this approach has for both tumor treatment and research. Firstly, single-cell analysis in COAD allows for an unprecedented resolution in understanding tumor heterogeneity. Colorectal tumors, like many cancers, are not uniform; they consist of a diverse array of cell types, each with unique genetic expressions. By dissecting these variations at a single-cell level, we gain a deeper insight into the molecular mechanisms driving tumor progression and resistance to therapy. This is particularly valuable in identifying subpopulations of cancer cells that might be resistant to current treatments or are more aggressive, thereby guiding more effective therapeutic strategies. Secondly, the integration of differential gene expression analysis enhances our ability to identify key driver genes and pathways in COAD. By correlating specific gene expression patterns with clinical outcomes, such as patient survival or response to treatment, we can pinpoint potential therapeutic targets. For example, discovering overexpressed genes in malignant tumor cells can result in the creation of focused treatments targeting these particular genes. Moreover, the use of single-cell sequencing in COAD can also aid in the identification of new biomarkers for early detection and prognosis. As we better understand the gene expression profiles of early-stage tumors and their evolution into more advanced stages, we can develop diagnostic tools that detect COAD at an earlier, more treatable stage. Additionally, the analysis of the tumor microenvironment at a single-cell level provides insights into the complex interactions between cancer cells and immune cells. Understanding these interactions is key to developing and optimizing immunotherapies for COAD. For instance, identifying immune evasion mechanisms employed by tumor cells can guide the development of immune checkpoint inhibitors or other immunomodulatory treatments. In summary, the integration of differential gene expression with single-cell analysis in COAD is a powerful approach that promises to revolutionize our understanding of the disease. It holds the potential to identify novel therapeutic targets, enhance the efficacy of existing treatments, contribute to the development of personalized medicine, and ultimately improve patient outcomes in colorectal cancer. Prognostic models play a crucial role in predicting patient outcomes and guiding clinical decision-making. By integrating various clinical and molecular data, including genetic, epigenetic, and expression profiles, these models can predict disease progression, patient survival, and response to therapy with greater accuracy. This is especially critical in cancers where the heterogeneity of the tumor and its microenvironment significantly influence the disease course and treatment response. With personalized prognostic models, clinicians can tailor treatment plans to individual patients, potentially improving outcomes and minimizing unnecessary treatments. In terms of immune infiltration, its study has revolutionized the field of oncology. The tumor microenvironment, comprising various immune cells, influences tumor progression, metastasis, and response to therapy. By analyzing the types and states of immune cells within tumors, researchers can gain insights into the tumor's behavior and its interaction with the host's immune system. This understanding is crucial for the development of effective immunotherapies, such as checkpoint inhibitors, which have shown remarkable success in treating certain types of cancer.\u003c/p\u003e \u003cp\u003eFurthermore, the study of immune infiltration helps in identifying new therapeutic targets and understanding mechanisms of drug resistance. For example, tumors that evolve mechanisms to evade immune detection might be resistant to certain therapies, prompting the need for alternative strategies.\u003c/p\u003e \u003cp\u003eIn conclusion, the development of sophisticated prognostic models and the in-depth study of immune infiltration are indispensable in the realm of tumor research. These approaches facilitate a more nuanced understanding of cancer biology, aid in the creation of more effective treatment strategies, and are integral to the future of personalized oncology.\u003c/p\u003e"},{"header":"Conclusion","content":"\u003cp\u003eBy merging information from 10\u0026times; scRNA-seq and bulk RNA-seq, a predictive framework was constructed to forecast the progression of colorectal adenocarcinoma (COAD) in this study. Within this study, we identified two distinct patient cohorts, each displaying different prognoses and immune characteristics. Notably, a higher risk score, indicative of worse survival outcomes, was also linked to a higher incidence of APC mutations. The potential of this model to function as a valuable biomarker for categorizing the risk of COAD patients and forecasting their reaction to treatments is significant. Future prospective studies are essential to confirm and expand upon these initial findings.\u003c/p\u003e "},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eData availability\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe datasets generated during and/or analyzed during the current study are available from the corresponding author upon reasonable request.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis work was supported by science and technology fund project of Guizhou Provincial Health Commission NO. gzwkj2021-179\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eEthical approval\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNot applicable.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConsent to participate\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNot applicable.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConsent for publication\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNot applicable.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConflict of interest\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors declare no conflicts of interest.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthor information\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eDepartment of anus and intestine surgery, Guizhou Medical University, Guiyang 550002, P.R. China;\u003c/p\u003e\n\u003cp\u003eQiujin Huang\u003c/p\u003e\n\u003cp\u003eDepartment of Interventional Radiology, the Affiliated Hospital of Guizhou Medical University, Guiyang 550002, P.R. China\u003c/p\u003e\n\u003cp\u003eDengwei You\u003c/p\u003e\n\u003cp\u003eDepartment of Interventional Radiology, the Affiliated Hospital of Guizhou Medical University, Guiyang 550002, P.R. China\u003c/p\u003e\n\u003cp\u003eZhi Huang\u003c/p\u003e\n\u003cp\u003eDepartment of anus and intestine surgery，the GuiZhou Moutai Hospital, Zunyi 5190100, P.R. China\u003c/p\u003e\n\u003cp\u003eZhaohui Yin\u003c/p\u003e\n\u003cp\u003eDepartment of Interventional Radiology, Affiliated Cancer Hospital of Guizhou\u003c/p\u003e\n\u003cp\u003eMedical University, Guiyang; 550002, P.R. China\u003c/p\u003e\n\u003cp\u003eXuya Zhao\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eContributions\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eQJH designs the study, implements the code, analyzes the results and writes the paper. DWY and ZH implements the code and analyze the results. All authors read and approved the final version. ZHY writes the paper. XYZ designs the study, analyzes the results and writes the paper. All authors read and approved the final manuscripts.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eLi J, Ma X, Chakravarti D, Shalapour S, DePinho RA. Genetic and biological hallmarks of colorectal cancer. Genes Dev. 2021;35(11\u0026ndash;12):787\u0026ndash;820. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1101/gad.348226.120\u003c/span\u003e\u003cspan address=\"10.1101/gad.348226.120\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eThanikachalam K, Khan G. Colorectal Cancer and Nutrition. Nutrients. 2019;11(1):164. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.3390/nu11010164\u003c/span\u003e\u003cspan address=\"10.3390/nu11010164\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhou J, Ji Q, Li Q. Resistance to anti-EGFR therapies in metastatic colorectal cancer: underlying mechanisms and reversal strategies. J Exp Clin Cancer Res. 2021;40(1):328. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1186/s13046-021-02130-2\u003c/span\u003e\u003cspan address=\"10.1186/s13046-021-02130-2\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eXie YH, Chen YX, Fang JY. Comprehensive review of targeted therapy for colorectal cancer. Signal Transduct Target Ther. 2020;5(1):22. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1038/s41392-020-0116-z\u003c/span\u003e\u003cspan address=\"10.1038/s41392-020-0116-z\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJiang A, Wang J, Liu N, Zheng X, Li Y, Ma Y, et al. Integration of Single-Cell RNA Sequencing and Bulk RNA Sequencing Data to Establish and Validate a Prognostic Model for Patients With Lung Adenocarcinoma. Front Genet. 2022;13:833797. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.3389/fgene.2022.833797\u003c/span\u003e\u003cspan address=\"10.3389/fgene.2022.833797\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChi H, Zhao S, Yang J, Gao X, Peng G, Zhang J, et al. T-cell exhaustion signatures characterize the immune landscape and predict HCC prognosis via integrating single-cell RNA-seq and bulk RNA-sequencing. Front Immunol. 2023;14:1137025. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.3389/fimmu.2023.1137025\u003c/span\u003e\u003cspan address=\"10.3389/fimmu.2023.1137025\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePeiffert D, Tournier-Rangeard L, G\u0026eacute;rard JP, Lemanski C, Fran\u0026ccedil;ois E, Giovannini M, et al. Induction chemotherapy and dose intensification of the radiation boost in locally advanced anal canal carcinoma: final analysis of the randomized UNICANCER ACCORD 03 trial. J Clin Oncol. 2012;30(16):1941\u0026ndash;8. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1200/JCO.2011.35.4837\u003c/span\u003e\u003cspan address=\"10.1200/JCO.2011.35.4837\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang J, Peng G, Chi H, Yang J, Xie X, Song G, et al. CD8\u0026thinsp;+\u0026thinsp;T-cell marker genes reveal different immune subtypes of oral lichen planus by integrating single-cell RNA-seq and bulk RNA-sequencing. BMC Oral Health. 2023;23(1):464. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1186/s12903-023-03138-0\u003c/span\u003e\u003cspan address=\"10.1186/s12903-023-03138-0\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang J, Liu X, Huang Z, Wu C, Zhang F, Han A, et al. T cell-related prognostic risk model and tumor immune environment modulation in lung adenocarcinoma based on single-cell and bulk RNA sequencing. Comput Biol Med. 2023;152:106460. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.compbiomed.2022.106460\u003c/span\u003e\u003cspan address=\"10.1016/j.compbiomed.2022.106460\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhu K, Yan A, Zhou F, Zhao S, Ning J, Yao L, et al. A Pyroptosis-Related Signature Predicts Overall Survival and Immunotherapy Responses in Lung Adenocarcinoma. Front Genet. 2022;13:891301. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.3389/fgene.2022.891301\u003c/span\u003e\u003cspan address=\"10.3389/fgene.2022.891301\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eXiaoqin Z, Zhouqi L, Huan P, Xinyi F, Bin S, Jiming W, et al. Development of a prognostic signature for immune-associated genes in bladder cancer and exploring potential drug findings. Int Urol Nephrol. 2023. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1007/s11255-023-03796-7\u003c/span\u003e\u003cspan address=\"10.1007/s11255-023-03796-7\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"scRNA-seq, prognosis, prognostic model, colorectal adenocarcinoma","lastPublishedDoi":"10.21203/rs.3.rs-3787497/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-3787497/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003e\u003cstrong\u003eBackground\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eColorectal adenocarcinoma (COAD) is a major global cause of mortality. While conventional RNA sequencing (RNA-seq) has been used to study its prognostic indicators, it lacks precision in identifying cellular alterations. This study aimed to develop a predictive framework for COAD by integrating scRNA-seq with conventional RNA-seq.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eMethods\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis study acquired primary RNA sequencing data from The Cancer Genome Atlas (TCGA) database and single-cell RNA sequencing data on colorectal adenocarcinoma (COAD) from the Gene Expression Omnibus (GEO) database. The t-SNE method reduced dimensionality and identified clusters. Additionally, Weighted Gene Correlation Network Analysis (WGCNA) identified crucial modules and genes with differential expression (DEGs). Cox regression analysis was utilized to construct the prognostic model and explore mutation profiles and immune statuses across different risk groups.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eResults\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eIntegration of scRNA-seq data from four samples revealed 15 distinct clusters covering 8 cell types. Differential analysis identified important cell types, including B cells (Naïve and Plasma cells), Endothelial cells, Epithelial cells, Monocytes, Natural Killer (NK) cells, Smooth muscle cells, and T cells (CD8+). Subsequently, a prognostic model was built using 28 genes showing differential expression, with four DEGs displaying a significant correlation with higher risk scores, poorer survival outcomes, and increased APC mutation rates. Various prognostic and immune characteristics were observed within this context.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConclusion\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eIntegrating 10x scRNA-seq and bulk RNA-seq data, we established a prognostic framework for colorectal adenocarcinoma (COAD) in this study. Additionally, we identified two distinct groups, each displaying different prognoses and immune features.\u003c/p\u003e","manuscriptTitle":"Establishing and validating a prognostic model for colorectal cancer patients by integrating data from single-cell RNA sequencing and bulk RNA sequencing","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-01-02 20:09:45","doi":"10.21203/rs.3.rs-3787497/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"bc1d21b0-9d06-4c4b-8180-6a312fe83eac","owner":[],"postedDate":"January 2nd, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2024-01-03T14:29:18+00:00","versionOfRecord":[],"versionCreatedAt":"2024-01-02 20:09:45","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-3787497","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-3787497","identity":"rs-3787497","version":["v1"]},"buildId":"qtupq5eGEP_6zYnWcrvyt","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2024) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00
unpaywall: last seen: 2026-05-23T02:00:01.238055+00:00

License: CC-BY-4.0