Age-Stratified Analysis of Therapeutic, Immune, and Glycosylation Gene Expression in Colorectal Cancer Using Machine Learning | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Age-Stratified Analysis of Therapeutic, Immune, and Glycosylation Gene Expression in Colorectal Cancer Using Machine Learning Hakan Celik, Banu Bansal, Jappreet Singh Gill, Jacqueline Kim Correa, and 4 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-6805885/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 13 Dec, 2025 Read the published version in Scientific Reports → Version 1 posted 10 You are reading this latest preprint version Abstract Colorectal cancer (CRC) is a major global health issue, yet current treatment strategies rarely consider patient age differences, leading to variable therapeutic efficacy and clinical outcomes. Although numerous biomarkers for CRC have been identified, their age-specific expression profiles and biological implications remain poorly understood, limiting the potential for age-tailored interventions. This study aimed to address this gap by identifying age-stratified gene expression patterns using Random Forest–based feature selection on the GSE44076 microarray dataset. We analyzed gene expression profiles from younger (<65 years) and older (≥65 years) CRC patient cohorts, focusing on three functional gene categories relevant to CRC biology (Therapeutic, Immune, and Glycosylation). Using Random Forest-based classification and feature selection, we identified minimal yet highly predictive gene signatures within each functional category. The performance of these signatures was rigorously evaluated via cross-validation and permutation testing, demonstrating robust predictive accuracy. Full models utilizing the top 10 genes from each category achieved exceptionally high cross-validation accuracy ranging from 97.2% to 98.6%. Even minimal models restricted to the top three predictive genes retained substantial classification power (85.2%–100%). Comparative analysis with Gradient Boosting Machines (GBM) and Support Vector Machines (SVM) classifiers affirmed the superiority and interpretability of Random Forest in discerning biologically meaningful gene interactions. Volcano plot analyses reinforced the significance of individual gene expression differences across age groups but highlighted Random Forest's unique ability to identify complex multi-gene interactions, particularly within the Therapeutic and Glycosylation gene categories. Glycosylation genes showed pronounced age-dependent expression changes, suggesting a role for glycosylation modifications in CRC pathogenesis and therapeutic responsiveness. Our study validates the hypothesis that carefully selected minimal gene sets can reliably differentiate CRC tissue types across age groups, uncovering age-related biological alterations with potential diagnostic and therapeutic implications. These findings underscore the critical need for further validation in independent patient cohorts and detailed functional studies to translate these age-specific biomarkers into clinical practice, enhancing personalized treatment strategies for colorectal cancer patients. Biological sciences/Cancer/Gastrointestinal cancer Biological sciences/Cancer/Tumour biomarkers colorectal cancer age-stratified analysis machine learning therapeutic genes immune genes glycosylation Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Full Text Additional Declarations No competing interests reported. Tables are available in the Supplementary Files section. Supplementary Files CeliketalTabels.pdf Cite Share Download PDF Status: Published Journal Publication published 13 Dec, 2025 Read the published version in Scientific Reports → Version 1 posted Editorial decision: Revision requested 15 Jul, 2025 Reviews received at journal 13 Jul, 2025 Reviews received at journal 05 Jul, 2025 Reviewers agreed at journal 24 Jun, 2025 Reviewers agreed at journal 19 Jun, 2025 Reviewers invited by journal 19 Jun, 2025 Editor invited by journal 05 Jun, 2025 Editor assigned by journal 04 Jun, 2025 Submission checks completed at journal 03 Jun, 2025 First submitted to journal 02 Jun, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-6805885","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":474450683,"identity":"e9511200-0893-47fd-8372-b5bb970fbfa5","order_by":0,"name":"Hakan Celik","email":"","orcid":"","institution":"University of North Dakota","correspondingAuthor":false,"prefix":"","firstName":"Hakan","middleName":"","lastName":"Celik","suffix":""},{"id":474450684,"identity":"bc8cb7ea-fca7-4916-a3ce-6b4234267657","order_by":1,"name":"Banu Bansal","email":"","orcid":"","institution":"University of North Dakota","correspondingAuthor":false,"prefix":"","firstName":"Banu","middleName":"","lastName":"Bansal","suffix":""},{"id":474450685,"identity":"a1e30975-f304-4e30-8fa1-3f45dd300509","order_by":2,"name":"Jappreet Singh Gill","email":"","orcid":"","institution":"University of North Dakota","correspondingAuthor":false,"prefix":"","firstName":"Jappreet","middleName":"Singh","lastName":"Gill","suffix":""},{"id":474450686,"identity":"2593d0ff-03f3-41ab-b1fb-9896dc1790a6","order_by":3,"name":"Jacqueline Kim Correa","email":"","orcid":"","institution":"University of North Dakota","correspondingAuthor":false,"prefix":"","firstName":"Jacqueline","middleName":"Kim","lastName":"Correa","suffix":""},{"id":474450687,"identity":"43fe3922-b39d-4cff-b6b4-c447526cd1e7","order_by":4,"name":"Kristian Herman","email":"","orcid":"","institution":"University of North Dakota","correspondingAuthor":false,"prefix":"","firstName":"Kristian","middleName":"","lastName":"Herman","suffix":""},{"id":474450688,"identity":"34f50b99-67fa-4d7c-84b2-c2688ef5fe98","order_by":5,"name":"Reet Goyal","email":"","orcid":"","institution":"University of North Dakota","correspondingAuthor":false,"prefix":"","firstName":"Reet","middleName":"","lastName":"Goyal","suffix":""},{"id":474450689,"identity":"8d287635-c49e-4fe2-b069-72cc212fdbe4","order_by":6,"name":"Veysel Çelik","email":"","orcid":"","institution":"Siirt University","correspondingAuthor":false,"prefix":"","firstName":"Veysel","middleName":"","lastName":"Çelik","suffix":""},{"id":474450690,"identity":"40d7ec95-3fd7-40a5-9c5d-84bf3363252b","order_by":7,"name":"Ramkumar Mathur","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA50lEQVRIiWNgGAWjYHACZhSWHBAbMDCwEaeFsRlIGJOuJbGBkBbz9tPJxrw5dvkMYoefPy5su5e+4XbzBoYPZYdxapE5k7s5mXdbsmWDdJph88y24twNd44VMM44h1uLBEPu5sO825gNGKQTDJt52xJyN9zIMWDmbcOjhf8tSEs9UEv6R5CWdAOQlr/4tEiAHXYYqCUHbEsCWAsjXi1vNxvO3XbcgE06p3A2z7kEw5k30goO9pxLx+Ow3M0Sb7dVG/BLp2/4zFOWIM93I3njgx9l1ji1wAFKRBwgrH4UjIJRMApGAT4AABxqUb/iAJ+3AAAAAElFTkSuQmCC","orcid":"","institution":"University of North Dakota","correspondingAuthor":true,"prefix":"","firstName":"Ramkumar","middleName":"","lastName":"Mathur","suffix":""}],"badges":[],"createdAt":"2025-06-03 01:38:23","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-6805885/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-6805885/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1038/s41598-025-31499-9","type":"published","date":"2025-12-13T15:58:22+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":85391777,"identity":"c0db7b7a-e6e4-44d9-9ef1-2c7937ae782b","added_by":"auto","created_at":"2025-06-25 10:34:38","extension":"jpg","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":234956,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eFlow chart This flow chart illustrates the overall methodology.\u003c/strong\u003e After loading and verifying the GSE44076 dataset, we stratified samples by age and filtered for specific functional gene sets (Therapeutic, Immune, Glycosylation). We then transformed the data for machine learning and trained Random Forest models as the primary classifier (with SVM and GBM models run in parallel for comparison), validating performance through cross-validation and permutation testing. Finally, we selected top genes based on RF feature importance and generated volcano plots to visualize age-related differential expression.\u003c/p\u003e","description":"","filename":"Figure1.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6805885/v1/0500d2e7feaf6273f478cfc5.jpg"},{"id":85391776,"identity":"ed5cc528-f0c7-4ce9-a6cb-1065ea387762","added_by":"auto","created_at":"2025-06-25 10:34:38","extension":"jpg","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":119195,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eAge Distribution and Principal Component Analysis (PCA) of Sample Types in Gene Expression Data. \u003c/strong\u003eThis figure presents the age distribution and PCA analysis of different sample types (Tumor, Normal, and Mucosa) in the dataset. (A) Age Distribution by Sample Type: A histogram showing the distribution of ages across three sample types: Tumor (green), Normal (orange), and Mucosa (blue). Kernel density estimates are overlaid to visualize the probability density. (B) PCA: Condition by Color, Age by Dot Size: A principal component analysis (PCA) plot, where each point represents a sample. Colors indicate sample types (Mucosa: blue, Normal: orange, Tumor: green), and dot size corresponds to age, illustrating clustering patterns. (C) Age Distribution Across Sample Types (Boxplot):A boxplot showing the age distribution for each sample type, highlighting median, interquartile ranges, and outliers. (D) Overall Age Distribution (Boxplot):A boxplot summarizing the age range in the dataset, with whiskers extending to the minimum and maximum ages, and dots representing potential outliers.\u003c/p\u003e","description":"","filename":"Figure2.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6805885/v1/9b94d20d996593ad9141111b.jpg"},{"id":85391036,"identity":"c22952ae-a6a5-405f-84cf-b03e797781d7","added_by":"auto","created_at":"2025-06-25 10:26:38","extension":"jpg","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":156578,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eTop 10 Differentially Expressed Genes (Tumor) Across Therapeutic, Glycosylation, and Immune Categories by Age Group. \u003c/strong\u003eThis figure presents heatmaps showing the expression levels (Tumor) of the top 10 differentially expressed genes across different functional categories: Therapeutic, Glycosylation, and Immune, stratified by age groups (Young vs. Old). The values are used to highlight and visualize clear differences in gene expression between younger (\u0026lt;65) and older (≥65) colorectal cancer patients, illustrating potential age-related transcriptional variations across these critical biological pathways. (A) Top 10 Differentially Expressed Genes (Therapeutic (Tumor)). (B) Top 10 Differentially Expressed Genes (Glycosylation (Tumor)). (C) Top 10 Differentially Expressed Genes (Immune (Tumor)).\u003c/p\u003e","description":"","filename":"Figure3.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6805885/v1/2d150d36532de989de55c9ef.jpg"},{"id":85391043,"identity":"a16dd1dc-f1aa-45bd-a1bb-45faee8c71a9","added_by":"auto","created_at":"2025-06-25 10:26:38","extension":"jpg","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":237704,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003ePerformance Evaluation and Feature Importance Analysis for Therapeutic Young (\u0026lt;65) and Old (≥65) Classification. \u003c/strong\u003eOverview of the Random Forest analysis for the Therapeutic gene category in younger individuals (\u0026lt;65). (A, B) the bar plot lists the 10 most important features in the full model, while (C, D) volcano plot shows log₂ fold change vs. −log₁₀ p-value for all genes (with the top 10 in red). (E, F) The learning curve for the top-3 gene subset demonstrates robust convergence between training and validation scores, while (I, J) the permutation distribution for the top-ten-gene subset confirms that the observed cross-validation accuracies (CV = 0.986 for young and CV = 0.972 for old) exceed chance levels. Confusion matrices (G for young full top-10 gene, K for young top-3 gene, H for old full top-10 gene, L for old top-3 gene) reveal strong classification performance across tumor, normal, and mucosa samples.\u003c/p\u003e","description":"","filename":"Figure4.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6805885/v1/bc6581081f86291122cecc65.jpg"},{"id":85391045,"identity":"46d0929e-0ea9-4559-996e-ba88d58c5633","added_by":"auto","created_at":"2025-06-25 10:26:38","extension":"jpg","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":229846,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003ePerformance Evaluation and Feature Importance Analysis for Immune Young (\u0026lt;65) and Old (≥65) Classification. \u003c/strong\u003eOverview of the Random Forest analysis for the Immune gene category in younger individuals (\u0026lt;65). (A, B) the bar plot lists the 10 most important features in the full model, while (C, D) volcano plot shows log₂ fold change vs. −log₁₀ p-value for all genes (with the top 10 in red). (E, F) The learning curve for the top-3 gene subset demonstrates robust convergence between training and validation scores, while (I, J) the permutation distribution for the top-ten-gene subset confirms that the observed cross-validation accuracies (CV = 0.986 for young and CV = 0.972 for old) exceed chance levels. Confusion matrices (G for young full top-10 gene, K for youngtop-3 gene, H for old full top-10 gene, L for old top-3 gene) reveal strong classification performance across tumor, normal, and mucosa samples.\u003c/p\u003e","description":"","filename":"Figure5.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6805885/v1/341e5e71d0c7329094c8aa21.jpg"},{"id":85391042,"identity":"1955f00d-ddae-4fb2-9030-2971fe7ad6e9","added_by":"auto","created_at":"2025-06-25 10:26:38","extension":"jpg","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":235753,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003ePerformance Evaluation and Feature Importance Analysis for Glycosylation Young (\u0026lt;65) and Old (≥65) Classification. \u003c/strong\u003eOverview of the Random Forest analysis for the Glycosylation gene category in younger individuals (\u0026lt;65). (A, B) the bar plot lists the 10 most important features in the full model, while (C, D) volcano plot shows log₂ fold change vs. −log₁₀ p-value for all genes (with the top 10 in red). (E, F) The learning curve for the top-3 gene subset demonstrates robust convergence between training and validation scores, while (I, J) the permutation distribution for the top-ten-gene subset confirms that the observed cross-validation accuracies (CV = 0.986 for young and CV = 0.977 for old) exceed chance levels. Confusion matrices (G for young full top-10 gene, K for youngtop-3 gene, H for old full top-10 gene, L for old top-3 gene) reveal strong classification performance across tumor, normal, and mucosa samples.\u003c/p\u003e","description":"","filename":"Figure6.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6805885/v1/978437478032671a4ce890ef.jpg"},{"id":85392691,"identity":"67586f54-1b62-4dae-b3e2-0dc63f996933","added_by":"auto","created_at":"2025-06-25 10:42:38","extension":"jpg","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":226789,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eComparison of Classifier Performance Across Gene Categories and Age Groups. \u003c/strong\u003eBar plots display the top 10 predictive genes selected by Random Forest (RF), Gradient Boosting Machine (GBM), and Support Vector Machine (SVM) classifiers based on feature importance rankings in each gene category and age group. Models were trained separately for younger (\u0026lt;65 years) and older (≥65 years) cohorts, using the top 10 genes derived from each classifier. Panels are grouped by classifier and gene category: (A, B) Therapeutic genes in young and old patients (GBM); (C, D) Immune genes in young and old patients (GBM); (E, F) Glycosylation genes in young and old patients (GBM); (G, H) Therapeutic genes in young and old patients (SVM); (I, J) Immune genes in young and old patients (SVM); (K, L) Glycosylation genes in young and old patients (SVM). The x-axis indicates the normalized feature importance score, and the y-axis lists gene symbols. Feature importance values reflect the relative contribution of each gene to the classifier’s prediction performance, highlighting variability in gene prioritization across classifiers and age-defined CRC subgroups.\u003c/p\u003e","description":"","filename":"Figure7.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6805885/v1/bd776204ef721b86772a130d.jpg"},{"id":85392693,"identity":"37f261df-65da-480e-836f-722fac027985","added_by":"auto","created_at":"2025-06-25 10:42:39","extension":"jpg","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":160257,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eCircular bar plots showing the top enriched pathways for Therapeutic (A), Immune (B), and Glycosylation (C) based on differentially expressed genes from the tumor group only. Each bar represents a pathway, with bar height corresponding to -log\u003c/strong\u003e\u003csub\u003e\u003cstrong\u003e10\u003c/strong\u003e\u003c/sub\u003e\u003cstrong\u003e (p) (significance), and pathway names placed around the circle. The gene symbols that overlap with each enriched pathway are shown on the outer ring, rotated for clarity.\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"Figure8.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6805885/v1/0d8313bfcb9da54ac71151f2.jpg"},{"id":98244974,"identity":"2a578e0b-f505-409f-84c4-8cfcd1b9e107","added_by":"auto","created_at":"2025-12-15 16:16:12","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":2547519,"visible":true,"origin":"","legend":"","description":"","filename":"CeliketalMainTextFile31MAY2025.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6805885/v1_covered_5e0e4547-437d-4ba9-9bee-400241caae0a.pdf"},{"id":85391034,"identity":"2208fece-99cd-461c-b175-9592e88844f3","added_by":"auto","created_at":"2025-06-25 10:26:38","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"supplement","size":145101,"visible":true,"origin":"","legend":"","description":"","filename":"CeliketalTabels.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6805885/v1/d56f3d98e3412ce4c946c240.pdf"}],"financialInterests":"\u003cp\u003eNo competing interests reported.\u003c/p\u003e\n\u003cp\u003eTables are available in the Supplementary Files section.\u003c/p\u003e","formattedTitle":"Age-Stratified Analysis of Therapeutic, Immune, and Glycosylation Gene Expression in Colorectal Cancer Using Machine Learning","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"colorectal cancer, age-stratified analysis, machine learning, therapeutic genes, immune genes, glycosylation","lastPublishedDoi":"10.21203/rs.3.rs-6805885/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-6805885/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"Colorectal cancer (CRC) is a major global health issue, yet current treatment strategies rarely consider patient age differences, leading to variable therapeutic efficacy and clinical outcomes. Although numerous biomarkers for CRC have been identified, their age-specific expression profiles and biological implications remain poorly understood, limiting the potential for age-tailored interventions. This study aimed to address this gap by identifying age-stratified gene expression patterns using Random Forest–based feature selection on the GSE44076 microarray dataset. We analyzed gene expression profiles from younger (\u003c65 years) and older (≥65 years) CRC patient cohorts, focusing on three functional gene categories relevant to CRC biology (Therapeutic, Immune, and Glycosylation). Using Random Forest-based classification and feature selection, we identified minimal yet highly predictive gene signatures within each functional category. The performance of these signatures was rigorously evaluated via cross-validation and permutation testing, demonstrating robust predictive accuracy. Full models utilizing the top 10 genes from each category achieved exceptionally high cross-validation accuracy ranging from 97.2% to 98.6%. Even minimal models restricted to the top three predictive genes retained substantial classification power (85.2%–100%). Comparative analysis with Gradient Boosting Machines (GBM) and Support Vector Machines (SVM) classifiers affirmed the superiority and interpretability of Random Forest in discerning biologically meaningful gene interactions. Volcano plot analyses reinforced the significance of individual gene expression differences across age groups but highlighted Random Forest's unique ability to identify complex multi-gene interactions, particularly within the Therapeutic and Glycosylation gene categories. Glycosylation genes showed pronounced age-dependent expression changes, suggesting a role for glycosylation modifications in CRC pathogenesis and therapeutic responsiveness. Our study validates the hypothesis that carefully selected minimal gene sets can reliably differentiate CRC tissue types across age groups, uncovering age-related biological alterations with potential diagnostic and therapeutic implications. These findings underscore the critical need for further validation in independent patient cohorts and detailed functional studies to translate these age-specific biomarkers into clinical practice, enhancing personalized treatment strategies for colorectal cancer patients.","manuscriptTitle":"Age-Stratified Analysis of Therapeutic, Immune, and Glycosylation Gene Expression in Colorectal Cancer Using Machine Learning","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-06-25 10:26:33","doi":"10.21203/rs.3.rs-6805885/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2025-07-15T05:18:23+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-07-13T13:35:03+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-07-05T16:02:43+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"330743834125524362225336811607295378468","date":"2025-06-24T14:45:45+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"282202645777380643766645001708076541790","date":"2025-06-19T09:48:00+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2025-06-19T08:45:37+00:00","index":"","fulltext":""},{"type":"editorInvited","content":"","date":"2025-06-05T15:27:08+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2025-06-04T13:32:21+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2025-06-03T13:20:23+00:00","index":"","fulltext":""},{"type":"submitted","content":"Scientific Reports","date":"2025-06-03T01:36:57+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"79f474ed-8408-473a-8ae1-73df600d1f8c","owner":[],"postedDate":"June 25th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[{"id":50386420,"name":"Biological sciences/Cancer/Gastrointestinal cancer"},{"id":50386421,"name":"Biological sciences/Cancer/Tumour biomarkers"}],"tags":[],"updatedAt":"2025-12-15T16:11:02+00:00","versionOfRecord":{"articleIdentity":"rs-6805885","link":"https://doi.org/10.1038/s41598-025-31499-9","journal":{"identity":"scientific-reports","isVorOnly":false,"title":"Scientific Reports"},"publishedOn":"2025-12-13 15:58:22","publishedOnDateReadable":"December 13th, 2025"},"versionCreatedAt":"2025-06-25 10:26:33","video":"","vorDoi":"10.1038/s41598-025-31499-9","vorDoiUrl":"https://doi.org/10.1038/s41598-025-31499-9","workflowStages":[]},"version":"v1","identity":"rs-6805885","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-6805885","identity":"rs-6805885","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.