Uncovering dark biomarkers of glioblastoma survival through regulatory-deviation–based transcriptomic modeling | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Uncovering dark biomarkers of glioblastoma survival through regulatory-deviation–based transcriptomic modeling Yangyu Ning, Zijun Qu, Zehang Xie, Shujun Wan, Weiya Pei, Xueqin Li, and 4 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-9640328/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Glioblastoma (GBM) displays extreme survival heterogeneity that remains poorly resolved by conventional gene-level transcriptomic analyses, which often fail to identify robust survival-associated biomarkers. Here, we introduce mqTrans, a regulatory deviation modeling framework that shifts the analytical focus from absolute gene-level expression abundance to the deviation between observed expression and transcription-factor-inferred regulatory expectations. Applying mqTrans to two independent GBM cohorts (CGGA and CPTAC-GBM), we systematically defined and identified “dark biomarkers”—genes that show no significant expression changes in traditional differential expression analysis yet exhibit significant and reproducible regulatory deviations linked to survival. We uncovered 19 such dark biomarkers that consistently distinguished short-term from long-term survivors across both cohorts. These biomarkers converged on metabolic pathways (cholesterol homeostasis) and post-transcriptional regulatory programs (miRNA targets) and were enriched in genomically complex loci with overlapping non-coding transcripts, explaining their invisibility in abundance-based analyses. A LASSO-derived five-gene Cox model based on these dark biomarkers provided more stable and generalizable survival stratification than models built from conventional expression features. Our findings demonstrate that regulatory-deviation modeling can reveal hidden prognostic signals in GBM, offering a complementary regulatory-layer view of the transcriptome that captures regulatory perturbations that remain obscured under gene-expression abundance–based quantification. Cancer Biology Computational Biology Glioblastoma Transcriptomics Regulatory deviation Dark biomarkers Prognostic biomarkers Precision medicine Figures Figure 1 Figure 2 Figure 3 Figure 4 Introduction Glioblastoma (GBM) is one of the most common and aggressive primary malignant brain tumors in adults, characterized by diffuse cancer cell infiltration, vigorous angiogenesis, and extreme resistance to conventional drug therapies[ 1 – 4 ]. Even with aggressive multimodal treatment—including maximal safe resection, radiation therapy, and temozolomide chemotherapy—the clinical course of glioblastoma remains highly heterogeneous.[ 2 , 5 ] Furthermore, there is a vast disparity in patient survival times: some patients experience extremely rapid disease progression, with only a few months elapsing between diagnosis and death, while others exhibit significantly slower progression, ranging up to several years in the case of a small number of long-term survivors[ 3 , 6 ]. This marked prognostic difference—from short-term mortality to long-term survival—highlights key but not yet fully elucidated molecular determinants that govern tumor aggressiveness and the host response[ 4 , 7 ]. Therefore, elucidating the molecular differences between GBM patients with short survival and those with long survival is crucial; this not only aids in refining prognostic stratification but also provides guidance for developing personalized treatment strategies[ 5 , 8 ]. However, the precise molecular mechanisms driving these starkly different clinical outcomes remain inadequately elucidated by current analytical paradigms[ 7 ]. There are currently three main clinical strategies used to assess GBM prognosis and guide treatment decisions: neuroimaging assessments (e.g., evaluation of tumor burden and progression via contrast-enhanced MRI), histopathological classification and grading based on WHO guidelines, and limited molecular biomarker testing, most notably isocitrate dehydrogenase (IDH) mutation status and O⁶-methylguanine-DNA methyltransferase (MGMT) promoter methylation[ 9 – 12 ]. Although these established methods provide a foundational framework for patient management, they have significant limitations in fully capturing the heterogeneity of GBM survival[ 10 , 13 ]. Imaging and pathological assessments are inherently macroscopic or morphological in nature and often lag behind the molecular evolution of tumor dynamics[ 11 , 14 ]. Furthermore, molecular assays for detecting MGMT methylation or IDH mutation status are typically resource-intensive, time-consuming, and susceptible to variations in experimental conditions, despite the clinical significance of these markers[ 12 , 15 ]. Crucially, these traditional markers have limited ability to resolve the complex layers of post-transcriptional and post-translational regulation, which may profoundly influence tumor cell behavior and treatment sensitivity, leaving a significant portion of survival heterogeneity unexplained[ 15 , 16 ]. The emergence and widespread adoption of high-throughput RNA sequencing (RNA-seq) have propelled computational methods to the forefront of GBM research, aiming to decipher this unresolved heterogeneity[ 17 , 18 ]. Standard analytical workflows in this field extensively employ differential expression analysis (DEG), unsupervised clustering and dimension reduction techniques (such as PCA and t-SNE), as well as survival regression models (such as univariate and multivariate Cox proportional hazards regression)[ 18 – 20 ]. However, the vast majority of these computational methods are based on a critical implicit assumption: they model biology at the gene level, treating the total abundance of all transcripts produced at a gene locus as the primary functional variable[ 20 , 21 ]. This perspective essentially posits that quantitative changes in total mRNA abundance are the primary drivers of phenotypic differentiation. However, a growing body of evidence suggests that relying solely on gene expression abundance is insufficient to fully characterize tumor-associated molecular abnormalities, and that complex transcriptional regulatory relationships and dysregulated networks may play a significant role in tumorigenesis and disease progression[ 22 – 24 ]. By merging different transcript isoforms with potentially antagonistic functions into a single expression value, gene-level analyses risk systematic signal loss, thereby obscuring critical regulatory changes that occur even when overall gene expression abundance remains largely unchanged[ 21 , 23 , 25 ]. Here, we propose a regulatory deviation modeling framework (mqTrans) built on gene-level expression, which captures discrepancies between observed expression and transcription factor–inferred regulatory expectations.Unlike traditional abundance-centric approaches, mqTrans shifts the analytical focus from the absolute abundance of gene expression to the deviation between the observed expression levels and those predicted based on upstream transcriptional regulators[ 24 – 26 ]. This framework is used to identify potential biological signals that exhibit no significant changes in overall expression levels but have already undergone abnormalities in their transcriptional regulatory states. Based on this regulatory deviation analysis framework, we systematically defined and investigated a novel class of molecular features termed “dark biomarkers.” These biomarkers are operationally defined as genes that do not exhibit significant changes in traditional gene-level differential expression analyses but show significant differences under the mqTrans regulatory deviation perspective[ 25 – 27 ]. In this study, based on the mqTrans framework, we conducted a systematic analysis of two independent GBM cohorts (CGGA and CPTAC-GBM) and obtained the following key findings. First, we demonstrated that mqTrans modeling based on regulatory deviation can identify potential survival-related signals that are difficult to detect using traditional gene expression analysis, providing a new analytical perspective for elucidating prognostic heterogeneity in GBM. Second, this study defined and identified a class of “dark biomarkers” that are difficult to detect in traditional expression analysis but exhibit significant differences from the mqTrans perspective, suggesting that important molecular signals associated with survival may be hidden at the regulatory network level, which is difficult to capture in traditional expression analysis. Third, we further explored the potential regulatory mechanisms underlying the formation of these dark biomarkers, providing a basis for understanding their biological origins. Additionally, we validated the potential clinical utility of these dark biomarkers in patient prognostic stratification. In addition to providing a new analytical perspective, this framework may have important translational implications. By capturing regulatory-deviation signals that are not detectable using conventional gene-expression approaches, mqTrans-derived biomarkers may improve patient stratification and prognostic assessment in glioblastoma. Furthermore, these dark biomarkers may represent potential therapeutic targets embedded within dysregulated regulatory networks, thereby offering new opportunities for precision medicine. Therefore, there is a critical need for analytical frameworks that can capture regulatory-layer perturbations beyond gene-expression abundance. Materials and Methods 2.1 Datasets and Patient Selection 2.1.1 CGGA cohort RNA-seq expression data and corresponding clinical annotations were obtained from the Chinese Glioma Genome Atlas (CGGA) cohort [ 28 , 29 ], which included CGGA693 and CGGA325 samples. Samples were retained if they met the following criteria: (i) primary tumor samples; (ii) histologically annotated glioblastoma, WHO grade IV; and (iii) available overall survival (OS) information. Survival groups were defined to contrast the extremes of the survival spectrum. Short survival was defined as OS ≤ 360 days, and long survival was defined as OS ≥ 720 days. Samples with intermediate survival, defined as OS between 361 and 719 days, were excluded from the main analyses. After filtering, the CGGA cohort included 149 primary GBM samples and was used as the discovery cohort. 2.1.2 CPTAC‑GBM cohort An independent validation cohort was assembled from the CPTAC-GBM dataset [ 30 ]. The same sample selection criteria were applied, including primary GBM status, WHO grade IV annotation, and complete OS information. The same survival cutoffs were used to define the short-survival and long-survival groups, with OS ≤ 360 days classified as short survival and OS ≥ 720 days classified as long survival. Samples with intermediate survival were excluded. After filtering, the CPTAC-GBM cohort included 63 GBM samples and served as the independent validation cohort. 2.1.3 Expression matrix preprocessing Gene-level TPM expression matrices from the CGGA and CPTAC-GBM cohorts were used for conventional expression-based downstream analyses. After sample filtering, the CGGA matrix contained 47,886 gene features across 149 samples, whereas the CPTAC-GBM matrix contained 10,959 gene features across 63 samples. Expression values were transformed as log2(TPM + 1) prior to modeling[ 16 , 31 ]. To ensure cross-cohort comparability, downstream analyses were restricted to the 10,628 overlapping genes shared by the two cohorts. After missing or invalid value processing, 10,309 genes remained in the CGGA cohort and 9,269 genes remained in the CPTAC-GBM cohort. Batch effects were corrected before differential analysis, and available covariates were incorporated into the linear modeling framework[ 32 , 33 ]. 2.2 The mqTrans View of Transcriptomes 2.2.1 Construction of the TF-based prediction model For each target gene, a transcription factor-based regression model was established to estimate its expected expression from the expression profiles of its candidate upstream regulators. Candidate transcription factors were obtained from the TRRUST regulatory interaction database [ 34 ], and their normalized expression values were used as model predictors. For a given target gene \(\:j\) , the expected expression was modeled as: $$\:\begin{array}{c}{\widehat{mRNA}}_{j}={\beta\:}_{0}+\sum\:_{k}{\beta\:}_{k}{TF}_{k}+ϵ\#\left(1\right)\end{array}$$ where \(\:{\widehat{mRNA}}_{j}\) denotes the predicted expression level of gene \(\:j\) , \(\:{TF}_{k}\) represents the expression level of the \(\:k-th\) transcription factor, \(\:{\beta\:}_{k}\) is the estimated regression coefficient, and \(\:ϵ\) is the residual term. The fitted model was then applied to each sample to obtain the expected expression value for each target gene. The mqTrans feature was defined as the absolute difference between the observed and predicted expression levels: $$\:\begin{array}{c}{mqTrans}_{j}=\left|{mRNA}_{j}-{\widehat{mRNA}}_{j}\right|\#\left(2\right)\end{array}$$ This value quantifies the extent to which the observed expression of a gene deviates from the level expected from its transcription factor-based regulatory context.The models were pre-trained using transcriptomic data of healthy brain tissue samples from the UCSC Xena platform [ 5 ] (GTEx dataset, n = 1,157, tissue = brain). We selected healthy brain tissue because the core transcriptional regulatory relationships (e.g., TF→target gene) are expected to be relatively stable in non-malignant tissue. For each target gene, a linear regression model was fitted using the expression levels of its known TFs (from TRRUST) as predictors. The model parameters were estimated in the healthy samples using ordinary least squares with 5-fold cross-validation to guard against overfitting[ 35 ]. The trained models were then applied to each GBM sample to predict the ‘expected’ expression level of each target gene. This design assumes that deviations from the healthy regulatory baseline in tumor samples may reflect disease-associated dysregulation. Unlike conventional transcriptomic analysis, which mainly evaluates absolute gene expression abundance, mqTrans focuses on regulatory deviation [ 36 , 37 ]. This design allows the analysis to capture transcriptomic alterations that may not be evident from direct expression abundance alone. Thus, mqTrans provides a complementary view of transcriptome dysregulation by representing the discrepancy between observed expression and transcription factor-predicted expression. mqTrans features were generated independently for the CGGA and CPTAC-GBM cohorts using the same computational procedure. The resulting mqTrans matrices were then used for downstream differential analysis and for the identification of dark biomarkers. This framework assumes that transcriptional regulatory relationships learned from normal tissues provide a stable baseline, and that deviations from this baseline in tumor samples reflect disease-associated dysregulation. Although transcriptional regulatory relationships may be partially altered in tumor contexts, a substantial proportion of core TF–target interactions are conserved across physiological and pathological states. Therefore, deviations from a normal regulatory baseline are more likely to reflect disease-associated dysregulation rather than random variation. This design allows mqTrans to capture biologically meaningful perturbations while maintaining a stable reference framework for cross-sample comparison. 2.3 Differential Analysis To compare the mqTrans framework with conventional expression-based analysis, differential analyses were performed in parallel at two analytical levels: gene-level expression abundance and mqTrans-derived regulatory deviation. The same survival grouping strategy was used in both analyses, where samples were classified as short-survival or long-survival according to the predefined overall survival cutoffs. Differential testing was conducted separately in the CGGA and CPTAC-GBM cohorts. For conventional expression-level analysis, each gene was evaluated independently using an ordinary least squares linear regression model. The processed gene expression value was used as the response variable, and the survival group indicator was included as the primary variable of interest. The model further incorporated available demographic, clinical, and technical covariates, including gender, age, OS days, OS event status, and batch, to account for potential confounding effects. For mqTrans-level analysis, differential testing was performed on the mqTrans feature matrix. Each mqTrans feature quantified the regulatory deviation magnitude of a gene, defined as the discrepancy between its observed expression and the expression predicted from transcription factor-based regulation. Because mqTrans values represent deviation strength rather than direct expression abundance, differences between the short-survival and long-survival groups were assessed for each feature using a two-sided Mann–Whitney U test[ 38 ]. Multiple testing correction was performed separately for each cohort and each analytical level. For the gene-level expression analysis, nominal P values from the linear models were adjusted using Bonferroni correction[ 39 ], whereas P values from the mqTrans-level Mann–Whitney U tests were adjusted using the Benjamini–Hochberg procedure[ 40 ]. Features with adjusted P values below 0.05 were considered statistically significant.The threshold for minimal expression change was defined as |log2 fold change| < 0.2. Dark biomarkers were defined as genes that were significant at the mqTrans level in both the CGGA and CPTAC-GBM cohorts, but were not significant in conventional gene-level expression analysis in either cohort. Results and Discussion 3.1 Conventional gene-expression abundance provides limited survival-associated signals in GBM To evaluate whether conventional gene-level transcriptomic analysis could identify survival-associated molecular signals in glioblastoma, we first performed differential expression analysis between short-survival ( 2 years) patients in the CGGA and CPTAC-GBM cohorts based on normalized gene-level expression profiles. In both cohorts, no genes remained significant after multiple testing correction, indicating that conventional gene-level differential expression analysis failed to identify robust survival-associated biomarkers[ 30 , 41 , 42 ]. To visualize potential expression trends, volcano plots based on nominal P-values were examined (Fig. 1A-B). In the CGGA cohort, most genes were symmetrically distributed around zero fold change with only a few genes showing weak differential trends. Similarly, although a limited number of genes in the CPTAC-GBM cohort exhibited nominal significance, these signals were sparse and lacked consistency, suggesting that survival-associated expression differences at the gene level were generally weak and unstable across datasets. To further assess whether survival groups could be separated at the global transcriptomic level, principal component analysis (PCA) was performed using the top 20 genes ranked by nominal P-values in each cohort (Fig. 1C-D). In both the CGGA and CPTAC-GBM datasets, short-survival and long-survival samples showed substantial overlap in PCA space without clear clustering patterns. This result indicates that overall gene-expression patterns were insufficient to distinguish survival phenotypes, even when focusing on the most variable candidate genes[ 30 , 43 ]. Together, these findings demonstrate that conventional gene-level expression profiling provides limited discriminatory power for identifying survival-associated molecular signatures in GBM. The lack of reproducible differential expression signals across independent cohorts suggests that prognostic information may be concealed within regulatory perturbations not captured by absolute expression abundance alone, rather than being solely attributable to limited statistical power, a phenomenon frequently observed in highly heterogeneous tumors such as glioblastoma[ 44 , 45 ]. These observations motivated the application of the mqTrans framework to detect hidden survival-associated signals from a regulatory deviation perspective.We further verified that the observed trends were robust under alternative survival grouping strategies, supporting the stability of the findings. 3.2 mqTrans-based regulatory deviation modeling reveals hidden survival-associated signals To determine whether regulatory deviation–based transcriptomic modeling could reveal survival-associated signals that were undetectable at the conventional gene-expression level, we applied the mqTrans framework to the same short-survival and long-survival GBM cohorts. Differential analysis based on mqTrans features identified multiple significantly altered genes in both the CGGA and CPTAC-GBM cohorts after multiple testing correction (adjusted P < 0.05), in contrast to the absence of significant signals in conventional gene-level analysis. Volcano plots demonstrated a substantially larger number of significant features with clearer distributions of differences in deviation magnitude between the two survival groups in both cohorts (Fig. 2A-B). These findings indicate that mqTrans captures a distinct layer of transcriptomic variation by quantifying deviations between observed expression and model-inferred regulatory expectations, thereby shifting the analytical focus from abundance-centric measurements to regulation-centric representations [ 37 , 46 , 47 ]. This framework therefore represents an alternative paradigm for transcriptomic analysis that complements and extends conventional gene-expression–based approaches. To further evaluate the discriminatory ability of mqTrans-derived features, principal component analysis (PCA) was performed using the top 20 significant mqTrans biomarkers ranked by adjusted P-values. In the CGGA cohort, mqTrans features showed an evident trend toward separation between short-survival and long-survival samples, while in the CPTAC-GBM cohort the separation was more pronounced, with the two survival groups forming distinct clusters in PCA space (Fig. 2C-D).It should be noted that PCA was used here as a descriptive visualization tool rather than an independent classification method. Compared with the substantial overlap observed in gene-level PCA analysis, mqTrans-derived features demonstrated markedly improved discriminatory power for stratifying patient survival phenotypes[ 47 , 48 ]. Taken together, these results demonstrate that regulatory deviation modeling reveals a distinct layer of survival-associated molecular information in GBM that is not captured by gene-expression abundance. By quantifying deviations between observed expression and model-inferred regulatory expectations, mqTrans detects regulatory perturbations that may underlie survival heterogeneity even in the absence of overt expression changes. This increased detectability of survival-associated deviation features provided the basis for defining dark biomarkers[ 37 , 46 , 49 ]. Conceptually, mqTrans can be interpreted as a regulatory deviation–based modeling framework that captures discrepancies between observed expression and inferred regulatory expectations. 3.3 Cross-cohort identification of 19 expression-invisible dark biomarkers Building on the mqTrans-derived survival-associated signals identified above, we next sought to define a stable set of reproducible features across cohorts.To identify prognostic molecular signals that are not detectable by conventional differential expression analysis, we sought to rigorously define “dark biomarkers”. To this end, we imposed additional constraints beyond statistical non-significance at the gene-expression level. Specifically, dark biomarkers were defined as genes that (i) showed significant differences in mqTrans features (adjusted P < 0.05) in both cohorts, and (ii) exhibited minimal changes in gene-expression abundance, as indicated by minimal expression effect sizes (below a predefined threshold) and lack of consistent directional trends across cohorts.Conceptually, these dark biomarkers are analogous to hidden or latent regulatory signals described in network-based biomarker studies, which are not directly observable at the level of gene-expression abundance.Notably, the focus here is on identifying features that are reproducible across cohorts rather than further optimizing discriminatory performance. Although no genes remained significant after multiple testing correction at the gene-expression level, a subset of genes exhibited weak but variable expression changes, with some showing nominal significance (unadjusted P < 0.01) and partially consistent fold-change directions across cohorts. Based on this stringent definition, dark biomarkers were subsequently identified as genes satisfying these criteria in both cohorts, ensuring that these biomarkers represent regulatory perturbations independent of expression abundance rather than artifacts of limited statistical power. Under this framework, we identified 19 consistently detected genes across the CGGA and CPTAC-GBM cohorts. Notably, none of these genes reached adjusted significance in conventional gene-level analysis, confirming that they represent a class of regulatory-deviation signals that are not detectable by traditional abundance-focused approaches. mqTrans-based differential analysis identified 115 candidate biomarkers in the CGGA cohort and 277 in the CPTAC-GBM cohort. Cross-cohort comparison revealed 19 overlapping genes, which were consistently detected in both datasets and were therefore defined as dark biomarkers (Fig. 3A). Notably, these biomarkers were not identified through conventional gene-level differential expression analysis, indicating that they represent hidden prognostic signals independent of absolute gene expression changes. To evaluate whether these 19 dark biomarkers provided improved discriminative ability, we compared them with the top-ranked 19 genes selected by nominal P values from the raw-expression analysis. Clustering performance was assessed using the Silhouette score and Davies–Bouldin index. In both cohorts, the dark biomarker set achieved higher Silhouette scores and lower Davies–Bouldin indices than the top-ranked raw-expression genes (Fig. 3B), indicating that these features exhibit consistent clustering structure and support a more stable cross-cohort feature set. Principal component analysis (PCA) further demonstrated that the 19 dark biomarkers enabled clearer separation between long-survival and short-survival patients in both the CGGA and CPTAC-GBM cohorts (Fig. 3C–D). In contrast, the top-ranked raw-expression genes showed weaker and less stable separation patterns. These results suggest that mqTrans-derived dark biomarkers provide a consistent feature space that supports stable separation patterns across independent cohorts. Together, these findings demonstrate the existence of a class of latent regulatory biomarkers that are not detectable through conventional differential expression analysis but can robustly capture survival-associated molecular differences in glioblastoma, highlighting the importance of regulatory relationships beyond absolute expression abundance[ 50 , 51 ].Importantly, the consistency of these signals across independent cohorts argues against insufficient statistical power as the primary explanation and instead supports a regulatory-origin mechanism. Notably, several identified dark biomarkers have previously been implicated in glioblastoma biology. For example, PDPK1 is a key regulator of the PI3K/AKT signaling pathway and has been associated with tumor cell survival and therapeutic resistance.[ 52 ] SMARCA2, a chromatin remodeling factor, has been reported to influence transcriptional regulation and tumor progression[ 53 ]. SRPK2 plays a role in RNA splicing and has been linked to oncogenic processes in multiple cancers[ 54 ]. The identification of these genes further supports the biological relevance of mqTrans-derived dark biomarkers and suggests that regulatory-deviation signals may capture functionally important drivers of tumor behavior Samples are colored by survival group (blue: long-survival; red: short-survival). mqTrans-derived dark biomarkers exhibit clearer separation between survival groups compared with conventional gene-expression features, indicating consistent separation patterns across cohorts. 3.4 Dark biomarkers converge on cholesterol homeostasis and miRNA-mediated regulatory programs To investigate the biological relevance of the identified dark biomarkers, we performed functional enrichment analyses using curated pathway databases (MSigDB Hallmark) and miRNA target databases (miRTarBase), with statistical significance assessed using a hypergeometric test followed by Benjamini–Hochberg correction. Significant terms were selected based on an adjusted P value < 0.05, excluding single-gene matches to ensure robustness. The enrichment results revealed that dark biomarkers were preferentially associated with metabolic regulatory pathways, particularly those related to lipid homeostasis (e.g., cholesterol homeostasis), suggesting that survival-associated transcriptional dysregulation may converge on metabolic reprogramming processes in glioblastoma[ 55 – 57 ].This is consistent with the known metabolic plasticity of glioblastoma, where lipid metabolism has been implicated in tumor growth and therapy resistance. At the post-transcriptional level, dark biomarkers exhibited significant enrichment in multiple miRNA-mediated regulatory modules, including targets of hsa-miR-16-5p and hsa-miR-95-5p[ 58 ]. This indicates that these genes are embedded within coordinated post-transcriptional regulatory networks rather than acting as independent expression units. Notably, such enrichment patterns were not observed in conventional Gene Ontology categories, highlighting that the functional convergence of dark biomarkers is not reflected in standard gene-centric annotations but instead emerges at the level of regulatory interactions[ 59 ].This pattern is consistent with the concept that regulatory convergence may arise at the network level rather than through individual gene annotations, further supporting the view that mqTrans captures a distinct layer of biological organization beyond absolute gene-expression changes. Collectively, these findings demonstrate that dark biomarkers converge on metabolic and post-transcriptional regulatory programs that are closely linked to glioblastoma progression and survival heterogeneity, reinforcing their role as regulatory-network-level biomarkers. Detailed enrichment results are provided in Supplementary Table S4. Term Source Ratio P.value adj.P.value Cholesterol Homeostasis MSigDB_Hallmark_2020 2/74 2.22E-03 1.33E-02 hsa-miR-16-5p miRTarBase_2017 8/1555 4.55E-05 2.59E-02 mmu-miR-3074-2-3p miRTarBase_2017 2/22 1.95E-04 4.37E-02 hsa-miR-95-5p miRTarBase_2017 3/128 2.30E-04 4.37E-02 3.5 Dark biomarker loci overlap with non-coding transcripts, suggesting transcriptomic structural complexity Having established that dark biomarkers converge on regulatory programs, we next examined whether their invisbility at the gene-expression level could be partially explained by transcriptomic structural complexity.To explore the structural basis underlying the hidden regulatory signals captured by mqTrans, we examined whether dark biomarkers are preferentially located in transcriptionally complex genomic regions. Using GENCODE annotation, we identified multiple non-coding transcripts overlapping dark biomarker loci, including antisense, intronic, TEC, and other non-coding transcript types[ 60 , 61 ]. These overlapping transcriptional elements indicate that many dark biomarkers reside in loci with complex transcriptional architectures.These overlapping transcripts are not limited to long non-coding RNAs but include diverse classes of non-coding transcriptional elements. In such regions, RNA-seq reads may originate from multiple partially overlapping transcripts, leading to ambiguity in read assignment during conventional gene-level quantification. Importantly, inspection of representative loci using genome browser data revealed RNA-seq coverage spanning overlapping transcript boundaries, supporting the active transcription of these non-coding elements rather than annotation artifacts. This suggests that transcriptional overlap introduces systematic uncertainty in gene-level expression estimates, consistent with transcriptional interference and isoform ambiguity reported in complex genomic loci. In this context, conventional abundance-based analyses may obscure biologically relevant regulatory perturbations arising from transcriptomic complexity, as signal contributions from overlapping transcripts cannot be disentangled. By contrast, mqTrans captures deviations in transcriptional relationships, enabling the detection of regulatory perturbations that arise from such structurally complex regions. Notably, several key dark biomarkers, including PDPK1, SRPK2, and SMARCA2, were associated with multiple overlapping non-coding transcripts, further supporting the prevalence of transcriptional interference and regulatory complexity at these loci. Together, these findings suggest that dark biomarkers are enriched in transcriptionally complex genomic regions, where overlapping non-coding transcription introduces quantification ambiguity. This provides a mechanistic explanation for why such regulatory signals remain obscured in conventional analyses while becoming detectable under a regulatory-deviation framework.More broadly, this highlights a fundamental limitation of gene-centric quantification in resolving transcriptional complexity. Dark Biomarker chr_id gene_name lncRNA_ensemble_id lnc_strand lnc_gene_type lnc_gene_name ENSG00000137449 chr4 CPEB2 ENSG00000249252 - antisense RP11-665G4.1 ENSG00000177706 chr7 FAM20C ENSG00000240093 - antisense AC093627.12 ENSG00000177706 chr7 FAM20C ENSG00000249852 - TEC AC145676.2 ENSG00000138757 chr4 G3BP2 ENSG00000201644 - misc_RNA Y_RNA ENSG00000125734 chr19 GPR108 ENSG00000283950 - miRNA MIR6791 ENSG00000173110 chr1 HSPA6 ENSG00000273112 + processed_transcript RP11-25K21.6 ENSG00000140992 chr16 PDPK1 ENSG00000261613 - antisense RP11-20I23.13 ENSG00000140992 chr16 PDPK1 ENSG00000261288 + sense_intronic RP11-20I23.11 ENSG00000140992 chr16 PDPK1 ENSG00000269937 - antisense RP11-20I23.8 ENSG00000140992 chr16 PDPK1 ENSG00000261140 - antisense RP11-20I23.6 ENSG00000140992 chr16 PDPK1 ENSG00000260436 - antisense RP11-20I23.7 ENSG00000140992 chr16 PDPK1 ENSG00000279568 - TEC RP11-20I23.5 ENSG00000140992 chr16 PDPK1 ENSG00000280402 + TEC RP11-20I23.10 ENSG00000140992 chr16 PDPK1 ENSG00000279162 + TEC CTD-3126B10.2 ENSG00000140992 chr16 PDPK1 ENSG00000261093 - antisense CTD-3126B10.1 ENSG00000080503 chr9 SMARCA2 ENSG00000236199 - antisense RP11-264I13.2 ENSG00000080503 chr9 SMARCA2 ENSG00000222973 - snRNA RNU2-25P ENSG00000135250 chr7 SRPK2 ENSG00000270764 + processed_pseudogene CTB-152G17.5 ENSG00000135250 chr7 SRPK2 ENSG00000271482 + processed_pseudogene CTB-152G17.4 ENSG00000135250 chr7 SRPK2 ENSG00000213361 + processed_pseudogene RP4-778K6.1 ENSG00000135250 chr7 SRPK2 ENSG00000244490 + processed_pseudogene RWDD4P1 ENSG00000135250 chr7 SRPK2 ENSG00000242154 + antisense RP4-778K6.3 ENSG00000135250 chr7 SRPK2 ENSG00000201179 + snRNA RNU6-1322P ENSG00000062716 chr17 VMP1 ENSG00000267637 + sense_intronic RP11-619I22.1 ENSG00000062716 chr17 VMP1 ENSG00000284190 + miRNA MIR21 ENSG00000167962 chr16 ZNF598 ENSG00000260107 + lincRNA AC005606.15 3.6 Reduced mqTrans-derived biomarker models show survival-stratification potential To evaluate the prognostic value of mqTrans-derived dark biomarkers, multigene Cox proportional hazards models were constructed and assessed in both the CGGA and CPTAC-GBM cohorts[ 35 , 62 ]. Using the full set of 19 dark biomarkers, the model achieved significant survival stratification in the CGGA cohort, as evidenced by clearly separated Kaplan–Meier curves and a significant hazard ratio. However, when applied to the CPTAC-GBM cohort, the model exhibited extreme hazard ratios and unstable confidence intervals, indicating severe overfitting likely due to the imbalance between feature dimensionality and sample size[ 63 , 64 ]. To mitigate overfitting and improve model robustness, feature selection was performed using a regularization-based Cox regression framework (LASSO), with the optimal penalty parameter (λ) determined via 10-fold cross-validation by minimizing the partial likelihood deviance.To avoid information leakage, model training and parameter tuning were performed within the training set only. This procedure yielded a reduced set of five dark biomarkers for model construction[ 62 , 65 ]. A risk score was calculated for each patient as a linear combination of selected features weighted by their Cox regression coefficients. Patients were stratified into high- and low-risk groups based on the median risk score. The resulting five-gene model demonstrated stable and consistent survival stratification across both the CGGA and CPTAC-GBM cohorts, with well-separated Kaplan–Meier curves and more reasonable hazard ratios[ 64 , 66 ]. In contrast, models constructed using the top-ranked 19 genes derived from conventional gene-expression analysis showed weaker and less consistent stratification performance, particularly in the CPTAC-GBM cohort[ 67 ]. Collectively, these results indicate that mqTrans-derived dark biomarkers showed more favorable survival-stratification patterns than abundance-based features in the analyzed cohorts. Importantly, controlling feature dimensionality is essential for achieving stable cross-cohort performance in survival modeling[ 35 , 64 ]. Hazard ratios (HR) with 95% confidence intervals and log-rank P values are indicated in each panel. High-risk and low-risk groups are shown in blue and orange, respectively. The mqTrans-derived dark biomarker model provides improved and more generalizable prognostic stratification compared with conventional gene-expression–based models. Conclusion In this study, we developed mqTrans, a regulatory deviation modeling framework built on gene-level expression designed to detect regulatory perturbations that are not captured by conventional abundance-based transcriptomic analyses. By quantifying the deviation between observed gene expression and model-inferred regulatory expectations, mqTrans provides a regulatory-layer view of the transcriptome that emphasizes regulatory dysregulation rather than absolute expression abundance[ 68 – 70 ]. Applying this framework to two independent Chinese Glioma Genome Atlas and Clinical Proteomic Tumor Analysis Consortium glioblastoma cohorts, we identified a set of dark biomarkers that were reproducibly associated with patient survival despite showing no significant differences in conventional gene-level expression analyses[ 69 , 71 ]. These biomarkers demonstrated stronger discriminatory power for survival stratification than abundance-based features, suggesting that clinically relevant prognostic information may reside in hidden regulatory signals that are overlooked by traditional transcriptomic methods[ 70 , 72 ]. Functional analyses further indicated that these dark biomarkers converge on post-transcriptional regulatory programs, while structural analyses revealed their enrichment in transcriptionally complex genomic loci, particularly regions overlapping with non-coding transcripts[ 69 , 73 ]. These findings provide a plausible mechanistic explanation for why important prognostic signals may remain undetected in standard gene-level analyses and support the view that regulatory-layer perturbation represents an important layer of glioblastoma heterogeneity[ 73 , 74 ].Importantly, these findings also suggest potential translational applications in clinical prognostic assessment and precision medicine. Importantly, the mqTrans-derived biomarkers retained prognostic relevance across independent cohorts and provided improved survival stratification compared with conventional expression-based models, indicating their potential utility for molecular prognostic assessment in Glioblastoma[ 69 , 75 ]. More broadly, this work suggests that regulatory-layer transcriptomic modeling can reveal biologically and clinically meaningful signals beyond gene-expression abundance alone[ 70 , 76 ]. Several limitations should be acknowledged. The current framework was evaluated retrospectively in public cohorts with limited sample sizes, and the biological functions of the identified dark biomarkers remain to be experimentally validated. Future studies incorporating larger prospective cohorts and functional validation experiments will be necessary to determine the mechanistic roles and translational applicability of these biomarkers. In summary, our findings demonstrate that regulatory deviation modeling can uncover hidden survival-associated signals that are not captured by conventional gene-expression analysis, providing a new computational strategy for identifying prognostic biomarkers and improving the understanding of molecular heterogeneity in aggressive cancers. Declarations Ethics approval and consent to participate This study is a secondary analysis of publicly available, de-identified transcriptomic datasets. The original studies (CGGA and CPTAC-GBM) from which the data were obtained each received ethical approval from their respective institutional review boards, and informed consent was obtained from all participants. No additional ethics approval was required for this study. Consent for publication Not applicable. This study contains no individual person's data in any form. Competing interests The authors declare that they have no competing interests. Funding This work was supported by the Natural Science Foundation of Anhui Province (Grant No. 2408085JX011), the Program for Excellent Scitech Innovation Teams of Universities in Anhui Province (Grant No. 2022AH010074), the Excellent Young Teacher Training Program of the Department of Education of Anhui Province (Grant No. YQZD202403), the Anhui Provincial Department of Education Higher Education Research Project (Grant No. 2025AHGXZK31534), the Open Research Fund of Anhui Province Key Laboratory of Non-coding RNA Basic and Clinical Transformation (Grant No. NcRNA202509), the Major Projects of Natural Science Research of Universities in Anhui Province (Grant No. 2025AHGXZK20259), and the General Projects of Natural Science Research of the Health Commission of Anhui Province (Grant Nos. AHWJ2024Aa20054 and AHWJ2024Aa20251). Authors' contributions Y.N. conceived the study, developed the methodology, performed all computational analyses, and wrote the manuscript. All other authors contributed through discussion, data interpretation, and manuscript review. K.L. supervised the project and acquired funding. All authors read and approved the final manuscript. Acknowledgements We acknowledge the Chinese Glioma Genome Atlas (CGGA) and the Clinical Proteomic Tumor Analysis Consortium (CPTAC) for providing publicly available data. We also thank the TRRUST database and the UCSC Xena platform for their open-access resources. Availability of data and material The datasets analyzed in this study are available in the following public repositories: CGGA ( http://www.cgga.org.cn ) and CPTAC-GBM ( https://proteomics.cancer.gov/programs/cptac ). Additional reference data were obtained from the UCSC Xena platform (GTEx dataset, https://xena.ucsc.edu ). Transcription factor–target interactions were retrieved from the TRRUST database ( https://www.grnpedia.org/trrust ). GENCODE annotation was obtained from https://www.gencodegenes.org . All analysis code and processed data supporting the findings of this study are available from the corresponding author upon reasonable request. References Louis DN et al (2021) The 2021 WHO Classification of Tumors of the Central Nervous System: a summary. Neuro Oncol 23(8):1231–1251 Tan AC et al (2020) Management of glioblastoma: State of the art and future directions. CA Cancer J Clin 70(4):299–312 Wen PY, Kesari S (2008) Malignant gliomas in adults. N Engl J Med 359(5):492–507 Ostrom QT et al (2023) CBTRUS Statistical Report: Primary Brain and Other Central Nervous System Tumors Diagnosed in the United States in 2016–2020. Neuro Oncol 25(12 Suppl 2):iv1–iv99 Stupp R et al (2005) Radiotherapy plus concomitant and adjuvant temozolomide for glioblastoma. N Engl J Med 352(10):987–996 Weller M et al (2021) EANO guidelines on the diagnosis and treatment of diffuse gliomas of adulthood. Nat Rev Clin Oncol 18(3):170–186 Brennan CW et al (2013) The somatic genomic landscape of glioblastoma. Cell 155(2):462–477 Phillips HS et al (2006) Molecular subclasses of high-grade glioma predict prognosis, delineate a pattern of disease progression, and resemble stages in neurogenesis. Cancer Cell 9(3):157–173 Eckel-Passow JE et al (2015) Glioma Groups Based on 1p/19q, IDH, and TERT Promoter Mutations in Tumors. N Engl J Med, 372(26): pp. 2499 – 508 Hegi ME et al (2005) MGMT gene silencing and benefit from temozolomide in glioblastoma. N Engl J Med 352(10):997–1003 Ceccarelli M et al (2016) Molecular Profiling Reveals Biologically Discrete Subsets and Pathways of Progression in Diffuse Glioma. Cell 164(3):550–563 Parsons DW et al (2008) An integrated genomic analysis of human glioblastoma multiforme. Science 321(5897):1807–1812 Verhaak RG et al (2010) Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1. Cancer Cell 17(1):98–110 Reifenberger G et al (2017) Advances in the molecular genetics of gliomas - implications for classification and therapy. Nat Rev Clin Oncol 14(7):434–452 Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature, (2008) 455(7216): p. 1061–1068 Love MI, Huber W, Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15(12):550 Robinson MD, McCarthy DJ, Smyth GK (2010) edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26(1):139–140 Ringnér M (2008) What is principal component analysis? Nat Biotechnol 26(3):303–304 Trapnell C et al (2012) Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc 7(3):562–578 Pertea M et al (2016) Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. Nat Protoc 11(9):1650–1667 Wang ET et al (2008) Alternative isoform regulation in human tissue transcriptomes. Nature 456(7221):470–476 Pan Q et al (2008) Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat Genet 40(12):1413–1415 Katz Y et al (2010) Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat Methods 7(12):1009–1015 Soneson C, Love MI, Robinson MD (2015) Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences. F1000Res 4:1521 Robert C, Watson M (2015) Errors in RNA-Seq quantification affect genes of relevance to human disease. Genome Biol 16(1):177 Marbach D et al (2012) Wisdom of crowds for robust gene network inference. Nat Methods 9(8):796–804 Alvarez MJ et al (2016) Functional characterization of somatic mutations in cancer using network-based inference of protein activity. Nat Genet 48(8):838–847 Zhao Z et al (2021) Chinese Glioma Genome Atlas (CGGA): a comprehensive resource with functional genomic data from Chinese glioma patients. Genom Proteom Bioinform 19(1):1–12 Zhao Z et al (2017) Comprehensive RNA-seq transcriptomic profiling in the malignant progression of gliomas. Sci data 4(1):170024 Wang LB et al (2021) Proteogenomic and metabolomic characterization of human glioblastoma. Cancer Cell 39(4):509–528e20 Conesa A et al (2016) A survey of best practices for RNA-seq data analysis. Genome Biol 17:13 Johnson WE, Li C, Rabinovic A (2007) Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8(1):118–127 Leek JT et al (2010) Tackling the widespread and critical impact of batch effects in high-throughput data. Nat Rev Genet 11(10):733–739 Han H et al (2018) TRRUST v2: an expanded reference database of human and mouse transcriptional regulatory interactions. Nucleic Acids Res 46(D1):D380–D386 Simon N et al (2011) Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent. J Stat Softw 39(5):1–13 Duan M et al (2023) Orchestrating information across tissues via a novel multitask GAT framework to improve quantitative gene regulation relation modeling for survival analysis. Brief Bioinform 24(4):bbad238 Li K et al Generating the Transcriptional Regulation View of Transcriptomic Features for Prediction Task and Dark Biomarker Detection on Small Datasets. J Vis Exp, 2024(205). Fay MP, Proschan MA (2010) Wilcoxon-Mann-Whitney or t-test? On assumptions for hypothesis tests and multiple interpretations of decision rules. Stat Surv 4:1–39 Bland JM, Altman DG (1995) Multiple significance tests: the Bonferroni method. BMJ 310(6973):170 Benjamini Y, Hochberg Y (1995) Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. J Roy Stat Soc: Ser B (Methodol) 57(1):289–300 Neftel C et al (2019) An Integrative Model of Cellular States, Plasticity, and Genetics for Glioblastoma. Cell 178(4):835–849e21 Singh S et al (2025) Glioblastoma at the crossroads: current understanding and future therapeutic horizons. Signal Transduct Target Therapy 10(1):213 Couturier CP et al (2020) Single-cell RNA-seq reveals that glioblastoma recapitulates a normal neurodevelopmental hierarchy. Nat Commun 11(1):3406 Yan Y et al (2024) Suppression of ITPKB degradation by Trim25 confers TMZ resistance in glioblastoma through ROS homeostasis. Signal Transduct Target Ther 9(1):58 Darmanis S et al (2017) Single-Cell RNA-Seq Analysis of Infiltrating Neoplastic Cells at the Migrating Front of Human Glioblastoma. Cell Rep 21(5):1399–1410 Han J et al (2022) Multilayered control of splicing regulatory networks by DAP3 leads to widespread alternative splicing changes in cancer. Nat Commun 13(1):1793 Glinos DA et al (2022) Transcriptome variation in human tissues revealed by long-read sequencing. Nature 608(7922):353–359 Stuart T et al (2019) Comprehensive Integration of Single-Cell Data. Cell 177(7):1888 –1902.e21 Vaquero-Garcia J et al (2016) A new view of transcriptome complexity and regulation through the lens of local splicing variations. Elife 5:e11752 Zhang Y et al (2022) Identifying network biomarkers of cancer by sample-specific differential network. BMC Bioinformatics 23(1):230 Tang S, Yuan K, Chen L (2022) Molecular biomarkers, network biomarkers, and dynamic network biomarkers for diagnosis and prediction of rare diseases. Fundam Res 2(6):894–902 Migliozzi S et al (2023) Integrative multi-omics networks identify PKCδ and DNA-PK as master kinases of glioblastoma subtypes and guide targeted cancer therapy. Nat Cancer 4(2):181–202 Harling JD, Tinworth CP (2023) A two-faced selectivity solution to target SMARCA2 for cancer therapy. Nat Commun 14(1):515 Wu Y et al (2024) LTR retrotransposon-derived LncRNA LINC01446 promotes hepatocellular carcinoma progression and angiogenesis by regulating the SRPK2/SRSF1/VEGF axis. Cancer Lett 598:217088 Venneti S, Thompson CB (2017) Metabolic Reprogramming in Brain Tumors. Annu Rev Pathol 12:515–545 Miska J, Chandel NS (2023) Targeting fatty acid metabolism in glioblastoma. J Clin Invest, 133(1) Kao TJ et al (2023) Dysregulated lipid metabolism in TMZ-resistant glioblastoma: pathways, proteins, metabolites and therapeutic opportunities. Lipids Health Dis 22(1):114 Chen M, Medarova Z, Moore A (2021) Role of microRNAs in glioblastoma. Oncotarget 12(17):1707–1723 Liberzon A et al (2015) The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst 1(6):417–425 Statello L et al (2021) Gene regulation by long non-coding RNAs and its biological functions. Nat Rev Mol Cell Biol 22(2):96–118 Ransohoff JD, Wei Y, Khavari PA (2018) The functions and unique features of long intergenic non-coding RNA. Nat Rev Mol Cell Biol 19(3):143–157 Tibshirani R (1997) The lasso method for variable selection in the Cox model. Stat Med 16(4):385–395 Sauerbrei W, Royston P, Binder H (2007) Selection of important variables and determination of functional form for continuous predictors in multivariable model building. Stat Med 26(30):5512–5528 Waldron L et al (2014) Comparative meta-analysis of prognostic gene signatures for late-stage ovarian cancer. J Natl Cancer Inst, 106(5) Zhang HH, Lu W (2007) Adaptive Lasso for Cox's proportional hazards model. Biometrika 94(3):691–703 Bøvelstad HM et al (2007) Predicting survival from microarray data–a comparative study. Bioinformatics 23(16):2080–2087 Geeleher P, Cox NJ, Huang RS (2014) Clinical drug response can be predicted using baseline gene expression levels and in vitro drug sensitivity in cell lines. Genome Biol 15(3):R47 Aibar S et al (2017) SCENIC: single-cell regulatory network inference and clustering. Nat Methods 14(11):1083–1086 Xu Z et al (2025) Integration of Single-Cell RNA and Bulk RNA Sequencing Reveals Cellular Heterogeneity and Identifies Survival-Associated Regulatory Networks in Glioblastoma. IET Syst Biol 19(1):e70025 Zheng Y et al (2023) Spatial cellular architecture predicts prognosis in glioblastoma. Nat Commun 14(1):4122 Ahmed YB et al (2024) Identification of Hypoxia Prognostic Signature in Glioblastoma Multiforme Based on Bulk and Single-Cell RNA-Seq. Cancers. 16(3):633 Ruffle JK et al (2023) Brain tumour genetic network signatures of survival. Brain 146(11):4736–4754 Ni B et al (2023) The short isoform of MS4A7 is a novel player in glioblastoma microenvironment, M2 macrophage polarization, and tumor progression. J Neuroinflammation 20(1):80 Hoogstrate Y et al (2023) Transcriptome analysis reveals tumor microenvironment changes in glioblastoma. Cancer Cell 41(4):678–692 .e7 Wan Z et al (2023) Identification of angiogenesis-related genes signature for predicting survival and its regulatory network in glioblastoma. Cancer Med 12(16):17445–17467 Oliveira FD et al (2024) Expression of key unfolded protein response genes predicts patient survival and an immunosuppressive microenvironment in glioblastoma. Translational Med Commun 9(1):5 Additional Declarations The authors declare no competing interests. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-9640328","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":636077548,"identity":"2a6e5903-f646-4028-8a1b-e869dd3e3143","order_by":0,"name":"Yangyu Ning","email":"","orcid":"","institution":"The First Affiliated Hospital of Wannan Medical College: Yijishan Hospital of Wannan Medical University","correspondingAuthor":false,"prefix":"","firstName":"Yangyu","middleName":"","lastName":"Ning","suffix":""},{"id":636077549,"identity":"108cf7c0-7120-4a69-83f7-6623fb8697a9","order_by":1,"name":"Zijun Qu","email":"","orcid":"","institution":"The First Affiliated Hospital of Wannan Medical College: Yijishan Hospital of Wannan Medical University","correspondingAuthor":false,"prefix":"","firstName":"Zijun","middleName":"","lastName":"Qu","suffix":""},{"id":636077550,"identity":"346d78fb-b1dd-4a08-b574-28e1a39acc7a","order_by":2,"name":"Zehang Xie","email":"","orcid":"","institution":"The First Affiliated Hospital of Wannan Medical College: Yijishan Hospital of Wannan Medical University","correspondingAuthor":false,"prefix":"","firstName":"Zehang","middleName":"","lastName":"Xie","suffix":""},{"id":636077551,"identity":"82d48388-d0d5-4e68-90b5-074e0116a77c","order_by":3,"name":"Shujun Wan","email":"","orcid":"","institution":"The First Affiliated Hospital of Wannan Medical College: Yijishan Hospital of Wannan Medical University","correspondingAuthor":false,"prefix":"","firstName":"Shujun","middleName":"","lastName":"Wan","suffix":""},{"id":636077552,"identity":"7614d9de-a585-4c46-bde7-9f9cad452a09","order_by":4,"name":"Weiya Pei","email":"","orcid":"","institution":"The First Affiliated Hospital of Wannan Medical College: Yijishan Hospital of Wannan Medical University","correspondingAuthor":false,"prefix":"","firstName":"Weiya","middleName":"","lastName":"Pei","suffix":""},{"id":636077553,"identity":"bfea24fa-2712-4ee7-88b3-044fec973632","order_by":5,"name":"Xueqin Li","email":"","orcid":"","institution":"The First Affiliated Hospital of Wannan Medical College: Yijishan Hospital of Wannan Medical University","correspondingAuthor":false,"prefix":"","firstName":"Xueqin","middleName":"","lastName":"Li","suffix":""},{"id":636077554,"identity":"883267f2-9011-4048-989d-951418cd4434","order_by":6,"name":"Mengying Zhang","email":"","orcid":"","institution":"The First Affiliated Hospital of Wannan Medical College: Yijishan Hospital of Wannan Medical University","correspondingAuthor":false,"prefix":"","firstName":"Mengying","middleName":"","lastName":"Zhang","suffix":""},{"id":636077555,"identity":"0874e849-7df5-49f6-9199-da3951ca6bdd","order_by":7,"name":"Xiaolong Zhu","email":"","orcid":"","institution":"The First Affiliated Hospital of Wannan Medical College: Yijishan Hospital of Wannan Medical University","correspondingAuthor":false,"prefix":"","firstName":"Xiaolong","middleName":"","lastName":"Zhu","suffix":""},{"id":636077556,"identity":"51ba5187-b963-43fe-b1bf-cdee8354c3f9","order_by":8,"name":"Lei Li","email":"","orcid":"","institution":"The First Affiliated Hospital of Wannan Medical College: Yijishan Hospital of Wannan Medical University","correspondingAuthor":false,"prefix":"","firstName":"Lei","middleName":"","lastName":"Li","suffix":""},{"id":636077557,"identity":"d07540ae-3d62-49ff-86ab-b062a95f3e0d","order_by":9,"name":"Kun Lv","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAu0lEQVRIiWNgGAWjYBACPgY2IPlDop6fmfngA6K0sIG0MPZYJEi2syUbEK+Fga0iweA8j5kAcVok0tKkC3gk8owPM5gxMNTYRBOj5Zj0DAuJYrPDDGkPGI6l5TYQ1MJzvE2ah0eCcdthhuMGjA2HidXCJsG4uZmxTYI4Lextx0BaEjcwM7MRrSXZmrdHwljiMBuzQQIxfuFnZjO8zfOjTo6///zHBx9qbAhrQQUJpCkfBaNgFIyCUYALAABoojLxkZQJLAAAAABJRU5ErkJggg==","orcid":"","institution":"The First Affiliated Hospital of Wannan Medical College: Yijishan Hospital of Wannan Medical University","correspondingAuthor":true,"prefix":"","firstName":"Kun","middleName":"","lastName":"Lv","suffix":""}],"badges":[],"createdAt":"2026-05-07 09:37:43","currentVersionCode":1,"declarations":{"humanSubjects":true,"vertebrateSubjects":false,"conflictsOfInterestStatement":false,"humanSubjectEthicalGuidelines":true,"humanSubjectConsent":true,"humanSubjectClinicalTrial":false,"humanSubjectCaseReport":false,"vertebrateSubjectEthicalGuidelines":false},"doi":"10.21203/rs.3.rs-9640328/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-9640328/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":109107315,"identity":"e78b5783-8a87-48a1-ba7f-e857a8b74bbe","added_by":"auto","created_at":"2026-05-12 15:08:07","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":304722,"visible":true,"origin":"","legend":"\u003cp\u003eConventional gene-level expression analysis fails to distinguish survival groups in GBM.\u003c/p\u003e\n\u003cp\u003e(A-B) Volcano plots of differential gene expression between short-survival and long-survival GBM patients in the CGGA and CPTAC-GBM cohorts based on gene-level expression profiles. No genes remained significant after multiple testing correction; nominal P-values are shown to illustrate overall expression trends.\u003c/p\u003e\n\u003cp\u003e(C-D) PCA plots based on the top 20 genes ranked by nominal P-values in the CGGA and CPTAC-GBM cohorts. Samples from short-survival and long-survival groups showed substantial overlap without clear separation.\u003c/p\u003e","description":"","filename":"1.png","url":"https://assets-eu.researchsquare.com/files/rs-9640328/v1/af4a6de1d776d6a8bcecfd09.png"},{"id":109107314,"identity":"765eaad7-a789-4c3f-bbdd-c9a196b6f54e","added_by":"auto","created_at":"2026-05-12 15:08:07","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":277793,"visible":true,"origin":"","legend":"\u003cp\u003emqTrans-based regulatory deviation modeling reveals survival-associated signals in GBM\u003c/p\u003e\n\u003cp\u003e(A-B) Volcano plots of differential mqTrans features between short-survival and long-survival GBM patients in the CGGA and CPTAC-GBM cohorts. Significant mqTrans biomarkers remained detectable after multiple testing correction (adjusted P \u0026lt; 0.05), demonstrating stronger survival-associated signals compared with conventional gene-level analysis.\u003c/p\u003e\n\u003cp\u003e(C-D) PCA plots based on the top 20 significant mqTrans biomarkers in the CGGA and CPTAC-GBM cohorts. mqTrans-derived features improved separation between short-survival and long-survival groups, particularly in the CPTAC-GBM cohort.\u003c/p\u003e","description":"","filename":"2.png","url":"https://assets-eu.researchsquare.com/files/rs-9640328/v1/58c0d1c8741b6743f07a02ca.png"},{"id":109107317,"identity":"00045f15-9a54-494d-80f6-0034184e4b14","added_by":"auto","created_at":"2026-05-12 15:08:07","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":209558,"visible":true,"origin":"","legend":"\u003cp\u003eIdentification and comparative evaluation of mqTrans-derived dark biomarkers in GBM.\u003c/p\u003e\n\u003cp\u003e(A) Venn diagram showing the overlap of significant mqTrans-derived biomarkers identified in the CGGA (n = 115) and CPTAC-GBM (n = 277) cohorts. A total of 19 shared genes were identified and defined as “dark biomarkers”, representing regulatory signals detected by mqTrans but not by conventional gene-expression analysis.\u003c/p\u003e\n\u003cp\u003e(B) Comparison of clustering performance between the 19 dark biomarkers and the top-ranked 19 genes selected based on nominal P values from gene-level expression analysis. Clustering quality was evaluated using Silhouette score (higher is better) and Davies–Bouldin index (lower is better) in both cohorts, demonstrating improved group separation and compactness for dark biomarkers.\u003c/p\u003e\n\u003cp\u003e(C–F) Principal component analysis (PCA) plots comparing the discriminative ability of different feature sets.\u003c/p\u003e\n\u003cp\u003e(C) CGGA cohort using dark biomarkers.\u003c/p\u003e\n\u003cp\u003e(D) CPTAC-GBM cohort using dark biomarkers.\u003c/p\u003e\n\u003cp\u003e(E) CGGA cohort using top-ranked gene-expression features.\u003c/p\u003e\n\u003cp\u003e(F) CPTAC-GBM cohort using top-ranked gene-expression features.\u003c/p\u003e","description":"","filename":"3.png","url":"https://assets-eu.researchsquare.com/files/rs-9640328/v1/b8f62e31420f655a9bd96a81.png"},{"id":109107316,"identity":"fce92c6e-2a55-4511-a62d-3f80582c78e2","added_by":"auto","created_at":"2026-05-12 15:08:07","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":197201,"visible":true,"origin":"","legend":"\u003cp\u003eSurvival stratification performance of mqTrans-derived dark biomarker–based Cox models.\u003c/p\u003e\n\u003cp\u003eKaplan–Meier survival curves comparing high-risk and low-risk groups defined by multigene Cox proportional hazards models in the CGGA and CPTAC-GBM cohorts. Patients were stratified based on the median risk score calculated from model coefficients.\u003c/p\u003e\n\u003cp\u003e(A–B) Survival curves based on the full set of 19 dark biomarkers in the CGGA (A) and CPTAC-GBM (B) cohorts. While significant stratification was observed in the CGGA cohort, the CPTAC-GBM cohort exhibited extreme hazard ratios and unstable confidence intervals, indicating overfitting.\u003c/p\u003e\n\u003cp\u003e(C–D) Survival curves based on the LASSO-selected five-gene model in the CGGA (C) and CPTAC-GBM (D) cohorts. The reduced model demonstrated improved robustness, with more stable hazard ratios and consistent survival separation across cohorts.\u003c/p\u003e\n\u003cp\u003e(E–F) Survival curves based on the top-ranked 19 genes derived from conventional gene-expression analysis in the CGGA (E) and CPTAC-GBM (F) cohorts. These models showed weaker and less consistent stratification performance compared with mqTrans-derived features.\u003c/p\u003e","description":"","filename":"4.png","url":"https://assets-eu.researchsquare.com/files/rs-9640328/v1/04f0f8aeee952a8cba54d6ec.png"},{"id":109107352,"identity":"8edb972c-cd65-48f1-bf40-3068e9bf4232","added_by":"auto","created_at":"2026-05-12 15:08:28","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1293502,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-9640328/v1/a8d9f857-c2c0-46ad-8958-4bcd84ac7137.pdf"}],"financialInterests":"The authors declare no competing interests.","formattedTitle":"\u003cp\u003e\u003cstrong\u003eUncovering dark biomarkers of glioblastoma survival through regulatory-deviation–based transcriptomic modeling\u003c/strong\u003e\u003c/p\u003e","fulltext":[{"header":"Introduction","content":"\u003cp\u003eGlioblastoma (GBM) is one of the most common and aggressive primary malignant brain tumors in adults, characterized by diffuse cancer cell infiltration, vigorous angiogenesis, and extreme resistance to conventional drug therapies[\u003cspan additionalcitationids=\"CR2 CR3\" citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e]. Even with aggressive multimodal treatment\u0026mdash;including maximal safe resection, radiation therapy, and temozolomide chemotherapy\u0026mdash;the clinical course of glioblastoma remains highly heterogeneous.[\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e, \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e] Furthermore, there is a vast disparity in patient survival times: some patients experience extremely rapid disease progression, with only a few months elapsing between diagnosis and death, while others exhibit significantly slower progression, ranging up to several years in the case of a small number of long-term survivors[\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e, \u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e]. This marked prognostic difference\u0026mdash;from short-term mortality to long-term survival\u0026mdash;highlights key but not yet fully elucidated molecular determinants that govern tumor aggressiveness and the host response[\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e, \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e]. Therefore, elucidating the molecular differences between GBM patients with short survival and those with long survival is crucial; this not only aids in refining prognostic stratification but also provides guidance for developing personalized treatment strategies[\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e, \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e]. However, the precise molecular mechanisms driving these starkly different clinical outcomes remain inadequately elucidated by current analytical paradigms[\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eThere are currently three main clinical strategies used to assess GBM prognosis and guide treatment decisions: neuroimaging assessments (e.g., evaluation of tumor burden and progression via contrast-enhanced MRI), histopathological classification and grading based on WHO guidelines, and limited molecular biomarker testing, most notably isocitrate dehydrogenase (IDH) mutation status and O⁶-methylguanine-DNA methyltransferase (MGMT) promoter methylation[\u003cspan additionalcitationids=\"CR10 CR11\" citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e]. Although these established methods provide a foundational framework for patient management, they have significant limitations in fully capturing the heterogeneity of GBM survival[\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e, \u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e]. Imaging and pathological assessments are inherently macroscopic or morphological in nature and often lag behind the molecular evolution of tumor dynamics[\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e, \u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e]. Furthermore, molecular assays for detecting MGMT methylation or IDH mutation status are typically resource-intensive, time-consuming, and susceptible to variations in experimental conditions, despite the clinical significance of these markers[\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e, \u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e]. Crucially, these traditional markers have limited ability to resolve the complex layers of post-transcriptional and post-translational regulation, which may profoundly influence tumor cell behavior and treatment sensitivity, leaving a significant portion of survival heterogeneity unexplained[\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e, \u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eThe emergence and widespread adoption of high-throughput RNA sequencing (RNA-seq) have propelled computational methods to the forefront of GBM research, aiming to decipher this unresolved heterogeneity[\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e, \u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e]. Standard analytical workflows in this field extensively employ differential expression analysis (DEG), unsupervised clustering and dimension reduction techniques (such as PCA and t-SNE), as well as survival regression models (such as univariate and multivariate Cox proportional hazards regression)[\u003cspan additionalcitationids=\"CR19\" citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e]. However, the vast majority of these computational methods are based on a critical implicit assumption: they model biology at the gene level, treating the total abundance of all transcripts produced at a gene locus as the primary functional variable[\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e, \u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e]. This perspective essentially posits that quantitative changes in total mRNA abundance are the primary drivers of phenotypic differentiation. However, a growing body of evidence suggests that relying solely on gene expression abundance is insufficient to fully characterize tumor-associated molecular abnormalities, and that complex transcriptional regulatory relationships and dysregulated networks may play a significant role in tumorigenesis and disease progression[\u003cspan additionalcitationids=\"CR23\" citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e]. By merging different transcript isoforms with potentially antagonistic functions into a single expression value, gene-level analyses risk systematic signal loss, thereby obscuring critical regulatory changes that occur even when overall gene expression abundance remains largely unchanged[\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e, \u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e, \u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eHere, we propose a regulatory deviation modeling framework (mqTrans) built on gene-level expression, which captures discrepancies between observed expression and transcription factor\u0026ndash;inferred regulatory expectations.Unlike traditional abundance-centric approaches, mqTrans shifts the analytical focus from the absolute abundance of gene expression to the deviation between the observed expression levels and those predicted based on upstream transcriptional regulators[\u003cspan additionalcitationids=\"CR25\" citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e]. This framework is used to identify potential biological signals that exhibit no significant changes in overall expression levels but have already undergone abnormalities in their transcriptional regulatory states. Based on this regulatory deviation analysis framework, we systematically defined and investigated a novel class of molecular features termed \u0026ldquo;dark biomarkers.\u0026rdquo; These biomarkers are operationally defined as genes that do not exhibit significant changes in traditional gene-level differential expression analyses but show significant differences under the mqTrans regulatory deviation perspective[\u003cspan additionalcitationids=\"CR26\" citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eIn this study, based on the mqTrans framework, we conducted a systematic analysis of two independent GBM cohorts (CGGA and CPTAC-GBM) and obtained the following key findings. First, we demonstrated that mqTrans modeling based on regulatory deviation can identify potential survival-related signals that are difficult to detect using traditional gene expression analysis, providing a new analytical perspective for elucidating prognostic heterogeneity in GBM. Second, this study defined and identified a class of \u0026ldquo;dark biomarkers\u0026rdquo; that are difficult to detect in traditional expression analysis but exhibit significant differences from the mqTrans perspective, suggesting that important molecular signals associated with survival may be hidden at the regulatory network level, which is difficult to capture in traditional expression analysis. Third, we further explored the potential regulatory mechanisms underlying the formation of these dark biomarkers, providing a basis for understanding their biological origins. Additionally, we validated the potential clinical utility of these dark biomarkers in patient prognostic stratification.\u003c/p\u003e \u003cp\u003eIn addition to providing a new analytical perspective, this framework may have important translational implications. By capturing regulatory-deviation signals that are not detectable using conventional gene-expression approaches, mqTrans-derived biomarkers may improve patient stratification and prognostic assessment in glioblastoma. Furthermore, these dark biomarkers may represent potential therapeutic targets embedded within dysregulated regulatory networks, thereby offering new opportunities for precision medicine. Therefore, there is a critical need for analytical frameworks that can capture regulatory-layer perturbations beyond gene-expression abundance.\u003c/p\u003e"},{"header":"Materials and Methods","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003e2.1 Datasets and Patient Selection\u003c/h2\u003e \u003cdiv id=\"Sec4\" class=\"Section3\"\u003e \u003ch2\u003e2.1.1 CGGA cohort\u003c/h2\u003e \u003cp\u003eRNA-seq expression data and corresponding clinical annotations were obtained from the Chinese Glioma Genome Atlas (CGGA) cohort [\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e, \u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e], which included CGGA693 and CGGA325 samples. Samples were retained if they met the following criteria: (i) primary tumor samples; (ii) histologically annotated glioblastoma, WHO grade IV; and (iii) available overall survival (OS) information. Survival groups were defined to contrast the extremes of the survival spectrum. Short survival was defined as OS\u0026thinsp;\u0026le;\u0026thinsp;360 days, and long survival was defined as OS\u0026thinsp;\u0026ge;\u0026thinsp;720 days. Samples with intermediate survival, defined as OS between 361 and 719 days, were excluded from the main analyses. After filtering, the CGGA cohort included 149 primary GBM samples and was used as the discovery cohort.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec5\" class=\"Section3\"\u003e \u003ch2\u003e2.1.2 CPTAC‑GBM cohort\u003c/h2\u003e \u003cp\u003eAn independent validation cohort was assembled from the CPTAC-GBM dataset [\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e]. The same sample selection criteria were applied, including primary GBM status, WHO grade IV annotation, and complete OS information. The same survival cutoffs were used to define the short-survival and long-survival groups, with OS\u0026thinsp;\u0026le;\u0026thinsp;360 days classified as short survival and OS\u0026thinsp;\u0026ge;\u0026thinsp;720 days classified as long survival. Samples with intermediate survival were excluded. After filtering, the CPTAC-GBM cohort included 63 GBM samples and served as the independent validation cohort.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec6\" class=\"Section3\"\u003e \u003ch2\u003e2.1.3 Expression matrix preprocessing\u003c/h2\u003e \u003cp\u003eGene-level TPM expression matrices from the CGGA and CPTAC-GBM cohorts were used for conventional expression-based downstream analyses. After sample filtering, the CGGA matrix contained 47,886 gene features across 149 samples, whereas the CPTAC-GBM matrix contained 10,959 gene features across 63 samples. Expression values were transformed as log2(TPM\u0026thinsp;+\u0026thinsp;1) prior to modeling[\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e, \u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e]. To ensure cross-cohort comparability, downstream analyses were restricted to the 10,628 overlapping genes shared by the two cohorts. After missing or invalid value processing, 10,309 genes remained in the CGGA cohort and 9,269 genes remained in the CPTAC-GBM cohort. Batch effects were corrected before differential analysis, and available covariates were incorporated into the linear modeling framework[\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e, \u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e].\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv id=\"Sec7\" class=\"Section2\"\u003e \u003ch2\u003e2.2 The mqTrans View of Transcriptomes\u003c/h2\u003e \u003cdiv id=\"Sec8\" class=\"Section3\"\u003e \u003ch2\u003e2.2.1 Construction of the TF-based prediction model\u003c/h2\u003e \u003cp\u003eFor each target gene, a transcription factor-based regression model was established to estimate its expected expression from the expression profiles of its candidate upstream regulators. Candidate transcription factors were obtained from the TRRUST regulatory interaction database [\u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e], and their normalized expression values were used as model predictors.\u003c/p\u003e \u003cp\u003eFor a given target gene \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:j\\)\u003c/span\u003e\u003c/span\u003e, the expected expression was modeled as:\u003cdiv id=\"Equa\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equa\" name=\"EquationSource\"\u003e\n$$\\:\\begin{array}{c}{\\widehat{mRNA}}_{j}={\\beta\\:}_{0}+\\sum\\:_{k}{\\beta\\:}_{k}{TF}_{k}+ϵ\\#\\left(1\\right)\\end{array}$$\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003ewhere \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{\\widehat{mRNA}}_{j}\\)\u003c/span\u003e\u003c/span\u003e denotes the predicted expression level of gene \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:j\\)\u003c/span\u003e\u003c/span\u003e, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{TF}_{k}\\)\u003c/span\u003e\u003c/span\u003e represents the expression level of the \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:k-th\\)\u003c/span\u003e\u003c/span\u003e transcription factor, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{\\beta\\:}_{k}\\)\u003c/span\u003e\u003c/span\u003e is the estimated regression coefficient, and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:ϵ\\)\u003c/span\u003e\u003c/span\u003e is the residual term.\u003c/p\u003e \u003cp\u003eThe fitted model was then applied to each sample to obtain the expected expression value for each target gene. The mqTrans feature was defined as the absolute difference between the observed and predicted expression levels:\u003cdiv id=\"Equb\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equb\" name=\"EquationSource\"\u003e\n$$\\:\\begin{array}{c}{mqTrans}_{j}=\\left|{mRNA}_{j}-{\\widehat{mRNA}}_{j}\\right|\\#\\left(2\\right)\\end{array}$$\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003eThis value quantifies the extent to which the observed expression of a gene deviates from the level expected from its transcription factor-based regulatory context.The models were pre-trained using transcriptomic data of healthy brain tissue samples from the UCSC Xena platform [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e] (GTEx dataset, n\u0026thinsp;=\u0026thinsp;1,157, tissue\u0026thinsp;=\u0026thinsp;brain). We selected healthy brain tissue because the core transcriptional regulatory relationships (e.g., TF\u0026rarr;target gene) are expected to be relatively stable in non-malignant tissue. For each target gene, a linear regression model was fitted using the expression levels of its known TFs (from TRRUST) as predictors. The model parameters were estimated in the healthy samples using ordinary least squares with 5-fold cross-validation to guard against overfitting[\u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e]. The trained models were then applied to each GBM sample to predict the \u0026lsquo;expected\u0026rsquo; expression level of each target gene. This design assumes that deviations from the healthy regulatory baseline in tumor samples may reflect disease-associated dysregulation.\u003c/p\u003e \u003cp\u003eUnlike conventional transcriptomic analysis, which mainly evaluates absolute gene expression abundance, mqTrans focuses on regulatory deviation [\u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e, \u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e37\u003c/span\u003e]. This design allows the analysis to capture transcriptomic alterations that may not be evident from direct expression abundance alone. Thus, mqTrans provides a complementary view of transcriptome dysregulation by representing the discrepancy between observed expression and transcription factor-predicted expression.\u003c/p\u003e \u003cp\u003emqTrans features were generated independently for the CGGA and CPTAC-GBM cohorts using the same computational procedure. The resulting mqTrans matrices were then used for downstream differential analysis and for the identification of dark biomarkers.\u003c/p\u003e \u003cp\u003eThis framework assumes that transcriptional regulatory relationships learned from normal tissues provide a stable baseline, and that deviations from this baseline in tumor samples reflect disease-associated dysregulation.\u003c/p\u003e \u003cp\u003eAlthough transcriptional regulatory relationships may be partially altered in tumor contexts, a substantial proportion of core TF\u0026ndash;target interactions are conserved across physiological and pathological states. Therefore, deviations from a normal regulatory baseline are more likely to reflect disease-associated dysregulation rather than random variation. This design allows mqTrans to capture biologically meaningful perturbations while maintaining a stable reference framework for cross-sample comparison.\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv id=\"Sec9\" class=\"Section2\"\u003e \u003ch2\u003e2.3 Differential Analysis\u003c/h2\u003e \u003cp\u003eTo compare the mqTrans framework with conventional expression-based analysis, differential analyses were performed in parallel at two analytical levels: gene-level expression abundance and mqTrans-derived regulatory deviation. The same survival grouping strategy was used in both analyses, where samples were classified as short-survival or long-survival according to the predefined overall survival cutoffs. Differential testing was conducted separately in the CGGA and CPTAC-GBM cohorts.\u003c/p\u003e \u003cp\u003eFor conventional expression-level analysis, each gene was evaluated independently using an ordinary least squares linear regression model. The processed gene expression value was used as the response variable, and the survival group indicator was included as the primary variable of interest. The model further incorporated available demographic, clinical, and technical covariates, including gender, age, OS days, OS event status, and batch, to account for potential confounding effects.\u003c/p\u003e \u003cp\u003eFor mqTrans-level analysis, differential testing was performed on the mqTrans feature matrix. Each mqTrans feature quantified the regulatory deviation magnitude of a gene, defined as the discrepancy between its observed expression and the expression predicted from transcription factor-based regulation. Because mqTrans values represent deviation strength rather than direct expression abundance, differences between the short-survival and long-survival groups were assessed for each feature using a two-sided Mann\u0026ndash;Whitney U test[\u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eMultiple testing correction was performed separately for each cohort and each analytical level. For the gene-level expression analysis, nominal \u003cem\u003eP\u003c/em\u003e values from the linear models were adjusted using Bonferroni correction[\u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e], whereas \u003cem\u003eP\u003c/em\u003e values from the mqTrans-level Mann\u0026ndash;Whitney U tests were adjusted using the Benjamini\u0026ndash;Hochberg procedure[\u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e40\u003c/span\u003e]. Features with adjusted \u003cem\u003eP\u003c/em\u003e values below 0.05 were considered statistically significant.The threshold for minimal expression change was defined as |log2 fold change| \u0026lt; 0.2. Dark biomarkers were defined as genes that were significant at the mqTrans level in both the CGGA and CPTAC-GBM cohorts, but were not significant in conventional gene-level expression analysis in either cohort.\u003c/p\u003e \u003c/div\u003e"},{"header":"Results and Discussion","content":"\u003cdiv id=\"Sec11\" class=\"Section2\"\u003e\n\u003ch2\u003e3.1 Conventional gene-expression abundance provides limited survival-associated signals in GBM\u003c/h2\u003e\n\u003cp\u003eTo evaluate whether conventional gene-level transcriptomic analysis could identify survival-associated molecular signals in glioblastoma, we first performed differential expression analysis between short-survival (\u0026lt;\u0026thinsp;1 year) and long-survival (\u0026gt;\u0026thinsp;2 years) patients in the CGGA and CPTAC-GBM cohorts based on normalized gene-level expression profiles.\u003c/p\u003e\n\u003cp\u003eIn both cohorts, no genes remained significant after multiple testing correction, indicating that conventional gene-level differential expression analysis failed to identify robust survival-associated biomarkers[\u003cspan class=\"CitationRef\"\u003e30\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e41\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e42\u003c/span\u003e]. To visualize potential expression trends, volcano plots based on nominal P-values were examined (Fig.\u0026nbsp;1A-B). In the CGGA cohort, most genes were symmetrically distributed around zero fold change with only a few genes showing weak differential trends. Similarly, although a limited number of genes in the CPTAC-GBM cohort exhibited nominal significance, these signals were sparse and lacked consistency, suggesting that survival-associated expression differences at the gene level were generally weak and unstable across datasets.\u003c/p\u003e\n\u003cp\u003eTo further assess whether survival groups could be separated at the global transcriptomic level, principal component analysis (PCA) was performed using the top 20 genes ranked by nominal P-values in each cohort (Fig.\u0026nbsp;1C-D). In both the CGGA and CPTAC-GBM datasets, short-survival and long-survival samples showed substantial overlap in PCA space without clear clustering patterns. This result indicates that overall gene-expression patterns were insufficient to distinguish survival phenotypes, even when focusing on the most variable candidate genes[\u003cspan class=\"CitationRef\"\u003e30\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e43\u003c/span\u003e].\u003c/p\u003e\n\u003cp\u003eTogether, these findings demonstrate that conventional gene-level expression profiling provides limited discriminatory power for identifying survival-associated molecular signatures in GBM. The lack of reproducible differential expression signals across independent cohorts suggests that prognostic information may be concealed within regulatory perturbations not captured by absolute expression abundance alone, rather than being solely attributable to limited statistical power, a phenomenon frequently observed in highly heterogeneous tumors such as glioblastoma[\u003cspan class=\"CitationRef\"\u003e44\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e45\u003c/span\u003e]. These observations motivated the application of the mqTrans framework to detect hidden survival-associated signals from a regulatory deviation perspective.We further verified that the observed trends were robust under alternative survival grouping strategies, supporting the stability of the findings.\u003c/p\u003e\n\u003cp\u003e\u0026nbsp;\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec12\" class=\"Section2\"\u003e\n\u003ch2\u003e3.2 mqTrans-based regulatory deviation modeling reveals hidden survival-associated signals\u003c/h2\u003e\n\u003cp\u003eTo determine whether regulatory deviation\u0026ndash;based transcriptomic modeling could reveal survival-associated signals that were undetectable at the conventional gene-expression level, we applied the mqTrans framework to the same short-survival and long-survival GBM cohorts.\u003c/p\u003e\n\u003cp\u003eDifferential analysis based on mqTrans features identified multiple significantly altered genes in both the CGGA and CPTAC-GBM cohorts after multiple testing correction (adjusted P\u0026thinsp;\u0026lt;\u0026thinsp;0.05), in contrast to the absence of significant signals in conventional gene-level analysis. Volcano plots demonstrated a substantially larger number of significant features with clearer distributions of differences in deviation magnitude between the two survival groups in both cohorts (Fig.\u0026nbsp;2A-B). These findings indicate that mqTrans captures a distinct layer of transcriptomic variation by quantifying deviations between observed expression and model-inferred regulatory expectations, thereby shifting the analytical focus from abundance-centric measurements to regulation-centric representations [\u003cspan class=\"CitationRef\"\u003e37\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e46\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e47\u003c/span\u003e]. This framework therefore represents an alternative paradigm for transcriptomic analysis that complements and extends conventional gene-expression\u0026ndash;based approaches.\u003c/p\u003e\n\u003cp\u003eTo further evaluate the discriminatory ability of mqTrans-derived features, principal component analysis (PCA) was performed using the top 20 significant mqTrans biomarkers ranked by adjusted P-values. In the CGGA cohort, mqTrans features showed an evident trend toward separation between short-survival and long-survival samples, while in the CPTAC-GBM cohort the separation was more pronounced, with the two survival groups forming distinct clusters in PCA space (Fig.\u0026nbsp;2C-D).It should be noted that PCA was used here as a descriptive visualization tool rather than an independent classification method. Compared with the substantial overlap observed in gene-level PCA analysis, mqTrans-derived features demonstrated markedly improved discriminatory power for stratifying patient survival phenotypes[\u003cspan class=\"CitationRef\"\u003e47\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e48\u003c/span\u003e].\u003c/p\u003e\n\u003cp\u003eTaken together, these results demonstrate that regulatory deviation modeling reveals a distinct layer of survival-associated molecular information in GBM that is not captured by gene-expression abundance. By quantifying deviations between observed expression and model-inferred regulatory expectations, mqTrans detects regulatory perturbations that may underlie survival heterogeneity even in the absence of overt expression changes. This increased detectability of survival-associated deviation features provided the basis for defining dark biomarkers[\u003cspan class=\"CitationRef\"\u003e37\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e46\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e49\u003c/span\u003e]. Conceptually, mqTrans can be interpreted as a regulatory deviation\u0026ndash;based modeling framework that captures discrepancies between observed expression and inferred regulatory expectations.\u003c/p\u003e\n\u003cp\u003e\u0026nbsp;\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec13\" class=\"Section2\"\u003e\n\u003ch2\u003e3.3 Cross-cohort identification of 19 expression-invisible dark biomarkers\u003c/h2\u003e\n\u003cp\u003eBuilding on the mqTrans-derived survival-associated signals identified above, we next sought to define a stable set of reproducible features across cohorts.To identify prognostic molecular signals that are not detectable by conventional differential expression analysis, we sought to rigorously define \u0026ldquo;dark biomarkers\u0026rdquo;. To this end, we imposed additional constraints beyond statistical non-significance at the gene-expression level. Specifically, dark biomarkers were defined as genes that (i) showed significant differences in mqTrans features (adjusted P\u0026thinsp;\u0026lt;\u0026thinsp;0.05) in both cohorts, and (ii) exhibited minimal changes in gene-expression abundance, as indicated by minimal expression effect sizes (below a predefined threshold) and lack of consistent directional trends across cohorts.Conceptually, these dark biomarkers are analogous to hidden or latent regulatory signals described in network-based biomarker studies, which are not directly observable at the level of gene-expression abundance.Notably, the focus here is on identifying features that are reproducible across cohorts rather than further optimizing discriminatory performance.\u003c/p\u003e\n\u003cp\u003eAlthough no genes remained significant after multiple testing correction at the gene-expression level, a subset of genes exhibited weak but variable expression changes, with some showing nominal significance (unadjusted P\u0026thinsp;\u0026lt;\u0026thinsp;0.01) and partially consistent fold-change directions across cohorts. Based on this stringent definition, dark biomarkers were subsequently identified as genes satisfying these criteria in both cohorts, ensuring that these biomarkers represent regulatory perturbations independent of expression abundance rather than artifacts of limited statistical power. Under this framework, we identified 19 consistently detected genes across the CGGA and CPTAC-GBM cohorts. Notably, none of these genes reached adjusted significance in conventional gene-level analysis, confirming that they represent a class of regulatory-deviation signals that are not detectable by traditional abundance-focused approaches.\u003c/p\u003e\n\u003cp\u003emqTrans-based differential analysis identified 115 candidate biomarkers in the CGGA cohort and 277 in the CPTAC-GBM cohort. Cross-cohort comparison revealed 19 overlapping genes, which were consistently detected in both datasets and were therefore defined as dark biomarkers (Fig.\u0026nbsp;3A). Notably, these biomarkers were not identified through conventional gene-level differential expression analysis, indicating that they represent hidden prognostic signals independent of absolute gene expression changes.\u003c/p\u003e\n\u003cp\u003eTo evaluate whether these 19 dark biomarkers provided improved discriminative ability, we compared them with the top-ranked 19 genes selected by nominal P values from the raw-expression analysis. Clustering performance was assessed using the Silhouette score and Davies\u0026ndash;Bouldin index. In both cohorts, the dark biomarker set achieved higher Silhouette scores and lower Davies\u0026ndash;Bouldin indices than the top-ranked raw-expression genes (Fig.\u0026nbsp;3B), indicating that these features exhibit consistent clustering structure and support a more stable cross-cohort feature set.\u003c/p\u003e\n\u003cp\u003ePrincipal component analysis (PCA) further demonstrated that the 19 dark biomarkers enabled clearer separation between long-survival and short-survival patients in both the CGGA and CPTAC-GBM cohorts (Fig.\u0026nbsp;3C\u0026ndash;D). In contrast, the top-ranked raw-expression genes showed weaker and less stable separation patterns. These results suggest that mqTrans-derived dark biomarkers provide a consistent feature space that supports stable separation patterns across independent cohorts.\u003c/p\u003e\n\u003cp\u003eTogether, these findings demonstrate the existence of a class of latent regulatory biomarkers that are not detectable through conventional differential expression analysis but can robustly capture survival-associated molecular differences in glioblastoma, highlighting the importance of regulatory relationships beyond absolute expression abundance[\u003cspan class=\"CitationRef\"\u003e50\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e51\u003c/span\u003e].Importantly, the consistency of these signals across independent cohorts argues against insufficient statistical power as the primary explanation and instead supports a regulatory-origin mechanism.\u003c/p\u003e\n\u003cp\u003eNotably, several identified dark biomarkers have previously been implicated in glioblastoma biology. For example, PDPK1 is a key regulator of the PI3K/AKT signaling pathway and has been associated with tumor cell survival and therapeutic resistance.[\u003cspan class=\"CitationRef\"\u003e52\u003c/span\u003e] SMARCA2, a chromatin remodeling factor, has been reported to influence transcriptional regulation and tumor progression[\u003cspan class=\"CitationRef\"\u003e53\u003c/span\u003e]. SRPK2 plays a role in RNA splicing and has been linked to oncogenic processes in multiple cancers[\u003cspan class=\"CitationRef\"\u003e54\u003c/span\u003e]. The identification of these genes further supports the biological relevance of mqTrans-derived dark biomarkers and suggests that regulatory-deviation signals may capture functionally important drivers of tumor behavior\u003c/p\u003e\n\u003cp\u003eSamples are colored by survival group (blue: long-survival; red: short-survival). mqTrans-derived dark biomarkers exhibit clearer separation between survival groups compared with conventional gene-expression features, indicating consistent separation patterns across cohorts.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec14\" class=\"Section2\"\u003e\n\u003ch2\u003e3.4 Dark biomarkers converge on cholesterol homeostasis and miRNA-mediated regulatory programs\u003c/h2\u003e\n\u003cp\u003eTo investigate the biological relevance of the identified dark biomarkers, we performed functional enrichment analyses using curated pathway databases (MSigDB Hallmark) and miRNA target databases (miRTarBase), with statistical significance assessed using a hypergeometric test followed by Benjamini\u0026ndash;Hochberg correction. Significant terms were selected based on an adjusted P value\u0026thinsp;\u0026lt;\u0026thinsp;0.05, excluding single-gene matches to ensure robustness.\u003c/p\u003e\n\u003cp\u003eThe enrichment results revealed that dark biomarkers were preferentially associated with metabolic regulatory pathways, particularly those related to lipid homeostasis (e.g., cholesterol homeostasis), suggesting that survival-associated transcriptional dysregulation may converge on metabolic reprogramming processes in glioblastoma[\u003cspan class=\"CitationRef\"\u003e55\u003c/span\u003e\u0026ndash;\u003cspan class=\"CitationRef\"\u003e57\u003c/span\u003e].This is consistent with the known metabolic plasticity of glioblastoma, where lipid metabolism has been implicated in tumor growth and therapy resistance.\u003c/p\u003e\n\u003cp\u003eAt the post-transcriptional level, dark biomarkers exhibited significant enrichment in multiple miRNA-mediated regulatory modules, including targets of hsa-miR-16-5p and hsa-miR-95-5p[\u003cspan class=\"CitationRef\"\u003e58\u003c/span\u003e]. This indicates that these genes are embedded within coordinated post-transcriptional regulatory networks rather than acting as independent expression units.\u003c/p\u003e\n\u003cp\u003eNotably, such enrichment patterns were not observed in conventional Gene Ontology categories, highlighting that the functional convergence of dark biomarkers is not reflected in standard gene-centric annotations but instead emerges at the level of regulatory interactions[\u003cspan class=\"CitationRef\"\u003e59\u003c/span\u003e].This pattern is consistent with the concept that regulatory convergence may arise at the network level rather than through individual gene annotations, further supporting the view that mqTrans captures a distinct layer of biological organization beyond absolute gene-expression changes.\u003c/p\u003e\n\u003cp\u003eCollectively, these findings demonstrate that dark biomarkers converge on metabolic and post-transcriptional regulatory programs that are closely linked to glioblastoma progression and survival heterogeneity, reinforcing their role as regulatory-network-level biomarkers. Detailed enrichment results are provided in Supplementary Table S4.\u003c/p\u003e\n\u003cdiv class=\"gridtable\"\u003e\n\u003cdiv class=\"colspec\" align=\"left\"\u003e\u0026nbsp;\u003c/div\u003e\n\u003cdiv class=\"colspec\" align=\"left\"\u003e\u0026nbsp;\u003c/div\u003e\n\u003ctable id=\"Taba\" border=\"1\"\u003e\n\u003cthead\u003e\n\u003ctr\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\u003eTerm\u003c/p\u003e\n\u003c/th\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\u003eSource\u003c/p\u003e\n\u003c/th\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\u003eRatio\u003c/p\u003e\n\u003c/th\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\u003eP.value\u003c/p\u003e\n\u003c/th\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\u003eadj.P.value\u003c/p\u003e\n\u003c/th\u003e\n\u003c/tr\u003e\n\u003c/thead\u003e\n\u003ctbody\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e\u003cstrong\u003eCholesterol Homeostasis\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eMSigDB_Hallmark_2020\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e2/74\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e2.22E-03\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e1.33E-02\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e\u003cstrong\u003ehsa-miR-16-5p\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003emiRTarBase_2017\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e8/1555\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e4.55E-05\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e2.59E-02\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e\u003cstrong\u003emmu-miR-3074-2-3p\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003emiRTarBase_2017\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e2/22\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e1.95E-04\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e4.37E-02\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e\u003cstrong\u003ehsa-miR-95-5p\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003emiRTarBase_2017\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e3/128\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e2.30E-04\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e4.37E-02\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003c/tbody\u003e\n\u003c/table\u003e\n\u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec15\" class=\"Section2\"\u003e\n\u003ch2\u003e3.5 Dark biomarker loci overlap with non-coding transcripts, suggesting transcriptomic structural complexity\u003c/h2\u003e\n\u003cp\u003eHaving established that dark biomarkers converge on regulatory programs, we next examined whether their invisbility at the gene-expression level could be partially explained by transcriptomic structural complexity.To explore the structural basis underlying the hidden regulatory signals captured by mqTrans, we examined whether dark biomarkers are preferentially located in transcriptionally complex genomic regions. Using GENCODE annotation, we identified multiple non-coding transcripts overlapping dark biomarker loci, including antisense, intronic, TEC, and other non-coding transcript types[\u003cspan class=\"CitationRef\"\u003e60\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e61\u003c/span\u003e].\u003c/p\u003e\n\u003cp\u003eThese overlapping transcriptional elements indicate that many dark biomarkers reside in loci with complex transcriptional architectures.These overlapping transcripts are not limited to long non-coding RNAs but include diverse classes of non-coding transcriptional elements. In such regions, RNA-seq reads may originate from multiple partially overlapping transcripts, leading to ambiguity in read assignment during conventional gene-level quantification.\u003c/p\u003e\n\u003cp\u003eImportantly, inspection of representative loci using genome browser data revealed RNA-seq coverage spanning overlapping transcript boundaries, supporting the active transcription of these non-coding elements rather than annotation artifacts. This suggests that transcriptional overlap introduces systematic uncertainty in gene-level expression estimates, consistent with transcriptional interference and isoform ambiguity reported in complex genomic loci.\u003c/p\u003e\n\u003cp\u003eIn this context, conventional abundance-based analyses may obscure biologically relevant regulatory perturbations arising from transcriptomic complexity, as signal contributions from overlapping transcripts cannot be disentangled. By contrast, mqTrans captures deviations in transcriptional relationships, enabling the detection of regulatory perturbations that arise from such structurally complex regions.\u003c/p\u003e\n\u003cp\u003eNotably, several key dark biomarkers, including PDPK1, SRPK2, and SMARCA2, were associated with multiple overlapping non-coding transcripts, further supporting the prevalence of transcriptional interference and regulatory complexity at these loci.\u003c/p\u003e\n\u003cp\u003eTogether, these findings suggest that dark biomarkers are enriched in transcriptionally complex genomic regions, where overlapping non-coding transcription introduces quantification ambiguity. This provides a mechanistic explanation for why such regulatory signals remain obscured in conventional analyses while becoming detectable under a regulatory-deviation framework.More broadly, this highlights a fundamental limitation of gene-centric quantification in resolving transcriptional complexity.\u003c/p\u003e\n\u003cdiv class=\"gridtable\"\u003e\n\u003cdiv class=\"colspec\" align=\"left\"\u003e\u0026nbsp;\u003c/div\u003e\n\u003cdiv class=\"colspec\" align=\"left\"\u003e\u0026nbsp;\u003c/div\u003e\n\u003ctable id=\"Tabb\" border=\"1\"\u003e\n\u003cthead\u003e\n\u003ctr\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\u003eDark Biomarker\u003c/p\u003e\n\u003c/th\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\u003echr_id\u003c/p\u003e\n\u003c/th\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\u003egene_name\u003c/p\u003e\n\u003c/th\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\u003elncRNA_ensemble_id\u003c/p\u003e\n\u003c/th\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\u003elnc_strand\u003c/p\u003e\n\u003c/th\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\u003elnc_gene_type\u003c/p\u003e\n\u003c/th\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\u003elnc_gene_name\u003c/p\u003e\n\u003c/th\u003e\n\u003c/tr\u003e\n\u003c/thead\u003e\n\u003ctbody\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e\u003cstrong\u003eENSG00000137449\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003echr4\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eCPEB2\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eENSG00000249252\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e-\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eantisense\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eRP11-665G4.1\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e\u003cstrong\u003eENSG00000177706\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003echr7\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eFAM20C\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eENSG00000240093\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e-\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eantisense\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eAC093627.12\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e\u003cstrong\u003eENSG00000177706\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003echr7\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eFAM20C\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eENSG00000249852\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e-\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eTEC\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eAC145676.2\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e\u003cstrong\u003eENSG00000138757\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003echr4\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eG3BP2\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eENSG00000201644\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e-\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003emisc_RNA\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eY_RNA\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e\u003cstrong\u003eENSG00000125734\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003echr19\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eGPR108\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eENSG00000283950\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e-\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003emiRNA\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eMIR6791\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e\u003cstrong\u003eENSG00000173110\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003echr1\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eHSPA6\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eENSG00000273112\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e+\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eprocessed_transcript\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eRP11-25K21.6\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e\u003cstrong\u003eENSG00000140992\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003echr16\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003ePDPK1\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eENSG00000261613\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e-\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eantisense\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eRP11-20I23.13\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e\u003cstrong\u003eENSG00000140992\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003echr16\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003ePDPK1\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eENSG00000261288\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e+\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003esense_intronic\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eRP11-20I23.11\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e\u003cstrong\u003eENSG00000140992\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003echr16\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003ePDPK1\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eENSG00000269937\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e-\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eantisense\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eRP11-20I23.8\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e\u003cstrong\u003eENSG00000140992\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003echr16\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003ePDPK1\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eENSG00000261140\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e-\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eantisense\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eRP11-20I23.6\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e\u003cstrong\u003eENSG00000140992\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003echr16\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003ePDPK1\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eENSG00000260436\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e-\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eantisense\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eRP11-20I23.7\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e\u003cstrong\u003eENSG00000140992\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003echr16\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003ePDPK1\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eENSG00000279568\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e-\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eTEC\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eRP11-20I23.5\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e\u003cstrong\u003eENSG00000140992\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003echr16\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003ePDPK1\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eENSG00000280402\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e+\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eTEC\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eRP11-20I23.10\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e\u003cstrong\u003eENSG00000140992\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003echr16\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003ePDPK1\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eENSG00000279162\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e+\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eTEC\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eCTD-3126B10.2\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e\u003cstrong\u003eENSG00000140992\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003echr16\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003ePDPK1\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eENSG00000261093\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e-\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eantisense\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eCTD-3126B10.1\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e\u003cstrong\u003eENSG00000080503\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003echr9\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eSMARCA2\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eENSG00000236199\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e-\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eantisense\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eRP11-264I13.2\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e\u003cstrong\u003eENSG00000080503\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003echr9\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eSMARCA2\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eENSG00000222973\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e-\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003esnRNA\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eRNU2-25P\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e\u003cstrong\u003eENSG00000135250\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003echr7\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eSRPK2\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eENSG00000270764\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e+\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eprocessed_pseudogene\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eCTB-152G17.5\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e\u003cstrong\u003eENSG00000135250\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003echr7\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eSRPK2\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eENSG00000271482\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e+\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eprocessed_pseudogene\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eCTB-152G17.4\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e\u003cstrong\u003eENSG00000135250\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003echr7\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eSRPK2\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eENSG00000213361\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e+\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eprocessed_pseudogene\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eRP4-778K6.1\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e\u003cstrong\u003eENSG00000135250\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003echr7\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eSRPK2\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eENSG00000244490\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e+\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eprocessed_pseudogene\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eRWDD4P1\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e\u003cstrong\u003eENSG00000135250\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003echr7\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eSRPK2\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eENSG00000242154\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e+\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eantisense\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eRP4-778K6.3\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e\u003cstrong\u003eENSG00000135250\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003echr7\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eSRPK2\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eENSG00000201179\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e+\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003esnRNA\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eRNU6-1322P\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e\u003cstrong\u003eENSG00000062716\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003echr17\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eVMP1\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eENSG00000267637\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e+\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003esense_intronic\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eRP11-619I22.1\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e\u003cstrong\u003eENSG00000062716\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003echr17\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eVMP1\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eENSG00000284190\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e+\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003emiRNA\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eMIR21\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e\u003cstrong\u003eENSG00000167962\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003echr16\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eZNF598\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eENSG00000260107\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e+\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003elincRNA\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eAC005606.15\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003c/tbody\u003e\n\u003c/table\u003e\n\u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec16\" class=\"Section2\"\u003e\n\u003ch2\u003e3.6 Reduced mqTrans-derived biomarker models show survival-stratification potential\u003c/h2\u003e\n\u003cp\u003eTo evaluate the prognostic value of mqTrans-derived dark biomarkers, multigene Cox proportional hazards models were constructed and assessed in both the CGGA and CPTAC-GBM cohorts[\u003cspan class=\"CitationRef\"\u003e35\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e62\u003c/span\u003e].\u003c/p\u003e\n\u003cp\u003eUsing the full set of 19 dark biomarkers, the model achieved significant survival stratification in the CGGA cohort, as evidenced by clearly separated Kaplan\u0026ndash;Meier curves and a significant hazard ratio. However, when applied to the CPTAC-GBM cohort, the model exhibited extreme hazard ratios and unstable confidence intervals, indicating severe overfitting likely due to the imbalance between feature dimensionality and sample size[\u003cspan class=\"CitationRef\"\u003e63\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e64\u003c/span\u003e].\u003c/p\u003e\n\u003cp\u003eTo mitigate overfitting and improve model robustness, feature selection was performed using a regularization-based Cox regression framework (LASSO), with the optimal penalty parameter (\u0026lambda;) determined via 10-fold cross-validation by minimizing the partial likelihood deviance.To avoid information leakage, model training and parameter tuning were performed within the training set only. This procedure yielded a reduced set of five dark biomarkers for model construction[\u003cspan class=\"CitationRef\"\u003e62\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e65\u003c/span\u003e].\u003c/p\u003e\n\u003cp\u003eA risk score was calculated for each patient as a linear combination of selected features weighted by their Cox regression coefficients. Patients were stratified into high- and low-risk groups based on the median risk score. The resulting five-gene model demonstrated stable and consistent survival stratification across both the CGGA and CPTAC-GBM cohorts, with well-separated Kaplan\u0026ndash;Meier curves and more reasonable hazard ratios[\u003cspan class=\"CitationRef\"\u003e64\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e66\u003c/span\u003e].\u003c/p\u003e\n\u003cp\u003eIn contrast, models constructed using the top-ranked 19 genes derived from conventional gene-expression analysis showed weaker and less consistent stratification performance, particularly in the CPTAC-GBM cohort[\u003cspan class=\"CitationRef\"\u003e67\u003c/span\u003e].\u003c/p\u003e\n\u003cp\u003eCollectively, these results indicate that mqTrans-derived dark biomarkers showed more favorable survival-stratification patterns than abundance-based features in the analyzed cohorts. Importantly, controlling feature dimensionality is essential for achieving stable cross-cohort performance in survival modeling[\u003cspan class=\"CitationRef\"\u003e35\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e64\u003c/span\u003e].\u003c/p\u003e\n\u003cp\u003e\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eHazard ratios (HR) with 95% confidence intervals and log-rank P values are indicated in each panel. High-risk and low-risk groups are shown in blue and orange, respectively. The mqTrans-derived dark biomarker model provides improved and more generalizable prognostic stratification compared with conventional gene-expression\u0026ndash;based models.\u003c/p\u003e\n\u003c/div\u003e"},{"header":"Conclusion","content":"\u003cp\u003eIn this study, we developed mqTrans, a regulatory deviation modeling framework built on gene-level expression designed to detect regulatory perturbations that are not captured by conventional abundance-based transcriptomic analyses. By quantifying the deviation between observed gene expression and model-inferred regulatory expectations, mqTrans provides a regulatory-layer view of the transcriptome that emphasizes regulatory dysregulation rather than absolute expression abundance[\u003cspan additionalcitationids=\"CR69\" citationid=\"CR68\" class=\"CitationRef\"\u003e68\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR70\" class=\"CitationRef\"\u003e70\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eApplying this framework to two independent Chinese Glioma Genome Atlas and Clinical Proteomic Tumor Analysis Consortium glioblastoma cohorts, we identified a set of dark biomarkers that were reproducibly associated with patient survival despite showing no significant differences in conventional gene-level expression analyses[\u003cspan citationid=\"CR69\" class=\"CitationRef\"\u003e69\u003c/span\u003e, \u003cspan citationid=\"CR71\" class=\"CitationRef\"\u003e71\u003c/span\u003e]. These biomarkers demonstrated stronger discriminatory power for survival stratification than abundance-based features, suggesting that clinically relevant prognostic information may reside in hidden regulatory signals that are overlooked by traditional transcriptomic methods[\u003cspan citationid=\"CR70\" class=\"CitationRef\"\u003e70\u003c/span\u003e, \u003cspan citationid=\"CR72\" class=\"CitationRef\"\u003e72\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eFunctional analyses further indicated that these dark biomarkers converge on post-transcriptional regulatory programs, while structural analyses revealed their enrichment in transcriptionally complex genomic loci, particularly regions overlapping with non-coding transcripts[\u003cspan citationid=\"CR69\" class=\"CitationRef\"\u003e69\u003c/span\u003e, \u003cspan citationid=\"CR73\" class=\"CitationRef\"\u003e73\u003c/span\u003e]. These findings provide a plausible mechanistic explanation for why important prognostic signals may remain undetected in standard gene-level analyses and support the view that regulatory-layer perturbation represents an important layer of glioblastoma heterogeneity[\u003cspan citationid=\"CR73\" class=\"CitationRef\"\u003e73\u003c/span\u003e, \u003cspan citationid=\"CR74\" class=\"CitationRef\"\u003e74\u003c/span\u003e].Importantly, these findings also suggest potential translational applications in clinical prognostic assessment and precision medicine.\u003c/p\u003e \u003cp\u003eImportantly, the mqTrans-derived biomarkers retained prognostic relevance across independent cohorts and provided improved survival stratification compared with conventional expression-based models, indicating their potential utility for molecular prognostic assessment in Glioblastoma[\u003cspan citationid=\"CR69\" class=\"CitationRef\"\u003e69\u003c/span\u003e, \u003cspan citationid=\"CR75\" class=\"CitationRef\"\u003e75\u003c/span\u003e]. More broadly, this work suggests that regulatory-layer transcriptomic modeling can reveal biologically and clinically meaningful signals beyond gene-expression abundance alone[\u003cspan citationid=\"CR70\" class=\"CitationRef\"\u003e70\u003c/span\u003e, \u003cspan citationid=\"CR76\" class=\"CitationRef\"\u003e76\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eSeveral limitations should be acknowledged. The current framework was evaluated retrospectively in public cohorts with limited sample sizes, and the biological functions of the identified dark biomarkers remain to be experimentally validated. Future studies incorporating larger prospective cohorts and functional validation experiments will be necessary to determine the mechanistic roles and translational applicability of these biomarkers.\u003c/p\u003e \u003cp\u003eIn summary, our findings demonstrate that regulatory deviation modeling can uncover hidden survival-associated signals that are not captured by conventional gene-expression analysis, providing a new computational strategy for identifying prognostic biomarkers and improving the understanding of molecular heterogeneity in aggressive cancers.\u003c/p\u003e"},{"header":"Declarations","content":" \u003cp\u003e \u003cstrong\u003eEthics approval and consent to participate\u003c/strong\u003e \u003cp\u003eThis study is a secondary analysis of publicly available, de-identified transcriptomic datasets. The original studies (CGGA and CPTAC-GBM) from which the data were obtained each received ethical approval from their respective institutional review boards, and informed consent was obtained from all participants. No additional ethics approval was required for this study.\u003c/p\u003e \u003c/p\u003e \u003cp\u003e \u003cstrong\u003eConsent for publication\u003c/strong\u003e \u003cp\u003eNot applicable. This study contains no individual person's data in any form.\u003c/p\u003e \u003c/p\u003e\u003cp\u003e \u003ch2\u003eCompeting interests\u003c/h2\u003e \u003cp\u003eThe authors declare that they have no competing interests.\u003c/p\u003e \u003c/p\u003e\u003ch2\u003eFunding\u003c/h2\u003e \u003cp\u003eThis work was supported by the Natural Science Foundation of Anhui Province (Grant No. 2408085JX011), the Program for Excellent Scitech Innovation Teams of Universities in Anhui Province (Grant No. 2022AH010074), the Excellent Young Teacher Training Program of the Department of Education of Anhui Province (Grant No. YQZD202403), the Anhui Provincial Department of Education Higher Education Research Project (Grant No. 2025AHGXZK31534), the Open Research Fund of Anhui Province Key Laboratory of Non-coding RNA Basic and Clinical Transformation (Grant No. NcRNA202509), the Major Projects of Natural Science Research of Universities in Anhui Province (Grant No. 2025AHGXZK20259), and the General Projects of Natural Science Research of the Health Commission of Anhui Province (Grant Nos. AHWJ2024Aa20054 and AHWJ2024Aa20251).\u003c/p\u003e\u003ch2\u003eAuthors' contributions\u003c/h2\u003e \u003cp\u003eY.N. conceived the study, developed the methodology, performed all computational analyses, and wrote the manuscript. All other authors contributed through discussion, data interpretation, and manuscript review. K.L. supervised the project and acquired funding. All authors read and approved the final manuscript.\u003c/p\u003e\u003ch2\u003eAcknowledgements\u003c/h2\u003e \u003cp\u003eWe acknowledge the Chinese Glioma Genome Atlas (CGGA) and the Clinical Proteomic Tumor Analysis Consortium (CPTAC) for providing publicly available data. We also thank the TRRUST database and the UCSC Xena platform for their open-access resources.\u003c/p\u003e\u003ch2\u003eAvailability of data and material\u003c/h2\u003e \u003cp\u003eThe datasets analyzed in this study are available in the following public repositories: CGGA (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://www.cgga.org.cn\u003c/span\u003e\u003cspan address=\"http://www.cgga.org.cn\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e) and CPTAC-GBM (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://proteomics.cancer.gov/programs/cptac\u003c/span\u003e\u003cspan address=\"https://proteomics.cancer.gov/programs/cptac\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e). Additional reference data were obtained from the UCSC Xena platform (GTEx dataset, \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://xena.ucsc.edu\u003c/span\u003e\u003cspan address=\"https://xena.ucsc.edu\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e). Transcription factor\u0026ndash;target interactions were retrieved from the TRRUST database (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.grnpedia.org/trrust\u003c/span\u003e\u003cspan address=\"https://www.grnpedia.org/trrust\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e). GENCODE annotation was obtained from \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.gencodegenes.org\u003c/span\u003e\u003cspan address=\"https://www.gencodegenes.org\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. All analysis code and processed data supporting the findings of this study are available from the corresponding author upon reasonable request.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eLouis DN et al (2021) The 2021 WHO Classification of Tumors of the Central Nervous System: a summary. Neuro Oncol 23(8):1231\u0026ndash;1251\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTan AC et al (2020) Management of glioblastoma: State of the art and future directions. CA Cancer J Clin 70(4):299\u0026ndash;312\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWen PY, Kesari S (2008) Malignant gliomas in adults. N Engl J Med 359(5):492\u0026ndash;507\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eOstrom QT et al (2023) CBTRUS Statistical Report: Primary Brain and Other Central Nervous System Tumors Diagnosed in the United States in 2016\u0026ndash;2020. Neuro Oncol 25(12 Suppl 2):iv1\u0026ndash;iv99\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eStupp R et al (2005) Radiotherapy plus concomitant and adjuvant temozolomide for glioblastoma. N Engl J Med 352(10):987\u0026ndash;996\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWeller M et al (2021) EANO guidelines on the diagnosis and treatment of diffuse gliomas of adulthood. Nat Rev Clin Oncol 18(3):170\u0026ndash;186\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBrennan CW et al (2013) The somatic genomic landscape of glioblastoma. Cell 155(2):462\u0026ndash;477\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePhillips HS et al (2006) Molecular subclasses of high-grade glioma predict prognosis, delineate a pattern of disease progression, and resemble stages in neurogenesis. Cancer Cell 9(3):157\u0026ndash;173\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eEckel-Passow JE et al (2015) \u003cem\u003eGlioma Groups Based on 1p/19q, IDH, and TERT Promoter Mutations in Tumors.\u003c/em\u003e N Engl J Med, 372(26): pp. 2499\u0026thinsp;\u0026ndash;\u0026thinsp;508\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHegi ME et al (2005) MGMT gene silencing and benefit from temozolomide in glioblastoma. N Engl J Med 352(10):997\u0026ndash;1003\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCeccarelli M et al (2016) Molecular Profiling Reveals Biologically Discrete Subsets and Pathways of Progression in Diffuse Glioma. Cell 164(3):550\u0026ndash;563\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eParsons DW et al (2008) An integrated genomic analysis of human glioblastoma multiforme. Science 321(5897):1807\u0026ndash;1812\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eVerhaak RG et al (2010) Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1. Cancer Cell 17(1):98\u0026ndash;110\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eReifenberger G et al (2017) Advances in the molecular genetics of gliomas - implications for classification and therapy. Nat Rev Clin Oncol 14(7):434\u0026ndash;452\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eComprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature, (2008) 455(7216): p. 1061\u0026ndash;1068\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLove MI, Huber W, Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15(12):550\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRobinson MD, McCarthy DJ, Smyth GK (2010) edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26(1):139\u0026ndash;140\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRingn\u0026eacute;r M (2008) What is principal component analysis? Nat Biotechnol 26(3):303\u0026ndash;304\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTrapnell C et al (2012) Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc 7(3):562\u0026ndash;578\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePertea M et al (2016) Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. Nat Protoc 11(9):1650\u0026ndash;1667\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang ET et al (2008) Alternative isoform regulation in human tissue transcriptomes. Nature 456(7221):470\u0026ndash;476\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePan Q et al (2008) Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat Genet 40(12):1413\u0026ndash;1415\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKatz Y et al (2010) Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat Methods 7(12):1009\u0026ndash;1015\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSoneson C, Love MI, Robinson MD (2015) Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences. F1000Res 4:1521\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRobert C, Watson M (2015) Errors in RNA-Seq quantification affect genes of relevance to human disease. Genome Biol 16(1):177\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMarbach D et al (2012) Wisdom of crowds for robust gene network inference. Nat Methods 9(8):796\u0026ndash;804\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAlvarez MJ et al (2016) Functional characterization of somatic mutations in cancer using network-based inference of protein activity. Nat Genet 48(8):838\u0026ndash;847\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhao Z et al (2021) Chinese Glioma Genome Atlas (CGGA): a comprehensive resource with functional genomic data from Chinese glioma patients. Genom Proteom Bioinform 19(1):1\u0026ndash;12\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhao Z et al (2017) Comprehensive RNA-seq transcriptomic profiling in the malignant progression of gliomas. Sci data 4(1):170024\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang LB et al (2021) Proteogenomic and metabolomic characterization of human glioblastoma. Cancer Cell 39(4):509\u0026ndash;528e20\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eConesa A et al (2016) A survey of best practices for RNA-seq data analysis. Genome Biol 17:13\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJohnson WE, Li C, Rabinovic A (2007) Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8(1):118\u0026ndash;127\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLeek JT et al (2010) Tackling the widespread and critical impact of batch effects in high-throughput data. Nat Rev Genet 11(10):733\u0026ndash;739\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHan H et al (2018) TRRUST v2: an expanded reference database of human and mouse transcriptional regulatory interactions. Nucleic Acids Res 46(D1):D380\u0026ndash;D386\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSimon N et al (2011) Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent. J Stat Softw 39(5):1\u0026ndash;13\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDuan M et al (2023) Orchestrating information across tissues via a novel multitask GAT framework to improve quantitative gene regulation relation modeling for survival analysis. Brief Bioinform 24(4):bbad238\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi K et al Generating the Transcriptional Regulation View of Transcriptomic Features for Prediction Task and Dark Biomarker Detection on Small Datasets. J Vis Exp, 2024(205).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFay MP, Proschan MA (2010) Wilcoxon-Mann-Whitney or t-test? On assumptions for hypothesis tests and multiple interpretations of decision rules. Stat Surv 4:1\u0026ndash;39\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBland JM, Altman DG (1995) Multiple significance tests: the Bonferroni method. BMJ 310(6973):170\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBenjamini Y, Hochberg Y (1995) Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. J Roy Stat Soc: Ser B (Methodol) 57(1):289\u0026ndash;300\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNeftel C et al (2019) An Integrative Model of Cellular States, Plasticity, and Genetics for Glioblastoma. Cell 178(4):835\u0026ndash;849e21\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSingh S et al (2025) Glioblastoma at the crossroads: current understanding and future therapeutic horizons. Signal Transduct Target Therapy 10(1):213\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCouturier CP et al (2020) Single-cell RNA-seq reveals that glioblastoma recapitulates a normal neurodevelopmental hierarchy. Nat Commun 11(1):3406\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYan Y et al (2024) Suppression of ITPKB degradation by Trim25 confers TMZ resistance in glioblastoma through ROS homeostasis. Signal Transduct Target Ther 9(1):58\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDarmanis S et al (2017) Single-Cell RNA-Seq Analysis of Infiltrating Neoplastic Cells at the Migrating Front of Human Glioblastoma. Cell Rep 21(5):1399\u0026ndash;1410\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHan J et al (2022) Multilayered control of splicing regulatory networks by DAP3 leads to widespread alternative splicing changes in cancer. Nat Commun 13(1):1793\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGlinos DA et al (2022) Transcriptome variation in human tissues revealed by long-read sequencing. Nature 608(7922):353\u0026ndash;359\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eStuart T et al (2019) Comprehensive Integration of Single-Cell Data. Cell 177(7):1888 \u0026ndash;1902.e21\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eVaquero-Garcia J et al (2016) A new view of transcriptome complexity and regulation through the lens of local splicing variations. Elife 5:e11752\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang Y et al (2022) Identifying network biomarkers of cancer by sample-specific differential network. BMC Bioinformatics 23(1):230\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTang S, Yuan K, Chen L (2022) Molecular biomarkers, network biomarkers, and dynamic network biomarkers for diagnosis and prediction of rare diseases. Fundam Res 2(6):894\u0026ndash;902\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMigliozzi S et al (2023) Integrative multi-omics networks identify PKCδ and DNA-PK as master kinases of glioblastoma subtypes and guide targeted cancer therapy. Nat Cancer 4(2):181\u0026ndash;202\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHarling JD, Tinworth CP (2023) A two-faced selectivity solution to target SMARCA2 for cancer therapy. Nat Commun 14(1):515\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWu Y et al (2024) LTR retrotransposon-derived LncRNA LINC01446 promotes hepatocellular carcinoma progression and angiogenesis by regulating the SRPK2/SRSF1/VEGF axis. Cancer Lett 598:217088\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eVenneti S, Thompson CB (2017) Metabolic Reprogramming in Brain Tumors. Annu Rev Pathol 12:515\u0026ndash;545\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMiska J, Chandel NS (2023) Targeting fatty acid metabolism in glioblastoma. J Clin Invest, 133(1)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKao TJ et al (2023) Dysregulated lipid metabolism in TMZ-resistant glioblastoma: pathways, proteins, metabolites and therapeutic opportunities. Lipids Health Dis 22(1):114\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChen M, Medarova Z, Moore A (2021) Role of microRNAs in glioblastoma. Oncotarget 12(17):1707\u0026ndash;1723\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLiberzon A et al (2015) The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst 1(6):417\u0026ndash;425\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eStatello L et al (2021) Gene regulation by long non-coding RNAs and its biological functions. Nat Rev Mol Cell Biol 22(2):96\u0026ndash;118\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRansohoff JD, Wei Y, Khavari PA (2018) The functions and unique features of long intergenic non-coding RNA. Nat Rev Mol Cell Biol 19(3):143\u0026ndash;157\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTibshirani R (1997) The lasso method for variable selection in the Cox model. Stat Med 16(4):385\u0026ndash;395\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSauerbrei W, Royston P, Binder H (2007) Selection of important variables and determination of functional form for continuous predictors in multivariable model building. Stat Med 26(30):5512\u0026ndash;5528\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWaldron L et al (2014) Comparative meta-analysis of prognostic gene signatures for late-stage ovarian cancer. J Natl Cancer Inst, 106(5)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang HH, Lu W (2007) Adaptive Lasso for Cox's proportional hazards model. Biometrika 94(3):691\u0026ndash;703\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eB\u0026oslash;velstad HM et al (2007) Predicting survival from microarray data\u0026ndash;a comparative study. Bioinformatics 23(16):2080\u0026ndash;2087\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGeeleher P, Cox NJ, Huang RS (2014) Clinical drug response can be predicted using baseline gene expression levels and in vitro drug sensitivity in cell lines. Genome Biol 15(3):R47\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAibar S et al (2017) SCENIC: single-cell regulatory network inference and clustering. Nat Methods 14(11):1083\u0026ndash;1086\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eXu Z et al (2025) Integration of Single-Cell RNA and Bulk RNA Sequencing Reveals Cellular Heterogeneity and Identifies Survival-Associated Regulatory Networks in Glioblastoma. IET Syst Biol 19(1):e70025\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZheng Y et al (2023) Spatial cellular architecture predicts prognosis in glioblastoma. Nat Commun 14(1):4122\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAhmed YB et al (2024) \u003cem\u003eIdentification of Hypoxia Prognostic Signature in Glioblastoma Multiforme Based on Bulk and Single-Cell RNA-Seq.\u003c/em\u003e Cancers. 16(3):633\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRuffle JK et al (2023) Brain tumour genetic network signatures of survival. Brain 146(11):4736\u0026ndash;4754\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNi B et al (2023) The short isoform of MS4A7 is a novel player in glioblastoma microenvironment, M2 macrophage polarization, and tumor progression. J Neuroinflammation 20(1):80\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHoogstrate Y et al (2023) Transcriptome analysis reveals tumor microenvironment changes in glioblastoma. Cancer Cell 41(4):678\u0026ndash;692 .e7\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWan Z et al (2023) Identification of angiogenesis-related genes signature for predicting survival and its regulatory network in glioblastoma. Cancer Med 12(16):17445\u0026ndash;17467\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eOliveira FD et al (2024) Expression of key unfolded protein response genes predicts patient survival and an immunosuppressive microenvironment in glioblastoma. Translational Med Commun 9(1):5\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"Wannan Medical College","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Glioblastoma, Transcriptomics, Regulatory deviation, Dark biomarkers, Prognostic biomarkers, Precision medicine","lastPublishedDoi":"10.21203/rs.3.rs-9640328/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-9640328/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eGlioblastoma (GBM) displays extreme survival heterogeneity that remains poorly resolved by conventional gene-level transcriptomic analyses, which often fail to identify robust survival-associated biomarkers. Here, we introduce mqTrans, a regulatory deviation modeling framework that shifts the analytical focus from absolute gene-level expression abundance to the deviation between observed expression and transcription-factor-inferred regulatory expectations. Applying mqTrans to two independent GBM cohorts (CGGA and CPTAC-GBM), we systematically defined and identified \u0026ldquo;dark biomarkers\u0026rdquo;\u0026mdash;genes that show no significant expression changes in traditional differential expression analysis yet exhibit significant and reproducible regulatory deviations linked to survival. We uncovered 19 such dark biomarkers that consistently distinguished short-term from long-term survivors across both cohorts. These biomarkers converged on metabolic pathways (cholesterol homeostasis) and post-transcriptional regulatory programs (miRNA targets) and were enriched in genomically complex loci with overlapping non-coding transcripts, explaining their invisibility in abundance-based analyses. A LASSO-derived five-gene Cox model based on these dark biomarkers provided more stable and generalizable survival stratification than models built from conventional expression features. Our findings demonstrate that regulatory-deviation modeling can reveal hidden prognostic signals in GBM, offering a complementary regulatory-layer view of the transcriptome that captures regulatory perturbations that remain obscured under gene-expression abundance\u0026ndash;based quantification.\u003c/p\u003e","manuscriptTitle":"Uncovering dark biomarkers of glioblastoma survival through regulatory-deviation–based transcriptomic modeling","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-05-12 15:08:02","doi":"10.21203/rs.3.rs-9640328/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"81986144-716a-4997-9ffa-83e07c779ad1","owner":[],"postedDate":"May 12th, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":67699726,"name":"Cancer Biology"},{"id":67699727,"name":"Computational Biology"}],"tags":[],"updatedAt":"2026-05-12T15:08:03+00:00","versionOfRecord":[],"versionCreatedAt":"2026-05-12 15:08:02","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-9640328","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-9640328","identity":"rs-9640328","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.