Drug-Gene Network Signature Modeling Predicts Breast Cancer Patient Response to Neoadjuvant Chemotherapy

doi:10.21203/rs.3.rs-6130021/v1

Drug-Gene Network Signature Modeling Predicts Breast Cancer Patient Response to Neoadjuvant Chemotherapy

2025 · doi:10.21203/rs.3.rs-6130021/v1

preprint OA: closed CC-BY-4.0

📄 Open PDF Full text JSON View at publisher

Full text 157,567 characters · extracted from preprint-html · click to expand

Drug-Gene Network Signature Modeling Predicts Breast Cancer Patient Response to Neoadjuvant Chemotherapy | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Drug-Gene Network Signature Modeling Predicts Breast Cancer Patient Response to Neoadjuvant Chemotherapy Romano Flores, Rahul Nihalani, Sevgi Umur, Frederic Vigneault, and 1 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-6130021/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Neoadjuvant chemotherapy (NAC) has been a staple treatment for breast cancer (BRCA) patients regardless of the tumor histological type. While this treatment can be effective on a population level, the pathologic complete response (pCR) rate post-NAC for individual patients varies widely throughout various clinical demographic groups and has not dramatically changed in practice. Improving stratification methods for therapeutic interventions could avoid the physical side effects as well as the psychological stress of undergoing NAC treatment if a patient is unlikely to respond [ 1 , 2 ]. Given the rapid advancements in sequencing technologies and the availability of RNA expression data, medical solutions based on transcriptomics data are becoming increasingly prevalent [ 3 ]. Here, we present a novel method to stratify the prognosis for individual breast cancer patients for NAC therapy using RNA expression data from pre-treatment tumor biopsies by relying on network biology interactions rather than individual gene panels. We processed the datasets through the BioNAV™ pipeline to generate BioNAV™ network signatures (BioNAV™ NS) combined with a random forest machine learning model and incorporating demographic and other metadata, including patient race, specific drugs used in NAC treatment, and tumor histological subtyping. These network signatures offer insights into the gene-gene and drug-gene interactions occurring within each patient’s biopsy. This study demonstrates the capability of BioNAV™ NS to help guide BRCA prognoses through a comprehensive, network-level view of the gene expression data. Using BioNAV™ NS, we were able to accurately predict patient response to NAC with a mean area under the receiver operator characteristic (AUROC) of 82.4%. The addition of demographic and tumor receptor type stratification further increased performance to as high as an AUROC of 93.7% for patients who are progesterone receptor positive (PR+). Additionally, classifier performance was maintained when combining datasets from multiple studies and various transcriptomics platforms and heterogeneous preprocessing steps prior to BioNAV™ pipeline processing. Stratification by histological subgroups enhanced the predictive accuracy and AUROC of BioNAV™, outperforming two leading models in recent literature by 18.6% and 12.9%, respectively. BioNAV™ NS significantly enhances the predictive value of transcriptomic data to determine patient response to NAC. This approach offers the integration of multiple biological data and clinical metadata layers to improve clinical outcome prediction, highlighting potentially novel therapeutic mechanisms that have been hidden inside a heterogeneous patient population. A transition towards personalized treatment plans and adjuvant treatments may further enhance efficacy and reduce adverse events. Biological sciences/Cancer/Breast cancer Biological sciences/Cancer/Tumour biomarkers Health sciences/Oncology/Cancer/Tumour heterogeneity Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Figure 9 Figure 10 Figure 11 Figure 12 Figure 13 Introduction As a persistent leading cause of malignancy, BRCA presents an ongoing therapeutic challenge. Among the most common cancer affecting quality of life and survival of millions of women, it has been a focus of extensive research [ 1 , 4 , 5 ]. The current guidelines from the National Comprehensive Cancer Network (NCCN) for treating BRCA emphasize a strategy that considers tumor biology, genetic markers, and clinical factors, including the disease stage and patient demographics [ 5 ]. Specific treatments and standard of care protocols exist for various sub-types and stages of cancer, but overall treatment success remains less than 30% [ 1 , 2 ], with the majority of patients not benefiting from any particular regimen. Despite triple negative tumors having worse survival probabilities overall, clinical practice has remained to pursue NAC independent of the tumor receptor histology subgrouping since the benefit of a potential treatment success outweighs the risks of adverse events. For nonmetastatic BRCA, NAC to reduce the tumor size is being widely applied before proceeding with the traditional route of surgery followed by Adjuvant Treatment [ 6 ]. Opting for NAC is advantageous for many reasons – not only can a locally advanced inoperable BRCA be downstaged and become operable [ 7 , 8 , 9 ], but it can also provide a cost-effective overall solution [ 10 ]. Additionally, the treatment response can be observed early, allowing for personalized modifications in the treatment plan [ 11 ]. While NAC can be beneficial, it causes physical and psychological stress, and the response varies based on molecular subtypes [ 1 , 2 ]. Moreover, patients can reach chemoresistance after NAC, adding further challenges to BRCA treatment [ 12 , 13 ]. An early prediction of pCR to NAC could be beneficial in tailoring the treatment. In this study, we focus on predicting the pCR to NAC for patients using transcriptomics data collected from breast tissue biopsies prior to NAC treatment. While transcriptomics is currently not utilized in the strategies described by NCCN, research shows that it can be an effective tool to personalize treatment in chronic diseases, as RNA is a dynamic indicator of tumor state. In a 2021 study [ 14 ], researchers employed a generalized linear model to identify a panel of 18 genes with the highest predictive value for response to NAC. The gene expression panel for each patient was then incorporated into a machine learning model creating the first RNA expression classifier for NAC response prediction (RNAec1). Subsequently, a 15-gene classifier (RNAec2) was similarly developed, but using only samples predicted to achieve pCR by RNAec1. Additionally, the genes for RNAec2 were selected from a pool of 348 identified biomarkers. These biomarkers were based on an experimental model of breast epithelial cells and excluded the 18 genes used in the first classifier. This method is collectively referred to as “RNAec” in this study. In a more recent study [ 15 ], a similar approach was employed for RNA expression analysis, except this method relied on genes included in a specified set of 1,087 immunological genes manually curated to focus on immune system recognition of the tumor cells. Further refinement of this dataset was carried out, narrowing it down to 62 genes identified by their strong Spearman rank correlation with pCR, generating the “Ipredictor” model. Following this, the ICpredictor; model was formulated by enhancing the Ipredictor with clinicopathological metadata. Ipredictor and ICpredictor achieved an AUROC of 80.0% and 84.0%, respectively, for ER+ & HER2- stratification, while RNAec demonstrated an accuracy of 85.5% for ER+ & HER2- stratification, though the AUROC for this model was not reported. When applied without stratification, Ipredictor’s AUROC dropped to 74.9%, while ICpredictor improved it to 80.1%, and RNAec accuracy showed little change (Accuracy = 85.6%). Both models incorporated a model stacking strategy, utilizing a combination of the most effective machine learning models together with clinical metadata for predicting NAC responses. This demonstrated that the inclusion of additional data layers such as clinical metadata could further improve classifier performance. Despite the promising results, these methods require specific domain knowledge to reduce the genes used in the analyses, relying on experts to accurately identify genes with the highest predictive value, which may not translate broadly to novel datasets. Secondly, predictive models based on clinical outcomes fail to make accurate predictions when analyzing data aggregated from independent experiments, indicating potential issues with model bias [ 16 ] or dataset generation heterogeneity that may not be possible to compensate. Machine learning models in general tend to be biased towards the majority population. In the case of BRCA, there are various subgroups, with certain subgroups being more common than others. This can lead to subgroups being overlooked, as these models tend to overfit on dominant characteristics while neglecting nuanced characteristics of minority groups, thus hindering the predictive ability of the model to benefit individual patients [ 16 , 17 ]. Lastly, while traditional methods of predicting BRCA patient response to NAC have often been based on broad histological classification by receptor types, histological subtypes amongst BRCA patients have previously been shown to produce significantly different rates of pathological complete response (pCR) in response to NAC [ 1 ]. This may be partially explained by intricate tumor microenvironment heterogeneity, immune response, and background genetics of individual patients [ 18 ]. Additionally, the same tools can be used to further our understanding of the underlying biology and discover hidden mechanisms, leading to targeted diagnostic, treatments, and clinical outcomes. Specifying the drug treatment molecule itself rather than the drug class is also crucial, although the importance of this is often downplayed [ 16 ]. Drug treatments that fall into a drug class categorization have traditionally been treated as interchangeable. However, in a recent study, we used transcriptomics to identify differential activity of 5 statins, and this was found to significantly impact treatment success and survival of COVID-19 patients based on electronic medical record analysis [ 19 ]. In BRCA patients, the CDK4/6 inhibitors palbociclib, ribociclib, and abemaciclib have exhibited highly divergent clinical success rates despite sharing the same therapeutic target and a high degree of structural similarity [ 20 ]. These examples highlight the idea that molecules, despite belonging to the same pharmacological class, can exert markedly distinct impacts on biological networks through molecule-specific on- and off-target activities [ 19 ]. Here, we present a predictive model based on networks signature prioritization derived from gene-drug and gene-gene interactions to effectively stratify BRCA patients based on their treatment response to NAC. Our BioNAV™ model leverages gene expression data as an input and applies machine learning techniques to extract relevant networks of interest. The BioNAV™ pipeline transforms RNA expression data into actionable network signatures (BioNAV™ NS). These signatures surpass traditional gene expression profiles, offer richer and more accurate understanding of the tumor biology without a priori and compares favorably to previous results obtained with RNAec, Ipredictor and ICpredictor models proposed to assess personalized drug response performance. As a clinical discovery tool, BioNAV™ can prospectively predict the response to a treatment, including the numerous treatments utilized in NAC. Unlike traditional methods that filter for specific genes, BioNAV™ NS are developed through automated feature creation, allowing for a generalizable application to various diseases and treatment regimens [ 21 ]. Additionally, use of these network signatures allows for effective aggregation of multiple datasets without compromising its accuracy. Methods Data Dataset Acquisition: Five distinct BRCA-related datasets were sourced from Gene Expression Omnibus (GEO) [ 22 ]. Two datasets, GSE123845 (D1) and GSE163882 (D2), employed high-throughput sequencing using Illumina sequencers, while the remaining three utilized Affymetrix Human Genome Arrays for expression profiling. Additionally, each dataset provides clinical features of the patients. GSE112825 (DH) contained all healthy subjects, while the other datasets contained a mixture of patients who either achieved pCR or residual disease (RD). Detailed information is presented in Table 1 . Preprocessing: Microarray preprocessing was performed following published guidelines [ 23 ] for GSE20194 (D3) and GSE20271 (D4). Outliers were removed based on strict quality control (QC) criteria, which included both statistical thresholds and visual inspection of sample intensity distributions. Following QC, normalization was carried out using robust microarray analysis (RMA). For the high-throughput sequencing datasets, transcripts per million (TPM) normalization and QC had been performed prior to deposition in GEO. Table 1 Overview of datasets utilized in the study, presented at three stages: initial dataset, after QC, and after BioNAV™ NS processing. The table includes five datasets sourced from GEO, detailing the total number of samples in each dataset, as well as the breakdown of patient responses (pCR or RD). GEO accession numbers and corresponding dataset names, as referenced in the study, are provided. Initial Post-QC Post-Bionav Accession Name pCR RD Total pCR RD Total pCR RD Total GSE112825 DS-H N/A N/A 109 N/A N/A 99 N/A N/A N/A GSE123845 DS-1 159 68 227 70 42 112 59 31 90 GSE163882 DS-2 80 142 222 80 142 222 61 116 177 GSE20194 DS-3 56 222 278 53 183 236 13 73 86 GSE20271 DS-4 26 152 178 22 146 168 22 143 165 BioNAV™ RNA Expression Processing BioNAV™, a derivative of NeMoCAD [ 19 , 24 ], is a tool to model gene and perturbant interactions networks for a given set of gene expression data. Briefly, BioNAV™ uses the transcriptomics data, along with drug-gene and gene-gene interaction databases (BioNAV™ DBs) to compute several statistics (correlation, entropy) and network signatures for each RNA-seq sample. BioNAV™ DBs are internally constructed using public databases like LINCS and CTD [ 25 , 26 ]. Additionally, a Bayesian inference is performed on a drug-gene perturbation network constructed from differential genes and drugs interacting with those genes. The statistics and the network inference feed into computing the BioNAV™ network signature. At a high level, a network signature encapsulates how a sample interacts with an array of drugs. This contrasts with traditional approaches where generally, the focus is to understand differences between experimental and control groups. The BioNAV™ pipeline transforms the data from transcriptomics space to a latent space consisting of network signatures that not only capture the expression profile of the genes, but also interactions within as well as their interaction with drugs. Each network signature is comprised of latent features specific to the BioNAV™ pipeline, and their generated values specific to each patient. These signatures are then used for all further intra-patient comparative analyses. For this study, the gene expression data for each dataset was processed through the BioNAV™ computational pipeline, generating a network signature for each patient (Fig. 1 ). This leads to an automated reduction from tens of thousands of genes to a few hundred latent features, as well as a reduction in samples based on specific BioNAV™ pipeline criteria designed to ensure data quality and consistency, illustrated as “sample reduction” in Fig. 1 . The number of samples for each dataset after BioNAV™ pipeline reductions can be seen in Supplementary Table 1. Evaluating BioNAV™ Establishing Consistent Race and Treatment Subgroups To demonstrate the importance of specifying race and treatment when attempting to predict patient outcome to NAC, patient samples were categorized by race, drug class combinations, and the specific drug combinations that the patients were treated with. D3 and D4 were the most aligned datasets with respect to the most prevalent race and treatment. The most prevalent race found in D3 and D4 were described as “white”. The most prevalent treatment drug class combination in D3 and D4 incorporated taxanes, pyrimidine antagonists, anthracyclines, and alkylating agents. The two most prevalent treatment drug combinations specified the anthracycline to either doxorubicin or epirubicin, with doxorubicin being most prevalent. For D1, race data was not provided. The most prevalent drug class combination used to treat patients in D1 aligned with that of D3 and D4, only differing in pyrimidine antagonists being excluded for D1. D1 did not provide data on the specific drugs combinations used. D2 was excluded from this analysis, as it did not contain the treatment or race data critical for this assessment. D3 was comprised of 4 different sub-datasets according to the authors’ description [ 27 ]. To avoid a single dataset prejudicing the results when analyzing combined datasets, only a single sub-dataset from D3 was used for the analyses. The MAQC_V dataset (D3V) was selected from D3, as it contained a more balanced proportion of samples treated with doxorubicin and epirubicin, necessary for contrasting treatment class combination stratification and treatment class combination. Combining Datasets To increase the total number of patients for analysis and reduce biasing that can arise from single dataset modeling, the power set of all three datasets was analyzed (Supplementary Table 1). Establishing Consistent Histological Subgroups The histological groups selected for analysis in this study were chosen due to their frequent appearance in recent literature as key criteria for stratification [ 14 , 15 ]. Due to missing clinical data in some datasets, the histological subgroups considered for stratification were limited to progesterone receptor positive (PR+), progesterone receptor negative (PR-), estrogen receptor positive (ER+), estrogen Receptor negative (ER-), human epidermal growth factor receptor 2 positive (HER2+), human epidermal growth factor receptor 2 negative (HER2-), triple negative for all 3 receptors (TN+), non- triple negative (TN-), hormone receptor positive (HR+), and hormone receptor negative (HR-). Stratification To assess the predictive power of race, treatment and histological data, we stratified the datasets described in the previous section in multiple settings (shown in Fig. 2) and evaluated the RFML model on them. The largest subgroup observed at each stratification step was used for in the next layer of stratification. We assigned labels to these settings for ease of narration. R1 indicates the baseline datasets with no stratification. Single layer stratifications were evaluated (where applicable) by the most prevalent drug class combinations (R2), drug combinations (R3), and race (R4). We later describe the specific drug and race criteria used for various datasets and dataset combinations. Lastly, we evaluated a two-layer stratification setting (Fig. 3 ), stratified by the most prevalent race and drug combinations (R5). The stratification in (R5) was then further combined with stratification by the histological groups described in the previous section to assess their effects. Histological subgroup stratification was limited to the dataset combination D3V-NS, D4-NS, D1-NS, and a single additional stratification layer due to insufficient number of samples in all other cases. Random Forest Model To retain clinical understanding, as well as to demonstrate richness of BioNAV™ NS, analyses were conducted using a simple Random Forest regressor machine learning model (RFML) to predict responses to NAC. The model was configured using Poisson distribution to measure the quality of each split, with number of features used to determine each split set as log 2 of the total number of features. No bootstrapping was used. We utilized stratified 3-fold cross-validation to maintain the proportion of classes (pCR and RD) across splits and ensure that each fold contained a representative distribution of the target classes (Fig. 4 ). To address the imbalance between the two pCR and RD, class weights were computed from the training data in each fold. The inverses of the class frequencies were used as weights, ensuring equal contribution of both classes during model training. These class weights were normalized to sum to one to avoid bias towards the larger class. After model fitting, the predicted probabilities for the test set were computed and converted into binary class predictions using a threshold of 0.50. Various performance metrics consistently used in literature related to predicting patient response to NAC were then calculated for each fold, including accuracy, area under the receiver operating characteristic curve (AUROC), F1 score, and Positive Predictive Value (PPV), Negative Predictive Value (NPV), Sensitivity (SEN), and Specificity (SPE) [ 28 ]. After completion of all 3 folds, each metric was averaged to generate summary statistics. Applying the RFML model The BioNAV™ NS were evaluated by applying the previously described RFML model to predict patient responses to NAC. For each dataset and dataset combination listed in Supplementary Table 1, the BioNAV™ NS were used as input features, with patient response (pCR vs. RD) as the target variable. We assessed each dataset individually and in combination to evaluate the generalizability of the network signatures. Combining datasets was done to mitigate biases inherent in individual datasets and to increase the overall sample size. Additionally, evaluating BioNAV™ across different datasets enabled us to examine the performance of its network signatures across various sequencing platforms. By combining datasets, we were able to assess BioNAV™'s ability to account for inter-dataset variability, including batch effects. All assessments were conducted using the summary statistics mentioned previously. BioNAV™-NS Genes The D3V-NS D4-NS D1-NS dataset with R5 PR + stratification was used to compare the top genes derived from NS with those identified through differential gene expression (DGE) analysis for each sample group (pCR, RD). For the DGE analysis, true responses to NAC were used to separate pCR and RD groups, while predicted responses were used for the NS analysis. For the DGE method, genes were filtered based on a p-value threshold (< 0.05) and a log2 Fold Change (log2FC) threshold (≥ 1.5), ensuring the inclusion of genes with substantial and statistically significant expression changes. The occurrences of each remaining gene across the samples in the group were then counted, and the top 100 most frequently occurring genes were selected. The NS method followed a similar approach. A two-tailed t-test was performed, and genes were filtered using the same p-value (< 0.05) and log2FC (≥ 1.5) thresholds. Network signatures were then sorted primarily by p-value and secondarily by log2FC. The top 10 network signatures were selected, from which genes were extracted. For each network signature, the top 10% most heavily weighted genes were retained. The occurrences of these genes across the samples in the group were then counted, and the top 100 most frequently occurring genes were selected. Both methods were applied separately to the pCR and RD groups. This analysis resulted in four gene lists, detailed in Supplementary Table 2: the top 100 genes derived from network signatures (NS) and DGE for each group (pCR and RD). To assess the biological significance of these gene signatures, Gene Ontology (GO) terms and pathway enrichments were determined using g:Profiler [ 29 ]. To supplement the primary g:Profiler-based enrichment analysis, we conducted additional GO term and Reactome pathway enrichment analyses using the clusterProfiler, and ReactomePA R packages (DOI: 10.18129/B9.bioc.clusterProfiler , DOI: 10.18129/B9.bioc.ReactomePA ). GO enrichment analysis was performed separately for the pCR_NS, pCR_DGE, RD_NS, and RD_DGE gene lists using enrichGO, with Biological Process (BP) and Molecular Function (MF) terms as the primary focus. Reactome pathway enrichment was conducted using enrichPathway() to explore pathway-level relationships. Additionally, MF terms and reactome pathways were analyzed using compareCluster(), enabling a comparative functional assessment across groups. All terms were selected for by applying a p-value threshold of 0.05. Term redundancy in GO terms was minimized by using the Wang method for simplification. Pairwise similarity was then calculated using the Jaccard correlation coefficient method to illustrate the similarities and differences between terms. “showCategories” was set to 10 to focus on the top terms for each. To improve readability and consistency across enrichment results, an abbreviation standardization (supplementary table XX) was applied to all GO and Reactome enrichment results. q-scores were calculated as the -log10 transformation of adjusted p-values. Benchmarking BioNAV™ against recent ML models To gauge the effectiveness of BioNAV™ NS, we compared BioNAV™’s predictive metrics with those obtained from RNAec, Ipredictor and ICpredictor models [ 14 , 15 ] (Fig. 5 ). To perform an accurate benchmark analysis, we matched the dataset used in the two studies for developing their respective models; therefore, only D2 was used. In addition to unstratified results, these studies presented results on stratified subgroups based on ER-positive (ER+) and HER2-negative (HER2-) statuses. Accordingly, we stratified D2 by the same criteria. To produce metrics, Ipredictor and ICpredictor utilized bootstrap resampling with 2,000 replicates, while RNAec employed 5-fold cross-validation. To mitigate computational burden, we opted for cross-validation over bootstrap resampling, as cross-validation is generally less computationally intensive while still providing reliable model evaluation [ 30 , 31 ]. Due to the reduced sample size of D2 after BioNAV processing, we used 3-fold cross-validation rather than 5-fold as done for RNAec. Employing a higher number of folds would have resulted in smaller training subsets, which can increase the variance of performance estimates and potentially compromise the reliability of model evaluation [ 32 , 33 ]. Previous studies have demonstrated that with smaller datasets, reducing the number of folds helps maintain sufficient data in each training set, thereby enhancing the stability and reliability of the model's performance metrics [ 34 , 35 ]. Results BioNAV™ RNA Expression Processing We first addressed the inherent heterogeneity of working with datasets from multiple studies, each with respectively unique data generation methods and platforms. Traditional stratification of patients based on normalized gene expression clustered samples by the respective study or dataset, with low information content for each subgroup and high contrast between subgroups. This finding was not surprising given the sources of data that dataset normalization was unable to sufficiently overcome. In contrast, transformation of gene expression into network signatures using BioNAV™ NS resulted in greater discrimination of patient subgroups that spanned multiple datasets (Fig. 6 ). These NS-transformed data were used in subsequent patient stratification and machine learning. Evaluating BioNAV™ Applying the RFML model Supplementary Table 3 presents the results on various metrics obtained with our updated approach, including accuracy (ACC), area under the receiver operating characteristic curve (AUROC), positive predictive value (PPV), negative predictive value (NPV), specificity (SPE), sensitivity (SEN), and the F1 score (F1). The standard deviation observed for AUROC during the 3-fold cross validation (STDEV) is also displayed. Results are organized by groups (R1–R5), which represent different stratifications of the datasets. The presence of 0 values in PPV and SEN metrics reflect that the performance was severely constrained, likely due to extreme class imbalance. NA values indicate that data wasn’t sufficient to produce a meaningful result.’ Without stratification (R1), model performance was beset with high variability, though combined datasets improved AUROC (e.g., 72.6 for D3V-NS D4-NS D1-NS vs. 54.3 for D1-NS) and reduced variability (STDEV: 9.0 vs. 11.6). When filtering for drug treatment category (R2), stratification improved AUROC and F1 scores, and combined datasets balanced sensitivity and specificity. Stratifying by drug treatment molecule (R3) increased AUROC (78.1 for D3V-NS D4-NS D1-NS) and reduced variability. Similar to R1, stratifying by race (R4) showed wide performance variation, but combined datasets improved balance (AUROC: 78.8, F1: 67.9 for D3V-NS D4-NS D1-NS). The combined stratification by race and drug (R5) achieved the highest performance (AUROC: 82.4, F1: 73.8) with combined datasets. Despite batch variations, combining datasets consistently enhanced AUROC and stability across all stratification groups. For example, in R1, combined datasets improved AUROC by 33.7% compared to D1-NS alone. Similar trends were observed in R2 and R5. Individual datasets often exhibited imbalanced metrics, such as high specificity but low sensitivity (e.g., D3V-NS D4-NS in R1, SPE: 92.6, SEN: 3.0), which were mitigated through dataset combinations. The overall stratification trends became particularly evident for the combined dataset D3V-NS D4-NS D1-NS. As stratification became more specific, AUROC improved, and variability reduced. For example, without stratification (R1), D3V-NS D4-NS D1-NS achieved an AUROC of 72.6 with noticeable variability (STDEV: 9.0). In contrast, the most specific stratification (R5), which combined race and drug treatment molecule stratification, increased AUROC by 9.8 (82.4) and reduced STDEV to 3.3, highlighting the observed improvements in performance metrics with increased stratification specificity. Few observations that don’t follow this trend need further investigation but are likely attributable to lack of enough data for various subgroups in order to properly train the ML model. Figure 7 displays the AUROC values for each stratification strategy: R1 (no stratification), R2, R3, R4 (single-layer stratification) and R5 (two-layer stratification). R1 (red) appeared inconsistent in AUROC, with values ranging from 47.2–78.2%. Models with one layer of stratification mostly improved from no layers with, ranging from 52.6–83.4%. The 2-layer model appeared to drop in AUROC, with scores from 53.6–83.3%. The D3V, D4, D1 combined model was the only model that displayed a consistent positive correlation between the number of layers of stratification and AUROC. Combining R5 with histological subgroups stratification : After applying a third layer of stratification based on histological receptor statuses (ER, PR, HER2, HR, and TN) to the three-dataset combination (D3V-NS, D4-NS, D1-NS) from R5, we observed notable differences in performance relative to the unstratified R5 baseline (ACC: 74.7%, AUROC: 82.4%) (Supplementary Table 4 and Fig. 8 ). Subgroups characterized by positive receptor expression generally showed marked improvements. For instance, the PR + subgroup achieved an ACC of 94.9% and an AUROC of 93.7%, reflecting substantial increases of + 20.2% in ACC and + 11.3% in AUROC compared to the baseline R5. Similarly, ER + improved by + 12.8% in ACC (87.5%) and + 6.5% in AUROC (88.9%), and HR + showed gains of + 15.7% in ACC (90.4%) and + 5.1% in AUROC (87.5%). Even the TN- subgroup (i.e., those not classified as triple-negative) exhibited improvements, reaching 87.0% ACC (+ 12.3%) and 87.6% AUROC (+ 5.2%). In contrast, subgroups defined by negative receptor expression saw declines. ER- decreased by 14.4% in ACC (60.3%) and 15.5% in AUROC (66.9%). PR- declined by 7.2% in ACC (67.5%) and 7.6% in AUROC (74.8%), while HR- dropped substantially by 18.3% in ACC (56.4%) and 20.3% in AUROC (62.1%). The TN + subgroup also showed reduced performance (59.4% ACC and 66.0% AUROC), representing a 15.3% drop in ACC and 16.4% drop in AUROC. For the HER2- subgroup, a slight increase in ACC (+ 1.2%) to 75.9% was noted, but with a 4.1% reduction in AUROC (78.3%). HER2 + data were unavailable for evaluation. These results underscore the interplay between receptor status and model performance. Positive receptor subgroups (ER+, PR+, HR+, TN-) tend to yield higher accuracy and discriminative ability when layered onto the R5 stratification, whereas negative receptor subgroups (ER-, PR-, HR-, TN+) face declines. This trend aligns with the known biological heterogeneity and complexity associated with negative receptor expressions in breast cancer [ 36 , 37 , 38 , 39 , 40 ]. Addressing these challenges may require further refinements, including additional stratification layers or the incorporation of more targeted molecular features, to enhance the AUROC and improve predictive stability across all subgroups. Stratification and Number of Subjects: The number of subjects in each analysis had variable impacts on the AUROC, depending on the number of stratification layers (Fig. 9 ). The unstratified model (R1) exhibits an insignificant relationship between AUROC and the number of patients (slope = 0.03, R² = 0.06), indicating minimal improvement in performance as the patient count increases. For the single-layer stratification models (R2, R3, R4), R3 shows the strongest correlation (R² = 0.56), followed by R2 (R² = 0.27) and R4 (R 2 = 0.38). The two-layer stratification model (R5) displayed the strongest relationship (R² = 0.60) suggesting a significant positive trend between AUROC and patient numbers and highlighting the potential to uncover more information about underlying drug response pathways from fewer patients. In light of the exhaustive experiments conducted for all possible combinations of stratification settings and datasets, the number of patients decreases as stratification layers are added. This presents a challenge as a low sample count deprives the machine learning models of the ability to learn accurate patterns and produce accurate predictions. As expected, the performance of our model consistently increases as the number of patients increases; however, increasing stratification through the incorporation of available demographic and tumor metadata enhances model performance (Fig. 9 ). Moreover, where the number of samples is sufficient, the model accuracy increases as stratification layers are added. This is especially clear from the three-dataset combination D3V-NS, D4-NS, D1-NS, where the performance increases as we progressively go from a no stratification setting to three layers of stratification (Fig. 7 , Fig. 8 B). The drop in ER-, HER- and PR- cases indicate a presence of complex subgroups present within these groups. As can be seen in Fig. 13 , our model shows increased performance when two layers of histological subgroupings are applied. BioNAV™-NS Genes : The gene lists were mostly unique, with pCR DGE and RD DGE sharing the most (3) genes (Fig. 10 A). The pCR-NS genes and the enrichment for GO aligned with processes related to apoptosis, chromatin accessibility, inflammation, autophagy and cell cycle regulation. Additionally, the majority of biological processes (BP) GO are associated with responding to specific types of molecules, i.e., steroids and organonitrogens (Fig. 10 B). For the pCR-DGE, the enrichment profile is primarily neuronal, and ion channel related (Fig. 10 C). A closer look reveals that the genes are involved in proliferation, differentiation, cell cycle and plasticity. As such, their classification in GO could be an artifact of how these genes have been studied. The RD-NS list also aligns with apoptosis and chromatin accessibility, but it is distinct from pCR-NS by showing enrichment in DNA damage response and cell cycle arrest processes (Fig. 10 D). Finally, the RD-DGE a broad list of processes, including apoptosis, chromatin accessibility, neuronal differentiation and proliferation with Wnt signaling setting it apart (Fig. 10 E). Rho GTPases signaling are the primary Reactome pathways specifically enriched in the pCR-NS group (Fig. 11 ). This family of signaling protein regulates many cellular functions, but is mostly known for regulating the cytoskeleton, with direct effects on cellular trafficking and cell cycle progression. Rho GTPases are typically overexpressed in cancer cells, which has been linked to inhibition of apoptotic pathways and increased metastatic activity. Rho GTPases expression and activity are modulated by their localization (nucleus, cytosol, membrane and several post-translational modifications, including AMPylation palmitoylation, phosphorylation, prenylation, SUMOylation, transglutamination. As such, future research should aim at refining whether specific localization as well as post-translational modification of members of the Rho proteins can be specific predictors of pCR [ 41 , 42 , 43 ]. Rho GTPases have so far escaped direct therapeutic targeting. In the small GTPase family, KRAS has been targeted. Sotorasib, a mutation-specific covalent inhibitor of a G12C KRAS variant, was approved in 2021 for non-small-cell lung cancer in combination with PD-1 checkpoint inhibitor [ 44 ]. Other covalent variant-specific modalities are development or pending approval. Otherwise, successful strategies have targeted the upstream Rho-associated coiled-coil containing protein kinase (ROCK) inhibitor, for which only a few inhibitors have been approved since 2017, as well as downstream inhibitors of MEK-ERK-BRAF and PI3K-AKT-mTOR signaling pathways. The enrichment results are provided in Supplementary Table 2. Enrichment for Peptide YY in the RD-NS group is of particular interest. This short 36 amino acids protein is primarily known to be expressed in the intestines, but it is also expressed in other organs including the pancreas and the brain stem. It is associated with endocrine signaling and secretion, glucagon response and appetite suppression. The gene concept network illustrates the functional enrichment of genes in the molecular function (MF) category, as determined by GO analysis (Fig. 12 ). The plot reveals distinct molecular pathways of focus identified by NS and DGE groups. The NS RD nodes highlight pathways such as histone acetyltransferase binding and EC matrix structural constituent, suggesting foundational processes that may contribute to tumor resilience to chemotherapy. Conversely, NS pCR nodes emphasize transcriptional pathways like promoter-specific chromatin binding, supporting baseline tumor activity that does not appear to act as a barrier to chemotherapy sensitivity, aligning with assumptions of a pCR outcome. DGE predominantly identifies pathways such as calcium ion transmembrane transporter activity. This may provide insights into tumor biology; however, these enriched pathway groups do not appear unique to either RD or pCR groups. This overlap indicates that DGE captures broader tumor processes rather than group-specific differences. In contrast, NS predominantly captures pathways that are more distinctly associated with RD or pCR, indicating potential for identifying potential biomarkers that differentiate the two groups. Benchmarking BioNAV™ against recent ML models The performance metrics for comparing BioNAV™ against the RNAec, Ipredictor and ICpredictor models (collectively referred to as benchmark models) are shown in Supplementary Table 5. The missing values indicate that data was not present in the studies. ER+ & HER2- (2-layer): The 2-layer stratification produced the highest AUROC and accuracies for all the models, except for RNAec. BioNAV™ NS outperformed all benchmark models with an AUROC of 89.0%, compared to 80.0% for Ipredictor and 84.0% for ICpredictor. BioNAV™ NS also achieved the highest accuracy of 87.7%, while RNAec achieved 85.5%. Additionally, BioNAV™ NS reported a PPV of 63.6%, NPV of 93.5%, specificity of 91.5%, and sensitivity of 70.0%. In contrast, RNAec had a slightly lower PPV of 58.8%, but a higher NPV (94.2%) and sensitivity (87.5%), albeit with lower specificity (76.9%). These results suggest that BioNAV™ NS excels in correctly identifying true positives and negatives, while RNAec is more sensitive to detecting positives in this stratified group. Metrics aside from AUROC were not provided Ipredictor and ICpredictor for this stratification. ER+ (1-layer): For the ER+ (1-layer) stratification, BioNAV™ NS demonstrated an AUROC of 78.9%, outperforming Ipredictor (69.8%) but closely aligning with ICpredictor (79.4%). The accuracy for BioNAV™ NS was 81.7%, although comparison to RNAec was not possible for this group due to lack of available metrics. BioNAV™ NS also reported a PPV of 54.6% and an NPV of 85.9%, while achieving a high specificity of 92.4% but lower sensitivity at 37.5%. These metrics were also not provided for this stratification for any of the benchmark models No Stratification: In the non-stratified case, BioNAV™ NS achieved an accuracy of 69.2% and an AUROC of 71.0%. In comparison, Ipredictor and ICpredictor showed AUROC values of 74.9% and 80.1%, respectively. The RNAec model, although lacking AUROC data, achieved the highest accuracy of 85.6%, followed by the ICpredictor (72.1%), and Ipredictor (70.2%). Figure 13 presents the comparison of predictive performance between BioNAV™ NS, Ipredictor, ICpredictor, and RNAec models, illustrating the effect of additional layer stratification on AUROC and accuracy. For AUROC (left panel), BioNAV™ NS demonstrated a clear, consistent increase in performance across all stratifications, starting from 71.0% in the non-stratified case (D2-NS) to 89.0% in the ER+ & HER2- (2-layer) group. This consistent improvement contrasts with the trends observed in both Ipredictor and ICpredictor, where performance fluctuated across stratifications. Ipredictor's AUROC decreased from 74.9% in the non-stratified group to 69.8% in the ER+ (1-layer) group, followed by a slight recovery to 80.0% in the ER+ & HER2- (2-layer) group. Similarly, ICpredictor exhibited an initial drop from 80.1–79.4% between the non-stratified and ER+ (1-layer) groups, before improving to 84.0% in the 2-layer ER+ & HER2- group. Along similar lines, the right panel, which compares the accuracy between BioNAV™ NS and RNAec, shows a marked improvement in BioNAV™ NS performance with increasing stratification. Starting at 69.2% accuracy in the non-stratified group, BioNAV™ NS increased to 81.7% with ER+ (1-layer) stratification and reached a peak of 87.7% in the ER+ & HER2- (2-layer) group. In contrast, RNAec showed no significant variation across the stratifications, maintaining a high but stable accuracy of around 85.5% in all groups. Overall, BioNAV™ NS demonstrated continued performance improvement, with substantial improvements in both AUROC and accuracy as more layers were applied. Ipredictor and ICpredictor, on the other hand, showed more variable results, with no consistent trend across stratifications, while RNAec’s performance remained unchanged. [Supplementary Table 5] Discussion In this paper, we introduce BioNAV™ network signatures (BioNAV™ NS) that transform highly heterogenous datasets from treatment-naive biopsies and use them to predict patient response to NAC for individual breast cancer patients. BioNAV™ NS outperforms the other methods by up to 18.6% in AUROC. We evaluated BioNAV™ NS on individual datasets that were sourced from GEO, as well as all possible dataset combinations. We showed that our approach is not hampered when combining independent datasets from multiple experiments. Additionally, we stratified the data on multiple criteria like drug class, drug, race and histological groupings and tested our approach on various stratification settings. Our approach shows increased performance as stratification layers are added (as high as 95.2% AUROC), something that other approaches are unable to achieve, as seen in Fig. 13 . As a key limitation, we note that when either the number of samples or number of stratification layers is insufficient, it can impede performance. All machine learning models rely on sufficient data to learn accurate patterns. Similarly, multiple unidentified subgroups categorized as a single group can deceive the machine learning models into learning incorrect patterns. While Fig. 9 projects a trend of improving performance as more data is utilized for analysis, its validation would require additional transcriptome datasets from treatment-naive BRCA tumors. Additional data with a balanced proportion of samples under various clinical and histological criteria would be beneficial in addressing the impediments noted. Another limitation is the inability of our approach to directly interrogate non-RNA mechanisms. While all biological mechanisms will eventually perturb gene expression, some localized mechanisms at the protein or metabolite levels may be more diluted in the analysis. As more proteomics and metabolomics (and other multiomics) datasets become more widely generated, this represents a future opportunity for cross validation. The findings of this study highlight the critical role of advanced transcriptomic analysis in enhancing the precision of NAC response predictions for breast cancer patients. The study aligns with the broader vision of advancing the state of the art in personalized medicine, emphasizing the transition towards breast cancer treatments tailored to individual patient profiles. By accurately predicting patient response to NAC, clinicians can tailor treatment plans to individual patient profiles, potentially improving outcomes and reducing unnecessary treatments. The application of this technology could lead to a reduction in the physical and psychological burden experienced by patients undergoing NAC, especially for those unlikely to respond to such treatments. In this study, we describe a robust process for network-level feature discovery for predicting response to NAC. It is evident that BioNAV™ NS capture a richer stack of information in a condensed form. Specifically, BioNAV™ NS encapsulate a holistic view of gene and drug interactions and therefore, offer comparison of samples based on information that is absent in methods based on using a gene panel. Additionally, the GO analysis presented earlier supports the idea that the network signature approach is more relevant to underlying BP in tumors than standard DGE profiles. Furthermore, some processes are interesting candidate mechanisms unique to the pCR-NS and the RD-NS groups and are potentially actionable as novel therapeutic targets or drug response biomarkers. For the former, Rho GTPase signaling and BPs in response to specific types of molecules such as steroid and organonitrogen were highlighted. For the latter, Peptide YY association with the GI and appetite suppression raise the hypothesis that a particular metabolic predisposition exists in the RD group. An interesting question to explore is whether this predisposition is diet, gut microbiome, or activity related, i.e., something that we can act on prior to the treatment, or is it genetically related in which case new therapeutic molecules or gene therapy will be required, which is a much higher bar to reach to help these patients. BioNAV™ NS was effective at discriminating responders and non-responders to NAC treatment, thereby offering potential for personalized oncology to reduce needless side effects in patients while improving outcomes and identifying patient subgroups lacking effective therapies. Additionally, developing biomarkers from NS genes could enhance the comprehensiveness and accuracy of stratification in clinical practice while enabling higher fidelity efficacy biomarkers in clinical trials. More broadly, BioNAV™ NS can facilitate understanding and treatment of diseases beyond breast cancer, with opportunities in rare oncology, drug-resistant tumors, and non-oncology diseases with complex patient populations and divergent treatment outcomes. Declarations Competing Interests The authors are or were employees of Unravel Biosciences, Inc. and hold equity in the company. Author Contribution R.F. and R. Ni. wrote the manuscript with contributions from all authors. R.F. conceptualized the study and developed the code for data preprocessing, machine learning modeling, and performance evaluation. S.U provided insights into the machine learning strategies. R.F. and R. Ni conducted metric evaluations. R.F., R. Ni., and F.V. performed the analysis of BioNAV™ network signatures. F.V. and R. No. provided key insights on the clinical relevance of the findings and supervised the integration of clinical metadata. All authors contributed to the study design. Data Availability Statement All datasets were sourced from publicly accessible databases as referenced in the manuscript. Code Availability The custom code used for data processing and analysis in this study is available on GitHub at https://github.com/romanf24/Drug-Gene-Network-Signature-Modeling-Predicts-BC-Patient-Response-to-NAC . Certain proprietary algorithms used in this study are part of Unravel Biosciences, Inc.’s intellectual property and are not publicly available. References Zhang, H., Zhang, X., Jin, L. & Wang, Z. The neoadjuvant chemotherapy responses and survival rates of patients with different molecular subtypes of breast cancer. Am J Transl Res 14, 4648–4656 (2022). Romeo, V. et al. Assessment and Prediction of Response to Neoadjuvant Chemotherapy in Breast Cancer: A Comparison of Imaging Modalities and Future Perspectives. Cancers 13, (2021). Supplitt, S., Karpinski, P., Sasiadek, M. & Laczmanska, I. Current Achievements and Applications of Transcriptomics in Personalized Cancer Medicine. International Journal of Molecular Sciences 22, (2021). Smolarz, B., Nowak, A. Z. & Romanowicz, H. Breast Cancer—Epidemiology, Classification, Pathogenesis and Treatment (Review of Literature). Cancers 14, (2022). Trayes, K. P. & Cokenakes, S. E. H. Breast Cancer Treatment. Am Fam Physician 104, 171–178 (2021). Xu, W. et al. Predictors of Neoadjuvant Chemotherapy Response in Breast Cancer: A Review. Onco Targets Ther 13, 5887–5899 (2020). Charfare, H., Limongelli, S. & Purushotham, A. D. Neoadjuvant chemotherapy in breast cancer. British Journal of Surgery 92, 14–23 (2005). Costa, S. D. et al. Neoadjuvant Chemotherapy Shows Similar Response in Patients With Inflammatory or Locally Advanced Breast Cancer When Compared With Operable Breast Cancer: A Secondary Analysis of the GeparTrio Trial Data. Journal of Clinical Oncology 28, 83–91 (2010). Schott, A. F. & Hayes, D. F. Defining the Benefits of Neoadjuvant Chemotherapy for Breast Cancer. Journal of Clinical Oncology 30, 1747–1749 (2012). Schegerin, M., Tosteson, A. N. A., Kaufman, P. A., Paulsen, K. D. & Pogue, B. W. Prognostic imaging in neoadjuvant chemotherapy of locally-advanced breast cancer should be cost-effective. Breast Cancer Research and Treatment 114, 537–547 (2009). Sinn, B. V. et al. On-treatment biopsies to predict response to neoadjuvant chemotherapy for breast cancer. Breast Cancer Research 26, 138 (2024). Cao, J. et al. Chemoresistance and Metastasis in Breast Cancer Molecular Mechanisms and Novel Clinical Strategies. Frontiers in Oncology 11, (2021). Sannachi, L. et al. Response monitoring of breast cancer patients receiving neoadjuvant chemotherapy using quantitative ultrasound, texture, and molecular features. PLoS One 13, e0189634 (2018). Chen, J. W. et al. RNA expression classifiers from a model of breast epithelial cell organization to predict pathological complete response in triple negative breast cancer. medRxiv (2021) doi: 10.1101/2021.02.10.21251517 . Chen, J. et al. Machine learning models based on immunological genes to predict the response to neoadjuvant therapy in breast cancer patients. Frontiers in Immunology 13, (2022). Chekroud, A. M. et al. Illusory generalizability of clinical prediction models. Science 383, 164–167 (2024). Sharma, S. et al. The impact of self-identified race on epidemiologic studies of gene expression. Genetic Epidemiology 35, 93–101 (2011). Lehmann, B. D., Pietenpol, J. A. & Tan, A. R. Triple-Negative Breast Cancer: Molecular Subtypes and New Targets for Therapy. Am Soc Clin Oncol Educ Book e31–e39 doi: 10.14694/EdBook_AM.2015.35.e31 . Sperry, M. M. et al. Target-agnostic drug prediction integrated with medical record analysis uncovers differential associations of statins with increased survival in COVID-19 patients. PLOS Computational Biology 19, 1–21 (2023). Braal, C. L. et al. Inhibiting CDK4/6 in Breast Cancer with Palbociclib, Ribociclib, and Abemaciclib: Similarities and Differences. Drugs 81, 317–331 (2021). Novak, R. et al. Target-agnostic discovery of Rett Syndrome therapeutics by coupling computational network analysis and CRISPR-enabled in vivo disease modeling. bioRxiv (2022) doi: 10.1101/2022.03.20.485056 . Barrett, T. et al. NCBI GEO: archive for functional genomics data sets—10 years on. Nucleic Acids Research 39, D1005–D1010 (2010). Klaus, B. & Reisenauer, S. An end to end workflow for differential gene expression using Affymetrix microarrays. F1000Res 5, 1384 (2016). Plaza Oliver, M. et al. Donepezil Nanoemulsion Induces a Torpor-like State with Reduced Toxicity in Nonhibernating Xenopus laevis Tadpoles. ACS Nano 18, 23991–24003 (2024). NIH LINCS Program. https://lincsproject.org/ Davis, A. P. et al. Comparative Toxicogenomics Database (CTD): update 2023. Nucleic Acids Res 51, D1257–D1262 (2023). Shi, L. et al. The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models. Nat Biotechnol 28, 827–838 (2010). Zuo, D. et al. Machine learning-based models for the prediction of breast cancer recurrence risk. BMC Medical Informatics and Decision Making 23, 276 (2023). Kolberg, L. et al. g:Profiler—interoperable web service for functional enrichment analysis and gene identifier mapping (2023 update). Nucleic Acids Research 51, W207–W212 (2023). Efron, B. & Tibshirani, R. J. An Introduction to the Bootstrap . (Taylor & Francis, 1994). Kohavi, R. A study of cross-validation and bootstrap for accuracy estimation and model selection. in Proceedings of the 14th International Joint Conference on Artificial Intelligence - Volume 2 1137–1143 (Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1995). Hastie, T., Tibshirani, R. & Friedman, J. H. The Elements of Statistical Learning: Data Mining, Inference, and Prediction . (Springer, 2001). Arlot, S. & Celisse, A. A survey of cross-validation procedures for model selection. Statistics Surveys 4, (2010). Laan, M. J. van der, Polley, E. C. & Hubbard, A. E. Super Learner. Statistical Applications in Genetics and Molecular Biology 6, (2007). Molinaro, A. M., Simon, R. & Pfeiffer, R. M. Prediction error estimation: a comparison of resampling methods. Bioinformatics 21, 3301–3307 (2005). Weigelt, B., Baehner, F. L. & Reis-Filho, J. S. The contribution of gene expression profiling to breast cancer classification, prognostication and prediction: a retrospective of the last decade. The Journal of Pathology 220, 263–280 (2010). Sørlie, T. et al. Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc Natl Acad Sci U S A 98, 10869–10874 (2001). Dai, X. et al. Breast cancer intrinsic subtype classification, clinical use and future trends. Am J Cancer Res 5, 2929–2943 (2015). Rakha, E. A., Reis-Filho, J. S. & Ellis, I. O. Basal-Like Breast Cancer: A Critical Review. JCO 26, 2568–2581 (2008). Weigelt, B. & Reis-Filho, J. S. Histological and molecular types of breast cancer: is there a unifying taxonomy? Nat Rev Clin Oncol 6, 718–730 (2009). Cho, H. J., Kim, J.-T., Baek, K. E., Kim, B.-Y. & Lee, H. G. Regulation of Rho GTPases by RhoGDIs in Human Cancers. Cells 8, (2019). Mosaddeghzadeh, N. & Ahmadian, M. R. The RHO Family GTPases: Mechanisms of Regulation and Signaling. Cells 10, (2021). Navarro-Lérida, I., Sánchez-Álvarez, M. & Del Pozo, M. Á. Post-Translational Modification and Subcellular Compartmentalization: Emerging Concepts on the Regulation and Physiopathological Relevance of RhoGTPases. Cells 10, (2021). Blair, H. A. Sotorasib: First Approval. Drugs 81, 1573–1579 (2021). Additional Declarations Competing interest reported. The authors are or were employees of Unravel Biosciences, Inc. and hold equity in the company. Supplementary Files SupplementaryTables135.docx SupplementaryTable2.xlsx Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-6130021","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":425224146,"identity":"4088f536-5f9c-4a9c-a27d-ba6a72a2c2f5","order_by":0,"name":"Romano Flores","email":"","orcid":"","institution":"Unravel Biosciences, Inc","correspondingAuthor":false,"prefix":"","firstName":"Romano","middleName":"","lastName":"Flores","suffix":""},{"id":425224148,"identity":"45c96f05-261a-4d12-b685-498fe5906da4","order_by":1,"name":"Rahul Nihalani","email":"","orcid":"","institution":"Unravel Biosciences, Inc","correspondingAuthor":false,"prefix":"","firstName":"Rahul","middleName":"","lastName":"Nihalani","suffix":""},{"id":425224149,"identity":"ca3e1489-0cd1-4495-b300-f61ce52e51f9","order_by":2,"name":"Sevgi Umur","email":"","orcid":"","institution":"Unravel Biosciences, Inc","correspondingAuthor":false,"prefix":"","firstName":"Sevgi","middleName":"","lastName":"Umur","suffix":""},{"id":425224150,"identity":"8d04e8ad-c245-47d5-9d88-567b045e7d79","order_by":3,"name":"Frederic Vigneault","email":"","orcid":"","institution":"Unravel Biosciences, Inc","correspondingAuthor":false,"prefix":"","firstName":"Frederic","middleName":"","lastName":"Vigneault","suffix":""},{"id":425224151,"identity":"7f0eaafe-2c0e-418a-80a8-27e7038d1b0f","order_by":4,"name":"Richard Novak","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA+0lEQVRIiWNgGAWjYDACZgY2MAkCjA0VUNEHDAd4iNRyBiqagE8LA7KWxjaEFpzqzduZnz34uceawZy9x/DjzHmH88zbzxg+SGC4I8OPQ4vMYTZzw55n6QyWPWeMJTduO1wscybH2CCB4RmPZAN2LRLMPGwSPAcOMxjcSEuQfLjtcOIMhrQ0iQSGwzwGONwG0iL5B6Il+efDOUAt/M/SfxDSIg2xJfmY5MYGoBaJ5GMM+LWwmUnLHEjnMThz+JjljGPpQC2PD0skGODxC//hZ5JvDljLGRxvbL7ZU2MNdFhi44cPFXfscYUYDKBHnAEB9aNgFIyCUTAK8AIAIRVaDPrL4VEAAAAASUVORK5CYII=","orcid":"","institution":"Unravel Biosciences, Inc","correspondingAuthor":true,"prefix":"","firstName":"Richard","middleName":"","lastName":"Novak","suffix":""}],"badges":[],"createdAt":"2025-02-28 15:53:21","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-6130021/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-6130021/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":78753584,"identity":"e07af8ce-d794-447f-8d2c-db43ab4f5497","added_by":"auto","created_at":"2025-03-18 12:21:33","extension":"jpg","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":184342,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eDiagram depicting the BioNAV™ processing of the independent datasets. BioNAV™ can generate network signatures with (A) or without (B) use of an external healthy breast tissue control dataset (DH). The labels for datasets processed by BioNAV™ pipeline include the “NS” suffix.\u003c/em\u003e\u003c/p\u003e","description":"","filename":"Picture1.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6130021/v1/cd5210e2819945060d23ff41.jpg"},{"id":78754299,"identity":"e6a69adf-75b6-488a-9b4b-cbeb8621faa8","added_by":"auto","created_at":"2025-03-18 12:29:33","extension":"jpg","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":225831,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eThe stratification parameters utilized on each individual dataset (first 3 rows), each dataset pair (next 3 rows), and all together (last row) to produce each set of prediction metrics. R1 indicates no stratification, R2 represents stratification by treatment, R3 denotes stratification by treatment with more specificity, R4 shows stratification by race, and R5 combines stratification by race and specific treatment.\u003c/em\u003e\u003c/p\u003e","description":"","filename":"Picture2.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6130021/v1/cd0e3e5757142a54c9fc2c8b.jpg"},{"id":78754301,"identity":"8664ffe6-9b35-4d4b-adc1-1940e5742f24","added_by":"auto","created_at":"2025-03-18 12:29:33","extension":"jpg","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":184451,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eDiagram illustrating 2-layers of patient stratification by histological subgroup.\u003c/em\u003e\u003c/p\u003e","description":"","filename":"Picture3.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6130021/v1/ed2e93a112c51c9086bedc69.jpg"},{"id":78753580,"identity":"b7bf7852-ec3f-4622-b625-80245d7472f0","added_by":"auto","created_at":"2025-03-18 12:21:33","extension":"jpg","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":124538,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eThe 3-fold cross validation method utilized to produce the prediction metrics used in the analyses.\u003c/em\u003e\u003c/p\u003e","description":"","filename":"Picture4.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6130021/v1/80d1c560f7c1c1e2e5ffa9f3.jpg"},{"id":78754303,"identity":"e1f57240-daca-4d30-94b3-dd93706b01bb","added_by":"auto","created_at":"2025-03-18 12:29:33","extension":"jpg","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":236600,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eThe response prediction process comparing BioNAV™ NS with RNAec, Ipredictor, and ICpredictor methodologies. RNAec uses a combination of 2 sets of unique independent gene filtering amounting to 33 unique genes (15 genes and 18 genes for RNAec1, and RNAec2 respectively), while Ipredictor and ICpredictor both use the same 62.\u003c/em\u003e\u003c/p\u003e","description":"","filename":"Picture5.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6130021/v1/90b28da058af5a0c8a0d9729.jpg"},{"id":78754304,"identity":"421ed77a-c0ee-45cc-9d9b-df5df7d0135e","added_by":"auto","created_at":"2025-03-18 12:29:33","extension":"jpg","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":224529,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eHeatmaps representing individual patient data with hierarchical clustering of gene expression (A) and BioNAV™ NS transformation (B) to highlight impact on dataset comparability and intragroup analysis. Color mapping represents Pearson’s correlation values between individual patients, with red denoting positive correlations and blue denoting negative correlations, ranging from -1.0 to 1.0. The side annotations indicate clinical metadata, including response to NAC (Response), ER, PR, HER2 status, triple negative receptor status (TN), hormone receptor status (HR) and Race.\u003c/em\u003e\u003c/p\u003e","description":"","filename":"Picture6.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6130021/v1/a283431c7b5248efd3fb16eb.jpg"},{"id":78754305,"identity":"c9cfc9c2-d755-404f-9e37-3e6afbd4ec53","added_by":"auto","created_at":"2025-03-18 12:29:33","extension":"jpg","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":240827,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eBar chart comparing the predictive performance (AUROC, %) across the five stratification strategies (R1-R5) applied to the various datasets and their combinations outlined in Figure 2. Each color represents a specific stratification strategy: R1 (red), R2 (light blue), R3 (blue), R4 (dark blue), and R5 (orange). The chart demonstrates the incremental improvement in predictive performance with progressive stratification, highlighting its impact on model performance when applied to BioNAV™ NS data.\u003c/em\u003e\u003c/p\u003e","description":"","filename":"Picture7.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6130021/v1/502b9b26bb052545973f609c.jpg"},{"id":78753593,"identity":"6ae8cb10-0c8e-4d6d-9762-b0df2cf3d847","added_by":"auto","created_at":"2025-03-18 12:21:33","extension":"jpg","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":142757,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003ePlots illustrating the significance of histological stratification (left) on the predictive ability of BioNAV™. The workflow (A) depicts the layered stratification applied to the D3V-NS D4-NS D1-NS dataset, first undergoing R5 stratification as described in Figure 2, followed by stratification into various histological subgroups. The bar chart (B) presents the predictive performance (AUROC) for each group, emphasizing model performance for specific histological subgroups such as R5 PR+ and R5 ER+. R5 HER2+ is omitted from the chart due to insufficient sample size.\u003c/em\u003e\u003c/p\u003e","description":"","filename":"Picture8.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6130021/v1/13414f253e9653bd95e53b98.jpg"},{"id":78753588,"identity":"0f3de297-1777-4160-adce-f4ae9b42db01","added_by":"auto","created_at":"2025-03-18 12:21:33","extension":"jpg","order_by":9,"title":"Figure 9","display":"","copyAsset":false,"role":"figure","size":80515,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eRelationship between the number of patients and AUROC. R1-R5 represent the layered stratification strategies described in Figure 2. The coefficient of determination (R²) and slope (m) of each linear regression line are shown, providing insight into the significance of the correlation between AUROC and patient count.\u003c/em\u003e\u003c/p\u003e","description":"","filename":"Picture9.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6130021/v1/801ffe7d0400f8430091bb43.jpg"},{"id":78753616,"identity":"204e01a8-6269-468b-a7db-dfa3119396e4","added_by":"auto","created_at":"2025-03-18 12:21:34","extension":"jpg","order_by":10,"title":"Figure 10","display":"","copyAsset":false,"role":"figure","size":205877,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003e(A) Venn diagram illustrating the overlap of the top 100 significant genes selected through DGE analysis (blue and green) and those derived from BioNAV™ NS (purple and yellow). Significant genes were identified separately for pCR and RD patients. Values represent the number of genes from each approach, with the number of shared genes shown in the overlapping regions. (B-E) Bar plots illustrating the top 10 enriched GO terms associated with BP for pCR NS, pCR DGE, RD NS, and RD DGE. Each bar represents an individual GO term, with its length corresponding to the q-score (negative log-transformed adjusted p-value), indicating the significance of enrichment. Colors within bars reflect the adjusted p-values, with redder shades representing more significant terms. Panels (A) and (C) display pathways identified by the NS method for pCR and RD groups, respectively, while panels (B) and (D) show pathways identified by the DGE method for pCR and RD groups. The figure highlights distinct functional pathways associated with each group and method, illustrating their potential roles in differentiating tumor behavior in RD and pCR patients.\u003c/em\u003e\u003c/p\u003e","description":"","filename":"Picture10.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6130021/v1/704331f53625d7135797de19.jpg"},{"id":78754316,"identity":"deed84f2-0485-4c2a-b455-30b527118df7","added_by":"auto","created_at":"2025-03-18 12:29:34","extension":"jpg","order_by":11,"title":"Figure 11","display":"","copyAsset":false,"role":"figure","size":150209,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eReactome pathways related to the respective genes. Nodes represent individual GO terms, with their size proportional to the number of genes associated with each term. Colors within nodes correspond to specific gene expression categories, including pCR NS, pCR DGE, RD NS, and RD DGE. Edges indicate relationships or shared genes between GO terms, highlighting functional connectivity. The legend indicates the gene count range for enriched terms across conditions.\u003c/em\u003e\u003c/p\u003e","description":"","filename":"Picture11.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6130021/v1/78cf50a8b23e441006c8b470.jpg"},{"id":78754309,"identity":"dc2d1d03-e7ef-443b-8b08-e48011f22864","added_by":"auto","created_at":"2025-03-18 12:29:33","extension":"jpg","order_by":12,"title":"Figure 12","display":"","copyAsset":false,"role":"figure","size":227548,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eGene concept network illustrating enriched GO terms associated with MF. Nodes represent individual GO terms, with their size proportional to the number of genes associated with each term. Colors within nodes correspond to specific gene expression categories, including pCR NS, pCR DGE, RD NS, and RD DGE. Edges indicate relationships or shared genes between GO terms, highlighting functional connectivity. The legend indicates the gene count range for enriched terms across conditions.\u003c/em\u003e\u003c/p\u003e","description":"","filename":"Picture12.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6130021/v1/b103ac6c29624e14c105c562.jpg"},{"id":78754315,"identity":"436be81f-a80b-4302-a3c4-74dc9899f393","added_by":"auto","created_at":"2025-03-18 12:29:34","extension":"jpg","order_by":13,"title":"Figure 13","display":"","copyAsset":false,"role":"figure","size":113561,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003ePlots illustrating the effect each additional layer stratification has on the predictive abilities of each method. AUROC (A) was used for comparison in BioNAV™ NS vs Ipredictor vs ICpredictor, while accuracy (B) was used for BioNAV™ NS vs RNAec. As stratification layers are added, BioNAV™ NS (light blue) shows a consistent significant increase in predictive performance, while Ipredictor (\u003c/em\u003eteal\u003cem\u003e) and ICpredictor (\u003c/em\u003eblue-green\u003cem\u003e) show inconsistent trends, and RNAec (purple) shows no significant change.\u003c/em\u003e\u003c/p\u003e","description":"","filename":"Picture13.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6130021/v1/6696b5aec275896473989ac8.jpg"},{"id":80409628,"identity":"60928d7e-bb38-44df-a792-ff4a4b9cb760","added_by":"auto","created_at":"2025-04-11 15:31:46","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":3077782,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6130021/v1/ec6dc50f-9c0d-4261-82ce-c97dff54454b.pdf"},{"id":78755280,"identity":"e723b0a6-573c-4405-a3d1-05920b88d207","added_by":"auto","created_at":"2025-03-18 12:45:33","extension":"docx","order_by":0,"title":"","display":"","copyAsset":false,"role":"supplement","size":37654,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryTables135.docx","url":"https://assets-eu.researchsquare.com/files/rs-6130021/v1/4f084ac77900cd3982d96a75.docx"},{"id":78753590,"identity":"7303cdcb-a412-4d60-94fe-66f81c9f5e0a","added_by":"auto","created_at":"2025-03-18 12:21:33","extension":"xlsx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":2323554,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryTable2.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-6130021/v1/d198d39b7a0cef1fa96839a6.xlsx"}],"financialInterests":"Competing interest reported. The authors are or were employees of Unravel Biosciences, Inc. and hold equity in the company.","formattedTitle":"Drug-Gene Network Signature Modeling Predicts Breast Cancer Patient Response to Neoadjuvant Chemotherapy","fulltext":[{"header":"Introduction","content":"\u003cp\u003eAs a persistent leading cause of malignancy, BRCA presents an ongoing therapeutic challenge. Among the most common cancer affecting quality of life and survival of millions of women, it has been a focus of extensive research [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e, \u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e, \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e]. The current guidelines from the National Comprehensive Cancer Network (NCCN) for treating BRCA emphasize a strategy that considers tumor biology, genetic markers, and clinical factors, including the disease stage and patient demographics [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e]. Specific treatments and standard of care protocols exist for various sub-types and stages of cancer, but overall treatment success remains less than 30% [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e, \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e], with the majority of patients not benefiting from any particular regimen. Despite triple negative tumors having worse survival probabilities overall, clinical practice has remained to pursue NAC independent of the tumor receptor histology subgrouping since the benefit of a potential treatment success outweighs the risks of adverse events. For nonmetastatic BRCA, NAC to reduce the tumor size is being widely applied before proceeding with the traditional route of surgery followed by Adjuvant Treatment [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e]. Opting for NAC is advantageous for many reasons \u0026ndash; not only can a locally advanced inoperable BRCA be downstaged and become operable [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e, \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e, \u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e], but it can also provide a cost-effective overall solution [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e]. Additionally, the treatment response can be observed early, allowing for personalized modifications in the treatment plan [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e]. While NAC can be beneficial, it causes physical and psychological stress, and the response varies based on molecular subtypes [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e, \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e]. Moreover, patients can reach chemoresistance after NAC, adding further challenges to BRCA treatment [\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e, \u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e]. An early prediction of pCR to NAC could be beneficial in tailoring the treatment. In this study, we focus on predicting the pCR to NAC for patients using transcriptomics data collected from breast tissue biopsies prior to NAC treatment.\u003c/p\u003e \u003cp\u003eWhile transcriptomics is currently not utilized in the strategies described by NCCN, research shows that it can be an effective tool to personalize treatment in chronic diseases, as RNA is a dynamic indicator of tumor state. In a 2021 study [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e], researchers employed a generalized linear model to identify a panel of 18 genes with the highest predictive value for response to NAC. The gene expression panel for each patient was then incorporated into a machine learning model creating the first RNA expression classifier for NAC response prediction (RNAec1). Subsequently, a 15-gene classifier (RNAec2) was similarly developed, but using only samples predicted to achieve pCR by RNAec1. Additionally, the genes for RNAec2 were selected from a pool of 348 identified biomarkers. These biomarkers were based on an experimental model of breast epithelial cells and excluded the 18 genes used in the first classifier. This method is collectively referred to as \u0026ldquo;RNAec\u0026rdquo; in this study.\u003c/p\u003e \u003cp\u003eIn a more recent study [\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e], a similar approach was employed for RNA expression analysis, except this method relied on genes included in a specified set of 1,087 immunological genes manually curated to focus on immune system recognition of the tumor cells. Further refinement of this dataset was carried out, narrowing it down to 62 genes identified by their strong Spearman rank correlation with pCR, generating the \u0026ldquo;Ipredictor\u0026rdquo; model. Following this, the ICpredictor; model was formulated by enhancing the Ipredictor with clinicopathological metadata. Ipredictor and ICpredictor achieved an AUROC of 80.0% and 84.0%, respectively, for ER+ \u0026amp; HER2- stratification, while RNAec demonstrated an accuracy of 85.5% for ER+ \u0026amp; HER2- stratification, though the AUROC for this model was not reported. When applied without stratification, Ipredictor\u0026rsquo;s AUROC dropped to 74.9%, while ICpredictor improved it to 80.1%, and RNAec accuracy showed little change (Accuracy\u0026thinsp;=\u0026thinsp;85.6%). Both models incorporated a model stacking strategy, utilizing a combination of the most effective machine learning models together with clinical metadata for predicting NAC responses. This demonstrated that the inclusion of additional data layers such as clinical metadata could further improve classifier performance.\u003c/p\u003e \u003cp\u003eDespite the promising results, these methods require specific domain knowledge to reduce the genes used in the analyses, relying on experts to accurately identify genes with the highest predictive value, which may not translate broadly to novel datasets. Secondly, predictive models based on clinical outcomes fail to make accurate predictions when analyzing data aggregated from independent experiments, indicating potential issues with model bias [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e] or dataset generation heterogeneity that may not be possible to compensate. Machine learning models in general tend to be biased towards the majority population. In the case of BRCA, there are various subgroups, with certain subgroups being more common than others. This can lead to subgroups being overlooked, as these models tend to overfit on dominant characteristics while neglecting nuanced characteristics of minority groups, thus hindering the predictive ability of the model to benefit individual patients [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e, \u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e]. Lastly, while traditional methods of predicting BRCA patient response to NAC have often been based on broad histological classification by receptor types, histological subtypes amongst BRCA patients have previously been shown to produce significantly different rates of pathological complete response (pCR) in response to NAC [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e]. This may be partially explained by intricate tumor microenvironment heterogeneity, immune response, and background genetics of individual patients [\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e]. Additionally, the same tools can be used to further our understanding of the underlying biology and discover hidden mechanisms, leading to targeted diagnostic, treatments, and clinical outcomes.\u003c/p\u003e \u003cp\u003eSpecifying the drug treatment molecule itself rather than the drug class is also crucial, although the importance of this is often downplayed [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e]. Drug treatments that fall into a drug class categorization have traditionally been treated as interchangeable. However, in a recent study, we used transcriptomics to identify differential activity of 5 statins, and this was found to significantly impact treatment success and survival of COVID-19 patients based on electronic medical record analysis [\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e]. In BRCA patients, the CDK4/6 inhibitors palbociclib, ribociclib, and abemaciclib have exhibited highly divergent clinical success rates despite sharing the same therapeutic target and a high degree of structural similarity [\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e]. These examples highlight the idea that molecules, despite belonging to the same pharmacological class, can exert markedly distinct impacts on biological networks through molecule-specific on- and off-target activities [\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eHere, we present a predictive model based on networks signature prioritization derived from gene-drug and gene-gene interactions to effectively stratify BRCA patients based on their treatment response to NAC. Our BioNAV\u0026trade; model leverages gene expression data as an input and applies machine learning techniques to extract relevant networks of interest. The BioNAV\u0026trade; pipeline transforms RNA expression data into actionable network signatures (BioNAV\u0026trade; NS). These signatures surpass traditional gene expression profiles, offer richer and more accurate understanding of the tumor biology without \u003cem\u003ea priori\u003c/em\u003e and compares favorably to previous results obtained with RNAec, Ipredictor and ICpredictor models proposed to assess personalized drug response performance. As a clinical discovery tool, BioNAV\u0026trade; can prospectively predict the response to a treatment, including the numerous treatments utilized in NAC. Unlike traditional methods that filter for specific genes, BioNAV\u0026trade; NS are developed through automated feature creation, allowing for a generalizable application to various diseases and treatment regimens [\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e]. Additionally, use of these network signatures allows for effective aggregation of multiple datasets without compromising its accuracy.\u003c/p\u003e "},{"header":"Methods","content":"\u003cp\u003eData\u003c/p\u003e \u003cp\u003eDataset Acquisition: Five distinct BRCA-related datasets were sourced from Gene Expression Omnibus (GEO) [\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e]. Two datasets, GSE123845 (D1) and GSE163882 (D2), employed high-throughput sequencing using Illumina sequencers, while the remaining three utilized Affymetrix Human Genome Arrays for expression profiling. Additionally, each dataset provides clinical features of the patients. GSE112825 (DH) contained all healthy subjects, while the other datasets contained a mixture of patients who either achieved pCR or residual disease (RD). Detailed information is presented in Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e.\u003c/p\u003e \u003cp\u003ePreprocessing: Microarray preprocessing was performed following published guidelines [\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e] for GSE20194 (D3) and GSE20271 (D4). Outliers were removed based on strict quality control (QC) criteria, which included both statistical thresholds and visual inspection of sample intensity distributions. Following QC, normalization was carried out using robust microarray analysis (RMA). For the high-throughput sequencing datasets, transcripts per million (TPM) normalization and QC had been performed prior to deposition in GEO.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eOverview of datasets utilized in the study, presented at three stages: initial dataset, after QC, and after BioNAV\u0026trade; NS processing. The table includes five datasets sourced from GEO, detailing the total number of samples in each dataset, as well as the breakdown of patient responses (pCR or RD). GEO accession numbers and corresponding dataset names, as referenced in the study, are provided.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"11\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c8\" colnum=\"8\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c9\" colnum=\"9\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c10\" colnum=\"10\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c11\" colnum=\"11\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colspan=\"2\" nameend=\"c2\" namest=\"c1\"\u003e\u0026nbsp;\u003c/th\u003e \u003cth align=\"left\" colspan=\"3\" nameend=\"c5\" namest=\"c3\"\u003e \u003cp\u003eInitial\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"3\" nameend=\"c8\" namest=\"c6\"\u003e \u003cp\u003ePost-QC\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"3\" nameend=\"c11\" namest=\"c9\"\u003e \u003cp\u003ePost-Bionav\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAccession\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eName\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003epCR\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eRD\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eTotal\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003epCR\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003eRD\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c8\"\u003e \u003cp\u003eTotal\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c9\"\u003e \u003cp\u003epCR\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c10\"\u003e \u003cp\u003eRD\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c11\"\u003e \u003cp\u003eTotal\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eGSE112825\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eDS-H\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eN/A\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eN/A\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e109\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eN/A\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eN/A\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e99\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003eN/A\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003eN/A\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e \u003cp\u003eN/A\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eGSE123845\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eDS-1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e159\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e68\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e227\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e70\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e42\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e112\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e59\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e31\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e \u003cp\u003e90\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eGSE163882\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eDS-2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e80\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e142\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e222\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e80\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e142\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e222\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e61\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e116\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e \u003cp\u003e177\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eGSE20194\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eDS-3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e56\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e222\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e278\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e53\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e183\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e236\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e13\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e73\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e \u003cp\u003e86\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eGSE20271\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eDS-4\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e26\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e152\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e178\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e22\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e146\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e168\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e22\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e143\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e \u003cp\u003e165\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eBioNAV\u0026trade; RNA Expression Processing\u003c/p\u003e \u003cp\u003eBioNAV\u0026trade;, a derivative of NeMoCAD [\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e, \u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e], is a tool to model gene and perturbant interactions networks for a given set of gene expression data. Briefly, BioNAV\u0026trade; uses the transcriptomics data, along with drug-gene and gene-gene interaction databases (BioNAV\u0026trade; DBs) to compute several statistics (correlation, entropy) and network signatures for each RNA-seq sample. BioNAV\u0026trade; DBs are internally constructed using public databases like LINCS and CTD [\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e, \u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e]. Additionally, a Bayesian inference is performed on a drug-gene perturbation network constructed from differential genes and drugs interacting with those genes. The statistics and the network inference feed into computing the BioNAV\u0026trade; network signature. At a high level, a network signature encapsulates how a sample interacts with an array of drugs. This contrasts with traditional approaches where generally, the focus is to understand differences between experimental and control groups. The BioNAV\u0026trade; pipeline transforms the data from transcriptomics space to a latent space consisting of network signatures that not only capture the expression profile of the genes, but also interactions within as well as their interaction with drugs. Each network signature is comprised of latent features specific to the BioNAV\u0026trade; pipeline, and their generated values specific to each patient. These signatures are then used for all further intra-patient comparative analyses. For this study, the gene expression data for each dataset was processed through the BioNAV\u0026trade; computational pipeline, generating a network signature for each patient (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). This leads to an automated reduction from tens of thousands of genes to a few hundred latent features, as well as a reduction in samples based on specific BioNAV\u0026trade; pipeline criteria designed to ensure data quality and consistency, illustrated as \u0026ldquo;sample reduction\u0026rdquo; in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e. The number of samples for each dataset after BioNAV\u0026trade; pipeline reductions can be seen in Supplementary Table\u0026nbsp;1.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eEvaluating BioNAV\u0026trade;\u003c/p\u003e \u003cp\u003e \u003cstrong\u003eEstablishing Consistent Race and Treatment Subgroups\u003c/strong\u003e \u003cp\u003eTo demonstrate the importance of specifying race and treatment when attempting to predict patient outcome to NAC, patient samples were categorized by race, drug class combinations, and the specific drug combinations that the patients were treated with. D3 and D4 were the most aligned datasets with respect to the most prevalent race and treatment. The most prevalent race found in D3 and D4 were described as \u0026ldquo;white\u0026rdquo;. The most prevalent treatment drug class combination in D3 and D4 incorporated taxanes, pyrimidine antagonists, anthracyclines, and alkylating agents. The two most prevalent treatment drug combinations specified the anthracycline to either doxorubicin or epirubicin, with doxorubicin being most prevalent. For D1, race data was not provided. The most prevalent drug class combination used to treat patients in D1 aligned with that of D3 and D4, only differing in pyrimidine antagonists being excluded for D1. D1 did not provide data on the specific drugs combinations used. D2 was excluded from this analysis, as it did not contain the treatment or race data critical for this assessment. D3 was comprised of 4 different sub-datasets according to the authors\u0026rsquo; description [\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e]. To avoid a single dataset prejudicing the results when analyzing combined datasets, only a single sub-dataset from D3 was used for the analyses. The MAQC_V dataset (D3V) was selected from D3, as it contained a more balanced proportion of samples treated with doxorubicin and epirubicin, necessary for contrasting treatment class combination stratification and treatment class combination.\u003c/p\u003e \u003c/p\u003e \u003cp\u003e \u003cstrong\u003eCombining Datasets\u003c/strong\u003e \u003cp\u003eTo increase the total number of patients for analysis and reduce biasing that can arise from single dataset modeling, the power set of all three datasets was analyzed (Supplementary Table\u0026nbsp;1).\u003c/p\u003e \u003c/p\u003e \u003cp\u003e \u003cstrong\u003eEstablishing Consistent Histological Subgroups\u003c/strong\u003e \u003cp\u003eThe histological groups selected for analysis in this study were chosen due to their frequent appearance in recent literature as key criteria for stratification [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e, \u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e]. Due to missing clinical data in some datasets, the histological subgroups considered for stratification were limited to progesterone receptor positive (PR+), progesterone receptor negative (PR-), estrogen receptor positive (ER+), estrogen Receptor negative (ER-), human epidermal growth factor receptor 2 positive (HER2+), human epidermal growth factor receptor 2 negative (HER2-), triple negative for all 3 receptors (TN+), non- triple negative (TN-), hormone receptor positive (HR+), and hormone receptor negative (HR-).\u003c/p\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cstrong\u003eStratification\u003c/strong\u003e \u003cp\u003eTo assess the predictive power of race, treatment and histological data, we stratified the datasets described in the previous section in multiple settings (shown in Fig.\u0026nbsp;2) and evaluated the RFML model on them. The largest subgroup observed at each stratification step was used for in the next layer of stratification. We assigned labels to these settings for ease of narration. R1 indicates the baseline datasets with no stratification. Single layer stratifications were evaluated (where applicable) by the most prevalent drug class combinations (R2), drug combinations (R3), and race (R4). We later describe the specific drug and race criteria used for various datasets and dataset combinations. Lastly, we evaluated a two-layer stratification setting (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e3\u003c/span\u003e), stratified by the most prevalent race and drug combinations (R5). The stratification in (R5) was then further combined with stratification by the histological groups described in the previous section to assess their effects. Histological subgroup stratification was limited to the dataset combination D3V-NS, D4-NS, D1-NS, and a single additional stratification layer due to insufficient number of samples in all other cases.\u003c/p\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cstrong\u003eRandom Forest Model\u003c/strong\u003e \u003cp\u003eTo retain clinical understanding, as well as to demonstrate richness of BioNAV\u0026trade; NS, analyses were conducted using a simple Random Forest regressor machine learning model (RFML) to predict responses to NAC. The model was configured using Poisson distribution to measure the quality of each split, with number of features used to determine each split set as log\u003csub\u003e2\u003c/sub\u003e of the total number of features. No bootstrapping was used. We utilized stratified 3-fold cross-validation to maintain the proportion of classes (pCR and RD) across splits and ensure that each fold contained a representative distribution of the target classes (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e4\u003c/span\u003e). To address the imbalance between the two pCR and RD, class weights were computed from the training data in each fold. The inverses of the class frequencies were used as weights, ensuring equal contribution of both classes during model training. These class weights were normalized to sum to one to avoid bias towards the larger class. After model fitting, the predicted probabilities for the test set were computed and converted into binary class predictions using a threshold of 0.50. Various performance metrics consistently used in literature related to predicting patient response to NAC were then calculated for each fold, including accuracy, area under the receiver operating characteristic curve (AUROC), F1 score, and Positive Predictive Value (PPV), Negative Predictive Value (NPV), Sensitivity (SEN), and Specificity (SPE) [\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e]. After completion of all 3 folds, each metric was averaged to generate summary statistics.\u003c/p\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cstrong\u003eApplying the RFML model\u003c/strong\u003e \u003cp\u003eThe BioNAV\u0026trade; NS were evaluated by applying the previously described RFML model to predict patient responses to NAC. For each dataset and dataset combination listed in Supplementary Table\u0026nbsp;1, the BioNAV\u0026trade; NS were used as input features, with patient response (pCR vs. RD) as the target variable. We assessed each dataset individually and in combination to evaluate the generalizability of the network signatures. Combining datasets was done to mitigate biases inherent in individual datasets and to increase the overall sample size. Additionally, evaluating BioNAV\u0026trade; across different datasets enabled us to examine the performance of its network signatures across various sequencing platforms. By combining datasets, we were able to assess BioNAV\u0026trade;'s ability to account for inter-dataset variability, including batch effects. All assessments were conducted using the summary statistics mentioned previously.\u003c/p\u003e \u003c/p\u003e \u003cp\u003e \u003cstrong\u003eBioNAV\u0026trade;-NS Genes\u003c/strong\u003e \u003cp\u003eThe D3V-NS D4-NS D1-NS dataset with R5 PR\u0026thinsp;+\u0026thinsp;stratification was used to compare the top genes derived from NS with those identified through differential gene expression (DGE) analysis for each sample group (pCR, RD). For the DGE analysis, true responses to NAC were used to separate pCR and RD groups, while predicted responses were used for the NS analysis.\u003c/p\u003e \u003c/p\u003e \u003cp\u003eFor the DGE method, genes were filtered based on a p-value threshold (\u0026lt;\u0026thinsp;0.05) and a log2 Fold Change (log2FC) threshold (\u0026ge;\u0026thinsp;1.5), ensuring the inclusion of genes with substantial and statistically significant expression changes. The occurrences of each remaining gene across the samples in the group were then counted, and the top 100 most frequently occurring genes were selected. The NS method followed a similar approach. A two-tailed t-test was performed, and genes were filtered using the same p-value (\u0026lt;\u0026thinsp;0.05) and log2FC (\u0026ge;\u0026thinsp;1.5) thresholds. Network signatures were then sorted primarily by p-value and secondarily by log2FC. The top 10 network signatures were selected, from which genes were extracted. For each network signature, the top 10% most heavily weighted genes were retained. The occurrences of these genes across the samples in the group were then counted, and the top 100 most frequently occurring genes were selected.\u003c/p\u003e \u003cp\u003eBoth methods were applied separately to the pCR and RD groups. This analysis resulted in four gene lists, detailed in Supplementary Table\u0026nbsp;2: the top 100 genes derived from network signatures (NS) and DGE for each group (pCR and RD). To assess the biological significance of these gene signatures, Gene Ontology (GO) terms and pathway enrichments were determined using g:Profiler [\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eTo supplement the primary g:Profiler-based enrichment analysis, we conducted additional GO term and Reactome pathway enrichment analyses using the clusterProfiler, and ReactomePA R packages (DOI: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.18129/B9.bioc.clusterProfiler\u003c/span\u003e\u003cspan address=\"10.18129/B9.bioc.clusterProfiler\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e, DOI: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.18129/B9.bioc.ReactomePA\u003c/span\u003e\u003cspan address=\"10.18129/B9.bioc.ReactomePA\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e). GO enrichment analysis was performed separately for the pCR_NS, pCR_DGE, RD_NS, and RD_DGE gene lists using enrichGO, with Biological Process (BP) and Molecular Function (MF) terms as the primary focus. Reactome pathway enrichment was conducted using enrichPathway() to explore pathway-level relationships. Additionally, MF terms and reactome pathways were analyzed using compareCluster(), enabling a comparative functional assessment across groups. All terms were selected for by applying a p-value threshold of 0.05. Term redundancy in GO terms was minimized by using the Wang method for simplification. Pairwise similarity was then calculated using the Jaccard correlation coefficient method to illustrate the similarities and differences between terms. \u0026ldquo;showCategories\u0026rdquo; was set to 10 to focus on the top terms for each. To improve readability and consistency across enrichment results, an abbreviation standardization (supplementary table XX) was applied to all GO and Reactome enrichment results. q-scores were calculated as the -log10 transformation of adjusted p-values.\u003c/p\u003e \u003cp\u003eBenchmarking BioNAV\u0026trade; against recent ML models\u003c/p\u003e \u003cp\u003eTo gauge the effectiveness of BioNAV\u0026trade; NS, we compared BioNAV\u0026trade;\u0026rsquo;s predictive metrics with those obtained from RNAec, Ipredictor and ICpredictor models [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e, \u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e] (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e5\u003c/span\u003e). To perform an accurate benchmark analysis, we matched the dataset used in the two studies for developing their respective models; therefore, only D2 was used. In addition to unstratified results, these studies presented results on stratified subgroups based on ER-positive (ER+) and HER2-negative (HER2-) statuses. Accordingly, we stratified D2 by the same criteria. To produce metrics, Ipredictor and ICpredictor utilized bootstrap resampling with 2,000 replicates, while RNAec employed 5-fold cross-validation. To mitigate computational burden, we opted for cross-validation over bootstrap resampling, as cross-validation is generally less computationally intensive while still providing reliable model evaluation [\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e, \u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eDue to the reduced sample size of D2 after BioNAV processing, we used 3-fold cross-validation rather than 5-fold as done for RNAec. Employing a higher number of folds would have resulted in smaller training subsets, which can increase the variance of performance estimates and potentially compromise the reliability of model evaluation [\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e, \u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e]. Previous studies have demonstrated that with smaller datasets, reducing the number of folds helps maintain sufficient data in each training set, thereby enhancing the stability and reliability of the model's performance metrics [\u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e, \u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e].\u003c/p\u003e \u003cp\u003e \u003c/p\u003e"},{"header":"Results","content":"\u003cp\u003eBioNAV\u0026trade; RNA Expression Processing\u003c/p\u003e \u003cp\u003eWe first addressed the inherent heterogeneity of working with datasets from multiple studies, each with respectively unique data generation methods and platforms. Traditional stratification of patients based on normalized gene expression clustered samples by the respective study or dataset, with low information content for each subgroup and high contrast between subgroups. This finding was not surprising given the sources of data that dataset normalization was unable to sufficiently overcome. In contrast, transformation of gene expression into network signatures using BioNAV\u0026trade; NS resulted in greater discrimination of patient subgroups that spanned multiple datasets (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e6\u003c/span\u003e). These NS-transformed data were used in subsequent patient stratification and machine learning.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eEvaluating BioNAV\u0026trade;\u003c/p\u003e \u003cp\u003e \u003cstrong\u003eApplying the RFML model\u003c/strong\u003e \u003cp\u003eSupplementary Table\u0026nbsp;3 presents the results on various metrics obtained with our updated approach, including accuracy (ACC), area under the receiver operating characteristic curve (AUROC), positive predictive value (PPV), negative predictive value (NPV), specificity (SPE), sensitivity (SEN), and the F1 score (F1). The standard deviation observed for AUROC during the 3-fold cross validation (STDEV) is also displayed. Results are organized by groups (R1\u0026ndash;R5), which represent different stratifications of the datasets. The presence of 0 values in PPV and SEN metrics reflect that the performance was severely constrained, likely due to extreme class imbalance. NA values indicate that data wasn\u0026rsquo;t sufficient to produce a meaningful result.\u0026rsquo;\u003c/p\u003e \u003c/p\u003e \u003cp\u003eWithout stratification (R1), model performance was beset with high variability, though combined datasets improved AUROC (e.g., 72.6 for D3V-NS D4-NS D1-NS vs. 54.3 for D1-NS) and reduced variability (STDEV: 9.0 vs. 11.6). When filtering for drug treatment category (R2), stratification improved AUROC and F1 scores, and combined datasets balanced sensitivity and specificity. Stratifying by drug treatment molecule (R3) increased AUROC (78.1 for D3V-NS D4-NS D1-NS) and reduced variability. Similar to R1, stratifying by race (R4) showed wide performance variation, but combined datasets improved balance (AUROC: 78.8, F1: 67.9 for D3V-NS D4-NS D1-NS). The combined stratification by race and drug (R5) achieved the highest performance (AUROC: 82.4, F1: 73.8) with combined datasets.\u003c/p\u003e \u003cp\u003eDespite batch variations, combining datasets consistently enhanced AUROC and stability across all stratification groups. For example, in R1, combined datasets improved AUROC by 33.7% compared to D1-NS alone. Similar trends were observed in R2 and R5. Individual datasets often exhibited imbalanced metrics, such as high specificity but low sensitivity (e.g., D3V-NS D4-NS in R1, SPE: 92.6, SEN: 3.0), which were mitigated through dataset combinations.\u003c/p\u003e \u003cp\u003eThe overall stratification trends became particularly evident for the combined dataset D3V-NS D4-NS D1-NS. As stratification became more specific, AUROC improved, and variability reduced. For example, without stratification (R1), D3V-NS D4-NS D1-NS achieved an AUROC of 72.6 with noticeable variability (STDEV: 9.0). In contrast, the most specific stratification (R5), which combined race and drug treatment molecule stratification, increased AUROC by 9.8 (82.4) and reduced STDEV to 3.3, highlighting the observed improvements in performance metrics with increased stratification specificity. Few observations that don\u0026rsquo;t follow this trend need further investigation but are likely attributable to lack of enough data for various subgroups in order to properly train the ML model.\u003c/p\u003e \u003cp\u003eFigure \u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e7\u003c/span\u003e displays the AUROC values for each stratification strategy: R1 (no stratification), R2, R3, R4 (single-layer stratification) and R5 (two-layer stratification). R1 (red) appeared inconsistent in AUROC, with values ranging from 47.2\u0026ndash;78.2%. Models with one layer of stratification mostly improved from no layers with, ranging from 52.6\u0026ndash;83.4%. The 2-layer model appeared to drop in AUROC, with scores from 53.6\u0026ndash;83.3%. The D3V, D4, D1 combined model was the only model that displayed a consistent positive correlation between the number of layers of stratification and AUROC.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003eCombining R5 with histological subgroups stratification\u003c/b\u003e: After applying a third layer of stratification based on histological receptor statuses (ER, PR, HER2, HR, and TN) to the three-dataset combination (D3V-NS, D4-NS, D1-NS) from R5, we observed notable differences in performance relative to the unstratified R5 baseline (ACC: 74.7%, AUROC: 82.4%) (Supplementary Table\u0026nbsp;4 and Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e8\u003c/span\u003e). Subgroups characterized by positive receptor expression generally showed marked improvements. For instance, the PR\u0026thinsp;+\u0026thinsp;subgroup achieved an ACC of 94.9% and an AUROC of 93.7%, reflecting substantial increases of +\u0026thinsp;20.2% in ACC and +\u0026thinsp;11.3% in AUROC compared to the baseline R5. Similarly, ER\u0026thinsp;+\u0026thinsp;improved by +\u0026thinsp;12.8% in ACC (87.5%) and +\u0026thinsp;6.5% in AUROC (88.9%), and HR\u0026thinsp;+\u0026thinsp;showed gains of +\u0026thinsp;15.7% in ACC (90.4%) and +\u0026thinsp;5.1% in AUROC (87.5%). Even the TN- subgroup (i.e., those not classified as triple-negative) exhibited improvements, reaching 87.0% ACC (+\u0026thinsp;12.3%) and 87.6% AUROC (+\u0026thinsp;5.2%). In contrast, subgroups defined by negative receptor expression saw declines. ER- decreased by 14.4% in ACC (60.3%) and 15.5% in AUROC (66.9%). PR- declined by 7.2% in ACC (67.5%) and 7.6% in AUROC (74.8%), while HR- dropped substantially by 18.3% in ACC (56.4%) and 20.3% in AUROC (62.1%). The TN\u0026thinsp;+\u0026thinsp;subgroup also showed reduced performance (59.4% ACC and 66.0% AUROC), representing a 15.3% drop in ACC and 16.4% drop in AUROC. For the HER2- subgroup, a slight increase in ACC (+\u0026thinsp;1.2%) to 75.9% was noted, but with a 4.1% reduction in AUROC (78.3%). HER2\u0026thinsp;+\u0026thinsp;data were unavailable for evaluation.\u003c/p\u003e \u003cp\u003eThese results underscore the interplay between receptor status and model performance. Positive receptor subgroups (ER+, PR+, HR+, TN-) tend to yield higher accuracy and discriminative ability when layered onto the R5 stratification, whereas negative receptor subgroups (ER-, PR-, HR-, TN+) face declines. This trend aligns with the known biological heterogeneity and complexity associated with negative receptor expressions in breast cancer [\u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e, \u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e37\u003c/span\u003e, \u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e, \u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e, \u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e40\u003c/span\u003e]. Addressing these challenges may require further refinements, including additional stratification layers or the incorporation of more targeted molecular features, to enhance the AUROC and improve predictive stability across all subgroups.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eStratification and Number of Subjects: The number of subjects in each analysis had variable impacts on the AUROC, depending on the number of stratification layers (Fig.\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e9\u003c/span\u003e). The unstratified model (R1) exhibits an insignificant relationship between AUROC and the number of patients (slope\u0026thinsp;=\u0026thinsp;0.03, R\u0026sup2; = 0.06), indicating minimal improvement in performance as the patient count increases. For the single-layer stratification models (R2, R3, R4), R3 shows the strongest correlation (R\u0026sup2; = 0.56), followed by R2 (R\u0026sup2; = 0.27) and R4 (R\u003csup\u003e2\u003c/sup\u003e\u0026thinsp;=\u0026thinsp;0.38). The two-layer stratification model (R5) displayed the strongest relationship (R\u0026sup2; = 0.60) suggesting a significant positive trend between AUROC and patient numbers and highlighting the potential to uncover more information about underlying drug response pathways from fewer patients.\u003c/p\u003e \u003cp\u003eIn light of the exhaustive experiments conducted for all possible combinations of stratification settings and datasets, the number of patients decreases as stratification layers are added. This presents a challenge as a low sample count deprives the machine learning models of the ability to learn accurate patterns and produce accurate predictions. As expected, the performance of our model consistently increases as the number of patients increases; however, increasing stratification through the incorporation of available demographic and tumor metadata enhances model performance (Fig.\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e9\u003c/span\u003e). Moreover, where the number of samples is sufficient, the model accuracy increases as stratification layers are added. This is especially clear from the three-dataset combination D3V-NS, D4-NS, D1-NS, where the performance increases as we progressively go from a no stratification setting to three layers of stratification (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e7\u003c/span\u003e, Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e8\u003c/span\u003eB). The drop in ER-, HER- and PR- cases indicate a presence of complex subgroups present within these groups. As can be seen in Fig.\u0026nbsp;\u003cspan refid=\"Fig12\" class=\"InternalRef\"\u003e13\u003c/span\u003e, our model shows increased performance when two layers of histological subgroupings are applied.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003e\u003cb\u003eBioNAV\u0026trade;-NS Genes\u003c/b\u003e:\u003c/h2\u003e \u003cp\u003eThe gene lists were mostly unique, with pCR DGE and RD DGE sharing the most (3) genes (Fig.\u0026nbsp;\u003cspan refid=\"Fig9\" class=\"InternalRef\"\u003e10\u003c/span\u003eA). The pCR-NS genes and the enrichment for GO aligned with processes related to apoptosis, chromatin accessibility, inflammation, autophagy and cell cycle regulation. Additionally, the majority of biological processes (BP) GO are associated with responding to specific types of molecules, i.e., steroids and organonitrogens (Fig.\u0026nbsp;\u003cspan refid=\"Fig9\" class=\"InternalRef\"\u003e10\u003c/span\u003eB). For the pCR-DGE, the enrichment profile is primarily neuronal, and ion channel related (Fig.\u0026nbsp;\u003cspan refid=\"Fig9\" class=\"InternalRef\"\u003e10\u003c/span\u003eC). A closer look reveals that the genes are involved in proliferation, differentiation, cell cycle and plasticity. As such, their classification in GO could be an artifact of how these genes have been studied. The RD-NS list also aligns with apoptosis and chromatin accessibility, but it is distinct from pCR-NS by showing enrichment in DNA damage response and cell cycle arrest processes (Fig.\u0026nbsp;\u003cspan refid=\"Fig9\" class=\"InternalRef\"\u003e10\u003c/span\u003eD). Finally, the RD-DGE a broad list of processes, including apoptosis, chromatin accessibility, neuronal differentiation and proliferation with Wnt signaling setting it apart (Fig.\u0026nbsp;\u003cspan refid=\"Fig9\" class=\"InternalRef\"\u003e10\u003c/span\u003eE).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eRho GTPases signaling are the primary Reactome pathways specifically enriched in the pCR-NS group (Fig.\u0026nbsp;\u003cspan refid=\"Fig10\" class=\"InternalRef\"\u003e11\u003c/span\u003e). This family of signaling protein regulates many cellular functions, but is mostly known for regulating the cytoskeleton, with direct effects on cellular trafficking and cell cycle progression. Rho GTPases are typically overexpressed in cancer cells, which has been linked to inhibition of apoptotic pathways and increased metastatic activity. Rho GTPases expression and activity are modulated by their localization (nucleus, cytosol, membrane and several post-translational modifications, including AMPylation palmitoylation, phosphorylation, prenylation, SUMOylation, transglutamination. As such, future research should aim at refining whether specific localization as well as post-translational modification of members of the Rho proteins can be specific predictors of pCR [\u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e, \u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e, \u003cspan citationid=\"CR43\" class=\"CitationRef\"\u003e43\u003c/span\u003e]. Rho GTPases have so far escaped direct therapeutic targeting. In the small GTPase family, KRAS has been targeted. Sotorasib, a mutation-specific covalent inhibitor of a G12C KRAS variant, was approved in 2021 for non-small-cell lung cancer in combination with PD-1 checkpoint inhibitor [\u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e44\u003c/span\u003e]. Other covalent variant-specific modalities are development or pending approval. Otherwise, successful strategies have targeted the upstream Rho-associated coiled-coil containing protein kinase (ROCK) inhibitor, for which only a few inhibitors have been approved since 2017, as well as downstream inhibitors of MEK-ERK-BRAF and PI3K-AKT-mTOR signaling pathways. The enrichment results are provided in Supplementary Table\u0026nbsp;2.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eEnrichment for Peptide YY in the RD-NS group is of particular interest. This short 36 amino acids protein is primarily known to be expressed in the intestines, but it is also expressed in other organs including the pancreas and the brain stem. It is associated with endocrine signaling and secretion, glucagon response and appetite suppression.\u003c/p\u003e \u003cp\u003eThe gene concept network illustrates the functional enrichment of genes in the molecular function (MF) category, as determined by GO analysis (Fig.\u0026nbsp;\u003cspan refid=\"Fig11\" class=\"InternalRef\"\u003e12\u003c/span\u003e). The plot reveals distinct molecular pathways of focus identified by NS and DGE groups. The NS RD nodes highlight pathways such as histone acetyltransferase binding and EC matrix structural constituent, suggesting foundational processes that may contribute to tumor resilience to chemotherapy. Conversely, NS pCR nodes emphasize transcriptional pathways like promoter-specific chromatin binding, supporting baseline tumor activity that does not appear to act as a barrier to chemotherapy sensitivity, aligning with assumptions of a pCR outcome. DGE predominantly identifies pathways such as calcium ion transmembrane transporter activity. This may provide insights into tumor biology; however, these enriched pathway groups do not appear unique to either RD or pCR groups. This overlap indicates that DGE captures broader tumor processes rather than group-specific differences. In contrast, NS predominantly captures pathways that are more distinctly associated with RD or pCR, indicating potential for identifying potential biomarkers that differentiate the two groups.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eBenchmarking BioNAV\u0026trade; against recent ML models\u003c/p\u003e \u003cp\u003eThe performance metrics for comparing BioNAV\u0026trade; against the RNAec, Ipredictor and ICpredictor models (collectively referred to as benchmark models) are shown in Supplementary Table\u0026nbsp;5. The missing values indicate that data was not present in the studies.\u003c/p\u003e \u003cp\u003eER+ \u0026amp; HER2- (2-layer): The 2-layer stratification produced the highest AUROC and accuracies for all the models, except for RNAec. BioNAV\u0026trade; NS outperformed all benchmark models with an AUROC of 89.0%, compared to 80.0% for Ipredictor and 84.0% for ICpredictor. BioNAV\u0026trade; NS also achieved the highest accuracy of 87.7%, while RNAec achieved 85.5%. Additionally, BioNAV\u0026trade; NS reported a PPV of 63.6%, NPV of 93.5%, specificity of 91.5%, and sensitivity of 70.0%. In contrast, RNAec had a slightly lower PPV of 58.8%, but a higher NPV (94.2%) and sensitivity (87.5%), albeit with lower specificity (76.9%). These results suggest that BioNAV\u0026trade; NS excels in correctly identifying true positives and negatives, while RNAec is more sensitive to detecting positives in this stratified group. Metrics aside from AUROC were not provided Ipredictor and ICpredictor for this stratification.\u003c/p\u003e \u003cp\u003eER+ (1-layer): For the ER+ (1-layer) stratification, BioNAV\u0026trade; NS demonstrated an AUROC of 78.9%, outperforming Ipredictor (69.8%) but closely aligning with ICpredictor (79.4%). The accuracy for BioNAV\u0026trade; NS was 81.7%, although comparison to RNAec was not possible for this group due to lack of available metrics. BioNAV\u0026trade; NS also reported a PPV of 54.6% and an NPV of 85.9%, while achieving a high specificity of 92.4% but lower sensitivity at 37.5%. These metrics were also not provided for this stratification for any of the benchmark models\u003c/p\u003e \u003cp\u003eNo Stratification: In the non-stratified case, BioNAV\u0026trade; NS achieved an accuracy of 69.2% and an AUROC of 71.0%. In comparison, Ipredictor and ICpredictor showed AUROC values of 74.9% and 80.1%, respectively. The RNAec model, although lacking AUROC data, achieved the highest accuracy of 85.6%, followed by the ICpredictor (72.1%), and Ipredictor (70.2%).\u003c/p\u003e \u003cp\u003eFigure \u003cspan refid=\"Fig12\" class=\"InternalRef\"\u003e13\u003c/span\u003e presents the comparison of predictive performance between BioNAV\u0026trade; NS, Ipredictor, ICpredictor, and RNAec models, illustrating the effect of additional layer stratification on AUROC and accuracy. For AUROC (left panel), BioNAV\u0026trade; NS demonstrated a clear, consistent increase in performance across all stratifications, starting from 71.0% in the non-stratified case (D2-NS) to 89.0% in the ER+ \u0026amp; HER2- (2-layer) group. This consistent improvement contrasts with the trends observed in both Ipredictor and ICpredictor, where performance fluctuated across stratifications. Ipredictor's AUROC decreased from 74.9% in the non-stratified group to 69.8% in the ER+ (1-layer) group, followed by a slight recovery to 80.0% in the ER+ \u0026amp; HER2- (2-layer) group. Similarly, ICpredictor exhibited an initial drop from 80.1\u0026ndash;79.4% between the non-stratified and ER+ (1-layer) groups, before improving to 84.0% in the 2-layer ER+ \u0026amp; HER2- group. Along similar lines, the right panel, which compares the accuracy between BioNAV\u0026trade; NS and RNAec, shows a marked improvement in BioNAV\u0026trade; NS performance with increasing stratification. Starting at 69.2% accuracy in the non-stratified group, BioNAV\u0026trade; NS increased to 81.7% with ER+ (1-layer) stratification and reached a peak of 87.7% in the ER+ \u0026amp; HER2- (2-layer) group. In contrast, RNAec showed no significant variation across the stratifications, maintaining a high but stable accuracy of around 85.5% in all groups. Overall, BioNAV\u0026trade; NS demonstrated continued performance improvement, with substantial improvements in both AUROC and accuracy as more layers were applied. Ipredictor and ICpredictor, on the other hand, showed more variable results, with no consistent trend across stratifications, while RNAec\u0026rsquo;s performance remained unchanged.\u003c/p\u003e \u003cp\u003e[Supplementary Table\u0026nbsp;5]\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e"},{"header":"Discussion","content":"\u003cp\u003eIn this paper, we introduce BioNAV\u0026trade; network signatures (BioNAV\u0026trade; NS) that transform highly heterogenous datasets from treatment-naive biopsies and use them to predict patient response to NAC for individual breast cancer patients. BioNAV\u0026trade; NS outperforms the other methods by up to 18.6% in AUROC. We evaluated BioNAV\u0026trade; NS on individual datasets that were sourced from GEO, as well as all possible dataset combinations. We showed that our approach is not hampered when combining independent datasets from multiple experiments. Additionally, we stratified the data on multiple criteria like drug class, drug, race and histological groupings and tested our approach on various stratification settings. Our approach shows increased performance as stratification layers are added (as high as 95.2% AUROC), something that other approaches are unable to achieve, as seen in Fig.\u0026nbsp;\u003cspan refid=\"Fig12\" class=\"InternalRef\"\u003e13\u003c/span\u003e.\u003c/p\u003e \u003cp\u003eAs a key limitation, we note that when either the number of samples or number of stratification layers is insufficient, it can impede performance. All machine learning models rely on sufficient data to learn accurate patterns. Similarly, multiple unidentified subgroups categorized as a single group can deceive the machine learning models into learning incorrect patterns. While Fig.\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e9\u003c/span\u003e projects a trend of improving performance as more data is utilized for analysis, its validation would require additional transcriptome datasets from treatment-naive BRCA tumors. Additional data with a balanced proportion of samples under various clinical and histological criteria would be beneficial in addressing the impediments noted. Another limitation is the inability of our approach to directly interrogate non-RNA mechanisms. While all biological mechanisms will eventually perturb gene expression, some localized mechanisms at the protein or metabolite levels may be more diluted in the analysis. As more proteomics and metabolomics (and other multiomics) datasets become more widely generated, this represents a future opportunity for cross validation.\u003c/p\u003e \u003cp\u003eThe findings of this study highlight the critical role of advanced transcriptomic analysis in enhancing the precision of NAC response predictions for breast cancer patients. The study aligns with the broader vision of advancing the state of the art in personalized medicine, emphasizing the transition towards breast cancer treatments tailored to individual patient profiles. By accurately predicting patient response to NAC, clinicians can tailor treatment plans to individual patient profiles, potentially improving outcomes and reducing unnecessary treatments. The application of this technology could lead to a reduction in the physical and psychological burden experienced by patients undergoing NAC, especially for those unlikely to respond to such treatments.\u003c/p\u003e \u003cp\u003eIn this study, we describe a robust process for network-level feature discovery for predicting response to NAC. It is evident that BioNAV\u0026trade; NS capture a richer stack of information in a condensed form. Specifically, BioNAV\u0026trade; NS encapsulate a holistic view of gene and drug interactions and therefore, offer comparison of samples based on information that is absent in methods based on using a gene panel.\u003c/p\u003e \u003cp\u003eAdditionally, the GO analysis presented earlier supports the idea that the network signature approach is more relevant to underlying BP in tumors than standard DGE profiles. Furthermore, some processes are interesting candidate mechanisms unique to the pCR-NS and the RD-NS groups and are potentially actionable as novel therapeutic targets or drug response biomarkers. For the former, Rho GTPase signaling and BPs in response to specific types of molecules such as steroid and organonitrogen were highlighted. For the latter, Peptide YY association with the GI and appetite suppression raise the hypothesis that a particular metabolic predisposition exists in the RD group. An interesting question to explore is whether this predisposition is diet, gut microbiome, or activity related, i.e., something that we can act on prior to the treatment, or is it genetically related in which case new therapeutic molecules or gene therapy will be required, which is a much higher bar to reach to help these patients.\u003c/p\u003e \u003cp\u003eBioNAV\u0026trade; NS was effective at discriminating responders and non-responders to NAC treatment, thereby offering potential for personalized oncology to reduce needless side effects in patients while improving outcomes and identifying patient subgroups lacking effective therapies. Additionally, developing biomarkers from NS genes could enhance the comprehensiveness and accuracy of stratification in clinical practice while enabling higher fidelity efficacy biomarkers in clinical trials. More broadly, BioNAV\u0026trade; NS can facilitate understanding and treatment of diseases beyond breast cancer, with opportunities in rare oncology, drug-resistant tumors, and non-oncology diseases with complex patient populations and divergent treatment outcomes.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003ch2\u003eCompeting Interests\u003c/h2\u003e\u003cp\u003eThe authors are or were employees of Unravel Biosciences, Inc. and hold equity in the company.\u003c/p\u003e\u003c/p\u003e\u003ch2\u003eAuthor Contribution\u003c/h2\u003e\u003cp\u003eR.F. and R. Ni. wrote the manuscript with contributions from all authors. R.F. conceptualized the study and developed the code for data preprocessing, machine learning modeling, and performance evaluation. S.U provided insights into the machine learning strategies. R.F. and R. Ni conducted metric evaluations. R.F., R. Ni., and F.V. performed the analysis of BioNAV\u0026trade; network signatures. F.V. and R. No. provided key insights on the clinical relevance of the findings and supervised the integration of clinical metadata. All authors contributed to the study design.\u003c/p\u003e\u003ch2\u003eData Availability Statement\u003c/h2\u003e \u003cp\u003eAll datasets were sourced from publicly accessible databases as referenced in the manuscript.\u003c/p\u003e\u003ch2\u003eCode Availability\u003c/h2\u003e \u003cp\u003eThe custom code used for data processing and analysis in this study is available on GitHub at \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/romanf24/Drug-Gene-Network-Signature-Modeling-Predicts-BC-Patient-Response-to-NAC\u003c/span\u003e\u003cspan address=\"https://github.com/romanf24/Drug-Gene-Network-Signature-Modeling-Predicts-BC-Patient-Response-to-NAC\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. Certain proprietary algorithms used in this study are part of Unravel Biosciences, Inc.\u0026rsquo;s intellectual property and are not publicly available.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eZhang, H., Zhang, X., Jin, L. \u0026amp; Wang, Z. The neoadjuvant chemotherapy responses and survival rates of patients with different molecular subtypes of breast cancer. Am J Transl Res 14, 4648\u0026ndash;4656 (2022).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRomeo, V. \u003cem\u003eet al.\u003c/em\u003e Assessment and Prediction of Response to Neoadjuvant Chemotherapy in Breast Cancer: A Comparison of Imaging Modalities and Future Perspectives. \u003cem\u003eCancers\u003c/em\u003e 13, (2021).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSupplitt, S., Karpinski, P., Sasiadek, M. \u0026amp; Laczmanska, I. Current Achievements and Applications of Transcriptomics in Personalized Cancer Medicine. International Journal of Molecular Sciences 22, (2021).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSmolarz, B., Nowak, A. Z. \u0026amp; Romanowicz, H. Breast Cancer\u0026mdash;Epidemiology, Classification, Pathogenesis and Treatment (Review of Literature). \u003cem\u003eCancers\u003c/em\u003e 14, (2022).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTrayes, K. P. \u0026amp; Cokenakes, S. E. H. Breast Cancer Treatment. Am Fam Physician 104, 171\u0026ndash;178 (2021).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eXu, W. \u003cem\u003eet al.\u003c/em\u003e Predictors of Neoadjuvant Chemotherapy Response in Breast Cancer: A Review. Onco Targets Ther 13, 5887\u0026ndash;5899 (2020).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCharfare, H., Limongelli, S. \u0026amp; Purushotham, A. D. Neoadjuvant chemotherapy in breast cancer. British Journal of Surgery 92, 14\u0026ndash;23 (2005).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCosta, S. D. \u003cem\u003eet al.\u003c/em\u003e Neoadjuvant Chemotherapy Shows Similar Response in Patients With Inflammatory or Locally Advanced Breast Cancer When Compared With Operable Breast Cancer: A Secondary Analysis of the GeparTrio Trial Data. Journal of Clinical Oncology 28, 83\u0026ndash;91 (2010).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSchott, A. F. \u0026amp; Hayes, D. F. Defining the Benefits of Neoadjuvant Chemotherapy for Breast Cancer. Journal of Clinical Oncology 30, 1747\u0026ndash;1749 (2012).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSchegerin, M., Tosteson, A. N. A., Kaufman, P. A., Paulsen, K. D. \u0026amp; Pogue, B. W. Prognostic imaging in neoadjuvant chemotherapy of locally-advanced breast cancer should be cost-effective. Breast Cancer Research and Treatment 114, 537\u0026ndash;547 (2009).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSinn, B. V. et al. On-treatment biopsies to predict response to neoadjuvant chemotherapy for breast cancer. Breast Cancer Research 26, 138 (2024).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCao, J. \u003cem\u003eet al.\u003c/em\u003e Chemoresistance and Metastasis in Breast Cancer Molecular Mechanisms and Novel Clinical Strategies. Frontiers in Oncology 11, (2021).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSannachi, L. \u003cem\u003eet al.\u003c/em\u003e Response monitoring of breast cancer patients receiving neoadjuvant chemotherapy using quantitative ultrasound, texture, and molecular features. PLoS One 13, e0189634 (2018).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChen, J. W. \u003cem\u003eet al.\u003c/em\u003e RNA expression classifiers from a model of breast epithelial cell organization to predict pathological complete response in triple negative breast cancer. medRxiv (2021) doi:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1101/2021.02.10.21251517\u003c/span\u003e\u003cspan address=\"10.1101/2021.02.10.21251517\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChen, J. \u003cem\u003eet al.\u003c/em\u003e Machine learning models based on immunological genes to predict the response to neoadjuvant therapy in breast cancer patients. Frontiers in Immunology 13, (2022).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChekroud, A. M. \u003cem\u003eet al.\u003c/em\u003e Illusory generalizability of clinical prediction models. Science 383, 164\u0026ndash;167 (2024).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSharma, S. \u003cem\u003eet al.\u003c/em\u003e The impact of self-identified race on epidemiologic studies of gene expression. Genetic Epidemiology 35, 93\u0026ndash;101 (2011).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLehmann, B. D., Pietenpol, J. A. \u0026amp; Tan, A. R. Triple-Negative Breast Cancer: Molecular Subtypes and New Targets for Therapy. \u003cem\u003eAm Soc Clin Oncol Educ Book\u003c/em\u003e e31\u0026ndash;e39 doi:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.14694/EdBook_AM.2015.35.e31\u003c/span\u003e\u003cspan address=\"10.14694/EdBook_AM.2015.35.e31\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSperry, M. M. \u003cem\u003eet al.\u003c/em\u003e Target-agnostic drug prediction integrated with medical record analysis uncovers differential associations of statins with increased survival in COVID-19 patients. PLOS Computational Biology 19, 1\u0026ndash;21 (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBraal, C. L. \u003cem\u003eet al.\u003c/em\u003e Inhibiting CDK4/6 in Breast Cancer with Palbociclib, Ribociclib, and Abemaciclib: Similarities and Differences. Drugs 81, 317\u0026ndash;331 (2021).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNovak, R. \u003cem\u003eet al.\u003c/em\u003e Target-agnostic discovery of Rett Syndrome therapeutics by coupling computational network analysis and CRISPR-enabled in vivo disease modeling. \u003cem\u003ebioRxiv\u003c/em\u003e (2022) doi:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1101/2022.03.20.485056\u003c/span\u003e\u003cspan address=\"10.1101/2022.03.20.485056\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBarrett, T. \u003cem\u003eet al.\u003c/em\u003e NCBI GEO: archive for functional genomics data sets\u0026mdash;10 years on. Nucleic Acids Research 39, D1005\u0026ndash;D1010 (2010).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKlaus, B. \u0026amp; Reisenauer, S. An end to end workflow for differential gene expression using Affymetrix microarrays. \u003cem\u003eF1000Res\u003c/em\u003e 5, 1384 (2016).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePlaza Oliver, M. \u003cem\u003eet al.\u003c/em\u003e Donepezil Nanoemulsion Induces a Torpor-like State with Reduced Toxicity in Nonhibernating Xenopus laevis Tadpoles. ACS Nano 18, 23991\u0026ndash;24003 (2024).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNIH LINCS Program. https://lincsproject.org/\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDavis, A. P. \u003cem\u003eet al.\u003c/em\u003e Comparative Toxicogenomics Database (CTD): update 2023. Nucleic Acids Res 51, D1257\u0026ndash;D1262 (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eShi, L. \u003cem\u003eet al.\u003c/em\u003e The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models. Nat Biotechnol 28, 827\u0026ndash;838 (2010).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZuo, D. \u003cem\u003eet al.\u003c/em\u003e Machine learning-based models for the prediction of breast cancer recurrence risk. BMC Medical Informatics and Decision Making 23, 276 (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKolberg, L. \u003cem\u003eet al.\u003c/em\u003e g:Profiler\u0026mdash;interoperable web service for functional enrichment analysis and gene identifier mapping (2023 update). Nucleic Acids Research 51, W207\u0026ndash;W212 (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eEfron, B. \u0026amp; Tibshirani, R. J. \u003cem\u003eAn Introduction to the Bootstrap\u003c/em\u003e. (Taylor \u0026amp; Francis, 1994).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKohavi, R. A study of cross-validation and bootstrap for accuracy estimation and model selection. in \u003cem\u003eProceedings of the 14th International Joint Conference on Artificial Intelligence - Volume 2\u003c/em\u003e 1137\u0026ndash;1143 (Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1995).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHastie, T., Tibshirani, R. \u0026amp; Friedman, J. H. \u003cem\u003eThe Elements of Statistical Learning: Data Mining, Inference, and Prediction\u003c/em\u003e. (Springer, 2001).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eArlot, S. \u0026amp; Celisse, A. A survey of cross-validation procedures for model selection. Statistics Surveys 4, (2010).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLaan, M. J. van der, Polley, E. C. \u0026amp; Hubbard, A. E. Super Learner. Statistical Applications in Genetics and Molecular Biology 6, (2007).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMolinaro, A. M., Simon, R. \u0026amp; Pfeiffer, R. M. Prediction error estimation: a comparison of resampling methods. Bioinformatics 21, 3301\u0026ndash;3307 (2005).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWeigelt, B., Baehner, F. L. \u0026amp; Reis-Filho, J. S. The contribution of gene expression profiling to breast cancer classification, prognostication and prediction: a retrospective of the last decade. The Journal of Pathology 220, 263\u0026ndash;280 (2010).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eS\u0026oslash;rlie, T. \u003cem\u003eet al.\u003c/em\u003e Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc Natl Acad Sci U S A 98, 10869\u0026ndash;10874 (2001).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDai, X. \u003cem\u003eet al.\u003c/em\u003e Breast cancer intrinsic subtype classification, clinical use and future trends. Am J Cancer Res 5, 2929\u0026ndash;2943 (2015).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRakha, E. A., Reis-Filho, J. S. \u0026amp; Ellis, I. O. Basal-Like Breast Cancer: A Critical Review. JCO 26, 2568\u0026ndash;2581 (2008).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWeigelt, B. \u0026amp; Reis-Filho, J. S. Histological and molecular types of breast cancer: is there a unifying taxonomy? Nat Rev Clin Oncol 6, 718\u0026ndash;730 (2009).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCho, H. J., Kim, J.-T., Baek, K. E., Kim, B.-Y. \u0026amp; Lee, H. G. Regulation of Rho GTPases by RhoGDIs in Human Cancers. \u003cem\u003eCells\u003c/em\u003e 8, (2019).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMosaddeghzadeh, N. \u0026amp; Ahmadian, M. R. The RHO Family GTPases: Mechanisms of Regulation and Signaling. \u003cem\u003eCells\u003c/em\u003e 10, (2021).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNavarro-L\u0026eacute;rida, I., S\u0026aacute;nchez-\u0026Aacute;lvarez, M. \u0026amp; Del Pozo, M. \u0026Aacute;. Post-Translational Modification and Subcellular Compartmentalization: Emerging Concepts on the Regulation and Physiopathological Relevance of RhoGTPases. \u003cem\u003eCells\u003c/em\u003e 10, (2021).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBlair, H. A. Sotorasib: First Approval. Drugs 81, 1573\u0026ndash;1579 (2021).\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"","lastPublishedDoi":"10.21203/rs.3.rs-6130021/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-6130021/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eNeoadjuvant chemotherapy (NAC) has been a staple treatment for breast cancer (BRCA) patients regardless of the tumor histological type. While this treatment can be effective on a population level, the pathologic complete response (pCR) rate post-NAC for individual patients varies widely throughout various clinical demographic groups and has not dramatically changed in practice. Improving stratification methods for therapeutic interventions could avoid the physical side effects as well as the psychological stress of undergoing NAC treatment if a patient is unlikely to respond [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e, \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e]. Given the rapid advancements in sequencing technologies and the availability of RNA expression data, medical solutions based on transcriptomics data are becoming increasingly prevalent [\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e]. Here, we present a novel method to stratify the prognosis for individual breast cancer patients for NAC therapy using RNA expression data from pre-treatment tumor biopsies by relying on network biology interactions rather than individual gene panels. We processed the datasets through the BioNAV\u0026trade; pipeline to generate BioNAV\u0026trade; network signatures (BioNAV\u0026trade; NS) combined with a random forest machine learning model and incorporating demographic and other metadata, including patient race, specific drugs used in NAC treatment, and tumor histological subtyping. These network signatures offer insights into the gene-gene and drug-gene interactions occurring within each patient\u0026rsquo;s biopsy.\u003c/p\u003e \u003cp\u003eThis study demonstrates the capability of BioNAV\u0026trade; NS to help guide BRCA prognoses through a comprehensive, network-level view of the gene expression data. Using BioNAV\u0026trade; NS, we were able to accurately predict patient response to NAC with a mean area under the receiver operator characteristic (AUROC) of 82.4%. The addition of demographic and tumor receptor type stratification further increased performance to as high as an AUROC of 93.7% for patients who are progesterone receptor positive (PR+). Additionally, classifier performance was maintained when combining datasets from multiple studies and various transcriptomics platforms and heterogeneous preprocessing steps prior to BioNAV\u0026trade; pipeline processing. Stratification by histological subgroups enhanced the predictive accuracy and AUROC of BioNAV\u0026trade;, outperforming two leading models in recent literature by 18.6% and 12.9%, respectively. BioNAV\u0026trade; NS significantly enhances the predictive value of transcriptomic data to determine patient response to NAC. This approach offers the integration of multiple biological data and clinical metadata layers to improve clinical outcome prediction, highlighting potentially novel therapeutic mechanisms that have been hidden inside a heterogeneous patient population. A transition towards personalized treatment plans and adjuvant treatments may further enhance efficacy and reduce adverse events.\u003c/p\u003e","manuscriptTitle":"Drug-Gene Network Signature Modeling Predicts Breast Cancer Patient Response to Neoadjuvant Chemotherapy","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-03-18 12:21:28","doi":"10.21203/rs.3.rs-6130021/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"e4048df6-ac75-4085-bdf1-a3b9f5085e8f","owner":[],"postedDate":"March 18th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":45321876,"name":"Biological sciences/Cancer/Breast cancer"},{"id":45321877,"name":"Biological sciences/Cancer/Tumour biomarkers"},{"id":45321878,"name":"Health sciences/Oncology/Cancer/Tumour heterogeneity"}],"tags":[],"updatedAt":"2025-04-11T15:23:38+00:00","versionOfRecord":[],"versionCreatedAt":"2025-03-18 12:21:28","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-6130021","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-6130021","identity":"rs-6130021","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00
unpaywall: last seen: 2026-05-23T02:00:01.238055+00:00

License: CC-BY-4.0