Identification and Validation of Novel CAF Markers in Breast Cancer

preprint OA: closed
Full text JSON View at publisher
Full text 168,925 characters · extracted from preprint-html · click to expand
Identification and Validation of Novel CAF Markers in Breast Cancer | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Identification and Validation of Novel CAF Markers in Breast Cancer Xin Zhou, Na Wang, Ling Shi, Dongxin Wei, Xiaoqin Sun, Mingxiu Shao, and 4 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-6479762/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 14 Jan, 2026 Read the published version in Scientific Reports → Version 1 posted 11 You are reading this latest preprint version Abstract Breast cancer remains a major global health challenge with high incidence and mortality rates among women. Recent studies have highlighted the critical role of the tumor microenvironment, particularly cancer-associated fibroblasts (CAFs), in tumor progression. However, current understanding of CAFs heterogeneity and its implications for breast cancer diagnosis and treatment remains limited. This study aimed to identify novel CAFs marker genes and develop a diagnostic model to improve breast cancer diagnosis and therapeutic strategies. We employed various machine learning algorithms to identify feature genes associated with cancer-associated fibroblasts (CAFs). Based on these genes, we constructed a high-precision diagnostic model for breast cancer. Furthermore, through single-cell analysis, we delved into the heterogeneity of CAFs and predicted the sensitivity of different CAF subsets to specific drugs. To validate the expression of these characteristic genes, immunohistochemical experiments were also conducted. This study identified FXYD1, SULF1, and TNXB as novel biomarkers for cancer-associated fibroblasts (CAFs) in breast cancer based on machine learning. Among these evaluated algorithms, the Random Forest algorithm distinctly stood out as the best due to its robust classification accuracy and stability. Single-cell analysis provided insights into the heterogeneity of CAFs between Luminal and non-Luminal breast cancer, thereby enhancing our understanding of the tumor microenvironment. Drug sensitivity predictions indicated that distinct CAF subsets responded differently to specific drugs, laying a solid foundation for the development of personalized breast cancer treatment strategies. Through immunohistochemistry (IHC), the expression patterns of these three biomarkers were verified: FXYD1 was expressed in myoepithelial and fibroblasts in normal breast tissue but was significantly absent in breast cancer; SULF1 was upregulated in fibroblasts of breast cancer; while the expression of TNXB did not exhibit notable variations between normal and cancerous tissues. These findings not only highlight the crucial roles played by FXYD1, SULF1, and TNXB in the development of breast cancer, but also uncover the heterogeneity CAFs. Consequently, our research provides a fresh perspective and a solid theoretical basis for advancing both early and precise diagnostic methods, as well as tailored therapeutic strategies. Biological sciences/Cancer Biological sciences/Cancer/Breast cancer Biological sciences/Cancer/Cancer microenvironment cancer-associated fibroblasts breast cancer machine learning immunohistochemistry diagnostic model Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Introduction Breast cancer, a prevalent malignancy among women globally, continues to exhibit high incidence and mortality rates, posing a significant public health challenge 1,2 . The tumor microenvironment (TME) serves as the fertile ground for tumor cell development and progression, supported by numerous studies 3–5 . Within this intricate ecosystem, cells can be broadly categorized into immune cells and stromal cells. Among stromal cells, a growing body of evidence underscores the pivotal role of specific subsets, particularly cancer-associated fibroblasts (CAFs), in tumor progression 6–8 . CAFs have emerged as a central player, with multiple studies elucidating their essential functions in cancer proliferation, advancement, and invasion 9 . Existing research demonstrates that CAFs interact intimately with cancer cells and play a crucial role in mediating and facilitating the metastasis of breast cancer. 10–12 . The emerging evidence reveals that the paradigms of cancer-centric therapeutics have limited therapeutic options in the clinic 13 . Consequently, there is a pressing necessity for a deeper exploration of CAF heterogeneity. Current research endeavors aimed at CAF classification and marker identification, though ongoing, remain limited in scope, with minimal translation into clinical practice 14 . This study aimed to identify potential novel CAF-related markers through the application of advanced machine learning algorithms to both single-cell and bulk datasets. Consequently, we have identified three signature genes: FXYD1 (fxyd domain-containing transport regulator 1), SULF1, and TNXB, which have received limited attention in breast cancer research to date. FXYD1, a crucial regulator of ion channel transport, encodes the phospholemman (PLM) protein, which plays a vital role in heart and brain tissue 15 . Given its importance in these critical systems, its potential involvement in breast cancer pathogenesis merits further examination. SULF1, a sulfatase enzyme, modulates tumor development by influencing the binding affinity of cell surface heparan sulfate proteoglycans 16 . Similarly, TNXB, an extracellular matrix protein, contributes to collagen network assembly and tissue integrity 17 . This study delves into the mechanisms of these genes in breast cancer initiation and progression through comprehensive analyses that encompass gene expression patterns, copy number alterations (CNAs), functional evaluations, and drug sensitivity predictions. Each of these components contributes to a holistic understanding of the genes' roles in breast cancer development. Immunohistochemical validation of these markers in both benign and malignant breast tissue samples provided a robust theoretical basis for advancing the diagnosis of breast cancer. However, to translate these groundbreaking discoveries into tangible clinical benefits, further rigorous clinical validation was imperative. Additionally, this study presented novel insights into CAF heterogeneity, uncovering promising avenues for the tailored development of therapeutic strategies. We anticipate that our findings will establish a solid scientific foundation for earlier diagnosis, more accurate prognosis assessment, and the accelerated development of personalized therapeutic strategies. Ultimately, by elucidating the expression patterns of FXYD1, SULF1, and TNXB in breast cancer progression, we aim to enhance patient outcomes and quality of life, thereby paving the way for the development of targeted therapies that can more effectively address this devastating disease. Results 1. Identification of feature genes of CAFs in breast cancer by machine learning Three previous single-cell studies 29–31 , each providing a unique list of genes associated with CAFs (cancer-associated fibroblasts) in breast cancer, were integrated into our analysis. We first conducted an intersection operation among these three lists, identifying the genes that were consistently reported across all studies. Subsequently, to further refine our candidate gene list, we intersected this consensus gene set with differentially expressed genes (DEGs) between breast cancer and normal breast tissues. Notably, these DEGs were downloaded from the GEPIA2 website. This dual intersection approach allowed us to identify 28 highly promising candidate genes (Figure 1A, Supplementary Table 2). Next, we used the caret package to perform feature selection on the TCGA-Train dataset. After evaluating the performance of six built-in feature selection methods, the random forest algorithm emerged as the most suitable due to its superior classification accuracy and stability, determining the optimal number of variables to be 3 (Figure 1B). To gain a deeper understanding of the results, we visually demonstrated them in the form of a Sankey diagram (Figure 1C) and tables (Supplementary Tables 3-4), which clearly showed the process of feature selection. In-depth analysis of the TCGA datasets revealed that FXYD1 and TNXB were significantly downregulated in breast cancer tissues, while SULF1 was significantly upregulated (Supplementary Figure 1). This abnormal expression pattern strongly suggests the potential key roles of these three genes in the development of breast cancer. To further validate the effectiveness of these feature genes in distinguishing cancer from normal breast tissues, we performed PCA and tSNE dimensionality reduction analyses. The results were compelling: Cancer tissues were effectively differentiated from normal breast tissues based on the expression profiles of these genes (Supplementary Figure 1). Additionally, we conducted external validations using two independent datasets, GSE65194 and GSE233242. The results revealed consistent results with the TCGA dataset (Figure 1D-E), further reinforcing our findings. Subsequently, we conducted an extensive exploration of the clinical and prognostic implications of these three genes across diverse datasets utilizing the BEST portal. It revealed significant correlations between the expression of FXYD1 and TNXB and breast cancer grade, with both genes exhibiting decreased mRNA levels as the grade increased. However, no significant associations were found between the three genes and patient outcomes, including overall survival (OS), disease-free survival (DFS), relapse-free survival (RFS), disease-specific survival (DSS), and progression-free survival (PFS). To avoid redundancy within this paper, detailed results are available at the BEST website. 2. Model construction and comparison of diagnostic performance. Using FYXD1, SULF1, and TNXB as feature genes, we establish diagnostic models utilizing various algorithms to distinguish between breast cancer and normal tissue. The performance of the different models across the datasets is summarized in Supplementary Table 5. Across the internal datasets, all models exhibit robust performance, achieving AUC and accuracy scores exceeding 0.9. Notably, the RF model stands out, demonstrating particularly significant performance (Supplementary Figure 2A-B and Supplementary Table 5). Specifically, on the TCGA-test and TCGA-all datasets, the RF model achieves AUC values of 0.9941 and 0.9944, respectively, and accuracy values of 0.9655 and 0.9319. The high true positive and true negative rates in the confusion matrices further validate its excellent diagnostic capability (Supplementary Figure 2A). The SVM and XGB models also performed well on the testing datsets. However, the GLM and NB models slightly lag behind on the internal validation datsets (Supplementary Figure 2A-B and Supplementary Table 5). Importantly, even in the imbalanced TCGA-all dataset, where cancer samples significantly outnumber normal ones, all models demonstrate exceptional ability in identifying minority class samples (i.e., normal tissue), as evidenced by the high PR-AUC values (Supplementary Figure 2C). This finding underscores the robustness of our models in handling imbalanced data. When applying these models to the external validation datsets GSE233242 and GSE65194, results varied. On GSE233242, the AUC and accuracy of the RF model decreased slightly but remained within an acceptable range (AUC=0.8732, accuracy=0.6744). In contrast, the SVM and KNN models saw significant decline in performance, almost losing their predictive power (Figure 2A-B, Supplementary Table 5). On GSE65194, the RF model maintained its superior performance (AUC=0.904, accuracy=0.9085), while the GLM and KNN models showed notable improvements. However, the SVM model struggled to maintain its initial performance. Notably, despite achieving a high true positive rate on GSE65194, the NB model had an extremely limited ability to recognize normal samples (correctly identifying only one case) (Figure 2A, Supplementary Table 5). Based on the PR-AUC values from both external datasets, the RF model remained the top performer (Figure 2C). Upon comprehensive analysis of these results, the RF model not only excelled on internal datasets but also demonstrated robust generalization ability on external validation sets, further emphasizing the pivotal role of FYXD1, SULF1, and TNXB as feature genes in breast cancer diagnosis. Moving beyond the assessment of the model's performance, we conducted an in-depth analysis of Shapley Additive exPlanations (SHAP) values, revealing variations in the importance of these feature genes across the diverse datasets. In contrast to TNXB's consistent prominence in both the testing and internal validation sets, FYXD1 showed a distinct lead in the rankings of the two external validation sets (Figure 2D, 2E; Supplementary Figure 2D, 2E). We primarily attributed this discrepancy to the inherent diversity in data distributions across different datasets. Having explored the diagnostic potential of these three feature genes to some extent, we intend to further investigate their potential in subsequent sections of this paper. 3. CNA analysis In the TCGA database, a comprehensive analysis was conducted on the variations of the three feature genes: FXYD1, SULF1, and TNXB. Specifically, FXYD1 alterations were observed in 44 samples (approximately 5% of the cohort), primarily manifesting as amplification (2.5%, n=24) and mRNA high (1.67%, n=16) (Supplementary Figure 3A-B). Likewise, SULF1 exhibited a broader spectrum of variations across 114 samples (approximately 12%), with amplification being the most prevalent (9.27%, n=89), reinforcing its potential significance in tumorigenesis and progression (Supplementary Figure 3A-B). The variation landscape of TNXB was comparatively intricate, with alterations detected in 45 samples (approximately 5%), encompassing mutation (0.94%, n=9), amplification (1.04%, n=10), mRNA high (1.15%, n=12), mRNA low (1.15%, n=11), and multiple alterations (0.31%, n=3) (Supplementary Figure 3A-B). These diverse variation patterns may mirror the multifaceted roles played by TNXB in tumor biology. Shifting attention to the METABRIC dataset, we observed similar yet distinct trends. Variations in FXYD1 were detected in 104 samples (approximately 6%), with amplification (1.93%, n=36) and mRNA high (3.59%, n=67) remaining the predominant forms (Supplementary Figure 4A-B). Notably, the frequency of SULF1 variations significantly increased, observed in 384 samples (approximately 21%), with amplification accounting for the vast majority (15.86%, n=296), further corroborating the high prevalence of SULF1 variations in breast cancer (Supplementary Figure 4A-B). The variation pattern of TNXB in the METABRIC dataset mirrored that in TCGA, but with distinct numerical specifics—specifically, amplification was observed in 0.86% of samples (n=16), mRNA high in 2.84% (n=53), and mRNA low in 1.23% (n=23) (Supplementary Figure 4A-B). In terms of survival analysis, no significant associations were observed between genetic alterations in all feature genes and either OS or RFS in the TCGA dataset (Supplementary Figure 3C-D). However, interestingly, in the METABRIC dataset, genetic alterations in FXYD1 were significantly associated with improved OS; specifically, the altered group exhibited significantly better prognosis compared to the non-altered group, suggesting a potentially favorable prognostic effect of FXYD1 variations (Supplementary Figure 4C-D). In contrast, SULF1 was negatively correlated with DFS, with the non-altered group faring better, which may be attributed to the promoting role of SULF1 in tumor progression (Supplementary Figure 4C-D). 4. The expression patterns of feature genes at the single-cell resolution In this section, we executed a series of systematic strategies for cell classification. Initially, based on the specific expression patterns of EPCAM and PTPRC, cells were categorized into three distinct groups: epithelial cells, immune cells, and a non-specific stroma cell population (Supplementary Figure 5). UMAP plots show the expression patterns of three feature genes within the stroma cells: Specifically, FXYD1 expression was significantly downregulated in cancerous tissues compared to adjacent normal tissues; conversely, SULF1 expression was markedly upregulated; whereas TNXB did not display any notable difference in expression levels between cancerous and normal tissues, thereby providing essential insights for our subsequent investigations (Supplementary Figure 5). To comprehensively unravel the heterogeneity of stroma cells, we conducted an extensive secondary clustering analysis, refining them into three major subpopulations: EPCAM1+ endothelial cells, RGS5+ pericytes, and PDGFRA+ fibroblasts (Figure 3A). Notably, although endothelial cells from both normal and cancerous tissues exhibited overlap in their expression profiles, posing a challenge for clear distinction, pericytes and fibroblasts could be distinctly categorized based on tissue type (Figure 3B). Within the stromal cell subpopulations, cancer cells from different types of breast cancer were intermixed, lacking distinct subpopulation differentiation or heterogeneity (Figure 3C). Through further refined clustering analysis, we segmented the stroma cells into five specific subpopulations: endothelial cells; normal pericytes; cancer pericytes; normal fibroblasts; and CAFs (Figure 3D). Specifically, FXYD1 was predominantly expressed in normal fibroblasts and, to a lesser degree, in pericytes; SULF1 was enriched primarily in CAFs; and TNXB was expressed in both normal fibroblasts and CAFs (Figure 3E-F). To gain a deeper understanding of the molecular mechanisms underlying the transformation of fibroblasts into CAFs, we further subdivided the fibroblast population, distinguishing three key subpopulations: normal fibroblasts; myofibroblastic CAFs (mCAFs), marked by ACTA2 expression; and inflammatory CAFs (iCAFs), characterized by CXCL14 expression. Notably, we found that FXYD1 and TNXB were more prominently expressed in iCAFs, suggesting a potential link to their inflammatory regulatory roles within the tumor microenvironment. Conversely, SULF1 was preferentially enriched in mCAFs, indicating its pivotal role in the development of myofibroblastic CAFs (Figure 3G). To dynamically simulate the transition from normal fibroblasts to CAFs, we employed advanced pseudotime analysis techniques. Our findings indicate that mCAFs occupy the terminal stage of development. Notably, during this transition, FXYD1 expression gradually diminishes, which may correlate with the loss of certain functions as fibroblasts transform into CAFs. Conversely, TNXB expression exhibits an initial surge followed by a decline, mirroring the dynamic shifts in extracellular matrix remodeling that accompany the transition. Furthermore, SULF1 expression consistently intensifies, emphasizing its central role in CAF development and functional preservation (Figure 3H-I). Upon deeper exploration of CAF subdivision, we observed marked heterogeneity between Luminal and non-Luminal breast cancer CAF populations (Figure 3J), allowing for their classification into four distinct subgroups: Luminal iCAFs, Luminal mCAFs, non-Luminal iCAFs, and non-Luminal mCAFs (Figure 3K). Finally, a bubble plot visually represents the expression profiles of ACTA2, CXCL14, and CAF-specific genes across these diverse subpopulations, revealing their unique expression signatures within the CAF subgroups (Figure 3L). 5. Functional analysis Through GO and KEGG analyses, we have unraveled the unique biological functions and pathways associated with different types of fibroblasts in breast cancer. For the functional analysis, we present only the top 10 results (Figure 4), with marker genes for each fibroblast type listed in Supplementary Table 6 and 7. Our analysis underscores the complexity of CAFs' roles within the tumor microenvironment. Specifically, mCAFs play a pivotal role in extracellular matrix remodeling and nutritional support, whereas normal fibroblasts are intimately linked to immune responses and inflammatory processes, potentially maintaining immune homeostasis via signaling pathways such as IL-17 and TNF. iCAFs play a pivotal role in regulating inflammation, immune responses, and cellular signaling, crucial for both physiological homeostasis and pathological conditions. Upon further examination, distinct functional characteristics between Luminal mCAFs and non-Luminal mCAFs have been discerned regarding protein synthesis and immune modulation. Notably, Luminal mCAFs exhibit significant enrichment in pathways related to ribosomal function, emphasizing their crucial role in protein synthesis. Conversely, non-Luminal mCAFs demonstrate greater enrichment in pathways associated with autoimmune diseases and pathogen infections, suggesting unique functions in immune regulation and resistance to infections. Regarding iCAFs, Luminal iCAFs are prominently associated with inflammation- and tumor-related signaling pathways, indicating their pro-inflammatory and pro-tumorigenic effects within the tumor microenvironment. Meanwhile, non-Luminal iCAFs are enriched in pathways linked to complement and coagulation cascades, as well as cytokine-receptor interactions, highlighting their significant roles in regulating inflammatory responses and blood coagulation. These results underscore not only the functional diversity of CAFs in the cancer microenvironment but also pave the way for novel research avenues and potential therapeutic interventions. 6. Drug sensitivity prediction Based on BCs, we successfully subdivided Luminal CAFs into four subgroups (TC0 to TC3) and non-Luminal CAFs into three subgroups (TC0 to TC2). However, due to the scarcity of TC2 subgroup cells in non-Luminal CAFs, we excluded the analysis results for this subgroup. To provide an intuitive illustration, we employed UMAP dimensionality reduction plots to showcase the distribution of these distinct CAF types in the reduced space (Figures 5A, 5G). Additionally, regarding drug sensitivity prediction, we conducted detailed calculations for TCs and CAFs classifications. The detailed information of TOP Differential High Sensitivity Drugs across all classifications is listed in Supplementary Tables 8-11. Firstly, concerning the drug sensitivity prediction results for TC classifications, we present the top 5 differential high sensitivity drugs in each TC cluster through volcano plots (Supplementary Figure 6A and Supplementary Figure 6C). Furthermore, the UMAP plots show the distribution of cells sensitive to the respective top differential high sensitivity drug for each TC cluster (Supplementary Figure 6B and Supplementary Figure 6D). Specifically, in Luminal CAFs, TC0 was most sensitive to GSK-J4, TC1 to SCH-900776, TC2 to TENIPOSIDE, and TC3 to GSK525762A. For non-Luminal CAFs, TC0 favored AZD8055, while TC1 preferred SORAFENIB. Notably, the distribution patterns of these drug-sensitive cells were highly consistent with the TC classifications. Next, our findings provide important insights into CAF heterogeneity, providing robust theoretical support for the development of targeted therapeutic strategies aimed at specific CAF subgroups, but also significantly deepening our understanding of this complex phenomenon. Subsequently, we focused our efforts on predicting drug sensitivity within various CAF classifications. The volcano plots revealed the top five drugs with differential high sensitivity for each CAF classification (Figures 5B, 5H). Intriguingly, we discovered that drugs that are sensitive to mCAFs tend to be insensitive to iCAFs, and conversely, drugs that are sensitive to iCAFs are often insensitive to mCAFs (Figures 5C, 5E, 5I, and 5K), this finding offers a novel perspective on CAF heterogeneity. In Luminal CAFs, mCAFs exhibited sensitivity to drugs such as DASATINIB and SKI-II (Figures 5D), whereas iCAFs responded more favorably to ENTINOSTAT and MUBRITINIB (Figures 5F). For non-Luminal CAFs, a similar pattern of distinct drug sensitivity between mCAFs and iCAFs was observed (Figures 5J). Notably, DASATINIB and SKI-II played pivotal roles in both Luminal and non-Luminal mCAFs (Figures 5D and 5J), whereas MUBRITINIB demonstrated high sensitivity specifically to iCAFs (Figures 5F and 5L). Additionally, it is noteworthy that a previous study has validated the efficacy of DASATINIB in inhibiting CAFs 32 , thereby further enhancing the credibility of our drug sensitivity prediction results. 7. Verification of the expression patterns of feature genes by IHC We have thoroughly analyzed the IHC results and made several significant discoveries. In normal breast tissue, FXYD1 protein is predominantly located in myoepithelial and stromal cells. (Supplementary Figure 7A-B). However, within carcinoma in situ, a marked decline in FXYD1 expression is observed within these cells, with occasional expression noted in the peritumoral stromal area. (Supplementary Figure 7A-B). Notably, in contrast to these findings, the expression of FXYD1 is completely absent in the invasive carcinoma cases (Supplementary Figure 7A-B). In fibroadenomas, the proliferation of fibrous tissue specifically leads to an enhanced expression of FXYD1 in fibroblasts, which is clearly visible under examination (Supplementary Figure 8A). By analyzing a pathological slide encompassing normal breast tissue, carcinoma in situ, and invasive carcinoma (Supplementary Figure 9A), we directly observe variations in FXYD1 expression across these distinct pathological stages. The gradual loss of FXYD1 expression pattern suggests a possible suppressive role in breast tumor progression. Upon investigation of SULF1 protein, we have observed a markedly elevated expression level in stromal fibroblasts within cancer tissue, in comparison to normal breast tissue and fibroadenomas (Supplementary Figure 7C and Supplementary Figure 8B). Our analysis, however, did not uncover any significant correlation between SULF1 expression and various clinicopathological features (Supplementary Figures 7D). Similarly, Figure 9B presents another pathological slide showcasing normal breast tissue, carcinoma in situ, and invasive carcinoma. Upon examination, this slide reveals variations in the level of SULF1 expression across these distinct pathological stages. Notably, the distinct expression pattern of SULF1 hints at a potentially significant contribution to the formation and modulation of the breast tumor microenvironment. However, no significant difference in TNXB expression was observed in benign and malignant breast tissues (Supplementary Figure 10). This finding highlights the need for further investigation into the potential mechanisms underlying the regulation of TNXB expression in the context of breast cancer development. We have selected two consecutive histological sections of breast cancer tissue to demonstrate the distinct expression patterns of FXYD1 and SULF1. Specifically, the expression levels of FXYD1 protein are notably higher in normal breast tissue compared to those in cancerous tissue (Figure 6A), whereas SULF1 expression levels are more pronounced in cancerous tissue (Figure 6B). Notably, FXYD1 is expressed not only within myoepithelial cells but also prominently on vascular walls. Based on our previous single-cell analysis, which provided insights into the cellular distribution of FXYD1, we hypothesize that FXYD1 may also be expressed in perivascular cells (Figure 6A and Figure 3E). To attain a deeper comprehension of the expression patterns exhibited by FXYD1 and SULF1, we compared the expression patterns of FXYD1 and SULF1 with some important protein markers. The use of α-SMA (Figure 7A), P63 (Figure 7B), and Calponin (Figure 7C) as markers for identifying myoepithelial cells is consistent and reliable in the daily practice of clinical pathological diagnosis. On the other hand, α-SMA and Vimentin are known as markers of CAFs. Upon close comparison, it is evident that in normal breast tissue, similar to Vimentin (Figure 7D), FXYD1 is expressed in both myoepithelial cells and stromal fibroblasts (Figure 7E). Notably, however, SULF1 expression is scarcely detectable within the stromal compartment of normal breast tissue (Figure 7F). As the tissue transitions towards malignancy, resulting in carcinoma in situ, markers like P63, Calponin, α-SMA, and Vimentin continue to be expressed in myoepithelial cells; additionally, Vimentin expression is markedly elevated in fibroblasts (Figure 8A). Conversely, FXYD1 expression is drastically reduced, becoming absent in both myoepithelial cells and fibroblasts. This inverse trend is observed with SULF1, whose expression increases in the stromal compartment, suggesting a potential tumor-suppressive function for FXYD1 and a tumor-promoting role for SULF1 during early tumorigenesis. In the context of invasive carcinoma, the persistent absence of FXYD1 expression underscores its vital role in inhibiting tumor progression (Figure 8B). Furthermore, the notable upregulation of SULF1 expression in fibroblasts, echoing the pattern seen with α-SMA and Vimentin, underscores its potential contribution to tumor progression (Figure 8B). Discussion This study applied multiple approaches to Recursive Feature Elimination (RFE), a statistical technique that enhances model performance by iteratively discarding the least significant features. This method effectively identified the CAF-associated genes FXYD1, SULF1, and TNXB, which are closely associated with breast cancer. This finding underscores the vital role of cancer-associated fibroblasts (CAFs) in breast cancer pathology and suggests these genes as promising new therapeutic targets. Among the machine learning algorithms assessed, the Random Forest (RF) model particularly excelled, greatly improving diagnostic accuracy and paving the way for new personalized treatment options. Data analysis from the BEST database revealed noteworthy correlations between the expression of FXYD1 and TNXB and tumor grade, providing essential insights. Nonetheless, despite the significant relationship of these three genes with tumor grading, they did not show a strong correlation with patient prognosis, highlighting the complexities inherent in cancer biology. This necessitates comprehensive research that includes a wider array of biological and clinical factors. Furthermore, the observed disparities in model performance across external validation datasets illustrate the difficulties in achieving consistent predictability, emphasizing the need for future studies to incorporate diverse datasets to enhance robustness and accuracy. Future research should also explore more sophisticated machine learning algorithms to improve the generalizability of predictive models, thereby advancing our understanding of cancer and treatment strategies. Although existing literature has examined the significance of these three feature genes in breast cancer, research on them remains relatively limited. This highlights the need to discuss our findings further. FXYD family consists of seven members (FXYD1 to FXYD7), which serve as tissue-specific regulators of Na+/K+-ATPase activity in cellular membranes, influencing its function based on tissue type. 33 . Given the well-documented roles of FXYD3, FXYD5, and FXYD6 across various cancer types 34–41 , , we aimed to investigate the expression patterns of FXYD1 in benign and malignant breast tissues, an area that has not been thoroughly explored. FXYD1 shows a distinct expression pattern in normal tissues, with significantly higher levels in the heart, kidneys, placenta, skeletal muscle, gastrointestinal tract, and colon, while moderate levels are found in breast samples. 42 . Research involving quantitative real-time PCR of clinical samples has indicated a notable downregulation of FXYD1 in ovarian cancer tissues, associating its overexpression with enhanced migratory and invasive characteristics of ovarian cancer cells, unrelated to proliferation 43 . Our current study specifically demonstrated that FXYD1 had higher immunohistochemical expression in normal breast tissue but was significantly reduced in breast cancer tissues, particularly in myoepithelial cells and CAFs. Therefore, we speculate that the downregulation of FXYD1 may be closely related to CAF activation and the modulation of the tumor microenvironment. This underscores its important role in breast cancer diagnostics and treatment, which warrants further validation. The sulfatase family, comprising sulfatase 1 (SULF1) and sulfatase 2 (SULF2), is important for controlling the sulfation of heparan sulfate proteoglycans (HSPGs). This modification greatly affects various physiological and pathological functions, including cell signaling, proliferation, migration, and differentiation 44,45 . The importance of SULF1 in a range of cancers, including prostate, ovarian, esophageal, hepatocellular, gastric, and colon cancers, has been widely recognized 46–51 . The expression level of SULF1 is nearly absent in normal breast tissue and low in benign and hyperplastic lesions. In contrast, SULF1 expression significantly rises in triple-positive and triple-negative breast cancers, particularly during the later stages of tumors, where its short splice variants are the most prevalent. 52 . Although SULF1 is important, research on its complex relationship with cancer-associated fibroblasts (CAFs) has been limited until recently. A groundbreaking study has shed light on this area, showing that SULF1, a signaling molecule secreted by CAFs, promotes metastasis and cisplatin (CDDP) resistance in gastric cancer cells by binding to TGFBR3 on their surfaces, thereby activating the TGF-β signaling pathway. 46 . Our study examined the previously overlooked relationship between SULF1 and breast cancer, revealing significant changes in SULF1 expression in breast fibroblasts throughout cancer progression. In normal breast tissues, SULF1 levels were low; however, they increased significantly in breast cancer tissues, especially in aggressive tumors. This change suggested that SULF1 may activate and enhance the functions of cancer-associated fibroblasts (CAFs). As tumors transitioned from in situ to invasive stages, SULF1 expression rose in fibroblasts, potentially aligning with the increased pro-tumorigenic activities of CAFs in the tumor microenvironment. These findings underscored the importance of SULF1 as a marker of functional changes in CAFs and introduced potential molecular targets for developing therapies aimed at CAFs in breast cancer. The tenascin family comprises four members—tenascin-C (TNC), tenascin-R (TNR), tenascin-X (TNXB), and tenascin-W (TNW)—each playing a pivotal role in diverse biological processes, including tissue regeneration, inflammatory diseases, tumorigenesis, and wound healing 53 . Under physiological conditions, TNXB functions as a crucial regulator of collagen deposition, fibril spacing, mechanical properties, and fibrillogenesis in various physiological contexts 54–56 . As early as 2002, researchers uncovered the intricate relationship between TNXB and fibroblasts, notably observing that B16 melanoma cells demonstrated reduced adhesion and spreading capabilities, coupled with increased detachment, when cultured on TNXB-null fibroblasts 57 . Recently, a pan-cancer analysis of TNXB significantly highlighted its reduced expression in breast cancer tissues compared to normal tissues, as determined by IHC 58 . Our bioinformatics analysis, using multiple datasets, clearly showed the downregulation of TNXB mRNA in breast cancer and identified TNXB as a marker for cancer-associated fibroblasts (CAFs) through single-cell analysis. However, the IHC findings did not lead to conclusive results. Therefore, further validation of the connection between TNXB and CAFs in breast cancer is necessary, particularly to understand how TNXB impacts CAF functions and contributes to the progression of breast cancer. Our ongoing research aims to address this issue and provide a deeper insight into the interactions between TNXB, CAFs, and breast cancer. Analysis of changes in the FXYD1, SULF1, and TNXB genes from the TCGA and METABRIC datasets provided important insights into their roles in breast cancer. Notably, the significant amplification of SULF1 in many samples highlighted its critical role in tumor development, warranting further investigation into the mechanisms behind this amplification. Interestingly, alterations in FXYD1 were associated with improved overall survival in the METABRIC dataset, suggesting a protective effect, while the TCGA dataset did not show similar positive results. Conversely, changes in SULF1 were associated with worse outcomes and shorter disease-free survival. This contrast emphasized the complex interactions between these genes in tumor progression and opened up opportunities for targeted therapies. Furthermore, the diverse patterns of TNXB alterations indicated its various contributions to the tumor microenvironment, reinforcing the need for a deeper understanding of its biological significance. Given the discrepancies across different datasets, future research should have focused on validating these findings in various populations and cancer subtypes, enhancing our knowledge of breast cancer biology and guiding the development of more effective treatments. In this study, we conducted a comparative analysis of normal mammary fibroblasts with iCAFs and mCAFs, revealing differences in their functionalities and signaling pathways. A previous study suggested that immune modulators, including myeloperoxidase (MPO) and inflammatory cytokines such as tumor necrosis factor alpha (TNF-α), may contribute to the development of high breast density by modulating gene expression patterns and collagen production in fibroblasts, ultimately influencing the risk of breast cancer 59 . Researchers have previously summarized that mCAFs are primarily responsible for the generation and remodeling of the extracellular matrix, providing support and migration pathways for tumor cells; on the other hand, iCAFs influence the tumor immune microenvironment by secreting inflammatory cytokines and immunoregulatory molecules, thereby facilitating tumor immune evasion and further progression 60 . Our study found that normal fibroblasts serve as the "baseline," primarily involved in regulating immune responses and inflammatory processes, while mCAFs are mainly engaged in the remodeling of the extracellular matrix, and iCAFs play a role in inflammation, immune reactions, and cellular signaling. Notably, for the first time, we introduce the distinction between Luminal and non-Luminal CAFs, highlighting the diversity and complexity of CAFs in breast cancer. Luminal CAFs exhibit a pronounced ability to promote protein synthesis and inflammatory responses, potentially accelerating tumor growth and progression. Conversely, non-Luminal CAFs contribute uniquely to immune regulation, anti-infection, and blood coagulation, which may modulate the tumor microenvironment dynamics. These discoveries not only offer novel insights and potential therapeutic targets for precision breast cancer therapy but also pave the way for future research endeavors and therapeutic interventions. By attaining a deeper comprehension of the intricate functions and functional heterogeneity of CAFs within the breast cancer microenvironment, we can devise more targeted treatment strategies, with the ultimate goal of effectively suppressing tumor growth. In our efforts to find new therapies for breast cancer, we have identified cancer-associated fibroblasts (CAFs) as significant therapeutic targets. Therapeutic strategies that focus on CAFs can involve targeting their surface markers, secreted factors, metabolic pathways, epigenetic modifications, immunoregulatory roles, and mechanical characteristics, along with specific interventions for different subgroups 60 . However, the heterogeneity of CAFs—evident in their varied functions, phenotypic traits, and drug sensitivities across subgroups—poses challenges for any single treatment method. Therefore, it is vital to have a clear understanding of CAF classification when designing effective therapeutic strategies. By predicting sensitive drugs for iCAFs and mCAFs in Luminal and non-Luminal breast cancer subtypes, respectively, we have provided a crucial foundation for the development of targeted therapies directed at specific CAF subgroups. Critically, we have also unveiled an intriguing phenomenon of mutual exclusivity in drug sensitivity among CAF subgroups, whereby certain drugs effective against iCAFs may be ineffective for mCAFs, and vice versa. This revelation not only offers a fresh perspective on CAF heterogeneity but also lays the groundwork for formulating combinatorial therapeutic strategies. By combining drugs sensitive to different CAF subgroups, we can more effectively inhibit their tumor-promoting effects while mitigating adverse effects and minimizing the risk of drug resistance. Previous research has demonstrated that Dasatinib can substantially inhibit the growth of cancer-associated fibroblasts (CAFs) in lung cancer, potentially augmenting the effectiveness of anticancer therapies 32 . Our analysis highlights the sensitivity of Luminal mCAFs to Dasatinib. This revelation not only reinforces the credibility of our results but also emphasizes Dasatinib's promising potential as a therapeutic target for CAFs. Although this study successfully identified CAFs markers in breast cancer, several limitations persist. Firstly, the dependency on specific datasets may undermine the universality of the findings, thereby constraining the model's ability to generalize to external datasets. Additionally, the existence of a significant correlation between gene expression and patients' long-term prognosis remains unclear. Furthermore, despite shedding light on the heterogeneity of CAFs, a deeper understanding of their multifaceted roles within the tumor microenvironment warrants further exploration. Drug sensitivity predictions, too, require rigorous validation to ensure their reliability, and potential biases in sample selection and statistical methodologies must be recognized and addressed. Immunohistochemical studies, however, are constrained by small sample sizes, underscoring the need for larger-scale investigations across diverse clinical contexts to substantiate the efficacy of these markers. Concurrently, integrating multi-omics data for nuanced analysis will bolster our comprehension of the intricate mechanisms underpinning the roles of these genes in breast cancer. Critically, translating these research insights into clinical practice represents a pivotal future endeavor, with the aim of devising diagnostic tests and therapeutic interventions tailored to these signature genes, ultimately facilitating early breast cancer detection, precise prognosis estimation, and individualized treatment regimens. Methods 1. Data Acquisition. In our study, we utilized the TCGAbiolinks package to access TCGA-BRCA TPM (Transcripts Per Kilobase Million) data along with corresponding patient clinical profiles. We applied the following data exclusion criteria: 1) genes with low expression, defined as those having an expression level of zero in more than 10% of the samples; 2) cases with incomplete clinical information; 3) male cases. We downloaded gene expression profile data (GSE65194 and GSE233242) and corresponding clinical information from the public Gene Expression Omnibus database (GEO, http://www.ncbi.nlm.nih.gov/geo/). Additionally, we obtained a single-cell dataset (GSE161529) from the GEO database, selecting 13 normal breast tissue samples, 6 Her2-positive breast cancer tissue samples, 17 ER-positive breast cancer tissue samples, and 8 triple-negative breast cancer tissue samples. 2. Identification of feature genes of CAFs in breast cancer by machine learning. We employed the caret package 18 to implement our machine learning pipeline, with the code being available at https://topepo.github.io/caret/index.html. Initially, we divided the paired samples from the TCGA-BRCA dataset into a training set (TCGA-train, comprising 68 pairs of cancer and normal samples) and a test set (TCGA-test, containing 29 pairs), following a 7:3 ratio. Furthermore, the entire TCGA-BRCA dataset (TCGA-all), encompassing 1076 cancer samples and 99 normal samples, served as the internal validation set to ensure robustness. To broaden the model's applicability and assess its generalizability, we incorporated two external validation sets: GSE65194, a microarray dataset consisting of 153 cancer samples and 11 normal samples, and GSE233242, a high-throughput sequencing dataset composed of 43 pairs of cancer and normal samples. Prior to model training, data preprocessing was crucial. To improve model performance and reduce redundant information, we eliminated collinear variables using a threshold of r >= 0.7 19 . Continuous variables underwent z-scoring normalization using the 'center' and 'scale' methods provided by the caret preProcess function. This standardization step was essential to ensure that all variables contributed equally to the model, regardless of their original scales. For feature selection, we opted for the Recursive Feature Elimination (RFE) method with 10-fold cross-validation. Six algorithms—Random Forest (RF), Generalized Additive Models (GAM), Linear Discriminant Analysis (LDA), Logistic Regression (LR), Naive Bayes (NB), and Bagged Trees (BT)—built in the caret package were utilized to guide this process. Through this rigorous selection, the optimal subset of features was identified. 3. Model construction and comparison of diagnostic performance. We employed six machine learning methods to develop diagnostic models using the TCGA-train dataset with 10-fold cross-validation. These methods include RF, NB, K-Nearest Neighbors (KNN), Generalized Linear Models (GLM), XGBoost (XGB), and Support Vector Machine (SVM). To comprehensively evaluate the performance of the models, we utilized confusion matrices (generated using the caret package), Receiver Operating Characteristic (ROC) curves (calculated with the pROC package), and Precision-Recall (PR) curves (computed with the PRROC package) on the test set, as well as the internal and external validation sets. For global interpretation of the models, we employed the fastshap package, with results visualized using the shapviz package. The code for interfacing fastshap with caret models is available at https://harpomaxx.github.io/post/shap-values/. 4. Copy number alteration (CNA) analysis. We conducted a comprehensive analysis of genomic data utilizing the cBioPortal platform (https://www.cbioportal.org) 20,21 . Our investigation encompassed two datasets: the TCGA-BRCA (Breast Invasive Carcinoma, Firehose Legacy), and the METABRIC dataset. The 'OncoPrint' module was used to visualize variants. The 'Cancer Types Summary' module was used to gain an overview of genomic alterations. To further advance our analysis, we employed the ‘Survival’ module to assess the potential correlation between alterations in feature genes and patient survival time. This enabled us to gain deeper insights into the prognostic implications of specific genomic alterations in breast cancer. 5. Single-cell Analysis. In this section, we used Seurat v4 22 while referencing the data quality control protocols outlined in the 'scCancer' 23 package (code is available at https://github.com/wguo-research/scCancer). To ensure data integrity, we employed the "DoubletFinder" 24 package to meticulously remove potential doublets from our dataset. Furthermore, to enhance the comprehensiveness of our analysis, we leveraged the "Harmony" 25 package to seamlessly integrate data from multiple samples. Using previously validated markers from prior studies 26,27 , we achieved a precise delineation of distinct cell subgroups, thereby solidifying the foundation for our subsequent investigations. By using the “Findmarkers” function in Seurat, unique markers for each subgroup were identified, applying parameters as follows: min.pct = 0.1, logfc.threshold = 0.25, and p_adjust < 0.05. Subsequently, we conducted trajectory inference using the monocle3 package (code is available at https://github.com/cole-trapnell-lab/monocle3), which provided us with insights into the developmental pathways of the cells. Moving on to enrichment analysis, we chose the ‘ClusterGVis’ package (code is available at https://github.com/junjunlab/ClusterGVis), specifically utilizing the ‘prepareDataFromscRNA’ function to transform the single-cell data. Notably, the ‘diffData’ parameter was configured to encompass the top 20 marker genes for each identified cell subgroup. Following this, we applied the ‘enrichCluster’ function to perform extensive gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses. Furthermore, we used the Drug sensitivity signature collection (SSC) from the ‘beyondcell’ 28 package for drug sensitivity prediction (code is available https://github.com/cnio-bu/beyondcell). SSC identified transcriptional state changes occurring before and after drug treatment, by collecting and analyzing data from extensive drug sensitivity databases. Through differential expression analysis, SSC screened for gene expression patterns indicative of drug sensitivity and constructed gene expression signatures. These signatures were subsequently utilized to calculate Beyondcell scores for individual cells, accurately quantifying their sensitivity to specific drugs. Lastly, we performed cluster analysis to form treatment clusters (TC), further refining our prioritization of potential drug candidates.。 6. Clinical sample collection. In 2023, we collected surgical samples from the Breast and Thyroid Surgery Department of Zibo Maternity and Child Health Hospital, including breast fibroadenoma, breast carcinoma in situ, and invasive breast cancer. Each cancer sample was accompanied by adjacent normal tissue taken at least 5 cm away from the respective tumor margin (Supplementary Table 1). After collecting the samples, to ensure the quality of our dataset, we applied rigorous criteria in selecting the samples for inclusion in the study. Specifically, we chose patients who had not received any prior treatment and had undergone a modified radical mastectomy for breast cancer. The study was approved by the Ethics Review Committee of Zibo Maternal and Child Health Hospital, all methods were performed in accordance with the relevant guidelines and regulations, and all participants provided written informed consent. 7. Immunohistochemistry (IHC). Immunohistochemical experiments were conducted adhering to a standardized protocol. Initially, paraffin-embedded tissue sections underwent deparaffinization with xylene, followed by a series of ethanol solutions for rehydration. After antigen retrieval, using an EDTA buffer at a pH of 9.0 in a DAKO PT Link device heated to 97°C for 20 minutes, the samples were cooled to 65°C and rinsed with Tris-buffered saline. To inactivate endogenous peroxidase activity, 3% hydrogen peroxide was applied. The primary antibodies employed in this study included rabbit polyclonal antibodies specific to FXYD1, SULF1, and TNXB, sourced exclusively from Abcam, and respectively diluted at ratios of 1:200, 1:400, and 1:100. Additionally, as part of routine pathology practice, ready-to-use primary antibodies for P63, vimentin, α-SMA, and calponin, obtained from Fuzhou Maisen Biotechnology Co., Ltd., were also utilized. These latter antibodies, which are routinely employed in the pathology department for diagnostic purposes, were each applied to the sections for 20 minutes at room temperature. Subsequently, the sections were incubated with a secondary antibody, anti-mouse IgG from Dako, for an additional 20 minutes at room temperature. The final stage involved color development utilizing a DAB chromogen for a duration of one minute. The intensity and distribution of the staining signal were independently assessed by three qualified pathologists. IHC staining percentage was graded as follows: 1 = 0–25%; 2 = 26–50%; 3 = 51–75%; 4 = 76–100%. The IHC intensity was scored as follows: 0 = none (-); 1 = weak (1+); 2 = moderate (2+); 3 = strong (3+). The IHC score was calculated by multiplying the intensity and percentage scores. 8. Statistical Analysis. In our study, all statistical analyses and graphical presentations were executed using the R software, version 4.2.2, aided by a selection of tailored R packages that met our analytical requirements. For scenarios involving nonparametric data, we employed the Wilcoxon rank-sum test for pairwise comparisons and the Kruskal-Wallis H test when multiple groups were analyzed. This rigorous methodology allowed us to accurately gauge statistical significance within our dataset, ensuring comprehensive evaluation of our results. Abbreviations α-SMA: Alpha Smooth Muscle Actin BC: Breast Cancer CAF: Cancer-Associated Fibroblasts CNA: Copy Number Alteration DAB: 3,3'-Diaminobenzidine EDTA: Ethylenediaminetetraacetic Acid SSC: sensitivity signature collection ER: Estrogen Receptor FXYD1: fxyd domain-containing transport regulator 1 GAM: Generalized Additive Models GEO: Gene Expression Omnibus GLM: Generalized Linear Models GO: Gene Ontology HER2: Human Epidermal Growth Factor Receptor 2 HSPG: Heparan Sulfate Proteoglycan IHC: Immunohistochemistry iCAFs: Inflammatory CAFs KEGG: Kyoto Encyclopedia of Genes and Genomes KNN: K-Nearest Neighbors LDA: Linear Discriminant Analysis LR: Logistic Regression mCAFs: Myofibroblastic CAFs METABRIC: Molecular Taxonomy of Breast Cancer International Consortium NB: Naive Bayes OS: Overall Survival PCA: Principal Component Analysis PR: Precision-Recall ROC: Receiver Operating Characteristic RFE: Recursive Feature Elimination RF: Random Forest RFS: Relapse-Free Survival SULF1: Sulfatase 1 SULF2: Sulfatase 2 SVM: Support Vector Machine TCGA: The Cancer Genome Atlas TC: Treatment Cluster TME: Tumor Microenvironment TNC: Tenascin-C TNR: Tenascin-R TNXB: Tenascin-X TNW: Tenascin-W tSNE: t-Distributed Stochastic Neighbor Embedding TPKM: Transcripts Per Kilobase Million XGB: xgboost Declarations Acknowledgements We express our gratitude to GEO, and the TCGA database, along with all contributors who have shared their codes online. Authors' contributions XZ: Writing – review & editing, Writing – original draft, Validation, Supervision, Project administration, Investigation, Data curation, Conceptualization. Na Wang: Writing – original draft, Resources, Investigation, Formal analysis. Ling Shi: Validation. Xindong Wei: Writing – original draft, Methodology, Investigation, Data curation. Xiaoqin Sun: Resources. Mingxiu Shao: Investigation. Xiaolong Guo: Investigation. Liang Tian: Data curation. Fangyuan Zhang: Data curation. Hui Lyu: Writing – review & editing, Writing – original draft, Validation, Supervision, Project administration, Investigation, Data curation, Conceptualization. Data availability statement The dataset of TCGA-BRCA is available at the TCGA database (https://cancergenome.nih.gov/). We obtained the dataset by using the TCGAbiolinks package in R. The datasets analyzed during the current study are available at GEO: GSE65194, GSE233242, and GSE161529. The raw data generated and/or analyzed during the current study are not publicly available for the IHC results due to ethical and privacy concerns regarding patient samples. However, key findings and analyses based on these data are reported in the manuscript, and any additional information necessary to reproduce the results may be obtained upon reasonable request to the corresponding author. Funding This work received support from the Zibo City Medical and Health Science Research Projects (No. 2023030926), and Zibo Maternal and Child Health Hospital. Ethics approval and consent to participate This study was reviewed and approved by the Ethics Review Committee of Zibo Maternal and Child Health Hospital (approval no. 202106073, data: 2022-06-23). Patient informed consent was obtained as part of surgical consent at the time of surgery for scientific research. The patient's information was kept confidential. Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. References Han, L. et al. LncRNA HOTTIP facilitates the stemness of breast cancer via regulation of miR-148a-3p/WNT1 pathway. J. Cell. Mol. Med. 24 , 6242–6252 (2020). Zuo, S., Yu, J., Pan, H. & Lu, L. Novel insights on targeting ferroptosis in cancer therapy. Biomark. Res. 8 , 1–11 (2020). Granucci, F. The Family of LPS Signal Transducers Increases: the Arrival of Chanzymes. Immunity 48 , 4–6 (2018). Zheng, S. et al. Development and validation of a stromal immune phenotype classifier for predicting immune activity and prognosis in triple-negative breast cancer. Int. J. Cancer 147 , 542–553 (2020). Tamborero, D. et al. A pan-cancer landscape of interactions between solid tumors and infiltrating immune cell populations. Clin. Cancer Res. 24 , 3717–3728 (2018). Lambrechts, D. et al. Phenotype molding of stromal cells in the lung tumor microenvironment. Nat. Med. 24 , 1277–1289 (2018). Auciello, F. R. et al. A stromal lysolipid-autotaxin signaling axis promotes pancreatic tumor progression. 9 , 617–627 (2019). T. Bertero, William. Oldham, E. Grasset, et al. Tumor-stroma mechanics coordinate amino acid availability to sustain tumor growth and malignancy Thomas. Cell Metab 29 , 124–140 (2019). Sahai, E. et al. A framework for advancing our understanding of cancer-associated fibroblasts. Nat. Rev. Cancer 20 , 174–186 (2020). Gagliano, T. et al. PIK3Cδ expression by fibroblasts promotes triplenegative breast cancer progression. J. Clin. Invest. 130 , 3188–3204 (2020). Alcaraz, L. B. et al. A 9-kDa matricellular SPARC fragment released by cathepsin D exhibits pro-tumor activity in the triple-negative breast cancer microenvironment. Theranostics 11 , 6173–6192 (2021). Al-Ansari, M. M., Hendrayani, S. F., Shehata, A. I. & Aboussekhra, A. P16 INK4A Represses the paracrine tumor-promoting effects of breast stromal fibroblasts. Oncogene 32 , 2356–2364 (2013). Yang, P. et al. CAF-derived exosomal WEE2-AS1 facilitates colorectal cancer progression via promoting degradation of MOB1A to inhibit the Hippo pathway. Cell Death Dis. 13 , (2022). Chen, X. & Song, E. Turning foes to friends: targeting cancer-associated fibroblasts. Nat. Rev. Drug Discov. 18 , 99–115 (2019). Cuomo, M. et al. Epigenetic remodelling of Fxyd1 promoters in developing heart and brain tissues. Sci. Rep. 12 , 1–11 (2022). Zhu, W. et al. SULF1 regulates malignant progression of colorectal cancer by modulating ARSH via FAK/PI3K/AKT/mTOR signaling. Cancer Cell Int. 24 , 1–19 (2024). Matsumoto, K. I., Higuchi, T., Umeki, M., Ono, M. & Sakamoto, S. Tenascin-X is increased with decreased expression of miR-378a-5p and miR-486-5p in mice fed a methionine-choline-deficient diet that induces hepatic fibrosis. Biomed. Res. 45 , 67–76 (2024). Kuhn, M. Building predictive models in R using the caret package. J. Stat. Softw. 28 , 1–26 (2008). López-Delgado, J. & Meirmans, P. G. History or demography? Determining the drivers of genetic variation in North American plants. Mol. Ecol. 31 , 1951–1962 (2022). Cerami et al. The cBio Cancer Genomics Portal: An Open Platform for Exploring Multidimensional Cancer Genomics Data. Cancer Discov. 32 , 736–740 (2017). Gao, J. et al. Integrative Analysis of Complex Cancer Genomics and Clinical Profiles Using the cBioPortal. Sci Signal 6 , pl1 (2013). Hao, Y. et al. Integrated analysis of multimodal single-cell data. Cell 184 , 3573-3587.e29 (2021). Guo, W. et al. ScCancer: A package for automated processing of single-cell RNA-seq data in cancer. Brief. Bioinform. 22 , 10–11 (2021). McGinnis, C. S., Murrow, L. M. & Gartner, Z. J. DoubletFinder: Doublet Detection in Single-Cell RNA Sequencing Data Using Artificial Nearest Neighbors. Cell Syst. 8 , 329-337.e4 (2019). Korsunsky, I. et al. Fast, sensitive, and accurate integration of single cell data with Harmony. Nat. Methods 16 , 1289--1296 (2019). Wu, S. Z. et al. A single-cell and spatially resolved atlas of human breast cancers. Nat Genet 53 , 1334–1347 (2022). Pal, B. et al. A single‐cell RNA expression atlas of normal, preneoplastic and tumorigenic states in the human breast. EMBO J. 40 , 1–23 (2021). Fustero-Torre, C. et al. Beyondcell: targeting cancer therapeutic heterogeneity in single-cell RNA-seq data. Genome Med. 13 , 1–15 (2021). Wu, S. Z. et al. A single-cell and spatially resolved atlas of human breast cancers. Nat. Genet. 53 , 1334–1347 (2021). Wu, S. Z. et al. Stromal cell diversity associated with immune evasion in human triple‐negative breast cancer. EMBO J. 39 , 1–20 (2020). Kieffer, Y. et al. Single-cell analysis reveals fibroblast clusters linked to immunotherapy resistance in cancer. Cancer Discov. 10 , 1330–1351 (2020). Haubeiss, S. et al. Dasatinib reverses Cancer-associated Fibroblasts (CAFs) from primary Lung Carcinomas to a Phenotype comparable to that of normal Fibroblasts. Mol. Cancer 9 , 1–8 (2010). Geering, K. et al. FXYD proteins: New tissue- and isoform-specific regulators of Na,K-ATPase. Ann. N. Y. Acad. Sci. 986 , 388–394 (2003). Gao, Q. et al. FXYD6: A novel therapeutic target toward hepatocellular carcinoma. Protein Cell 5 , 532–543 (2014). Zhu, Z. L. et al. Overexpression of FXYD-3 is involved in the tumorigenesis and development of esophageal squamous cell carcinoma. Dis. Markers 35 , 195–202 (2013). Liu, J., Zhou, N. & Zhang, X. A monoclonal antibody against human FXYD6. Hybridoma 30 , 487–490 (2011). Kayed, H. et al. FXYD3 is overexpressed in pancreatic ductal adenocarcinoma and influences pancreatic cancer cell growth. Int. J. Cancer 118 , 43–54 (2006). Bai, Y. et al. A FXYD5/TGF-β/SMAD positive feedback loop drives epithelial-to-mesenchymal transition and promotes tumor growth and metastasis in ovarian cancer. Int. J. Oncol. 56 , 301–314 (2020). Loftås, P. et al. Expression of FXYD-3 is an Independent Prognostic Factor in Rectal Cancer Patients With Preoperative Radiotherapy. Int. J. Radiat. Oncol. Biol. Phys. 75 , 137–142 (2009). Liu, J. et al. Extracellular vesicles-encapsulated let-7i shed from bone mesenchymal stem cells suppress lung cancer via KDM3A/DCLK1/FXYD3 axis. J. Cell. Mol. Med. 25 , 1911–1926 (2021). Wang, L. J. et al. Prognostic significance of sodium-potassium ATPaseregulator, FXYD3, in human hepatocellular carcinoma. Oncol. Lett. 15 , 3024–3030 (2018). Floyd, R. V., Wray, S., Martín-Vasallo, P. & Mobasheri, A. Differential cellular expression of FXYD1 (phospholemman) and FXYD2 (gamma subunit of Na, K-ATPase) in normal human tissues: A study using high density human tissue microarrays. Ann. Anat. 192 , 7–16 (2010). Zhao, E. et al. The roles of FXYD family members in ovarian cancer: an integrated analysis by mining TCGA and GEO databases and functional validations. J. Cancer Res. Clin. Oncol. 149 , 17269–17284 (2023). Ai, X. et al. SULF1 and SULF2 regulate heparan sulfate-mediated GDNF signaling for esophageal innervation. Development 134 , 3327–3338 (2007). Morimoto-Tomita, M. et al. Sulf-2, a proangiogenic heparan sulfate endosulfatase, is upregulated in breast cancer. Neoplasia 7 , 1001–1010 (2005). Fang, X. et al. Cancer associated fibroblasts-derived SULF1 promotes gastric cancer metastasis and CDDP resistance through the TGFBR3-mediated TGF-β signaling pathway. Cell Death Discov. 10 , 1–12 (2024). Liu, C. T. et al. SULF1 inhibits proliferation and invasion of esophageal squamous cell carcinoma cells by decreasing heparin-binding growth factor signaling. Dig. Dis. Sci. 58 , 1256–1263 (2013). Hur, K. et al. Up-regulated expression of sulfatases (SULF1 and SULF2) as prognostic and metastasis predictive markers in human gastric cancer. J. Pathol. 228 , 88–98 (2012). Lai, J. P. et al. SULF1 Inhibits Tumor Growth and Potentiates the Effects of Histone Deacetylase Inhibitors in Hepatocellular Carcinoma. Gastroenterology 130 , 2130–2144 (2006). Ouyang, Q. et al. Loss of ZNF587B and SULF1 contributed to cisplatin resistance in ovarian cancer cell lines based on Genome-scale CRISPR/Cas9 screening. Am. J. Cancer Res. 9 , 988–998 (2019). Brasil da Costa, F. H., Lewis, M. S., Truong, A., Carson, D. D. & Farach-Carson, M. C. SULF1 suppresses Wnt3A-driven growth of bone metastatic prostate cancer in perlecan-modified 3D cancer-stroma-macrophage triculture models. PLoS One 15 , 1–25 (2020). Gill, R. M., Mehra, V., Milford, E. & Dhoot, G. K. Short SULF1/SULF2 splice variants predominate in mammary tumours with a potential to facilitate receptor tyrosine kinase-mediated cell signalling. Histochem. Cell Biol. 146 , 431–444 (2016). Tucker, R. P. et al. Phylogenetic analysis of the tenascin gene family: Evidence of origin early in the chordate lineage. BMC Evol. Biol. 6 , 1–17 (2006). Okuda-Ashitaka, E. & Matsumoto, K. I. Tenascin-X as a causal gene for classical-like Ehlers-Danlos syndrome. Front. Genet. 14 , 1–7 (2023). Valcourt, U., Alcaraz, L. B., Exposito, J. Y., Lethias, C. & Bartholin, L. Tenascin-X: Beyond the architectural function. Cell Adhes. Migr. 9 , 154–165 (2015). Matsumoto, K. I. & Aoki, H. The Roles of Tenascins in Cardiovascular, Inflammatory, and Heritable Connective Tissue Diseases. Front. Immunol. 11 , 1–10 (2020). Minamitani, T., Ariga, H. & Matsumoto, K. I. Adhesive defect in extracellular matrix tenascin-X-null fibroblasts: A possible mechanism of tumor invasion. Biol. Pharm. Bull. 25 , 1472–1475 (2002). Liot, S. et al. Loss of Tenascin-X expression during tumor progression: A new pan-cancer marker. Matrix Biol. Plus 6 – 7 , 6–7 (2020). Archer, M. et al. Immune Regulation of Mammary Fibroblasts and the Impact of Mammographic Density. J. Clin. Med. 11 , (2022). Caligiuri, G. & Tuveson, D. A. Activated fibroblasts in cancer: Perspectives and challenges. Cancer Cell 41 , 434–449 (2023). Additional Declarations No competing interests reported. Supplementary Files SupplementaryFigures.docx SupplementaryTables.xlsx GraphicalAbstract.tiff Cite Share Download PDF Status: Published Journal Publication published 14 Jan, 2026 Read the published version in Scientific Reports → Version 1 posted Editorial decision: Revision requested 23 Sep, 2025 Reviews received at journal 09 Sep, 2025 Reviews received at journal 26 Aug, 2025 Reviewers agreed at journal 23 Aug, 2025 Reviewers agreed at journal 20 Aug, 2025 Reviewers agreed at journal 28 May, 2025 Reviewers invited by journal 28 May, 2025 Editor assigned by journal 28 May, 2025 Editor invited by journal 30 Apr, 2025 Submission checks completed at journal 28 Apr, 2025 First submitted to journal 18 Apr, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-6479762","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":463108117,"identity":"46a749e3-bc1c-413b-90f0-8e39d5c5e978","order_by":0,"name":"Xin Zhou","email":"","orcid":"","institution":"Zibo Maternal and Child Health Hospital","correspondingAuthor":false,"prefix":"","firstName":"Xin","middleName":"","lastName":"Zhou","suffix":""},{"id":463108118,"identity":"3fdab342-845c-42dd-8d64-3f17c5f5d09b","order_by":1,"name":"Na Wang","email":"","orcid":"","institution":"Zibo Maternal and Child Health Hospital","correspondingAuthor":false,"prefix":"","firstName":"Na","middleName":"","lastName":"Wang","suffix":""},{"id":463108119,"identity":"94bf22ff-6918-482d-80ef-b33ecdb785dd","order_by":2,"name":"Ling Shi","email":"","orcid":"","institution":"Zibo Maternal and Child Health Hospital","correspondingAuthor":false,"prefix":"","firstName":"Ling","middleName":"","lastName":"Shi","suffix":""},{"id":463108120,"identity":"02d02700-1e4c-4093-84da-2ab5ad227ff5","order_by":3,"name":"Dongxin Wei","email":"","orcid":"","institution":"Zibo Maternal and Child Health Hospital","correspondingAuthor":false,"prefix":"","firstName":"Dongxin","middleName":"","lastName":"Wei","suffix":""},{"id":463108121,"identity":"7b4c2a65-456f-4890-a3f8-d63f7657d587","order_by":4,"name":"Xiaoqin Sun","email":"","orcid":"","institution":"Zibo Maternal and Child Health Hospital","correspondingAuthor":false,"prefix":"","firstName":"Xiaoqin","middleName":"","lastName":"Sun","suffix":""},{"id":463108122,"identity":"305c4073-ac1d-414d-81c8-34ab00f23c66","order_by":5,"name":"Mingxiu Shao","email":"","orcid":"","institution":"Zibo Maternal and Child Health Hospital","correspondingAuthor":false,"prefix":"","firstName":"Mingxiu","middleName":"","lastName":"Shao","suffix":""},{"id":463108123,"identity":"5b6bff8e-660d-4742-8667-6a3aaaacc562","order_by":6,"name":"Liang Tian","email":"","orcid":"","institution":"Zibo Maternal and Child Health Hospital","correspondingAuthor":false,"prefix":"","firstName":"Liang","middleName":"","lastName":"Tian","suffix":""},{"id":463108124,"identity":"1cd70a87-3f05-40a1-b931-01a52b97c9b0","order_by":7,"name":"Xiaolong Guo","email":"","orcid":"","institution":"Zibo Maternal and Child Health Hospital","correspondingAuthor":false,"prefix":"","firstName":"Xiaolong","middleName":"","lastName":"Guo","suffix":""},{"id":463108125,"identity":"465257cf-92ea-4df5-8b64-7e2dbb213fc3","order_by":8,"name":"Fangyuan Zhang","email":"","orcid":"","institution":"Zibo Maternal and Child Health Hospital","correspondingAuthor":false,"prefix":"","firstName":"Fangyuan","middleName":"","lastName":"Zhang","suffix":""},{"id":463108126,"identity":"1c175927-3335-421d-9bfa-2bf19a5aa473","order_by":9,"name":"Hui Lyu","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAz0lEQVRIiWNgGAWjYBACAwYGNiApIcfG3tj44IOBjR2xWmyM+XkOHzacUZCWTKQWhrTEmTPS0oR5PhxibCCoRSLH7DFPwWFjgzNnzJhtDA4wM7AfPrqBgBZzYx6Dw3IGx3vMHucY3OFj4ElLu0HIFmmgFpAt5sY5Bs+YGSR4zIjSkrjhBpBhYXCYsYFILRDvSzMQpYXnWZnkHFgg9xikJbMR8ot9e/I2iTd/oFH544+NHT/74WN4tTAIJKAJsOFVDgL8BwgqGQWjYBSMgpEOAI0tSDpNq2LiAAAAAElFTkSuQmCC","orcid":"","institution":"Zibo Maternal and Child Health Hospital","correspondingAuthor":true,"prefix":"","firstName":"Hui","middleName":"","lastName":"Lyu","suffix":""}],"badges":[],"createdAt":"2025-04-18 14:38:15","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-6479762/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-6479762/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1038/s41598-025-34923-2","type":"published","date":"2026-01-14T16:29:39+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":83610660,"identity":"88da4e85-6b81-4579-9392-fe0bf2d54fde","added_by":"auto","created_at":"2025-05-29 12:10:02","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":4462676,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eFeature Selection by Machine Learning\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e(A) Integration of CAFs-associated gene lists from 3 studies with TCGA breast cancer DEGs revealed 28 candidate CAFs markers. (B) Among six evaluated feature selection methods, random forest emerged as the optimal choice, achieving high accuracy and stability; it also determined the optimal number of variables to be 3. (C) The Sankey diagram presents the optimal variables selected by different algorithms during the feature selection process. (D-E) Boxplots display expressions of feature genes in GSE65194 and GSE233242, respectively. Both datasets underwent PCA and tSNE analyses, effectively differentiating cancer from normal breast tissues.\u003c/p\u003e","description":"","filename":"Fig1.png","url":"https://assets-eu.researchsquare.com/files/rs-6479762/v1/c4c7317e6be23f0e2e9d2a53.png"},{"id":83610661,"identity":"08c3e649-0bce-4c54-8355-b17f29eed4f7","added_by":"auto","created_at":"2025-05-29 12:10:02","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":4542160,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eDiagnostic Models' Performance Across Six Machine Learning Algorithms on External Validation Sets.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e(A) Confusion matrices for six machine learning algorithms on GSE65194 and GSE233242 datasets. ROC Curves (B) and PR-RPC Curves (C) of Different Models. Mean absolute SHAP values and detailed SHAP values of feature genes in GSE65194 (D) and GSE233242 (E).\u003c/p\u003e","description":"","filename":"fig2.png","url":"https://assets-eu.researchsquare.com/files/rs-6479762/v1/db5ecc2f8abaf5dca94ab9e8.png"},{"id":83610669,"identity":"ee95568e-8aab-47dd-8cd2-6b23ab666da9","added_by":"auto","created_at":"2025-05-29 12:10:03","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":22995594,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eStroma Cell Heterogeneity.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e(A) Stroma cells can be categorized into three major subpopulations: endothelial cells, pericytes, and fibroblasts. (B) Distributions of normal and cancerous samples across stroma subpopulations. (C) Distributions of different types of breast cancer across stroma subpopulations. (D) The stroma cell subpopulations were further refined based on the distributions of benign and malignant samples. UMAP plots (E) and violin plots (F) show the expression levels of selected genes. (G) Fibroblasts were extracted and subjected to dimensional reduction and clustering, yielding normal fibroblasts, iCAFs, and mCAFs, UMAP plots and violin plots show the expression levels of selected genes. (H) Trajectory reconstruction illuminates the differentiation process of fibroblasts in breast cancer. (I) Feature genes show varied expression patterns during the trajectory reconstruction. (G-K) CAFs exhibit heterogeneity in breast cancer. (L) Dot plots show the expression levels of selected genes across different CAFs subpopulations.\u003c/p\u003e","description":"","filename":"fig3.png","url":"https://assets-eu.researchsquare.com/files/rs-6479762/v1/4f0251f5f94ba8619a9a09ca.png"},{"id":83610957,"identity":"f07cb3ee-fafb-4298-920d-84d696024738","added_by":"auto","created_at":"2025-05-29 12:18:02","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":9667764,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eFunctional Analysis.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e(A) GO and KEGG analyses were performed using marker genes of normal fibroblasts, iCAFs, and mCAFs, respectively showing the top 10 enriched outcomes for each subpopulation. (B) GO and KEGG analyses were performed using marker genes of different CAFs, respectively showing the top 10 enriched outcomes for each subpopulation.\u003c/p\u003e","description":"","filename":"fig4.png","url":"https://assets-eu.researchsquare.com/files/rs-6479762/v1/1916c949888f4e46af2ebc18.png"},{"id":83610665,"identity":"b67dff7b-8e90-4beb-83f1-03ad13fbd028","added_by":"auto","created_at":"2025-05-29 12:10:02","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":32927406,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eDrug Sensitivity Prediction for Luminal and Non-Luminal CAFs.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eLuminal and non-Luminal CAFs were subjected to dimensional reduction and clustering respectively, based on Beyondcell scores of individual cells. (A) UMAP plots present two views: the left depicting TC clusters specific to Luminal CAFs, the right illustrating the distribution patterns of Luminal iCAFs and Luminal mCAFs. (B) Top differential sensitivity drugs of Luminal iCAFs\u003c/p\u003e\n\u003cp\u003eand mCAFs. (C) Intersection of high sensitivity drugs for Luminal mCAFs and low sensitivity drugs for Luminal iCAFs. (D) UMAP plots show top 5 high sensitivity drugs for Luminal mCAFs. (E) Intersection of high sensitivity drugs for Luminal iCAFs and low sensitivity drugs for Luminal mCAFs. (F) UMAP plots show top 5 high sensitivity drugs for Luminal iCAFs. (G) UMAP plots present two views: the left depicting TC clusters specific to non-Luminal CAFs, the right illustrating the distribution patterns of non-Luminal iCAFs and non-Luminal mCAFs. (H) Top differential sensitivity drugs of Luminal iCAFs\u003c/p\u003e\n\u003cp\u003eand mCAFs. (I) Intersection of high sensitivity drugs for non-Luminal mCAFs and low sensitivity drugs for non-Luminal iCAFs. (J) UMAP plots show top 5 high sensitivity drugs for non-Luminal mCAFs. (K) Intersection of high sensitivity drugs for non-Luminal iCAFs and low sensitivity drugs for non-Luminal mCAFs. (L) UMAP plots show top 5 high sensitivity drugs for non-Luminal iCAFs.\u003c/p\u003e","description":"","filename":"fig5.png","url":"https://assets-eu.researchsquare.com/files/rs-6479762/v1/fbfecbd1a3167ec69b82de1e.png"},{"id":83610667,"identity":"61b54879-fee6-4e0d-bf66-7a87c5528927","added_by":"auto","created_at":"2025-05-29 12:10:03","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":27064235,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eTwo Consecutive Pathological Slides Intuitively Demonstrate the Distinct Expression Patterns of FXYD1 and SULF1.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e(A) The expression of FXYD1 in myoepithelial cells. (B) The expression of FXYD1 in fibroblasts. (C) The expression of SULF1 in fibroblasts. (D) The relationship between SULF1 and different clinicopathological characteristics, including tissue type, age, T stage, N stage, ER status, PR status, and Her2 status.\u003c/p\u003e","description":"","filename":"fig6.png","url":"https://assets-eu.researchsquare.com/files/rs-6479762/v1/36375a68780c5a10e2fc18ff.png"},{"id":83610958,"identity":"74fd5ecc-4f9b-4d75-894e-e1d848e522f9","added_by":"auto","created_at":"2025-05-29 12:18:03","extension":"png","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":18790521,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eA Comparison of the IHC Results for FXYD1 and SULF1 with Other Markers in Normal Breast Tissues.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eα-SMA (A), P63 (B), and Calponin (C) are markers of myoepithelial cells. α-SMA (A), and Vimentin (D) are markers of CAFs. The expression of FXYD1 (E) is observed in myoepithelial cells and fibroblasts. The expression of SULF1 (F) is nearly undetectable in normal mammary tissue.\u003c/p\u003e","description":"","filename":"fig7.png","url":"https://assets-eu.researchsquare.com/files/rs-6479762/v1/25cf302beb352a114541bdde.png"},{"id":83610670,"identity":"c04ac5f5-64c6-48f8-8672-344aee93304c","added_by":"auto","created_at":"2025-05-29 12:10:03","extension":"png","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":28817656,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eA Comparison of the IHC Results for FXYD1 and SULF1 with Other Markers in Malignant Breast Tissues.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eIn carcinoma in situ (A), FXYD1 expression is notably absent in myoepithelial cells and markedly downregulated in fibroblasts, whereas SULF1 expression undergoes upregulation in fibroblasts. As the disease progresses to invasive carcinoma (B), the absence of FXYD1 expression remains consistent, and the upward trend in SULF1 expression persists. Compared to other markers, these findings provide a more intuitive understanding of the expression patterns.\u003c/p\u003e","description":"","filename":"fig8.png","url":"https://assets-eu.researchsquare.com/files/rs-6479762/v1/285ee6fb3249584665728006.png"},{"id":100614854,"identity":"4aa02419-9202-43a3-9181-fe731fe77ba3","added_by":"auto","created_at":"2026-01-19 17:26:26","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":136714208,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6479762/v1/3aaaf2e4-a25a-4d68-8725-8453eb63cdb5.pdf"},{"id":83610671,"identity":"c50aafd1-d830-4e80-8457-5d4a3f865900","added_by":"auto","created_at":"2025-05-29 12:10:03","extension":"docx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":6156765,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryFigures.docx","url":"https://assets-eu.researchsquare.com/files/rs-6479762/v1/bb991a1a4206d09986fb24b0.docx"},{"id":83610662,"identity":"488ab711-01f3-4cd4-92c1-06c9f25eb679","added_by":"auto","created_at":"2025-05-29 12:10:02","extension":"xlsx","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":91932,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryTables.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-6479762/v1/19617f4dfd29497ff5293582.xlsx"},{"id":83610663,"identity":"f08071ff-43d4-44fc-83f0-6dad99149586","added_by":"auto","created_at":"2025-05-29 12:10:02","extension":"tiff","order_by":3,"title":"","display":"","copyAsset":false,"role":"supplement","size":919257,"visible":true,"origin":"","legend":"","description":"","filename":"GraphicalAbstract.tiff","url":"https://assets-eu.researchsquare.com/files/rs-6479762/v1/13a2b36441f272720abe6f0d.tiff"}],"financialInterests":"No competing interests reported.","formattedTitle":"Identification and Validation of Novel CAF Markers in Breast Cancer","fulltext":[{"header":"Introduction ","content":"\u003cp\u003eBreast cancer, a prevalent malignancy among women globally, continues to exhibit high incidence and mortality rates, posing a significant public health challenge\u003csup\u003e1,2\u003c/sup\u003e. The tumor microenvironment (TME) serves as the fertile ground for tumor cell development and progression, supported by numerous studies\u003csup\u003e3–5\u003c/sup\u003e. Within this intricate ecosystem, cells can be broadly categorized into immune cells and stromal cells. Among stromal cells, a growing body of evidence underscores the pivotal role of specific subsets, particularly cancer-associated fibroblasts (CAFs), in tumor progression\u003csup\u003e6–8\u003c/sup\u003e. CAFs have emerged as a central player, with multiple studies elucidating their essential functions in cancer proliferation, advancement, and invasion\u003csup\u003e9\u003c/sup\u003e. Existing research demonstrates that CAFs interact intimately with cancer cells and play a crucial role in mediating and facilitating the metastasis of breast cancer.\u003csup\u003e10–12\u003c/sup\u003e.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThe emerging evidence reveals that the paradigms of cancer-centric therapeutics have limited therapeutic options in the clinic\u003csup\u003e13\u003c/sup\u003e. Consequently, there is a pressing necessity for a deeper exploration of CAF heterogeneity. Current research endeavors aimed at CAF classification and marker identification, though ongoing, remain limited in scope, with minimal translation into clinical practice\u003csup\u003e14\u003c/sup\u003e. This study aimed to identify potential novel CAF-related markers through the application of advanced machine learning algorithms to both single-cell and bulk datasets. Consequently, we have identified three signature genes: FXYD1 (fxyd domain-containing transport regulator 1), SULF1, and TNXB, which have received limited attention in breast cancer research to date.\u003c/p\u003e\n\u003cp\u003eFXYD1, a crucial regulator of ion channel transport, encodes the phospholemman (PLM) protein, which plays a vital role in heart and brain tissue \u003csup\u003e15\u003c/sup\u003e. Given its importance in these critical systems, its potential involvement in breast cancer pathogenesis merits further examination. SULF1, a sulfatase enzyme, modulates tumor development by influencing the binding affinity of cell surface heparan sulfate proteoglycans\u003csup\u003e16\u003c/sup\u003e. Similarly, TNXB, an extracellular matrix protein, contributes to collagen network assembly and tissue integrity\u003csup\u003e17\u003c/sup\u003e. This study delves into the mechanisms of these genes in breast cancer initiation and progression through comprehensive analyses that encompass gene expression patterns, copy number alterations (CNAs), functional evaluations, and drug sensitivity predictions. Each of these components contributes to a holistic understanding of the genes' roles in breast cancer development.\u003c/p\u003e\n\u003cp\u003eImmunohistochemical validation of these markers in both benign and malignant breast tissue samples provided a robust theoretical basis for advancing the diagnosis of breast cancer. However, to translate these groundbreaking discoveries into tangible clinical benefits, further rigorous clinical validation was imperative. Additionally, this study presented novel insights into CAF heterogeneity, uncovering promising avenues for the tailored development of therapeutic strategies.\u003c/p\u003e\n\u003cp\u003eWe anticipate that our findings will establish a solid scientific foundation for earlier diagnosis, more accurate prognosis assessment, and the accelerated development of personalized therapeutic strategies. Ultimately, by elucidating the expression patterns of FXYD1, SULF1, and TNXB in breast cancer progression, we aim to enhance patient outcomes and quality of life, thereby paving the way for the development of targeted therapies that can more effectively address this devastating disease.\u003c/p\u003e"},{"header":"Results","content":"\u003cp\u003e\u003cstrong\u003e1.\u0026nbsp;\u0026nbsp;Identification of feature genes of CAFs in breast cancer by machine learning\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThree previous single-cell studies\u003csup\u003e29\u0026ndash;31\u003c/sup\u003e, each providing a unique list of genes associated with CAFs (cancer-associated fibroblasts) in breast cancer, were integrated into our analysis. We first conducted an intersection operation among these three lists, identifying the genes that were consistently reported across all studies. Subsequently, to further refine our candidate gene list, we intersected this consensus gene set with differentially expressed genes (DEGs) between breast cancer and normal breast tissues. Notably, these DEGs were downloaded from the GEPIA2 website. This dual intersection approach allowed us to identify 28 highly promising candidate genes (Figure 1A, Supplementary Table 2).\u003c/p\u003e\n\u003cp\u003eNext, we used the caret package to perform feature selection on the TCGA-Train dataset. After evaluating the performance of six built-in feature selection methods, the random forest algorithm emerged as the most suitable due to its superior classification accuracy and stability, determining the optimal number of variables to be 3 (Figure 1B). To gain a deeper understanding of the results, we visually demonstrated them in the form of a Sankey diagram (Figure 1C) and tables (Supplementary Tables 3-4), which clearly showed the process of feature selection.\u003c/p\u003e\n\u003cp\u003eIn-depth analysis of the TCGA datasets revealed that FXYD1 and TNXB were significantly downregulated in breast cancer tissues, while SULF1 was significantly upregulated (Supplementary Figure 1). This abnormal expression pattern strongly suggests the potential key roles of these three genes in the development of breast cancer.\u003c/p\u003e\n\u003cp\u003eTo further validate the effectiveness of these feature genes in distinguishing cancer from normal breast tissues, we performed PCA and tSNE dimensionality reduction analyses. The results were compelling: Cancer tissues were effectively differentiated from normal breast tissues based on the expression profiles of these genes (Supplementary Figure 1).\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eAdditionally, we conducted external validations using two independent datasets, GSE65194 and GSE233242. The results revealed consistent results with the TCGA dataset (Figure 1D-E), further reinforcing our findings. Subsequently, we conducted an extensive exploration of the clinical and prognostic implications of these three genes across diverse datasets utilizing the BEST portal. It revealed significant correlations between the expression of FXYD1 and TNXB and breast cancer grade, with both genes exhibiting decreased mRNA levels as the grade increased. However, no significant associations were found between the three genes and patient outcomes, including overall survival (OS), disease-free survival (DFS), relapse-free survival (RFS), disease-specific survival (DSS), and progression-free survival (PFS). To avoid redundancy within this paper, detailed results are available at the BEST website.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e2.\u0026nbsp;\u0026nbsp;Model construction and comparison of diagnostic performance.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eUsing FYXD1, SULF1, and TNXB as feature genes, we establish diagnostic models utilizing various algorithms to distinguish between breast cancer and normal tissue. The performance of the different models across the datasets is summarized in Supplementary Table 5. Across the internal datasets, all models exhibit robust performance, achieving AUC and accuracy scores exceeding 0.9. Notably, the RF model stands out, demonstrating particularly significant performance (Supplementary Figure 2A-B and Supplementary Table 5). Specifically, on the TCGA-test and TCGA-all datasets, the RF model achieves AUC values of 0.9941 and 0.9944, respectively, and accuracy values of 0.9655 and 0.9319. The high true positive and true negative rates in the confusion matrices further validate its excellent diagnostic capability (Supplementary Figure 2A). The SVM and XGB models also performed well on the testing datsets. However, the GLM and NB models slightly lag behind on the internal validation datsets (Supplementary Figure 2A-B and Supplementary Table 5). Importantly, even in the imbalanced TCGA-all dataset, where cancer samples significantly outnumber normal ones, all models demonstrate exceptional ability in identifying minority class samples (i.e., normal tissue), as evidenced by the high PR-AUC values (Supplementary Figure 2C). This finding underscores the robustness of our models in handling imbalanced data.\u003c/p\u003e\n\u003cp\u003eWhen applying these models to the external validation datsets GSE233242 and GSE65194, results varied. On GSE233242, the AUC and accuracy of the RF model decreased slightly but remained within an acceptable range (AUC=0.8732, accuracy=0.6744). In contrast, the SVM and KNN models saw significant decline in performance, almost losing their predictive power (Figure 2A-B, Supplementary Table 5). On GSE65194, the RF model maintained its superior performance (AUC=0.904, accuracy=0.9085), while the GLM and KNN models showed notable improvements. However, the SVM model struggled to maintain its initial performance. Notably, despite achieving a high true positive rate on GSE65194, the NB model had an extremely limited ability to recognize normal samples (correctly identifying only one case) (Figure 2A, Supplementary Table 5). Based on the PR-AUC values from both external datasets, the RF model remained the top performer (Figure 2C).\u003c/p\u003e\n\u003cp\u003eUpon comprehensive analysis of these results, the RF model not only excelled on internal datasets but also demonstrated robust generalization ability on external validation sets, further emphasizing the pivotal role of FYXD1, SULF1, and TNXB as feature genes in breast cancer diagnosis. Moving beyond the assessment of the model\u0026apos;s performance, we conducted an in-depth analysis of Shapley Additive exPlanations (SHAP) values, revealing variations in the importance of these feature genes across the diverse datasets. In contrast to TNXB\u0026apos;s consistent prominence in both the testing and internal validation sets, FYXD1 showed a distinct lead in the rankings of the two external validation sets (Figure 2D, 2E; Supplementary Figure 2D, 2E). We primarily attributed this discrepancy to the inherent diversity in data distributions across different datasets. Having explored the diagnostic potential of these three feature genes to some extent, we intend to further investigate their potential in subsequent sections of this paper.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e3.\u003c/strong\u003e \u003cstrong\u003eCNA analysis\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eIn the TCGA database, a comprehensive analysis was conducted on the variations of the three feature genes: FXYD1, SULF1, and TNXB. Specifically, FXYD1 alterations were observed in 44 samples (approximately 5% of the cohort), primarily manifesting as amplification (2.5%, n=24) and mRNA high (1.67%, n=16) (Supplementary Figure 3A-B). Likewise, SULF1 exhibited a broader spectrum of variations across 114 samples (approximately 12%), with amplification being the most prevalent (9.27%, n=89), reinforcing its potential significance in tumorigenesis and progression (Supplementary Figure 3A-B). The variation landscape of TNXB was comparatively intricate, with alterations detected in 45 samples (approximately 5%), encompassing mutation (0.94%, n=9), amplification (1.04%, n=10), mRNA high (1.15%, n=12), mRNA low (1.15%, n=11), and multiple alterations (0.31%, n=3) (Supplementary Figure 3A-B). These diverse variation patterns may mirror the multifaceted roles played by TNXB in tumor biology.\u003c/p\u003e\n\u003cp\u003eShifting attention to the METABRIC dataset, we observed similar yet distinct trends. Variations in FXYD1 were detected in 104 samples (approximately 6%), with amplification (1.93%, n=36) and mRNA high (3.59%, n=67) remaining the predominant forms (Supplementary Figure 4A-B). Notably, the frequency of SULF1 variations significantly increased, observed in 384 samples (approximately 21%), with amplification accounting for the vast majority (15.86%, n=296), further corroborating the high prevalence of SULF1 variations in breast cancer (Supplementary Figure 4A-B). The variation pattern of TNXB in the METABRIC dataset mirrored that in TCGA, but with distinct numerical specifics\u0026mdash;specifically, amplification was observed in 0.86% of samples (n=16), mRNA high in 2.84% (n=53), and mRNA low in 1.23% (n=23) (Supplementary Figure 4A-B).\u003c/p\u003e\n\u003cp\u003eIn terms of survival analysis, no significant associations were observed between genetic alterations in all feature genes and either OS or RFS in the TCGA dataset (Supplementary Figure 3C-D). However, interestingly, in the METABRIC dataset, genetic alterations in FXYD1 were significantly associated with improved OS; specifically, the altered group exhibited significantly better prognosis compared to the non-altered group, suggesting a potentially favorable prognostic effect of FXYD1 variations (Supplementary Figure 4C-D). In contrast, SULF1 was negatively correlated with DFS, with the non-altered group faring better, which may be attributed to the promoting role of SULF1 in tumor progression (Supplementary Figure 4C-D).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e4.\u0026nbsp;\u003c/strong\u003e\u003cstrong\u003eThe expression patterns of feature genes at the single-cell resolution\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eIn this section, we executed a series of systematic strategies for cell classification. Initially, based on the specific expression patterns of EPCAM and PTPRC, cells were categorized into three distinct groups: epithelial cells, immune cells, and a non-specific stroma cell population (Supplementary Figure 5). UMAP plots show the expression patterns of three feature genes within the stroma cells: Specifically, FXYD1 expression was significantly downregulated in cancerous tissues compared to adjacent normal tissues; conversely, SULF1 expression was markedly upregulated; whereas TNXB did not display any notable difference in expression levels between cancerous and normal tissues, thereby providing essential insights for our subsequent investigations (Supplementary Figure 5).\u003c/p\u003e\n\u003cp\u003eTo comprehensively unravel the heterogeneity of stroma cells, we conducted an extensive secondary clustering analysis, refining them into three major subpopulations: EPCAM1+ endothelial cells, RGS5+ pericytes, and PDGFRA+ fibroblasts (Figure 3A). Notably, although endothelial cells from both normal and cancerous tissues exhibited overlap in their expression profiles, posing a challenge for clear distinction, pericytes and fibroblasts could be distinctly categorized based on tissue type (Figure 3B). Within the stromal cell subpopulations, cancer cells from different types of breast cancer were intermixed, lacking distinct subpopulation differentiation or heterogeneity (Figure 3C). Through further refined clustering analysis, we segmented the stroma cells into five specific subpopulations: endothelial cells; normal pericytes; cancer pericytes; normal fibroblasts; and CAFs (Figure 3D). Specifically, FXYD1 was predominantly expressed in normal fibroblasts and, to a lesser degree, in pericytes; SULF1 was enriched primarily in CAFs; and TNXB was expressed in both normal fibroblasts and CAFs (Figure 3E-F).\u003c/p\u003e\n\u003cp\u003eTo gain a deeper understanding of the molecular mechanisms underlying the transformation of fibroblasts into CAFs, we further subdivided the fibroblast population, distinguishing three key subpopulations: normal fibroblasts; myofibroblastic CAFs (mCAFs), marked by ACTA2 expression; and inflammatory CAFs (iCAFs), characterized by CXCL14 expression. Notably, we found that FXYD1 and TNXB were more prominently expressed in iCAFs, suggesting a potential link to their inflammatory regulatory roles within the tumor microenvironment. Conversely, SULF1 was preferentially enriched in mCAFs, indicating its pivotal role in the development of myofibroblastic CAFs (Figure 3G).\u003c/p\u003e\n\u003cp\u003eTo dynamically simulate the transition from normal fibroblasts to CAFs, we employed advanced pseudotime analysis techniques. Our findings indicate that mCAFs occupy the terminal stage of development. Notably, during this transition, FXYD1 expression gradually diminishes, which may correlate with the loss of certain functions as fibroblasts transform into CAFs. Conversely, TNXB expression exhibits an initial surge followed by a decline, mirroring the dynamic shifts in extracellular matrix remodeling that accompany the transition. Furthermore, SULF1 expression consistently intensifies, emphasizing its central role in CAF development and functional preservation (Figure 3H-I). Upon deeper exploration of CAF subdivision, we observed marked heterogeneity between Luminal and non-Luminal breast cancer CAF populations (Figure 3J), allowing for their classification into four distinct subgroups: Luminal iCAFs, Luminal mCAFs, non-Luminal iCAFs, and non-Luminal mCAFs (Figure 3K). Finally, a bubble plot visually represents the expression profiles of ACTA2, CXCL14, and CAF-specific genes across these diverse subpopulations, revealing their unique expression signatures within the CAF subgroups (Figure 3L).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e5. Functional analysis\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThrough GO and KEGG analyses, we have unraveled the unique biological functions and pathways associated with different types of fibroblasts in breast cancer. For the functional analysis, we present only the top 10 results (Figure 4), with marker genes for each fibroblast type listed in Supplementary Table 6 and 7. Our analysis underscores the complexity of CAFs\u0026apos; roles within the tumor microenvironment. Specifically, mCAFs play a pivotal role in extracellular matrix remodeling and nutritional support, whereas normal fibroblasts are intimately linked to immune responses and inflammatory processes, potentially maintaining immune homeostasis via signaling pathways such as IL-17 and TNF. iCAFs play a pivotal role in regulating inflammation, immune responses, and cellular signaling, crucial for both physiological homeostasis and pathological conditions. Upon further examination, distinct functional characteristics between Luminal mCAFs and non-Luminal mCAFs have been discerned regarding protein synthesis and immune modulation. Notably, Luminal mCAFs exhibit significant enrichment in pathways related to ribosomal function, emphasizing their crucial role in protein synthesis. Conversely, non-Luminal mCAFs demonstrate greater enrichment in pathways associated with autoimmune diseases and pathogen infections, suggesting unique functions in immune regulation and resistance to infections. Regarding iCAFs, Luminal iCAFs are prominently associated with inflammation- and tumor-related signaling pathways, indicating their pro-inflammatory and pro-tumorigenic effects within the tumor microenvironment. Meanwhile, non-Luminal iCAFs are enriched in pathways linked to complement and coagulation cascades, as well as cytokine-receptor interactions, highlighting their significant roles in regulating inflammatory responses and blood coagulation. These results underscore not only the functional diversity of CAFs in the cancer microenvironment but also pave the way for novel research avenues and potential therapeutic interventions.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e6.\u0026nbsp;\u0026nbsp;Drug sensitivity prediction\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eBased on BCs, we successfully subdivided Luminal CAFs into four subgroups (TC0 to TC3) and non-Luminal CAFs into three subgroups (TC0 to TC2). However, due to the scarcity of TC2 subgroup cells in non-Luminal CAFs, we excluded the analysis results for this subgroup. To provide an intuitive illustration, we employed UMAP dimensionality reduction plots to showcase the distribution of these distinct CAF types in the reduced space (Figures 5A, 5G). Additionally, regarding drug sensitivity prediction, we conducted detailed calculations for TCs and CAFs classifications. The detailed information of TOP Differential High Sensitivity Drugs across all classifications is listed in Supplementary Tables 8-11.\u003c/p\u003e\n\u003cp\u003eFirstly, concerning the drug sensitivity prediction results for TC classifications, we present the top 5 differential high sensitivity drugs in each TC cluster through volcano plots (Supplementary Figure 6A and Supplementary Figure 6C). Furthermore, the UMAP plots show the distribution of cells sensitive to the respective top differential high sensitivity drug for each TC cluster (Supplementary Figure 6B and Supplementary Figure 6D). Specifically, in Luminal CAFs, TC0 was most sensitive to GSK-J4, TC1 to SCH-900776, TC2 to TENIPOSIDE, and TC3 to GSK525762A. For non-Luminal CAFs, TC0 favored AZD8055, while TC1 preferred SORAFENIB. Notably, the distribution patterns of these drug-sensitive cells were highly consistent with the TC classifications.\u003c/p\u003e\n\u003cp\u003eNext, our findings provide important insights into CAF heterogeneity, providing robust theoretical support for the development of targeted therapeutic strategies aimed at specific CAF subgroups, but also significantly deepening our understanding of this complex phenomenon. Subsequently, we focused our efforts on predicting drug sensitivity within various CAF classifications. The volcano plots revealed the top five drugs with differential high sensitivity for each CAF classification (Figures 5B, 5H). Intriguingly, we discovered that drugs that are sensitive to mCAFs tend to be insensitive to iCAFs, and conversely, drugs that are sensitive to iCAFs are often insensitive to mCAFs (Figures 5C, 5E, 5I, and 5K), this finding offers a novel perspective on CAF heterogeneity. In Luminal CAFs, mCAFs exhibited sensitivity to drugs such as DASATINIB and SKI-II (Figures 5D), whereas iCAFs responded more favorably to ENTINOSTAT and MUBRITINIB (Figures 5F). For non-Luminal CAFs, a similar pattern of distinct drug sensitivity between mCAFs and iCAFs was observed (Figures 5J). Notably, DASATINIB and SKI-II played pivotal roles in both Luminal and non-Luminal mCAFs (Figures 5D and 5J), whereas MUBRITINIB demonstrated high sensitivity specifically to iCAFs (Figures 5F and 5L). Additionally, it is noteworthy that a previous study has validated the efficacy of DASATINIB in inhibiting CAFs\u003csup\u003e32\u003c/sup\u003e, thereby further enhancing the credibility of our drug sensitivity prediction results.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e7.\u0026nbsp;\u0026nbsp;Verification of the expression patterns of feature genes by IHC\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe have thoroughly analyzed the IHC results and made several significant discoveries. In normal breast tissue, FXYD1 protein is predominantly located in myoepithelial and stromal cells. (Supplementary Figure 7A-B). However, within carcinoma in situ, a marked decline in FXYD1 expression is observed within these cells, with occasional expression noted in the peritumoral stromal area. (Supplementary Figure 7A-B). Notably, in contrast to these findings, the expression of FXYD1 is completely absent in the invasive carcinoma cases (Supplementary Figure 7A-B). In fibroadenomas, the proliferation of fibrous tissue specifically leads to an enhanced expression of FXYD1 in fibroblasts, which is clearly visible under examination (Supplementary Figure 8A). By analyzing a pathological slide encompassing normal breast tissue, carcinoma in situ, and invasive carcinoma (Supplementary Figure 9A), we directly observe variations in FXYD1 expression across these distinct pathological stages. The gradual loss of FXYD1 expression pattern suggests a possible suppressive role in breast tumor progression. Upon investigation of SULF1 protein, we have observed a markedly elevated expression level in stromal fibroblasts within cancer tissue, in comparison to normal breast tissue and fibroadenomas (Supplementary Figure 7C and Supplementary Figure 8B). Our analysis, however, did not uncover any significant correlation between SULF1 expression and various clinicopathological features (Supplementary Figures 7D). Similarly, Figure 9B presents another pathological slide showcasing normal breast tissue, carcinoma in situ, and invasive carcinoma. Upon examination, this slide reveals variations in the level of SULF1 expression across these distinct pathological stages. Notably, the distinct expression pattern of SULF1 hints at a potentially significant contribution to the formation and modulation of the breast tumor microenvironment. However, no significant difference in TNXB expression was observed in benign and malignant breast tissues (Supplementary Figure 10). This finding highlights the need for further investigation into the potential mechanisms underlying the regulation of TNXB expression in the context of breast cancer development. We have selected two consecutive histological sections of breast cancer tissue to demonstrate the distinct expression patterns of FXYD1 and SULF1. Specifically, the expression levels of FXYD1 protein are notably higher in normal breast tissue compared to those in cancerous tissue (Figure 6A), whereas SULF1 expression levels are more pronounced in cancerous tissue (Figure 6B). Notably, FXYD1 is expressed not only within myoepithelial cells but also prominently on vascular walls. Based on our previous single-cell analysis, which provided insights into the cellular distribution of FXYD1, we hypothesize that FXYD1 may also be expressed in perivascular cells (Figure 6A and Figure 3E). To attain a deeper comprehension of the expression patterns exhibited by FXYD1 and SULF1, we compared the expression patterns of FXYD1 and SULF1 with some important protein markers. The use of \u0026alpha;-SMA (Figure 7A), P63 (Figure 7B), and Calponin (Figure 7C) as markers for identifying myoepithelial cells is consistent and reliable in the daily practice of clinical pathological diagnosis. On the other hand, \u0026alpha;-SMA and Vimentin are known as markers of CAFs. Upon close comparison, it is evident that in normal breast tissue, similar to Vimentin (Figure 7D), FXYD1 is expressed in both myoepithelial cells and stromal fibroblasts (Figure 7E). Notably, however, SULF1 expression is scarcely detectable within the stromal compartment of normal breast tissue (Figure 7F). As the tissue transitions towards malignancy, resulting in carcinoma in situ, markers like P63, Calponin, \u0026alpha;-SMA, and Vimentin continue to be expressed in myoepithelial cells; additionally, Vimentin expression is markedly elevated in fibroblasts (Figure 8A). Conversely, FXYD1 expression is drastically reduced, becoming absent in both myoepithelial cells and fibroblasts. This inverse trend is observed with SULF1, whose expression increases in the stromal compartment, suggesting a potential tumor-suppressive function for FXYD1 and a tumor-promoting role for SULF1 during early tumorigenesis. In the context of invasive carcinoma, the persistent absence of FXYD1 expression underscores its vital role in inhibiting tumor progression (Figure 8B). Furthermore, the notable upregulation of SULF1 expression in fibroblasts, echoing the pattern seen with \u0026alpha;-SMA and Vimentin, underscores its potential contribution to tumor progression (Figure 8B).\u003c/p\u003e"},{"header":"Discussion","content":"\u003cp\u003eThis study applied multiple approaches to Recursive Feature Elimination (RFE), a statistical technique that enhances model performance by iteratively discarding the least significant features. This method effectively identified the CAF-associated genes FXYD1, SULF1, and TNXB, which are closely associated with breast cancer. This finding underscores the vital role of cancer-associated fibroblasts (CAFs) in breast cancer pathology and suggests these genes as promising new therapeutic targets. Among the machine learning algorithms assessed, the Random Forest (RF) model particularly excelled, greatly improving diagnostic accuracy and paving the way for new personalized treatment options. Data analysis from the BEST database revealed noteworthy correlations between the expression of FXYD1 and TNXB and tumor grade, providing essential insights. Nonetheless, despite the significant relationship of these three genes with tumor grading, they did not show a strong correlation with patient prognosis, highlighting the complexities inherent in cancer biology. This necessitates comprehensive research that includes a wider array of biological and clinical factors. Furthermore, the observed disparities in model performance across external validation datasets illustrate the difficulties in achieving consistent predictability, emphasizing the need for future studies to incorporate diverse datasets to enhance robustness and accuracy. Future research should also explore more sophisticated machine learning algorithms to improve the generalizability of predictive models, thereby advancing our understanding of cancer and treatment strategies.\u003c/p\u003e\n\u003cp\u003eAlthough existing literature has examined the significance of these three feature genes in breast cancer, research on them remains relatively limited. This highlights the need to discuss our findings further. FXYD family consists of seven members (FXYD1 to FXYD7), which serve as tissue-specific regulators of Na+/K+-ATPase activity in cellular membranes, influencing its function based on tissue type.\u003csup\u003e33\u003c/sup\u003e. Given the well-documented roles of FXYD3, FXYD5, and FXYD6 across various cancer types\u003csup\u003e34\u0026ndash;41\u003c/sup\u003e, , we aimed to investigate the expression patterns of FXYD1 in benign and malignant breast tissues, an area that has not been thoroughly explored. FXYD1 shows a distinct expression pattern in normal tissues, with significantly higher levels in the heart, kidneys, placenta, skeletal muscle, gastrointestinal tract, and colon, while moderate levels are found in breast samples.\u003csup\u003e42\u003c/sup\u003e. Research involving quantitative real-time PCR of clinical samples has indicated a notable downregulation of FXYD1 in ovarian cancer tissues, associating its overexpression with enhanced migratory and invasive characteristics of ovarian cancer cells, unrelated to proliferation\u003csup\u003e43\u003c/sup\u003e. Our current study specifically demonstrated that FXYD1 had higher immunohistochemical expression in normal breast tissue but was significantly reduced in breast cancer tissues, particularly in myoepithelial cells and CAFs. Therefore, we speculate that the downregulation of FXYD1 may be closely related to CAF activation and the modulation of the tumor microenvironment. This underscores its important role in breast cancer diagnostics and treatment, which warrants further validation.\u003c/p\u003e\n\u003cp\u003eThe sulfatase family, comprising sulfatase 1 (SULF1) and sulfatase 2 (SULF2), is important for controlling the sulfation of heparan sulfate proteoglycans (HSPGs). This modification greatly affects various physiological and pathological functions, including cell signaling, proliferation, migration, and differentiation\u003csup\u003e44,45\u003c/sup\u003e.\u0026nbsp;The importance of SULF1 in a range of cancers, including prostate, ovarian, esophageal, hepatocellular, gastric, and colon cancers, has been widely recognized\u0026nbsp;\u003csup\u003e46\u0026ndash;51\u003c/sup\u003e. The expression level of SULF1 is nearly absent in normal breast tissue and low in benign and hyperplastic lesions. In contrast, SULF1 expression significantly rises in triple-positive and triple-negative breast cancers, particularly during the later stages of tumors, where its short splice variants are the most prevalent.\u0026nbsp;\u003csup\u003e52\u003c/sup\u003e. Although SULF1 is important, research on its complex relationship with cancer-associated fibroblasts (CAFs) has been limited until recently. A groundbreaking study has shed light on this area, showing that SULF1, a signaling molecule secreted by CAFs, promotes metastasis and cisplatin (CDDP) resistance in gastric cancer cells by binding to TGFBR3 on their surfaces, thereby activating the TGF-\u0026beta;\u0026nbsp;signaling pathway.\u003csup\u003e46\u003c/sup\u003e. Our study examined the previously overlooked relationship between SULF1 and breast cancer, revealing significant changes in SULF1 expression in breast fibroblasts throughout cancer progression. In normal breast tissues, SULF1 levels were low; however, they increased significantly in breast cancer tissues, especially in aggressive tumors. This change suggested that SULF1 may activate and enhance the functions of cancer-associated fibroblasts (CAFs). As tumors transitioned from in situ to invasive stages, SULF1 expression rose in fibroblasts, potentially aligning with the increased pro-tumorigenic activities of CAFs in the tumor microenvironment. These findings underscored the importance of SULF1 as a marker of functional changes in CAFs and introduced potential molecular targets for developing therapies aimed at CAFs in breast cancer.\u003c/p\u003e\n\u003cp\u003eThe tenascin family comprises four members\u0026mdash;tenascin-C (TNC), tenascin-R (TNR), tenascin-X (TNXB), and tenascin-W (TNW)\u0026mdash;each playing a pivotal role in diverse biological processes, including tissue regeneration, inflammatory diseases, tumorigenesis, and wound healing\u003csup\u003e53\u003c/sup\u003e. Under physiological conditions, TNXB functions as a crucial regulator of collagen deposition, fibril spacing, mechanical properties, and fibrillogenesis in various physiological contexts\u003csup\u003e54\u0026ndash;56\u003c/sup\u003e. As early as 2002, researchers uncovered the intricate relationship between TNXB and fibroblasts, notably observing that B16 melanoma cells demonstrated reduced adhesion and spreading capabilities, coupled with increased detachment, when cultured on TNXB-null fibroblasts\u003csup\u003e57\u003c/sup\u003e. Recently, a pan-cancer analysis of TNXB significantly highlighted its reduced expression in breast cancer tissues compared to normal tissues, as determined by IHC\u003csup\u003e58\u003c/sup\u003e. Our bioinformatics analysis, using multiple datasets, clearly showed the downregulation of TNXB mRNA in breast cancer and identified TNXB as a marker for cancer-associated fibroblasts (CAFs) through single-cell analysis. However, the IHC findings did not lead to conclusive results. Therefore, further validation of the connection between TNXB and CAFs in breast cancer is necessary, particularly to understand how TNXB impacts CAF functions and contributes to the progression of breast cancer. Our ongoing research aims to address this issue and provide a deeper insight into the interactions between TNXB, CAFs, and breast cancer.\u003c/p\u003e\n\u003cp\u003eAnalysis of changes in the FXYD1, SULF1, and TNXB genes from the TCGA and METABRIC datasets provided important insights into their roles in breast cancer. Notably, the significant amplification of SULF1 in many samples highlighted its critical role in tumor development, warranting further investigation into the mechanisms behind this amplification. Interestingly, alterations in FXYD1 were associated with improved overall survival in the METABRIC dataset, suggesting a protective effect, while the TCGA dataset did not show similar positive results. Conversely, changes in SULF1 were associated with worse outcomes and shorter disease-free survival. This contrast emphasized the complex interactions between these genes in tumor progression and opened up opportunities for targeted therapies. Furthermore, the diverse patterns of TNXB alterations indicated its various contributions to the tumor microenvironment, reinforcing the need for a deeper understanding of its biological significance. Given the discrepancies across different datasets, future research should have focused on validating these findings in various populations and cancer subtypes, enhancing our knowledge of breast cancer biology and guiding the development of more effective treatments.\u003c/p\u003e\n\u003cp\u003eIn this study, we conducted a comparative analysis of normal mammary fibroblasts with iCAFs and mCAFs, revealing differences in their functionalities and signaling pathways. A previous study suggested that immune modulators, including myeloperoxidase (MPO) and inflammatory cytokines such as tumor necrosis factor alpha (TNF-\u0026alpha;), may contribute to the development of high breast density by modulating gene expression patterns and collagen production in fibroblasts, ultimately influencing the risk of breast cancer\u003csup\u003e59\u003c/sup\u003e. Researchers have previously summarized that mCAFs are primarily responsible for the generation and remodeling of the extracellular matrix, providing support and migration pathways for tumor cells; on the other hand, iCAFs influence the tumor immune microenvironment by secreting inflammatory cytokines and immunoregulatory molecules, thereby facilitating tumor immune evasion and further progression\u003csup\u003e60\u003c/sup\u003e. Our study found that normal fibroblasts serve as the \u0026quot;baseline,\u0026quot; primarily involved in regulating immune responses and inflammatory processes, while mCAFs are mainly engaged in the remodeling of the extracellular matrix, and iCAFs play a role in inflammation, immune reactions, and cellular signaling. Notably, for the first time, we introduce the distinction between Luminal and non-Luminal CAFs, highlighting the diversity and complexity of CAFs in breast cancer. Luminal CAFs exhibit a pronounced ability to promote protein synthesis and inflammatory responses, potentially accelerating tumor growth and progression. Conversely, non-Luminal CAFs contribute uniquely to immune regulation, anti-infection, and blood coagulation, which may modulate the tumor microenvironment dynamics. These discoveries not only offer novel insights and potential therapeutic targets for precision breast cancer therapy but also pave the way for future research endeavors and therapeutic interventions. By attaining a deeper comprehension of the intricate functions and functional heterogeneity of CAFs within the breast cancer microenvironment, we can devise more targeted treatment strategies, with the ultimate goal of effectively suppressing tumor growth.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eIn our efforts to find new therapies for breast cancer, we have identified cancer-associated fibroblasts (CAFs) as significant therapeutic targets. Therapeutic strategies that focus on CAFs can involve targeting their surface markers, secreted factors, metabolic pathways, epigenetic modifications, immunoregulatory roles, and mechanical characteristics, along with specific interventions for different subgroups\u003csup\u003e60\u003c/sup\u003e. However, the heterogeneity of CAFs\u0026mdash;evident in their varied functions, phenotypic traits, and drug sensitivities across subgroups\u0026mdash;poses challenges for any single treatment method. Therefore, it is vital to have a clear understanding of CAF classification when designing effective therapeutic strategies. By predicting sensitive drugs for iCAFs and mCAFs in Luminal and non-Luminal breast cancer subtypes, respectively, we have provided a crucial foundation for the development of targeted therapies directed at specific CAF subgroups. Critically, we have also unveiled an intriguing phenomenon of mutual exclusivity in drug sensitivity among CAF subgroups, whereby certain drugs effective against iCAFs may be ineffective for mCAFs, and vice versa. This revelation not only offers a fresh perspective on CAF heterogeneity but also lays the groundwork for formulating combinatorial therapeutic strategies. By combining drugs sensitive to different CAF subgroups, we can more effectively inhibit their tumor-promoting effects while mitigating adverse effects and minimizing the risk of drug resistance. Previous research has demonstrated that Dasatinib can substantially inhibit the growth of cancer-associated fibroblasts (CAFs) in lung cancer, potentially augmenting the effectiveness of anticancer therapies\u003csup\u003e32\u003c/sup\u003e. Our analysis highlights the sensitivity of Luminal mCAFs to Dasatinib. This revelation not only reinforces the credibility of our results but also emphasizes Dasatinib\u0026apos;s promising potential as a therapeutic target for CAFs.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eAlthough this study successfully identified CAFs markers in breast cancer, several limitations persist. Firstly, the dependency on specific datasets may undermine the universality of the findings, thereby constraining the model\u0026apos;s ability to generalize to external datasets. Additionally, the existence of a significant correlation between gene expression and patients\u0026apos; long-term prognosis remains unclear. Furthermore, despite shedding light on the heterogeneity of CAFs, a deeper understanding of their multifaceted roles within the tumor microenvironment warrants further exploration. Drug sensitivity predictions, too, require rigorous validation to ensure their reliability, and potential biases in sample selection and statistical methodologies must be recognized and addressed. Immunohistochemical studies, however, are constrained by small sample sizes, underscoring the need for larger-scale investigations across diverse clinical contexts to substantiate the efficacy of these markers. Concurrently, integrating multi-omics data for nuanced analysis will bolster our comprehension of the intricate mechanisms underpinning the roles of these genes in breast cancer. Critically, translating these research insights into clinical practice represents a pivotal future endeavor, with the aim of devising diagnostic tests and therapeutic interventions tailored to these signature genes, ultimately facilitating early breast cancer detection, precise prognosis estimation, and individualized treatment regimens.\u003c/p\u003e"},{"header":"Methods","content":"\u003cp\u003e\u003cstrong\u003e1.\u0026nbsp;\u0026nbsp;Data Acquisition.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eIn our study, we utilized the TCGAbiolinks package to access TCGA-BRCA TPM (Transcripts Per Kilobase Million) data along with corresponding patient clinical profiles. We applied the following data exclusion criteria: 1) genes with low expression, defined as those having an expression level of zero in more than 10% of the samples; 2) cases with incomplete clinical information; 3) male cases. We downloaded gene expression profile data (GSE65194 and GSE233242) and corresponding clinical information from the public Gene Expression Omnibus database (GEO, http://www.ncbi.nlm.nih.gov/geo/). Additionally, we obtained a single-cell dataset (GSE161529) from the GEO database, selecting 13 normal breast tissue samples, 6 Her2-positive breast cancer tissue samples, 17 ER-positive breast cancer tissue samples, and 8 triple-negative breast cancer tissue samples.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e2.\u0026nbsp;\u0026nbsp;Identification of feature genes of CAFs in breast cancer by machine learning.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe employed the caret package\u003csup\u003e18\u003c/sup\u003e to implement our machine learning pipeline, with the code being available at https://topepo.github.io/caret/index.html. Initially, we divided the paired samples from the TCGA-BRCA dataset into a training set (TCGA-train, comprising 68 pairs of cancer and normal samples) and a test set (TCGA-test, containing 29 pairs), following a 7:3 ratio. Furthermore, the entire TCGA-BRCA dataset (TCGA-all), encompassing 1076 cancer samples and 99 normal samples, served as the internal validation set to ensure robustness.\u003c/p\u003e\n\u003cp\u003eTo broaden the model\u0026apos;s applicability and assess its generalizability, we incorporated two external validation sets: GSE65194, a microarray dataset consisting of 153 cancer samples and 11 normal samples, and GSE233242, a high-throughput sequencing dataset composed of 43 pairs of cancer and normal samples.\u003c/p\u003e\n\u003cp\u003ePrior to model training, data preprocessing was crucial. To improve model performance and reduce redundant information, we eliminated collinear variables using a threshold of r \u0026gt;= 0.7\u003csup\u003e19\u003c/sup\u003e. Continuous variables underwent z-scoring normalization using the \u0026apos;center\u0026apos; and \u0026apos;scale\u0026apos; methods provided by the caret preProcess function. This standardization step was essential to ensure that all variables contributed equally to the model, regardless of their original scales.\u003c/p\u003e\n\u003cp\u003eFor feature selection, we opted for the Recursive Feature Elimination (RFE) method with 10-fold cross-validation. Six algorithms\u0026mdash;Random Forest (RF), Generalized Additive Models (GAM), Linear Discriminant Analysis (LDA), Logistic Regression (LR), Naive Bayes (NB), and Bagged Trees (BT)\u0026mdash;built in the caret package were utilized to guide this process. Through this rigorous selection, the optimal subset of features was identified.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e3.\u0026nbsp;\u0026nbsp;Model construction and comparison of diagnostic performance.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe employed six machine learning methods to develop diagnostic models using the TCGA-train dataset with 10-fold cross-validation. These methods include RF, NB, K-Nearest Neighbors (KNN), Generalized Linear Models (GLM), XGBoost (XGB), and Support Vector Machine (SVM). To comprehensively evaluate the performance of the models, we utilized confusion matrices (generated using the caret package), Receiver Operating Characteristic (ROC) curves (calculated with the pROC package), and Precision-Recall (PR) curves (computed with the PRROC package) on the test set, as well as the internal and external validation sets. For global interpretation of the models, we employed the fastshap package, with results visualized using the shapviz package. The code for interfacing fastshap with caret models is available at https://harpomaxx.github.io/post/shap-values/.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e4.\u0026nbsp;\u0026nbsp;Copy number alteration (CNA) analysis.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe conducted a comprehensive analysis of genomic data utilizing the cBioPortal platform (https://www.cbioportal.org)\u003csup\u003e20,21\u003c/sup\u003e. Our investigation encompassed two datasets: the TCGA-BRCA (Breast Invasive Carcinoma, Firehose Legacy), and the METABRIC dataset. The \u0026apos;OncoPrint\u0026apos; module was used to visualize variants. The \u0026apos;Cancer Types Summary\u0026apos; module was used to gain an overview of genomic alterations. To further advance our analysis, we employed the \u0026lsquo;Survival\u0026rsquo; module to assess the potential correlation between alterations in feature genes and patient survival time. This enabled us to gain deeper insights into the prognostic implications of specific genomic alterations in breast cancer.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e5.\u0026nbsp;\u0026nbsp;Single-cell Analysis.\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eIn this section, we used Seurat v4\u003csup\u003e22\u003c/sup\u003e while referencing the data quality control protocols outlined in the \u0026apos;scCancer\u0026apos;\u003csup\u003e23\u003c/sup\u003e package (code is available at https://github.com/wguo-research/scCancer). To ensure data integrity, we employed the \u0026quot;DoubletFinder\u0026quot;\u003csup\u003e24\u003c/sup\u003e package to meticulously remove potential doublets from our dataset. Furthermore, to enhance the comprehensiveness of our analysis, we leveraged the \u0026quot;Harmony\u0026quot;\u003csup\u003e25\u003c/sup\u003e package to seamlessly integrate data from multiple samples. Using previously validated markers from prior studies\u003csup\u003e26,27\u003c/sup\u003e, we achieved a precise delineation of distinct cell subgroups, thereby solidifying the foundation for our subsequent investigations. By using the\u0026nbsp;\u0026ldquo;Findmarkers\u0026rdquo;\u0026nbsp;function in Seurat, unique markers for each subgroup were identified, applying parameters as follows: min.pct = 0.1, logfc.threshold = 0.25, and p_adjust \u0026lt; 0.05. Subsequently, we conducted trajectory inference using the monocle3 package (code is available at https://github.com/cole-trapnell-lab/monocle3), which provided us with insights into the developmental pathways of the cells.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eMoving on to enrichment analysis, we chose the \u0026lsquo;ClusterGVis\u0026rsquo; package (code is available at https://github.com/junjunlab/ClusterGVis), specifically utilizing the \u0026lsquo;prepareDataFromscRNA\u0026rsquo; function to transform the single-cell data. Notably, the \u0026lsquo;diffData\u0026rsquo; parameter was configured to encompass the top 20 marker genes for each identified cell subgroup. Following this, we applied the \u0026lsquo;enrichCluster\u0026rsquo; function to perform extensive gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses.\u003c/p\u003e\n\u003cp\u003eFurthermore, we used the Drug sensitivity signature collection (SSC) from the \u0026lsquo;beyondcell\u0026rsquo;\u003csup\u003e28\u003c/sup\u003e package for drug sensitivity prediction (code is available https://github.com/cnio-bu/beyondcell). SSC identified transcriptional state changes occurring before and after drug treatment, by collecting and analyzing data from extensive drug sensitivity databases. Through differential expression analysis, SSC screened for gene expression patterns indicative of drug sensitivity and constructed gene expression signatures. These signatures were subsequently utilized to calculate Beyondcell scores for individual cells, accurately quantifying their sensitivity to specific drugs. Lastly, we performed cluster analysis to form treatment clusters (TC), further refining our prioritization of potential drug candidates.。\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e6.\u0026nbsp;\u0026nbsp;Clinical sample collection.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eIn 2023, we collected surgical samples from the Breast and Thyroid Surgery Department of Zibo Maternity and Child Health Hospital, including breast fibroadenoma, breast carcinoma in situ, and invasive breast cancer. Each cancer sample was accompanied by adjacent normal tissue taken at least 5 cm away from the respective tumor margin (Supplementary Table 1). After collecting the samples, to ensure the quality of our dataset, we applied rigorous criteria in selecting the samples for inclusion in the study. Specifically, we chose patients who had not received any prior treatment and had undergone a modified radical mastectomy for breast cancer. The study was approved by the Ethics Review Committee of Zibo Maternal and Child Health Hospital, all methods were performed in accordance with the relevant guidelines and regulations, and all participants provided written informed consent.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e7.\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;Immunohistochemistry (IHC).\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eImmunohistochemical experiments were conducted adhering to a standardized protocol. Initially, paraffin-embedded tissue sections underwent deparaffinization with xylene, followed by a series of ethanol solutions for rehydration. After antigen retrieval, using an EDTA buffer at a pH of 9.0 in a DAKO PT Link device heated to 97\u0026deg;C for 20 minutes, the samples were cooled to 65\u0026deg;C and rinsed with Tris-buffered saline. To inactivate endogenous peroxidase activity, 3% hydrogen peroxide was applied. The primary antibodies employed in this study included rabbit polyclonal antibodies specific to FXYD1, SULF1, and TNXB, sourced exclusively from Abcam, and respectively diluted at ratios of 1:200, 1:400, and 1:100. Additionally, as part of routine pathology practice, ready-to-use primary antibodies for P63, vimentin, \u0026alpha;-SMA, and calponin, obtained from Fuzhou Maisen Biotechnology Co., Ltd., were also utilized. These latter antibodies, which are routinely employed in the pathology department for diagnostic purposes, were each applied to the sections for 20 minutes at room temperature. Subsequently, the sections were incubated with a secondary antibody, anti-mouse IgG from Dako, for an additional 20 minutes at room temperature. The final stage involved color development utilizing a DAB chromogen for a duration of one minute. The intensity and distribution of the staining signal were independently assessed by three qualified pathologists. IHC staining percentage was graded as follows: 1 = 0\u0026ndash;25%; 2 = 26\u0026ndash;50%; 3 = 51\u0026ndash;75%; 4 = 76\u0026ndash;100%. The IHC intensity was scored as follows: 0 = none (-); 1 = weak (1+); 2 = moderate (2+); 3 = strong (3+). The IHC score was calculated by multiplying the intensity and percentage scores.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e8.\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;Statistical Analysis.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eIn our study, all statistical analyses and graphical presentations were executed using the R software, version 4.2.2, aided by a selection of tailored R packages that met our analytical requirements. For scenarios involving nonparametric data, we employed the Wilcoxon rank-sum test for pairwise comparisons and the Kruskal-Wallis H test when multiple groups were analyzed. This rigorous methodology allowed us to accurately gauge statistical significance within our dataset, ensuring comprehensive evaluation of our results.\u0026nbsp;\u003c/p\u003e"},{"header":"Abbreviations","content":"\u003cp\u003eα-SMA: Alpha Smooth Muscle Actin\u003c/p\u003e\n\u003cp\u003eBC: Breast Cancer\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eCAF: Cancer-Associated Fibroblasts\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eCNA: Copy Number Alteration\u003c/p\u003e\n\u003cp\u003eDAB: 3,3'-Diaminobenzidine\u003c/p\u003e\n\u003cp\u003eEDTA: Ethylenediaminetetraacetic Acid\u003c/p\u003e\n\u003cp\u003eSSC: sensitivity signature collection\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eER: Estrogen Receptor\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eFXYD1: fxyd domain-containing transport regulator 1\u003c/p\u003e\n\u003cp\u003eGAM: Generalized Additive Models\u003c/p\u003e\n\u003cp\u003eGEO: Gene Expression Omnibus\u003c/p\u003e\n\u003cp\u003eGLM: Generalized Linear Models\u003c/p\u003e\n\u003cp\u003eGO: Gene Ontology\u003c/p\u003e\n\u003cp\u003eHER2: Human Epidermal Growth Factor Receptor 2\u003c/p\u003e\n\u003cp\u003eHSPG: Heparan Sulfate Proteoglycan\u003c/p\u003e\n\u003cp\u003eIHC: Immunohistochemistry\u003c/p\u003e\n\u003cp\u003eiCAFs: Inflammatory CAFs\u003c/p\u003e\n\u003cp\u003eKEGG: Kyoto Encyclopedia of Genes and Genomes\u003c/p\u003e\n\u003cp\u003eKNN: K-Nearest Neighbors\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eLDA: Linear Discriminant Analysis\u003c/p\u003e\n\u003cp\u003eLR: Logistic Regression\u003c/p\u003e\n\u003cp\u003emCAFs: Myofibroblastic CAFs\u003c/p\u003e\n\u003cp\u003eMETABRIC: Molecular Taxonomy of Breast Cancer International Consortium\u003c/p\u003e\n\u003cp\u003eNB: Naive Bayes\u003c/p\u003e\n\u003cp\u003eOS: Overall Survival\u003c/p\u003e\n\u003cp\u003ePCA: Principal Component Analysis\u0026nbsp;\u003c/p\u003e\n\u003cp\u003ePR: Precision-Recall\u003c/p\u003e\n\u003cp\u003eROC: Receiver Operating Characteristic\u003c/p\u003e\n\u003cp\u003eRFE: Recursive Feature Elimination\u003c/p\u003e\n\u003cp\u003eRF: Random Forest\u003c/p\u003e\n\u003cp\u003eRFS: Relapse-Free Survival\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eSULF1: Sulfatase 1\u003c/p\u003e\n\u003cp\u003eSULF2: Sulfatase 2\u003c/p\u003e\n\u003cp\u003eSVM: Support Vector Machine\u003c/p\u003e\n\u003cp\u003eTCGA: The Cancer Genome Atlas\u003c/p\u003e\n\u003cp\u003eTC: Treatment Cluster\u003c/p\u003e\n\u003cp\u003eTME: Tumor Microenvironment\u003c/p\u003e\n\u003cp\u003eTNC: Tenascin-C\u003c/p\u003e\n\u003cp\u003eTNR: Tenascin-R\u003c/p\u003e\n\u003cp\u003eTNXB: Tenascin-X\u003c/p\u003e\n\u003cp\u003eTNW: \u0026nbsp;Tenascin-W\u003c/p\u003e\n\u003cp\u003etSNE: t-Distributed Stochastic Neighbor Embedding\u003c/p\u003e\n\u003cp\u003eTPKM: Transcripts Per Kilobase Million\u003c/p\u003e\n\u003cp\u003eXGB: xgboost\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eAcknowledgements\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe express our gratitude to GEO, and the TCGA database, along with all contributors who have shared their codes online.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthors\u0026apos; contributions\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eXZ: Writing\u0026nbsp;\u0026ndash;\u0026nbsp;review \u0026amp; editing, Writing\u0026nbsp;\u0026ndash;\u0026nbsp;original draft, Validation, Supervision, Project administration, Investigation, Data curation, Conceptualization. Na Wang: Writing\u0026nbsp;\u0026ndash;\u0026nbsp;original draft, Resources, Investigation, Formal analysis. Ling Shi: Validation. Xindong Wei: Writing\u0026nbsp;\u0026ndash;\u0026nbsp;original draft, Methodology, Investigation, Data curation. Xiaoqin Sun: Resources. Mingxiu Shao: Investigation. Xiaolong Guo: Investigation. Liang Tian: Data curation. Fangyuan Zhang: Data curation. Hui Lyu: Writing\u0026nbsp;\u0026ndash;\u0026nbsp;review \u0026amp; editing, Writing\u0026nbsp;\u0026ndash;\u0026nbsp;original draft, Validation, Supervision, Project administration, Investigation, Data curation, Conceptualization.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eData availability statement\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe dataset of TCGA-BRCA is available at the TCGA database (https://cancergenome.nih.gov/). We obtained the dataset by using the TCGAbiolinks package in R.\u003c/p\u003e\n\u003cp\u003eThe datasets analyzed during the current study are available at GEO: GSE65194, GSE233242, and GSE161529.\u003c/p\u003e\n\u003cp\u003eThe raw data generated and/or analyzed during the current study are not publicly available for the IHC results due to ethical and privacy concerns regarding patient samples. However, key findings and analyses based on these data are reported in the manuscript, and any additional information necessary to reproduce the results may be obtained upon reasonable request to the corresponding author.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis work received support from the Zibo City Medical and Health Science Research Projects (No. 2023030926), and Zibo Maternal and Child Health Hospital.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eEthics approval and consent to participate\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis study was reviewed and approved by the Ethics Review Committee of Zibo Maternal and Child Health Hospital (approval no. 202106073, data: 2022-06-23). Patient informed consent was obtained as part of surgical consent at the time of surgery for scientific research. The patient\u0026apos;s information was kept confidential.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eDeclaration of competing interest\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eHan, L. \u003cem\u003eet al.\u003c/em\u003e LncRNA HOTTIP facilitates the stemness of breast cancer via regulation of miR-148a-3p/WNT1 pathway. \u003cem\u003eJ. Cell. Mol. Med.\u003c/em\u003e \u003cstrong\u003e24\u003c/strong\u003e, 6242\u0026ndash;6252 (2020).\u003c/li\u003e\n\u003cli\u003eZuo, S., Yu, J., Pan, H. \u0026amp; Lu, L. Novel insights on targeting ferroptosis in cancer therapy. \u003cem\u003eBiomark. Res.\u003c/em\u003e \u003cstrong\u003e8\u003c/strong\u003e, 1\u0026ndash;11 (2020).\u003c/li\u003e\n\u003cli\u003eGranucci, F. The Family of LPS Signal Transducers Increases: the Arrival of Chanzymes. \u003cem\u003eImmunity\u003c/em\u003e \u003cstrong\u003e48\u003c/strong\u003e, 4\u0026ndash;6 (2018).\u003c/li\u003e\n\u003cli\u003eZheng, S. \u003cem\u003eet al.\u003c/em\u003e Development and validation of a stromal immune phenotype classifier for predicting immune activity and prognosis in triple-negative breast cancer. \u003cem\u003eInt. J. Cancer\u003c/em\u003e \u003cstrong\u003e147\u003c/strong\u003e, 542\u0026ndash;553 (2020).\u003c/li\u003e\n\u003cli\u003eTamborero, D. \u003cem\u003eet al.\u003c/em\u003e A pan-cancer landscape of interactions between solid tumors and infiltrating immune cell populations. \u003cem\u003eClin. Cancer Res.\u003c/em\u003e \u003cstrong\u003e24\u003c/strong\u003e, 3717\u0026ndash;3728 (2018).\u003c/li\u003e\n\u003cli\u003eLambrechts, D. \u003cem\u003eet al.\u003c/em\u003e Phenotype molding of stromal cells in the lung tumor microenvironment. \u003cem\u003eNat. Med.\u003c/em\u003e \u003cstrong\u003e24\u003c/strong\u003e, 1277\u0026ndash;1289 (2018).\u003c/li\u003e\n\u003cli\u003eAuciello, F. R. \u003cem\u003eet al.\u003c/em\u003e A stromal lysolipid-autotaxin signaling axis promotes pancreatic tumor progression. \u003cstrong\u003e9\u003c/strong\u003e, 617\u0026ndash;627 (2019).\u003c/li\u003e\n\u003cli\u003eT. Bertero, William. Oldham, E. Grasset, et al. Tumor-stroma mechanics coordinate amino acid availability to sustain tumor growth and malignancy Thomas. \u003cem\u003eCell Metab\u003c/em\u003e \u003cstrong\u003e29\u003c/strong\u003e, 124\u0026ndash;140 (2019).\u003c/li\u003e\n\u003cli\u003eSahai, E. \u003cem\u003eet al.\u003c/em\u003e A framework for advancing our understanding of cancer-associated fibroblasts. \u003cem\u003eNat. Rev. Cancer\u003c/em\u003e \u003cstrong\u003e20\u003c/strong\u003e, 174\u0026ndash;186 (2020).\u003c/li\u003e\n\u003cli\u003eGagliano, T. \u003cem\u003eet al.\u003c/em\u003e PIK3C\u0026delta; expression by fibroblasts promotes triplenegative breast cancer progression. \u003cem\u003eJ. Clin. Invest.\u003c/em\u003e \u003cstrong\u003e130\u003c/strong\u003e, 3188\u0026ndash;3204 (2020).\u003c/li\u003e\n\u003cli\u003eAlcaraz, L. B. \u003cem\u003eet al.\u003c/em\u003e A 9-kDa matricellular SPARC fragment released by cathepsin D exhibits pro-tumor activity in the triple-negative breast cancer microenvironment. \u003cem\u003eTheranostics\u003c/em\u003e \u003cstrong\u003e11\u003c/strong\u003e, 6173\u0026ndash;6192 (2021).\u003c/li\u003e\n\u003cli\u003eAl-Ansari, M. M., Hendrayani, S. F., Shehata, A. I. \u0026amp; Aboussekhra, A. P16 INK4A Represses the paracrine tumor-promoting effects of breast stromal fibroblasts. \u003cem\u003eOncogene\u003c/em\u003e \u003cstrong\u003e32\u003c/strong\u003e, 2356\u0026ndash;2364 (2013).\u003c/li\u003e\n\u003cli\u003eYang, P. \u003cem\u003eet al.\u003c/em\u003e CAF-derived exosomal WEE2-AS1 facilitates colorectal cancer progression via promoting degradation of MOB1A to inhibit the Hippo pathway. \u003cem\u003eCell Death Dis.\u003c/em\u003e \u003cstrong\u003e13\u003c/strong\u003e, (2022).\u003c/li\u003e\n\u003cli\u003eChen, X. \u0026amp; Song, E. Turning foes to friends: targeting cancer-associated fibroblasts. \u003cem\u003eNat. Rev. Drug Discov.\u003c/em\u003e \u003cstrong\u003e18\u003c/strong\u003e, 99\u0026ndash;115 (2019).\u003c/li\u003e\n\u003cli\u003eCuomo, M. \u003cem\u003eet al.\u003c/em\u003e Epigenetic remodelling of Fxyd1 promoters in developing heart and brain tissues. \u003cem\u003eSci. Rep.\u003c/em\u003e \u003cstrong\u003e12\u003c/strong\u003e, 1\u0026ndash;11 (2022).\u003c/li\u003e\n\u003cli\u003eZhu, W. \u003cem\u003eet al.\u003c/em\u003e SULF1 regulates malignant progression of colorectal cancer by modulating ARSH via FAK/PI3K/AKT/mTOR signaling. \u003cem\u003eCancer Cell Int.\u003c/em\u003e \u003cstrong\u003e24\u003c/strong\u003e, 1\u0026ndash;19 (2024).\u003c/li\u003e\n\u003cli\u003eMatsumoto, K. I., Higuchi, T., Umeki, M., Ono, M. \u0026amp; Sakamoto, S. Tenascin-X is increased with decreased expression of miR-378a-5p and miR-486-5p in mice fed a methionine-choline-deficient diet that induces hepatic fibrosis. \u003cem\u003eBiomed. Res.\u003c/em\u003e \u003cstrong\u003e45\u003c/strong\u003e, 67\u0026ndash;76 (2024).\u003c/li\u003e\n\u003cli\u003eKuhn, M. Building predictive models in R using the caret package. \u003cem\u003eJ. Stat. Softw.\u003c/em\u003e \u003cstrong\u003e28\u003c/strong\u003e, 1\u0026ndash;26 (2008).\u003c/li\u003e\n\u003cli\u003eL\u0026oacute;pez-Delgado, J. \u0026amp; Meirmans, P. G. History or demography? Determining the drivers of genetic variation in North American plants. \u003cem\u003eMol. Ecol.\u003c/em\u003e \u003cstrong\u003e31\u003c/strong\u003e, 1951\u0026ndash;1962 (2022).\u003c/li\u003e\n\u003cli\u003eCerami et al. The cBio Cancer Genomics Portal: An Open Platform for Exploring Multidimensional Cancer Genomics Data. \u003cem\u003eCancer Discov.\u003c/em\u003e \u003cstrong\u003e32\u003c/strong\u003e, 736\u0026ndash;740 (2017).\u003c/li\u003e\n\u003cli\u003eGao, J. et al. Integrative Analysis of Complex Cancer Genomics and Clinical Profiles Using the cBioPortal. \u003cem\u003eSci Signal\u003c/em\u003e \u003cstrong\u003e6\u003c/strong\u003e, pl1 (2013).\u003c/li\u003e\n\u003cli\u003eHao, Y. \u003cem\u003eet al.\u003c/em\u003e Integrated analysis of multimodal single-cell data. \u003cem\u003eCell\u003c/em\u003e \u003cstrong\u003e184\u003c/strong\u003e, 3573-3587.e29 (2021).\u003c/li\u003e\n\u003cli\u003eGuo, W. \u003cem\u003eet al.\u003c/em\u003e ScCancer: A package for automated processing of single-cell RNA-seq data in cancer. \u003cem\u003eBrief. Bioinform.\u003c/em\u003e \u003cstrong\u003e22\u003c/strong\u003e, 10\u0026ndash;11 (2021).\u003c/li\u003e\n\u003cli\u003eMcGinnis, C. S., Murrow, L. M. \u0026amp; Gartner, Z. J. DoubletFinder: Doublet Detection in Single-Cell RNA Sequencing Data Using Artificial Nearest Neighbors. \u003cem\u003eCell Syst.\u003c/em\u003e \u003cstrong\u003e8\u003c/strong\u003e, 329-337.e4 (2019).\u003c/li\u003e\n\u003cli\u003eKorsunsky, I. \u003cem\u003eet al.\u003c/em\u003e Fast, sensitive, and accurate integration of single cell data with Harmony. \u003cem\u003eNat. Methods\u003c/em\u003e \u003cstrong\u003e16\u003c/strong\u003e, 1289--1296 (2019).\u003c/li\u003e\n\u003cli\u003eWu, S. Z. \u003cem\u003eet al.\u003c/em\u003e A single-cell and spatially resolved atlas of human breast cancers. \u003cem\u003eNat Genet\u003c/em\u003e \u003cstrong\u003e53\u003c/strong\u003e, 1334\u0026ndash;1347 (2022).\u003c/li\u003e\n\u003cli\u003ePal, B. \u003cem\u003eet al.\u003c/em\u003e A single‐cell RNA expression atlas of normal, preneoplastic and tumorigenic states in the human breast. \u003cem\u003eEMBO J.\u003c/em\u003e \u003cstrong\u003e40\u003c/strong\u003e, 1\u0026ndash;23 (2021).\u003c/li\u003e\n\u003cli\u003eFustero-Torre, C. \u003cem\u003eet al.\u003c/em\u003e Beyondcell: targeting cancer therapeutic heterogeneity in single-cell RNA-seq data. \u003cem\u003eGenome Med.\u003c/em\u003e \u003cstrong\u003e13\u003c/strong\u003e, 1\u0026ndash;15 (2021).\u003c/li\u003e\n\u003cli\u003eWu, S. Z. \u003cem\u003eet al.\u003c/em\u003e A single-cell and spatially resolved atlas of human breast cancers. \u003cem\u003eNat. Genet.\u003c/em\u003e \u003cstrong\u003e53\u003c/strong\u003e, 1334\u0026ndash;1347 (2021).\u003c/li\u003e\n\u003cli\u003eWu, S. Z. \u003cem\u003eet al.\u003c/em\u003e Stromal cell diversity associated with immune evasion in human triple‐negative breast cancer. \u003cem\u003eEMBO J.\u003c/em\u003e \u003cstrong\u003e39\u003c/strong\u003e, 1\u0026ndash;20 (2020).\u003c/li\u003e\n\u003cli\u003eKieffer, Y. \u003cem\u003eet al.\u003c/em\u003e Single-cell analysis reveals fibroblast clusters linked to immunotherapy resistance in cancer. \u003cem\u003eCancer Discov.\u003c/em\u003e \u003cstrong\u003e10\u003c/strong\u003e, 1330\u0026ndash;1351 (2020).\u003c/li\u003e\n\u003cli\u003eHaubeiss, S. \u003cem\u003eet al.\u003c/em\u003e Dasatinib reverses Cancer-associated Fibroblasts (CAFs) from primary Lung Carcinomas to a Phenotype comparable to that of normal Fibroblasts. \u003cem\u003eMol. Cancer\u003c/em\u003e \u003cstrong\u003e9\u003c/strong\u003e, 1\u0026ndash;8 (2010).\u003c/li\u003e\n\u003cli\u003eGeering, K. \u003cem\u003eet al.\u003c/em\u003e FXYD proteins: New tissue- and isoform-specific regulators of Na,K-ATPase. \u003cem\u003eAnn. N. Y. Acad. Sci.\u003c/em\u003e \u003cstrong\u003e986\u003c/strong\u003e, 388\u0026ndash;394 (2003).\u003c/li\u003e\n\u003cli\u003eGao, Q. \u003cem\u003eet al.\u003c/em\u003e FXYD6: A novel therapeutic target toward hepatocellular carcinoma. \u003cem\u003eProtein Cell\u003c/em\u003e \u003cstrong\u003e5\u003c/strong\u003e, 532\u0026ndash;543 (2014).\u003c/li\u003e\n\u003cli\u003eZhu, Z. L. \u003cem\u003eet al.\u003c/em\u003e Overexpression of FXYD-3 is involved in the tumorigenesis and development of esophageal squamous cell carcinoma. \u003cem\u003eDis. Markers\u003c/em\u003e \u003cstrong\u003e35\u003c/strong\u003e, 195\u0026ndash;202 (2013).\u003c/li\u003e\n\u003cli\u003eLiu, J., Zhou, N. \u0026amp; Zhang, X. A monoclonal antibody against human FXYD6. \u003cem\u003eHybridoma\u003c/em\u003e \u003cstrong\u003e30\u003c/strong\u003e, 487\u0026ndash;490 (2011).\u003c/li\u003e\n\u003cli\u003eKayed, H. \u003cem\u003eet al.\u003c/em\u003e FXYD3 is overexpressed in pancreatic ductal adenocarcinoma and influences pancreatic cancer cell growth. \u003cem\u003eInt. J. Cancer\u003c/em\u003e \u003cstrong\u003e118\u003c/strong\u003e, 43\u0026ndash;54 (2006).\u003c/li\u003e\n\u003cli\u003eBai, Y. \u003cem\u003eet al.\u003c/em\u003e A FXYD5/TGF-\u0026beta;/SMAD positive feedback loop drives epithelial-to-mesenchymal transition and promotes tumor growth and metastasis in ovarian cancer. \u003cem\u003eInt. J. Oncol.\u003c/em\u003e \u003cstrong\u003e56\u003c/strong\u003e, 301\u0026ndash;314 (2020).\u003c/li\u003e\n\u003cli\u003eLoft\u0026aring;s, P. \u003cem\u003eet al.\u003c/em\u003e Expression of FXYD-3 is an Independent Prognostic Factor in Rectal Cancer Patients With Preoperative Radiotherapy. \u003cem\u003eInt. J. Radiat. Oncol. Biol. Phys.\u003c/em\u003e \u003cstrong\u003e75\u003c/strong\u003e, 137\u0026ndash;142 (2009).\u003c/li\u003e\n\u003cli\u003eLiu, J. \u003cem\u003eet al.\u003c/em\u003e Extracellular vesicles-encapsulated let-7i shed from bone mesenchymal stem cells suppress lung cancer via KDM3A/DCLK1/FXYD3 axis. \u003cem\u003eJ. Cell. Mol. Med.\u003c/em\u003e \u003cstrong\u003e25\u003c/strong\u003e, 1911\u0026ndash;1926 (2021).\u003c/li\u003e\n\u003cli\u003eWang, L. J. \u003cem\u003eet al.\u003c/em\u003e Prognostic significance of sodium-potassium ATPaseregulator, FXYD3, in human hepatocellular carcinoma. \u003cem\u003eOncol. Lett.\u003c/em\u003e \u003cstrong\u003e15\u003c/strong\u003e, 3024\u0026ndash;3030 (2018).\u003c/li\u003e\n\u003cli\u003eFloyd, R. V., Wray, S., Mart\u0026iacute;n-Vasallo, P. \u0026amp; Mobasheri, A. Differential cellular expression of FXYD1 (phospholemman) and FXYD2 (gamma subunit of Na, K-ATPase) in normal human tissues: A study using high density human tissue microarrays. \u003cem\u003eAnn. Anat.\u003c/em\u003e \u003cstrong\u003e192\u003c/strong\u003e, 7\u0026ndash;16 (2010).\u003c/li\u003e\n\u003cli\u003eZhao, E. \u003cem\u003eet al.\u003c/em\u003e The roles of FXYD family members in ovarian cancer: an integrated analysis by mining TCGA and GEO databases and functional validations. \u003cem\u003eJ. Cancer Res. Clin. Oncol.\u003c/em\u003e \u003cstrong\u003e149\u003c/strong\u003e, 17269\u0026ndash;17284 (2023).\u003c/li\u003e\n\u003cli\u003eAi, X. \u003cem\u003eet al.\u003c/em\u003e SULF1 and SULF2 regulate heparan sulfate-mediated GDNF signaling for esophageal innervation. \u003cem\u003eDevelopment\u003c/em\u003e \u003cstrong\u003e134\u003c/strong\u003e, 3327\u0026ndash;3338 (2007).\u003c/li\u003e\n\u003cli\u003eMorimoto-Tomita, M. \u003cem\u003eet al.\u003c/em\u003e Sulf-2, a proangiogenic heparan sulfate endosulfatase, is upregulated in breast cancer. \u003cem\u003eNeoplasia\u003c/em\u003e \u003cstrong\u003e7\u003c/strong\u003e, 1001\u0026ndash;1010 (2005).\u003c/li\u003e\n\u003cli\u003eFang, X. \u003cem\u003eet al.\u003c/em\u003e Cancer associated fibroblasts-derived SULF1 promotes gastric cancer metastasis and CDDP resistance through the TGFBR3-mediated TGF-\u0026beta; signaling pathway. \u003cem\u003eCell Death Discov.\u003c/em\u003e \u003cstrong\u003e10\u003c/strong\u003e, 1\u0026ndash;12 (2024).\u003c/li\u003e\n\u003cli\u003eLiu, C. T. \u003cem\u003eet al.\u003c/em\u003e SULF1 inhibits proliferation and invasion of esophageal squamous cell carcinoma cells by decreasing heparin-binding growth factor signaling. \u003cem\u003eDig. Dis. Sci.\u003c/em\u003e \u003cstrong\u003e58\u003c/strong\u003e, 1256\u0026ndash;1263 (2013).\u003c/li\u003e\n\u003cli\u003eHur, K. \u003cem\u003eet al.\u003c/em\u003e Up-regulated expression of sulfatases (SULF1 and SULF2) as prognostic and metastasis predictive markers in human gastric cancer. \u003cem\u003eJ. Pathol.\u003c/em\u003e \u003cstrong\u003e228\u003c/strong\u003e, 88\u0026ndash;98 (2012).\u003c/li\u003e\n\u003cli\u003eLai, J. P. \u003cem\u003eet al.\u003c/em\u003e SULF1 Inhibits Tumor Growth and Potentiates the Effects of Histone Deacetylase Inhibitors in Hepatocellular Carcinoma. \u003cem\u003eGastroenterology\u003c/em\u003e \u003cstrong\u003e130\u003c/strong\u003e, 2130\u0026ndash;2144 (2006).\u003c/li\u003e\n\u003cli\u003eOuyang, Q. \u003cem\u003eet al.\u003c/em\u003e Loss of ZNF587B and SULF1 contributed to cisplatin resistance in ovarian cancer cell lines based on Genome-scale CRISPR/Cas9 screening. \u003cem\u003eAm. J. Cancer Res.\u003c/em\u003e \u003cstrong\u003e9\u003c/strong\u003e, 988\u0026ndash;998 (2019).\u003c/li\u003e\n\u003cli\u003eBrasil da Costa, F. H., Lewis, M. S., Truong, A., Carson, D. D. \u0026amp; Farach-Carson, M. C. SULF1 suppresses Wnt3A-driven growth of bone metastatic prostate cancer in perlecan-modified 3D cancer-stroma-macrophage triculture models. \u003cem\u003ePLoS One\u003c/em\u003e \u003cstrong\u003e15\u003c/strong\u003e, 1\u0026ndash;25 (2020).\u003c/li\u003e\n\u003cli\u003eGill, R. M., Mehra, V., Milford, E. \u0026amp; Dhoot, G. K. Short SULF1/SULF2 splice variants predominate in mammary tumours with a potential to facilitate receptor tyrosine kinase-mediated cell signalling. \u003cem\u003eHistochem. Cell Biol.\u003c/em\u003e \u003cstrong\u003e146\u003c/strong\u003e, 431\u0026ndash;444 (2016).\u003c/li\u003e\n\u003cli\u003eTucker, R. P. \u003cem\u003eet al.\u003c/em\u003e Phylogenetic analysis of the tenascin gene family: Evidence of origin early in the chordate lineage. \u003cem\u003eBMC Evol. Biol.\u003c/em\u003e \u003cstrong\u003e6\u003c/strong\u003e, 1\u0026ndash;17 (2006).\u003c/li\u003e\n\u003cli\u003eOkuda-Ashitaka, E. \u0026amp; Matsumoto, K. I. Tenascin-X as a causal gene for classical-like Ehlers-Danlos syndrome. \u003cem\u003eFront. Genet.\u003c/em\u003e \u003cstrong\u003e14\u003c/strong\u003e, 1\u0026ndash;7 (2023).\u003c/li\u003e\n\u003cli\u003eValcourt, U., Alcaraz, L. B., Exposito, J. Y., Lethias, C. \u0026amp; Bartholin, L. Tenascin-X: Beyond the architectural function. \u003cem\u003eCell Adhes. Migr.\u003c/em\u003e \u003cstrong\u003e9\u003c/strong\u003e, 154\u0026ndash;165 (2015).\u003c/li\u003e\n\u003cli\u003eMatsumoto, K. I. \u0026amp; Aoki, H. The Roles of Tenascins in Cardiovascular, Inflammatory, and Heritable Connective Tissue Diseases. \u003cem\u003eFront. Immunol.\u003c/em\u003e \u003cstrong\u003e11\u003c/strong\u003e, 1\u0026ndash;10 (2020).\u003c/li\u003e\n\u003cli\u003eMinamitani, T., Ariga, H. \u0026amp; Matsumoto, K. I. Adhesive defect in extracellular matrix tenascin-X-null fibroblasts: A possible mechanism of tumor invasion. \u003cem\u003eBiol. Pharm. Bull.\u003c/em\u003e \u003cstrong\u003e25\u003c/strong\u003e, 1472\u0026ndash;1475 (2002).\u003c/li\u003e\n\u003cli\u003eLiot, S. \u003cem\u003eet al.\u003c/em\u003e Loss of Tenascin-X expression during tumor progression: A new pan-cancer marker. \u003cem\u003eMatrix Biol. Plus\u003c/em\u003e \u003cstrong\u003e6\u003c/strong\u003e\u0026ndash;\u003cstrong\u003e7\u003c/strong\u003e, 6\u0026ndash;7 (2020).\u003c/li\u003e\n\u003cli\u003eArcher, M. \u003cem\u003eet al.\u003c/em\u003e Immune Regulation of Mammary Fibroblasts and the Impact of Mammographic Density. \u003cem\u003eJ. Clin. Med.\u003c/em\u003e \u003cstrong\u003e11\u003c/strong\u003e, (2022).\u003c/li\u003e\n\u003cli\u003eCaligiuri, G. \u0026amp; Tuveson, D. A. Activated fibroblasts in cancer: Perspectives and challenges. \u003cem\u003eCancer Cell\u003c/em\u003e \u003cstrong\u003e41\u003c/strong\u003e, 434\u0026ndash;449 (2023).\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"cancer-associated fibroblasts, breast cancer, machine learning, immunohistochemistry, diagnostic model","lastPublishedDoi":"10.21203/rs.3.rs-6479762/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-6479762/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eBreast cancer remains a major global health challenge with high incidence and mortality rates among women. Recent studies have highlighted the critical role of the tumor microenvironment, particularly cancer-associated fibroblasts (CAFs), in tumor progression. However, current understanding of CAFs heterogeneity and its implications for breast cancer diagnosis and treatment remains limited. This study aimed to identify novel CAFs marker genes and develop a diagnostic model to improve breast cancer diagnosis and therapeutic strategies. We employed various machine learning algorithms to identify feature genes associated with cancer-associated fibroblasts (CAFs). Based on these genes, we constructed a high-precision diagnostic model for breast cancer. Furthermore, through single-cell analysis, we delved into the heterogeneity of CAFs and predicted the sensitivity of different CAF subsets to specific drugs. To validate the expression of these characteristic genes, immunohistochemical experiments were also conducted. This study identified FXYD1, SULF1, and TNXB as novel biomarkers for cancer-associated fibroblasts (CAFs) in breast cancer based on machine learning. Among these evaluated algorithms, the Random Forest algorithm distinctly stood out as the best due to its robust classification accuracy and stability. Single-cell analysis provided insights into the heterogeneity of CAFs between Luminal and non-Luminal breast cancer, thereby enhancing our understanding of the tumor microenvironment. Drug sensitivity predictions indicated that distinct CAF subsets responded differently to specific drugs, laying a solid foundation for the development of personalized breast cancer treatment strategies. Through immunohistochemistry (IHC), the expression patterns of these three biomarkers were verified: FXYD1 was expressed in myoepithelial and fibroblasts in normal breast tissue but was significantly absent in breast cancer; SULF1 was upregulated in fibroblasts of breast cancer; while the expression of TNXB did not exhibit notable variations between normal and cancerous tissues. These findings not only highlight the crucial roles played by FXYD1, SULF1, and TNXB in the development of breast cancer, but also uncover the heterogeneity CAFs. Consequently, our research provides a fresh perspective and a solid theoretical basis for advancing both early and precise diagnostic methods, as well as tailored therapeutic strategies.\u003c/p\u003e","manuscriptTitle":"Identification and Validation of Novel CAF Markers in Breast Cancer","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-05-29 12:09:58","doi":"10.21203/rs.3.rs-6479762/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2025-09-23T11:45:35+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-09-09T19:00:07+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-08-26T22:39:27+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"210848839235949706341864703842717043716","date":"2025-08-23T20:51:05+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"72890735488106803820461217594500990170","date":"2025-08-20T17:22:51+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"29544765145634006353138411964784309","date":"2025-05-28T11:50:12+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2025-05-28T05:30:23+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2025-05-28T05:27:42+00:00","index":"","fulltext":""},{"type":"editorInvited","content":"","date":"2025-04-30T05:33:55+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2025-04-28T07:18:28+00:00","index":"","fulltext":""},{"type":"submitted","content":"Scientific Reports","date":"2025-04-18T14:29:03+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"e1335244-b2c5-43d2-bd75-e4b99b9b0721","owner":[],"postedDate":"May 29th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[{"id":49174528,"name":"Biological sciences/Cancer"},{"id":49174529,"name":"Biological sciences/Cancer/Breast cancer"},{"id":49174530,"name":"Biological sciences/Cancer/Cancer microenvironment"}],"tags":[],"updatedAt":"2026-01-19T16:48:29+00:00","versionOfRecord":{"articleIdentity":"rs-6479762","link":"https://doi.org/10.1038/s41598-025-34923-2","journal":{"identity":"scientific-reports","isVorOnly":false,"title":"Scientific Reports"},"publishedOn":"2026-01-14 16:29:39","publishedOnDateReadable":"January 14th, 2026"},"versionCreatedAt":"2025-05-29 12:09:58","video":"","vorDoi":"10.1038/s41598-025-34923-2","vorDoiUrl":"https://doi.org/10.1038/s41598-025-34923-2","workflowStages":[]},"version":"v1","identity":"rs-6479762","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-6479762","identity":"rs-6479762","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00