Evaluating Temporal Dynamics in Breast Cancer Survival Predictions with Machine Learning and Cox Regression Analysis

doi:10.21203/rs.3.rs-5515692/v1

Evaluating Temporal Dynamics in Breast Cancer Survival Predictions with Machine Learning and Cox Regression Analysis

2025 · doi:10.21203/rs.3.rs-5515692/v1

preprint OA: closed CC-BY-4.0

📄 Open PDF Full text JSON View at publisher

Full text 138,603 characters · extracted from preprint-html · click to expand

Evaluating Temporal Dynamics in Breast Cancer Survival Predictions with Machine Learning and Cox Regression Analysis | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Evaluating Temporal Dynamics in Breast Cancer Survival Predictions with Machine Learning and Cox Regression Analysis Braden Woodhouse, Annette Lasham, Nicholas Knowlton This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-5515692/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Accurate prediction of breast cancer-specific survival is crucial for guiding personalized treatment decisions and improving patient outcomes. This study evaluated the performance of machine learning approaches (Random Survival Forest, RSF and Generalized Boosted Model, GBM) alongside traditional Cox proportional hazards models for predicting survival in 21,574 women diagnosed with stage I-IV breast cancer in New Zealand between 2000-2019. Performance comparisons using time-dependent Area Under the Curve and Brier score metrics demonstrated that RSF consistently outperformed both Cox regression variants and GBM across all time points. Distinct differences emerged in survival predictions between modelling approaches: RSF captured a sharper initial decline in survival for most tumour receptor subtypes and better differentiated the favourable prognosis of ER+/HER2- tumours compared to other subtypes. Notably, variable importance analysis revealed fundamentally different prognostic emphases between modelling approaches—disease stage dominated Cox model predictions while tumour receptor subtype most strongly influenced RSF predictions. These findings highlight how machine learning approaches can capture complex, nonlinear relationships between clinical variables and survival outcomes that may be missed by traditional statistical models. The complementary insights provided by different modelling approaches suggest potential value in their combined use for enhanced risk stratification and more tailored treatment planning in breast cancer management, particularly when accounting for tumour biological characteristics alongside conventional staging factors. Statistical Epidemiology Breast cancer Survival prediction Random Survival Forest Cox proportional hazards Machine learning Tumour receptor subtype Figures Figure 1 Figure 2 Figure 3 Figure 4 Introduction Breast cancer remains a global health priority, with one of the highest incidence and mortality rates among cancers affecting women (Amato 2023, Newman 2023 ). In New Zealand (NZ), breast cancer incidence is particularly high, with one in nine women receiving a diagnosis in their lifetime and significant disparities in outcomes persisting, especially among Māori and Pacific women (Te Aho o Te Kahu 2020, Kim 2025). While the average five-year survival rates in NZ have improved to 91%, there remains a pressing need to deepen our understanding of the factors influencing long-term survival and recurrence (Gautier 2022). Addressing this knowledge gap could drive more precise risk models and guide personalized follow-up and intervention strategies. Traditionally, the Cox proportional hazards regression model has been instrumental in survival analysis, including in breast cancer research, by estimating hazard ratios (HRs) and highlighting risk factors (Lawrenson 2016, Elwood 2018). However, the Cox proportional hazards model relies on predefined predictor variables, which may constrain the discovery of complex, time-varying relationships. Machine learning (ML) approaches, particularly Random Survival Forests (RSF), offer an adaptable, data-driven alternative, by dynamically identifying important features and patterns within high-dimensional data. By allowing the data itself to determine critical prognostic factors, ML can identify interactions and nonlinear associations that may otherwise remain hidden (Cygu 2023, Mihaylov 2019). Unlike traditional parametric models, ML approaches like RSF can flexibly handle nonlinear relationships, high-dimensional data, and variable interactions, making fewer assumptions about the data structure. For example, RSF techniques do not rely on a fixed baseline hazard, allowing for variable effects over time without requiring the proportional hazards assumption inherent to the Cox model (Xu 2022, Alafchi 2019). This flexibility is especially valuable when exploring nuanced relationships, such as those between receptor subtype and survival, that may not conform to proportional hazards. In addition, ML methods like RSF incorporate automatic variable selection, reducing the manual effort required to select predictors. They often handle missing data more effectively than traditional methods, often employing ensemble techniques that enable predictions even when some data points are incomplete. In this study, we chose not to employ extensions of the Cox proportional hazards model, such as those incorporating time-varying effects, due to the stringent assumptions required and their potential to limit applicability in long-term survival analysis (Bellera et al., 2010). Instead, we utilize RSF as an exemplar machine learning approach to explore survival and recurrence patterns among breast cancer patients in NZ. RSF serves as a complementary method, offering insights into complex prognostic factors while highlighting patterns not readily captured by traditional approaches. By leveraging both traditional statistical models and modern machine learning methods, this work provides a more comprehensive understanding of cancer outcomes and establishes a foundation for future advancements in personalized breast cancer care. Results Clinicopathological characteristics Data for this study was sourced from Te Rēhita Mate Ūtaetae (Breast Cancer Foundation NZ National Register, subsequently referred to as Te Rēhita). Of 26,463 women diagnosed with stage 1–4 breast cancers in NZ between 2000 and 2019, 4,889 had missing information so were omitted (Fig. 1 ). This resulted in a study cohort of 21,574 women. The majority of women were 45–69 years when their invasive breast cancer was diagnosed, were European, had ER+/HER2- tumour receptor subtype, had grade 2 tumours and early stage (1–2) disease (Table 1 ). Slightly more women had breast cancer detected after presenting with symptoms (approx. 55%) than through breast screening. Approximately half of these women underwent breast-conserving surgery (BCS), while the other half had a mastectomy. The majority (65%) also received adjuvant radiotherapy (Table 1 ). When analysed by tumour subtype, ER+/HER2- tumours were predominantly low-grade, whereas triple-negative tumours were predominantly high-grade. A higher proportion of women with ER+/HER2- breast tumours were over 44 years old and had their cancer diagnosed through screening (Table 1 ). These distinct differences in clinical and pathological features across receptor subtypes, which are already integral to clinical decision-making and treatment planning, highlight the need for accurate, subtype-specific survival prediction models to further refine personalised clinical decision-making and treatment. Table 1 Clinicopathological characteristics of the study cohort. A total of n = 21,574 women with invasive breast cancer and diagnosed between 2000–2019 were included in this study. Columns represent specific tumour receptor subtypes, which influence clinical decision making, and cohort characteristics are presented in rows. Receptor Subtype ER+/HER2- ER+/HER2+ ER-/HER2+ Triple Negative Overall (n = 16,177) (n = 2,182) (n = 1,087) (n = 2,128) (n = 21,574) Age at Diagnosis (years) Mean (SD) 58.5 (12.3) 54.5 (13.1) 54.9 (13.0) 57.2 (14.1) 57.8 (12.7) Median [Min, Max] 58.0 [20.0, 97.0] 53.0 [21.0, 94.0] 54.0 [22.0, 96.0] 57.0 [20.0, 98.0] 57.0 [20.0, 98.0] Age at Diagnosis group (years) 45–69 11,405 (70.5%) 1,396 (64.0%) 702 (64.6%) 1,276 (60.0%) 14,779 (68.5%) ≤ 44 1,863 (11.5%) 507 (23.2%) 237 (21.8%) 433 (20.3%) 3,040 (14.1%) ≥ 70 2,909 (18.0%) 279 (12.8%) 148 (13.6%) 419 (19.7%) 3,755 (17.4%) Ethnicity European 12,026 (74.3%) 1,521 (69.7%) 714 (65.7%) 1,656 (77.8%) 15,917 (73.8%) Māori 1,582 (9.8%) 237 (10.9%) 117 (10.8%) 158 (7.4%) 2,094 (9.7%) Pacific Peoples 908 (5.6%) 177 (8.1%) 124 (11.4%) 82 (3.9%) 1,291 (6.0%) Asian 1,390 (8.6%) 204 (9.3%) 104 (9.6%) 185 (8.7%) 1,883 (8.7%) Other/Unknown 271 (1.7%) 43 (2.0%) 28 (2.6%) 47 (2.2%) 389 (1.8%) Detection Method Screened 8,016 (49.6%) 795 (36.4%) 322 (29.6%) 594 (27.9%) 9,727 (45.1%) Not Screened 8,161 (50.4%) 1,387 (63.6%) 765 (70.4%) 1,534 (72.1%) 11,847 (54.9%) Tumour Grade 1 4,657 (28.8%) 106 (4.9%) 12 (1.1%) 35 (1.6%) 4,810 (22.3%) 2 8,621 (53.3%) 980 (44.9%) 233 (21.4%) 383 (18.0%) 10,217 (47.4%) 3 2,899 (17.9%) 1,096 (50.2%) 842 (77.5%) 1,710 (80.4%) 6,547 (30.3%) Disease Stage (AJCC 7) 1 9,868 (61.0%) 1,076 (49.3%) 466 (42.9%) 960 (45.1%) 12,370 (57.3%) 2 5,115 (31.6%) 827 (37.9%) 450 (41.4%) 944 (44.4%) 7,336 (34.0%) 3 877 (5.4%) 165 (7.6%) 89 (8.2%) 127 (6.0%) 1,258 (5.8%) 4 317 (2.0%) 114 (5.2%) 82 (7.5%) 97 (4.6%) 610 (2.8%) Adjuvant Radiotherapy No 5,565 (34.4%) 763 (35.0%) 427 (39.3%) 749 (35.2%) 7,504 (34.8%) Yes 10,612 (65.6%) 1,419 (65.0%) 660 (60.7%) 1,379 (64.8%) 14,070 (65.2%) Surgery Type BCS 8,772 (54.2%) 853 (39.1%) 302 (27.8%) 980 (46.1%) 10,907 (50.6%) Mastectomy 7,405 (45.8%) 1,329 (60.9%) 785 (72.2%) 1,148 (53.9%) 10,667 (49.4%) Diagnosis Year Cluster 2000–2004 853 (5.3%) 187 (8.6%) 139 (12.8%) 261 (12.3%) 1,440 (6.7%) 2005–2008 2,261 (14.0%) 298 (13.7%) 231 (21.3%) 414 (19.5%) 3,204 (14.9%) 2009–2019 13,063 (80.8%) 1,697 (77.8%) 717 (66.0%) 1,453 (68.3%) 16,930 (78.5%) Machine learning model performance To address this need, accurate survival models can increase our understanding of factors that affect breast cancer outcomes, inform treatment decision-making, and enable the stratification of patients into different risk groups, which is essential for personalised medicine. In order to generate an accurate and useful model, both machine learning and traditional models were explored for each receptor subtype. The survival prediction curves generated by the RSF model were contrasted with those generated by traditional adjusted CPH model, revealing notable differences in the shapes of the survival curves (Fig. 2 ). The RSF model predicted a sharper initial decline in survival across all except ER+/HER2- tumour receptor subtype, compared to the more gradual decline shown by the adjusted Kaplan-Meier (KM) curves. The RSF model predicted comparatively better survival for women with ER+/HER2- breast cancers relative to other receptor subtypes, whereas the curves for the adjusted KM are more condensed, potentially obscuring these distinctions. Next, the time-dependent survival prediction performance was evaluated via AUC and Brier score for five different models (see methods). A high AUC and low Brier score is indicative of a better performing model, or closer survival prediction to the true values. This analysis showed that RSF was the optimal model for survival prediction across all time points (Fig. 3 ). The other models analysed had similar survival prediction performance. Since the different Cox proportional hazards models studied had roughly equivalent prediction accuracy, for simplicity, this study proceeded with the simplest Cox model as the main comparison model representing a traditional tool. The Cox model with interactions and regularised Cox model were not explored further in this study. The traditional model CPH and RSF were then compared to evaluate the risk factors associated with breast cancer-specific survival (BCSS). To further understand the factors driving these predictions, the influential predictors identified through both models were analysed. Influential predictors identified through traditional Cox regression and RSF models The RSF model (which demonstrated the most accurate survival predictions) and the Cox proportional hazards model were analysed to uncover their underlying mechanisms and identify the key variables influencing their predictions. In the Cox proportional hazards model, worse BCSS was significantly associated with diagnosis from age 70, diagnosis after presenting with symptoms (i.e. not through breast screening), having triple negative receptor subtype, higher tumour grade and disease stage, and requiring a mastectomy instead of BCS (Table 2 ). In contrast, Asian women, and those with the ER+/HER2 + receptor subtype were associated with improved BCSS compared to other ethnicities and tumour subtypes, respectively (Table 2 ). Table 2 Cox proportional hazards model. Hazard ratios for breast cancer-specific survival are presented for each covariable, with 95% confidence intervals (CI) in parentheses. Hazard Ratio (95% CI) P value Age at diagnosis (years) 45–69 Reference ≤ 44 0.96 (0.86–1.07) 0.42 ≥ 70 1.41 (1.26–1.57) 0.00 Ethnicity European Reference Māori 1.13 (0.99–1.29) 0.07 Pacific Peoples 0.98 (0.83–1.14) 0.76 Asian 0.6 (0.5–0.71) 0.00 Other/Unknown 1.38 (1.08–1.75) 0.01 Detection method Screened Reference Not Screened 1.52 (1.35–1.71) 0.00 Receptor Subtype ER+/HER2- Reference ER+/HER2+ 0.87 (0.76–0.99) 0.04 ER-/HER2+ 1.11 (0.96–1.3) 0.16 Triple Negative 1.48 (1.31–1.66) 0.00 Tumour Grade 1 Reference 2 2.48 (2.05–3.01) 0.00 3 4 (3.27–4.88) 0.00 Disease stage 1 Reference 2 2.26 (2.02–2.52) 0.00 3 4.17 (3.57–4.87) 0.00 4 11.33 (9.77–13.15) 0.00 Radiotherapy No Reference Yes 0.97 (0.88–1.06) 0.47 Most invasive surgery BCS Reference Mastectomy 1.45 (1.3–1.62) 0.00 Diagnosis year cluster 2000–2004 Reference 2005–2008 0.72 (0.63–0.82) 0.00 2009–2019 0.47 (0.42–0.53) 0.00 While the Cox proportional hazards model generates hazards ratios which can be used to assess the impact of variables of survival, RSF model does not, although it does provide a measure “variable importance”. RSF variable importance is different from statistical model coefficients, however, it provides a metric for comparison and an alternative tool when assessing which variables influence survival and survival prediction. This analysis showed that stage, grade, receptor subtype and surgery type were the four most influential risk factors in the RSF model, when considering permutation variable importance (Fig. 4 A). These results were consistent with the statistically significant coefficients observed in the Cox proportional hazards model. Ethnicity was the next most important variable, which also demonstrated statistical significance in the Cox proportional hazards model. Having identified the key predictors in each model, the next step was to explore how these variables contribute to the models' overall predictive performance. To achieve this, Brier score loss was compared after variable permutation. This analysis revealed that disease stage was the most important variable for the performance of the Cox proportional hazards model, whereas tumour receptor subtype was the most important variable for the performance of the RSF survival prediction model (Fig. 4 B and C). Discussion This study demonstrates the utility of machine learning methods, particularly Random Survival Forests (RSF), in analysing BCSS alongside traditional statistical approaches such as Cox proportional hazards model. Our findings indicate that the RSF model provided the most accurate survival predictions across all time points, outperforming traditional models in this dataset. While RSF does not yield the familiar hazard ratios (HRs) associated with Cox proportional hazards model, it offers alternative metrics such as variable importance and Brier score loss after permutations to identify influential predictors of survival. Analysis of the importance of the individual variables in the RSF model largely aligned with the significant HRs identified by the Cox model. Both models highlighted disease stage, tumour grade, receptor subtype, and surgery type as key predictors of BCSS. Notably, in the RSF model, receptor subtype emerged as the most influential predictor, contrasting with the Cox model where disease stage held the greatest influence. This difference highlights the potential of RSF to capture the complex, nonlinear relationships between receptor subtype and survival outcomes, which may be less apparent in traditional Cox regression analysis. This enhanced ability to detect intricate patterns underscores the potential of machine learning methods to uncover nuanced prognostic factors. This finding has important implications for personalised medicine in breast cancer. While disease stage remains a critical factor in treatment decisions, the RSF model's emphasis on receptor subtype provides a complementary perspective. By accurately predicting survival based on subtype-specific factors, this model could lead to more tailored treatment decisions, particularly within specific disease stages. For instance, the model may identify specific subtypes with particularly favourable or unfavourable prognoses within a given stage, allowing for more informed treatment selection or closer surveillance. Overall, the RSF model's emphasis on receptor subtype, in contrast to the Cox model's focus on disease stage, highlights its potential to enhance prognostication and treatment decision-making in breast cancer, ultimately leading to improved patient outcomes. An interesting observation from this study was that the performance of survival prediction for all models decreased as time from diagnosis increased, indicated by declining AUC and increasing Brier scores. Importantly, diagnosis year cluster was included in these models, which adjusts for some of the expected change over time, such as changes in treatment method. This trend may reflect model underfitting due to limited long-term data (11% of women in this cohort had follow-up times of 15 years or greater) or fewer events (deaths) occurring at extended follow-up times. However, performance metrics appeared to stabilize or even improve approaching the 20-year follow-up mark. This could suggest that the models are better at predicting long-term survivors, possibly due to distinct characteristics among women who survive beyond 20 years after diagnosis (1% of women in this cohort). These survivors may have unique clinical or biological features that are more readily captured by the models at extended time points. This pattern may also be influenced by survivorship bias, meaning the individuals who remain in the cohort at extended time points could represent a selective group with inherently better prognostic factors, potentially inflating model performance at those longer time points. The strengths and limitations of both traditional and machine learning methods are evident in this study. Traditional methods like Cox proportional hazards model rely on strong assumptions about the data, such as the proportional hazards assumption, which can be violated in long-term survival data (Kurt Omurlu et al., 2009; Wang & Li, 2017). In contrast, machine learning models like RSF can model complex nonlinear relationships and interactions between covariates without stringent assumptions, providing flexibility in handling diverse data structures. This flexibility is demonstrated in our findings, where RSF consistently outperformed traditional models in prediction accuracy across all time points (Fig. 3 ). However, increased model complexity can reduce interpretability. In this study we began to explore the underlying mechanism of how the models generated their predictions (Fig. 4 ). Future studies could include techniques like Local Interpretable Model-Agnostic Explanations and Shapley Additive Explanation values that can explore the influence of variables in complex models even further (Alabi et al., 2023; Lundberg & Lee, 2017; Moncada-Torres et al., 2021). Machine learning models also handle missing data more effectively and can produce predictions even with incomplete data through ensemble methods and the use of weak learners (Fanizzi et al., 2023; Steele et al., 2018). Women with missing data were excluded from this study (with the exception of ethnicity) to ensure comparability with the Cox proportional hazards model, which requires complete datasets. Future studies could consider retaining these records to prevent information loss and explore statistical methods such as multiple imputation or other machine learning approaches for missing data. Our findings align with previous studies comparing traditional and machine learning methods for survival analysis. Spooner et al. (2020) found similar performance across various machine learning algorithms and traditional Cox proportional hazards model, depending on whether model assumptions are met and the complexity of covariate relationships. In cases where the assumptions are violated or relationships are more complex, machine learning models may outperform traditional methods. Even when traditional model assumptions hold, the ability of machine learning models to detect more complex relationships provides an opportunity to enhance traditional models by identifying additional variables or interactions to include. Our study found better survival prediction performance for RSF compared to Cox proportional hazards, which aligns with the findings from Jia et al. ( 2025 ), although their study cohort included only inflammatory breast cancer patients. Studies exploring breast cancer survival or recurrence as a binary analysis (ignoring censoring) are common (Hamedi et al., 2024 ; Kamble et al., 2025 ; Noman et al., 2025 ). Noman et al. ( 2025 ) nicely utilised a Cox proportional hazards model alongside other machine learning models, however, this was to predict recurrence, whereas our study analysed BCSS, so we are unable to compare results. It is important to note that there is no universally 'best' model across all datasets (Manikandan et al., 2023). The choice of model should be guided by the specific context, data characteristics, and research objectives. While machine learning offers advantages in modelling complex relationships, traditional models remain valuable, especially when their assumptions are appropriate for the data. The utility and interpretation of survival prediction models should be approached cautiously, with rigorous validation and calibration studies, and in conjunction with clinical expertise. Individual variability among patients necessitates careful consideration when applying these models for personalized prognostication. However, both traditional statistical models and machine learning methods are valuable tools for exploring survival patterns and identifying influential predictors in patient populations. Conclusion This study underscores the potential of machine learning methods like RSF in enhancing survival analysis for breast cancer patients. The RSF model's ability to capture complex relationships without strict assumptions makes it a powerful complement to traditional methods. Key predictors such as tumour receptor subtype and disease stage were identified as influential for BCSS, highlighting the need for models that can accurately capture these complexities. Our findings advocate for the integration of machine learning approaches in survival analysis to improve risk stratification and support the development of personalised care strategies in breast cancer management. Methods Ethics This study was conducted according to the guidelines of the Declaration of Helsinki and approved by the Auckland Health Research Ethics Committee (AH2800). This study used data from Te Rēhita Mate Ūtaetae- the Breast Cancer Foundation NZ National Register. Te Rēhita is an opt-out register, which operates under the NZ Health and Disability Ethics Committee approval (16/NTA/139/AM03), privacy, and health legislation and Treaty of Waitangi principles. Patients receive an information sheet explaining that their de-identified data may be used for research purposes, subject to approval by Te Rēhita governance group. Those who choose not to opt-out implicitly consent to their data being included in this study. Study cohort This study analysed data from 26,463 women diagnosed with stage 1–4 breast cancers in NZ. A 2000-to-2019-time span was selected to ensure a follow-up period of at least 3 years, and to mitigate bias introduced by the inclusion of data from new regions in 2020. Patient follow-up within Te Rēhita was confirmed as up to date as of the data extraction on March 2, 2023. BCSS was calculated from the date of diagnosis to the date of death from breast cancer, if it occurred. Otherwise, survival time was censored at the date of death from other causes or at the latest follow-up date within the study period. Women in the study cohort had a median follow-up time of 8 years. Predictors This analysis incorporated several key patient factors as predictors, including age group at diagnosis, detection method, ethnicity, surgical intervention, and radiotherapy. Age groups were categorized as < 45, 45–69, and ≥ 70 years to correspond with the eligibility criteria for New Zealand’s national breast screening program, BreastScreen Aotearoa (BSA) (Breast Cancer Aotearoa Coalition, 2020) for women aged 45–69 years. Detection method was recorded as invasive breast cancer detected by screening mammography- “screened”, or detection after presentation with symptoms- “not screened”. Te Rēhita collects up to three ethnicities per person, and sources these from New Zealand’s Ministry of Health through an interactive link with each person’s unique health identifier (Gautier et al., 2022 ). Ethnicity was recorded as prioritised ethnicity, level 1 per HISO 10001:2017 ethnicity data protocols (Wellington: Ministry of Health, 2017), with Middle Eastern, Latin American, African (MELAA), “Other,” and missing ethnicity information combined into an “Other/Unknown” category. The most invasive surgery performed on each patient was recorded, ensuring that cases where a mastectomy followed an initial BCS were accurately captured. Disease stage was assessed using the AJCC 7 TNM staging system to ensure consistency in staging across the entire cohort. Year of diagnosis was grouped into intervals (2000–2004, 2005–2008, and 2009–2019) based on the results of k-medians survival clustering from our previous study ( reference clustering paper ). Models In this analysis, we evaluated several models for survival prediction, including standard Cox regression, regularized Cox regression, Cox regression with interactions, Random Survival Forests (RSF), and Generalized Boosting Models (GBM). The Cox proportional hazards model(Cox, 1972 ) is a semi-parametric model (Kalbfleisch & Schaubel, 2023 ), containing components of both parametric (known distribution of regression coefficients $\:{{\beta\:}}_{\text{i}}$ ) and non-parametric (unknown baseline hazard function $\:{\text{h}}_{0}$ ). The Cox model takes the formula $$\:\text{h}\left(\text{t}\right)={\text{h}}_{0}\left(\text{t}\right)\bullet\:\text{e}\text{x}\text{p}({{\beta\:}}_{1}{\text{x}}_{1}+{{\beta\:}}_{1}{\text{x}}_{1}+...+{{\beta\:}}_{\text{p}}{\text{x}}_{\text{p}})$$ where $\:\text{h}\left(\text{t}\right)$ is the hazard function at time $\:\text{t}$ and $\:{\text{h}}_{0}$ is the baseline hazard with $\:\:{\text{x}}_{\text{i}}=0$ for all $\:\text{p}$ predictors. The hazard function is the conditional probability of a person experiencing an event at some time point, for example breast cancer-specific death, given that the person has been event-free up until that time. Hazard ratios can be calculated by taking the exponential of the coefficients $\:\text{e}\text{x}\text{p}\left({{\beta\:}}_{\text{i}}\right)$ , and are measures of an instantaneous relative risk (Sashegyi & Ferry, 2017 ). Note that the log of the hazard rate is a linear combination of covariates. The other assumption for the Cox model is that the effect of covariates on the hazard function is proportional over time, since the exponent $\:\text{e}\text{x}\text{p}\left({{\beta\:}}_{\text{i}}{\text{x}}_{\text{i}}\right)$ does not include time $\:\text{t}$ . That is, the difference in hazard for one group at one time point, maintains the same proportion difference in hazard to other groups at any other time point (Bewick et al., 2004 ; Kuitunen et al., 2021 ). In non-mathematical terms, the Cox model helps to quantify how each variable impacts the person ‘s risk of experiencing the event. Machine learning provides some additional useful tools to examine survival that do not share the same rigorous statistical assumptions as those just described for Cox regression. Random Survival Forests (RSF)(Ishwaran et al., 2008 ) are extensions of random forests (Breiman, 2001 ), ensemble tree methods that combine and average the survival predictions from many decision trees (Breiman et al., 1984 ). RSFs reduce estimation variance by using independent bootstrap sampling before constructing each tree, with each split chosen from a random subset of the features rather than all features. In the context of survival analysis, RSF extends this approach by using a splitting criterion optimized for survival differences, such as the log-rank test, to construct each tree (Ishwaran et al., 2008 ). Each node within an RSF tree divides the data to maximize survival contrast between groups, enabling the detection of complex, nonlinear relationships among covariates. Unlike Cox regression, RSF does not produce hazard ratios but instead provides aggregate survival predictions derived from many trees. To determine variable importance in RSF, we used an out-of-bag (OOB) approach, where the OOB data — data not included in the bootstrap sample — were permuted for each variable. Comparing prediction error before and after permutation indicated each variable’s importance, with larger discrepancies in error signifying greater influence on survival. Variables appearing higher in the tree structure influence more downstream nodes, and therefore, permutations of high-importance variables typically result in larger prediction errors. Generalised Boosted Models (GBM) works by consecutively building decision trees to predict the residuals of the previous tree (Ridgeway, 2020), similar to Gradient Boosting (Chen & Guestrin, 2016). Model implementation Cox regression was implemented in R using survival::coxph(). Regularized Cox regression used riskRegression::GLMnet(), with alpha values tuned over a grid from 0 to 1 in 0.1 increments, resulting in an optimal alpha of 0, which effectively applied ridge regularization. Lambda was selected based on the minimum prediction error using the default regularization path. By incorporating regularization, we anticipated improved model generalizability and enhanced survival prediction accuracy on unseen test data, as regularization mitigates overfitting. Additionally, a Cox model with interaction terms was developed to account for the interaction between disease stage and ethnicity, which was found to be significant during exploratory analysis. RSF was tuned on all covariates from the multivariable model, along with the diagnosis year, using randomForestSRC::tune.rfsrc. The optimal configuration was achieved with nodesize = 15 and mtry = 6. Feature importance scores and their confidence intervals were computed using delete-d jackknife procedures (randomForestSRC::subsample), while survival curves from RSF predictions were visualized through ggRandomForests::gg_rfsrc(). The GBM model was fit in this study using gbm::gbm(), and employed 10-fold cross-validation over a parameter grid to minimize the AUC for 5-year BCSS. Optimal GBM parameters were identified as n.tree = 300, interaction.depth = 3, and shrinkage = 0.1, with covariate influence on survival assessed through relative variable importance scores. To evaluate the survival prediction performance across models, we used the Area Under the Curve (AUC) and Brier score at 5-year follow-up and time-dependent intervals. The riskRegression::Score() function, which employs inverse probability of censoring weights (IPCW), was utilized for Brier score estimation. AUC, which quantifies a model’s discriminatory power, reflects the probability that a randomly selected positive instance (breast-cancer-specific death) is ranked higher than a negative one (survival) (Fawcett, 2006). Time-dependent AUC was calculated using the Blanche et al. method, a modification of the Uno method (Uno et al., 2007), wherein each AUC point reflects the probability that a patient who died from breast cancer had a higher predicted risk than a patient who survived (Wu & Li, 2018). Confidence intervals for these metrics were generated through 10-fold cross-validation. To enhance interpretability of the Cox regression and RSF models, we employed model-agnostic explanations using the survex package (Spytek et al., 2023), which allowed for time-dependent permutation-based feature importance. This approach provided insights into the dynamic influence of covariates over time and facilitated a deeper understanding of variable impacts on survival prediction in a model-agnostic context. Declarations Competing interests The author(s) declare no competing interests. Author Contributions BW conducted the initial data analysis, contributed to study design, and participated in the interpretation of results. NK conceptualized and designed the study, interpreted the findings, and contributed to manuscript drafting. AL assisted in the interpretation of results and manuscript preparation. All authors reviewed and approved the final manuscript. Acknowledgements We gratefully acknowledge Te Rēhita Mate Ūtaetae - Breast Cancer Foundation National Register for providing the data used in this study and Te Rēhita Clinical Advisory Group for reviewing this manuscript. Our thanks also extend to all the individuals who consented to participate in Te Rēhita, contributing invaluable information towards breast cancer research and management. We would also like to thank our funders, the New Zealand Breast Cancer Foundation via the Helena McAlpine Young Women’s Breast Cancer Study and the Breast Cancer Cure via the Not A One-Size-Fits-All Service grant. Data availability The dataset presented in this article is not readily available because this requires approval by the custodians of the data. Requests to access the dataset should be directed to Te Rēhita Mate Ūtaetae. References Amato, O., Guarneri, V., & Girardi, F. (2023). Epidemiology trends and progress in breast cancer survival: earlier diagnosis, new therapeutics. Current Opinion in Oncology , 35 (6), 612. https://doi.org/10.1097/CCO.0000000000000991 Arnold, M., Morgan, E., Rumgay, H., Mafra, A., Singh, D., Laversanne, M., Vignat, J., Gralow, J. R., Cardoso, F., Siesling, S., & Soerjomataram, I. (2022). Current and future burden of breast cancer: Global statistics for 2020 and 2040. Breast (Edinburgh, Scotland) , 66 , 15–23. https://doi.org/10.1016/J.BREAST.2022.08.010 Aye, P. S., Win, S. S., Tin Tin, S., & Elwood, J. M. (2023). Comparison of Cancer Mortality and Incidence Between New Zealand and Australia and Reflection on Differences in Cancer Care: An Ecological Cross-Sectional Study of 2014-2018. Cancer Control : Journal of the Moffitt Cancer Center , 30 . https://doi.org/10.1177/10732748231152330 Bewick, V., Cheek, L., & Ball, J. (2004). Statistics review 12: Survival analysis. Critical Care , 8 (5), 389. https://doi.org/10.1186/CC2955 Breiman, L. (2001). Random forests. Machine Learning , 45 (1), 5–32. https://doi.org/10.1023/A:1010933404324/METRICS Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees. In Classification and Regression Trees . CRC Press. https://doi.org/https://doi.org/10.1201/9781315139470 Chih-Lin Chi, W Nick Street, & William H Wolberg. (2007). Application of artificial neural network-based survival analysis on two breast cancer datasets. AMIA Annu Symp Proc . https://pubmed-ncbi-nlm-nih-gov.ezproxy.auckland.ac.nz/18693812/ Cox, D. R. (1972). Regression Models and Life-Tables. Journal of the Royal Statistical Society: Series B (Methodological) , 34 (2), 187–202. https://doi.org/10.1111/J.2517-6161.1972.TB00899.X Cygu, S., Seow, H., Dushoff, J., & Bolker, B. M. (2023). Comparing machine learning approaches to incorporate time-varying covariates in predicting cancer survival time. Scientific Reports 2023 13:1 , 13 (1), 1–10. https://doi.org/10.1038/s41598-023-28393-7 Elwood, J. M., Tawfiq, E., TinTin, S., Marshall, R. J., Phung, T. M., Campbell, I., Harvey, V., & Lawrenson, R. (2018). Development and validation of a new predictive model for breast cancer survival in New Zealand and comparison to the Nottingham prognostic index. BMC Cancer , 18 (1), 897. https://doi.org/10.1186/s12885-018-4791-x Gautier, A., Harvey, V., Kleinsman, S., Knowlton, N., Lasham, A., & Ramsaroop, R. (2022). 30,000 voices: Informing a better future for breast cancer in Aotearoa New Zealand . Breast Cancer Foundation NZ. https://doi.org/10.17608/k6.auckland.19679019 Gupta, S., Tran, T., Luo, W., Phung, D., Kennedy, R. L., Broad, A., Campbell, D., Kipp, D., Singh, M., Khasraw, M., Matheson, L., Ashley, D. M., & Venkatesh, S. (2014). Machine-learning prediction of cancer survival: a retrospective study using electronic administrative records and a cancer registry. BMJ Open , 4 (3), e004007. https://doi.org/10.1136/BMJOPEN-2013-004007 Hamedi, S. Z., Emami, H., Khayamzadeh, M., Rabiei, R., Aria, M., Akrami, M., & Zangouri, V. (2024). Application of machine learning in breast cancer survival prediction using a multimethod approach. Scientific Reports 2024 14:1 , 14 (1), 1–18. https://doi.org/10.1038/s41598-024-81734-y Hao, J., Kim, Y., Mallavarapu, T., Oh, J. H., & Kang, M. (2019). Interpretable deep neural network for cancer survival analysis by integrating genomic and clinical data. BMC Medical Genomics , 12 (Suppl 10). https://doi.org/10.1186/S12920-019-0624-2 Ishwaran, H., Kogalur, U. B., Blackstone, E. H., & Lauer, M. S. (2008). Random survival forests. Https://Doi.Org/10.1214/08-AOAS169 , 2 (3), 841–860. https://doi.org/10.1214/08-AOAS169 Jia, Y., Li, C., Feng, C., Sun, S., Cai, Y., Yao, P., Wei, X., Feng, Z., Liu, Y., Lv, W., Wu, H., Wu, F., Zhang, L., Zhang, S., & Ma, X. (2025). Prognostic prediction for inflammatory breast cancer patients using random survival forest modelling. Translational Oncology , 52 , 102246. https://doi.org/10.1016/J.TRANON.2024.102246 Kalbfleisch, J. D., & Schaubel, D. E. (2023). Fifty Years of the Cox Model. Https://Doi.Org/10.1146/Annurev-Statistics-033021-014043 , 10 , 1–23. https://doi.org/10.1146/ANNUREV-STATISTICS-033021-014043 Kamble, T. S., Wang, H., Myers, N., Littlefield, N., Reid, L., McCarthy, C. S., Lee, Y. J., Liu, H., Pantanowitz, L., Amirian, S., Rashidi, H. H., & Tafti, A. P. (2025). Predicting cancer survival at different stages: Insights from fair and explainable machine learning approaches. International Journal of Medical Informatics , 197 , 105822. https://doi.org/10.1016/J.IJMEDINF.2025.105822 Kaplan, E. L., & Meier, P. (1958). Nonparametric Estimation from Incomplete Observations. Journal of the American Statistical Association , 53 (282), 457–481. https://doi.org/10.1080/01621459.1958.10501452 Kim, J., Harper, A., McCormack, V. et al. Global patterns and trends in breast cancer incidence and mortality across 185 countries. Nat Med (2025). https://doi.org/10.1038/s41591-025-03502-3 Kuitunen, I., Ponkilainen, V. T., Uimonen, M. M., Eskelinen, A., & Reito, A. (2021). Testing the proportional hazards assumption in cox regression and dealing with possible non-proportionality in total joint arthroplasty research: methodological perspectives and review. BMC Musculoskeletal Disorders , 22 (1), 1–7. https://doi.org/10.1186/S12891-021-04379-2/TABLES/2 Lawrenson, R., Lao, C., Elwood, M., Brown, C., Sarfati, D., & Campbell, I. (2016). Urban Rural Differences in Breast Cancer in New Zealand. International Journal of Environmental Research and Public Health , 13 (10). https://doi.org/10.3390/ijerph13101000 Mihaylov, I., Nisheva, M., & Vassilev, D. (2019). Application of Machine Learning Models for Survival Prognosis in Breast Cancer Studies. Information 2019, Vol. 10, Page 93 , 10 (3), 93. https://doi.org/10.3390/INFO10030093 Montazeri, M., Montazeri, M., Montazeri, M., & Beigzadeh, A. (2016). Machine learning models in breast cancer survival prediction. Technology and Health Care : Official Journal of the European Society for Engineering and Medicine , 24 (1), 31–42. https://doi.org/10.3233/THC-151071 Najafi-Vosough, R., Faradmal, J., Tapak, L., Alafchi, B., Najafi-Ghobadi, K., & Mohammadi, T. (2022). Prediction the survival of patients with breast cancer using random survival forests for competing risks. Journal of Preventive Medicine and Hygiene , 63 (2), E298. https://doi.org/10.15167/2421-4248/JPMH2022.63.2.2405 New Zealand. Te Aho o te Kahu. (2020). The state of cancer in New Zealand 2020. Newman, L. (2023). Oncologic anthropology: Global variations in breast cancer risk, biology, and outcome. Journal of Surgical Oncology , 128 (6), 959–966. https://doi.org/10.1002/JSO.27459 Noman, S. M., Fadel, Y. M., Henedak, M. T., Attia, N. A., Essam, M., Elmaasarawii, S., Fouad, F. A., Eltasawi, E. G., & Al-Atabany, W. (2025). Leveraging survival analysis and machine learning for accurate prediction of breast cancer recurrence and metastasis. Scientific Reports 2025 15:1 , 15 (1), 1–16. https://doi.org/10.1038/s41598-025-87622-3 Pölsterl, S., Sarasua, I., Gutiérrez-Becker, B., & Wachinger, C. (2020). A wide and deep neural network for survival analysis from anatomical shape and tabular clinical data. Communications in Computer and Information Science , 1167 CCIS , 453–464. https://doi.org/10.1007/978-3-030-43823-4_37/COVER Sashegyi, A., & Ferry, D. (2017). On the Interpretation of the Hazard Ratio and Communication of Survival Benefit. The Oncologist , 22 (4), 484. https://doi.org/10.1634/THEONCOLOGIST.2016-0198 Wellington: Ministry of Health. (2017). HISO 10001:2017 Ethnicity Data Protocols. In Ethnicity Data Protocols . Ministry of Health. chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/https://www.tewhatuora.govt.nz/assets/Our-health-system/Digital-health/Health-information-standards/HISO-10001-2017-Ethnicity-Data-Protocols.pdf Additional Declarations The authors declare no competing interests. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-5515692","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":423188627,"identity":"79e7daab-f005-457d-a79f-4579ebd7c439","order_by":0,"name":"Braden Woodhouse","email":"","orcid":"https://orcid.org/0000-0001-7162-6199","institution":"University of Auckland","correspondingAuthor":false,"prefix":"","firstName":"Braden","middleName":"","lastName":"Woodhouse","suffix":""},{"id":423188722,"identity":"216c9571-5421-43f0-a08c-41f1634da2b7","order_by":1,"name":"Annette Lasham","email":"","orcid":"https://orcid.org/0000-0002-1084-4261","institution":"University of Auckland","correspondingAuthor":false,"prefix":"","firstName":"Annette","middleName":"","lastName":"Lasham","suffix":""},{"id":423188826,"identity":"d12e672d-4cec-4b02-ba22-f92acd167b57","order_by":2,"name":"Nicholas Knowlton","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA6UlEQVRIiWNgGAWjYDACHhBRAMTsDSRpMTAAMg6QrEUigUgd5jxnDz7mMfgjZy75OvHjjz82iQ3shx8wF/zBrcWyty/ZmMfAwNhydu5mad62tMQGnjQD5hk8uLUYnOcxk84xMEjccDt3gzRjw+HEBoYcBmYeCWK03Dy7+eePP/8TG/jfALUY4NFytgeq5QbvNgketgOJDRIgWxLwaDlzxtj4j4GxscGZ3G3WvG3Jxm0SzwwO4wtxgzM5hg9nVMjJGRw/u/nmjz92sv38yQ8f8+AJMQzg2AYk8NiBBdiTpHoUjIJRMApGBAAAZZ1MxwJiQFkAAAAASUVORK5CYII=","orcid":"https://orcid.org/0000-0002-7022-3352","institution":"Massey University","correspondingAuthor":true,"prefix":"","firstName":"Nicholas","middleName":"","lastName":"Knowlton","suffix":""}],"badges":[],"createdAt":"2024-11-24 21:41:27","currentVersionCode":1,"declarations":{"humanSubjects":true,"vertebrateSubjects":false,"conflictsOfInterestStatement":false,"humanSubjectEthicalGuidelines":true,"humanSubjectConsent":true,"humanSubjectClinicalTrial":false,"humanSubjectCaseReport":false,"vertebrateSubjectEthicalGuidelines":false},"doi":"10.21203/rs.3.rs-5515692/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-5515692/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":77685749,"identity":"149b2646-502a-4020-bcbd-631732a1c68e","added_by":"auto","created_at":"2025-03-04 09:03:56","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":68563,"visible":true,"origin":"","legend":"\u003cp\u003eConsort diagram showing the number of women excluded in each category. This study analysed breast cancer-specific survival (BCSS) for women with invasive breast cancers (stage 1-4) diagnosed in the years 2000-2019 (n=21,574 following exclusions).\u003c/p\u003e","description":"","filename":"image1.png","url":"https://assets-eu.researchsquare.com/files/rs-5515692/v1/11d7e80e1b911bffacb835cd.png"},{"id":77685750,"identity":"973c9faa-13fb-46bc-b0ff-87cae5a724ca","added_by":"auto","created_at":"2025-03-04 09:03:56","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":106304,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eProportion of women surviving with invasive breast cancers over time by receptor subtype.\u003c/strong\u003e (A) \u0026nbsp;Adjusted Kaplan-Meier curves using “direct” adjustment method and (B) predicted proportion of women surviving using random survival forest (RSF) model. All curves were adjusted for patient age, tumour grade and disease stage.\u003c/p\u003e","description":"","filename":"image2.png","url":"https://assets-eu.researchsquare.com/files/rs-5515692/v1/805b096404f9159ed16df449.png"},{"id":77685753,"identity":"4bea3266-0ec9-4ff4-8513-7d3f4c63416e","added_by":"auto","created_at":"2025-03-04 09:03:56","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":101801,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eModel performance for prediction of breast cancer-specific survival. \u003c/strong\u003eCox proportional hazards model, Cox proportional hazards model with interaction terms between receptor subtype and grade and between diagnosis year cluster and detection method, regularised Cox proportional hazards model (ridge), Random Survival Forest (RSF) and Generalised Boosted Model (GBM) were evaluated for survival prediction performance over time using measures (A) area under the curve (AUC) and (B) Brier score.\u003c/p\u003e","description":"","filename":"image3.png","url":"https://assets-eu.researchsquare.com/files/rs-5515692/v1/1cd16a106dead4022be1fba9.png"},{"id":77689505,"identity":"2f98976f-ff4d-4a0b-8304-2a078112e920","added_by":"auto","created_at":"2025-03-04 09:27:57","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":187733,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eAnalysis of variable importance. \u003c/strong\u003e\u0026nbsp;(A) Forest plot showing the most important variables associated with predicting BCSS in the RSF model. Error bars represent 95% confidence intervals generated using 100 jack-knife subsamples. Model-agnostic explanations for (B) Cox proportional hazards and (C) RSF models, showing permutation variable importance over time.\u003c/p\u003e","description":"","filename":"image4.png","url":"https://assets-eu.researchsquare.com/files/rs-5515692/v1/94fcf0baf511427f2d3e3a18.png"},{"id":77690003,"identity":"fe98774f-ab1b-4f49-8d8b-c66d56decd58","added_by":"auto","created_at":"2025-03-04 09:35:58","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1466831,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-5515692/v1/4de2e0b8-e0ee-49a8-bb07-2ccc92325040.pdf"}],"financialInterests":"The authors declare no competing interests.","formattedTitle":"\u003cp\u003eEvaluating Temporal Dynamics in Breast Cancer Survival Predictions with Machine Learning and Cox Regression Analysis\u003c/p\u003e","fulltext":[{"header":"Introduction","content":"\u003cp\u003eBreast cancer remains a global health priority, with one of the highest incidence and mortality rates among cancers affecting women (Amato 2023, Newman \u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). In New Zealand (NZ), breast cancer incidence is particularly high, with one in nine women receiving a diagnosis in their lifetime and significant disparities in outcomes persisting, especially among Māori and Pacific women (Te Aho o Te Kahu 2020, Kim 2025). While the average five-year survival rates in NZ have improved to 91%, there remains a pressing need to deepen our understanding of the factors influencing long-term survival and recurrence (Gautier 2022). Addressing this knowledge gap could drive more precise risk models and guide personalized follow-up and intervention strategies.\u003c/p\u003e \u003cp\u003eTraditionally, the Cox proportional hazards regression model has been instrumental in survival analysis, including in breast cancer research, by estimating hazard ratios (HRs) and highlighting risk factors (Lawrenson 2016, Elwood 2018). However, the Cox proportional hazards model relies on predefined predictor variables, which may constrain the discovery of complex, time-varying relationships. Machine learning (ML) approaches, particularly Random Survival Forests (RSF), offer an adaptable, data-driven alternative, by dynamically identifying important features and patterns within high-dimensional data. By allowing the data itself to determine critical prognostic factors, ML can identify interactions and nonlinear associations that may otherwise remain hidden (Cygu 2023, Mihaylov 2019).\u003c/p\u003e \u003cp\u003eUnlike traditional parametric models, ML approaches like RSF can flexibly handle nonlinear relationships, high-dimensional data, and variable interactions, making fewer assumptions about the data structure. For example, RSF techniques do not rely on a fixed baseline hazard, allowing for variable effects over time without requiring the proportional hazards assumption inherent to the Cox model (Xu 2022, Alafchi 2019). This flexibility is especially valuable when exploring nuanced relationships, such as those between receptor subtype and survival, that may not conform to proportional hazards.\u003c/p\u003e \u003cp\u003eIn addition, ML methods like RSF incorporate automatic variable selection, reducing the manual effort required to select predictors. They often handle missing data more effectively than traditional methods, often employing ensemble techniques that enable predictions even when some data points are incomplete.\u003c/p\u003e \u003cp\u003eIn this study, we chose not to employ extensions of the Cox proportional hazards model, such as those incorporating time-varying effects, due to the stringent assumptions required and their potential to limit applicability in long-term survival analysis (Bellera et al., 2010). Instead, we utilize RSF as an exemplar machine learning approach to explore survival and recurrence patterns among breast cancer patients in NZ. RSF serves as a complementary method, offering insights into complex prognostic factors while highlighting patterns not readily captured by traditional approaches. By leveraging both traditional statistical models and modern machine learning methods, this work provides a more comprehensive understanding of cancer outcomes and establishes a foundation for future advancements in personalized breast cancer care.\u003c/p\u003e"},{"header":"Results","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003eClinicopathological characteristics\u003c/h2\u003e \u003cp\u003eData for this study was sourced from Te Rēhita Mate Ūtaetae (Breast Cancer Foundation NZ National Register, subsequently referred to as Te Rēhita). Of 26,463 women diagnosed with stage 1\u0026ndash;4 breast cancers in NZ between 2000 and 2019, 4,889 had missing information so were omitted (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). This resulted in a study cohort of 21,574 women.\u003c/p\u003e \u003cp\u003eThe majority of women were 45\u0026ndash;69 years when their invasive breast cancer was diagnosed, were European, had ER+/HER2- tumour receptor subtype, had grade 2 tumours and early stage (1\u0026ndash;2) disease (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). Slightly more women had breast cancer detected after presenting with symptoms (approx. 55%) than through breast screening. Approximately half of these women underwent breast-conserving surgery (BCS), while the other half had a mastectomy. The majority (65%) also received adjuvant radiotherapy (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). When analysed by tumour subtype, ER+/HER2- tumours were predominantly low-grade, whereas triple-negative tumours were predominantly high-grade. A higher proportion of women with ER+/HER2- breast tumours were over 44 years old and had their cancer diagnosed through screening (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). These distinct differences in clinical and pathological features across receptor subtypes, which are already integral to clinical decision-making and treatment planning, highlight the need for accurate, subtype-specific survival prediction models to further refine personalised clinical decision-making and treatment.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003e\u003cb\u003eClinicopathological characteristics of the study cohort.\u003c/b\u003e A total of n\u0026thinsp;=\u0026thinsp;21,574 women with invasive breast cancer and diagnosed between 2000\u0026ndash;2019 were included in this study. Columns represent specific tumour receptor subtypes, which influence clinical decision making, and cohort characteristics are presented in rows.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"6\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/th\u003e \u003cth align=\"left\" colspan=\"4\" nameend=\"c5\" namest=\"c2\"\u003e \u003cp\u003eReceptor Subtype\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e\u0026nbsp;\u003c/th\u003e \u003c/tr\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eER+/HER2-\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eER+/HER2+\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eER-/HER2+\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eTriple Negative\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eOverall\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e(n\u0026thinsp;=\u0026thinsp;16,177)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e(n\u0026thinsp;=\u0026thinsp;2,182)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e(n\u0026thinsp;=\u0026thinsp;1,087)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e(n\u0026thinsp;=\u0026thinsp;2,128)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e(n\u0026thinsp;=\u0026thinsp;21,574)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eAge at Diagnosis (years)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMean (SD)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e58.5 (12.3)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e54.5 (13.1)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e54.9 (13.0)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e57.2 (14.1)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e57.8 (12.7)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMedian [Min, Max]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e58.0 [20.0, 97.0]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e53.0 [21.0, 94.0]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e54.0 [22.0, 96.0]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e57.0 [20.0, 98.0]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e57.0 [20.0, 98.0]\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eAge at Diagnosis group (years)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e45\u0026ndash;69\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e11,405 (70.5%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1,396 (64.0%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e702 (64.6%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e1,276 (60.0%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e14,779 (68.5%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u0026le;\u0026thinsp;44\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e1,863 (11.5%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e507 (23.2%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e237 (21.8%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e433 (20.3%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e3,040 (14.1%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u0026ge;\u0026thinsp;70\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e2,909 (18.0%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e279 (12.8%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e148 (13.6%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e419 (19.7%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e3,755 (17.4%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eEthnicity\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eEuropean\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e12,026 (74.3%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1,521 (69.7%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e714 (65.7%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e1,656 (77.8%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e15,917 (73.8%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMāori\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e1,582 (9.8%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e237 (10.9%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e117 (10.8%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e158 (7.4%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e2,094 (9.7%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ePacific Peoples\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e908 (5.6%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e177 (8.1%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e124 (11.4%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e82 (3.9%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e1,291 (6.0%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAsian\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e1,390 (8.6%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e204 (9.3%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e104 (9.6%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e185 (8.7%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e1,883 (8.7%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eOther/Unknown\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e271 (1.7%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e43 (2.0%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e28 (2.6%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e47 (2.2%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e389 (1.8%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eDetection Method\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eScreened\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e8,016 (49.6%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e795 (36.4%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e322 (29.6%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e594 (27.9%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e9,727 (45.1%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eNot Screened\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e8,161 (50.4%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1,387 (63.6%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e765 (70.4%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e1,534 (72.1%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e11,847 (54.9%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eTumour Grade\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e4,657 (28.8%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e106 (4.9%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e12 (1.1%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e35 (1.6%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e4,810 (22.3%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e8,621 (53.3%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e980 (44.9%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e233 (21.4%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e383 (18.0%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e10,217 (47.4%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e2,899 (17.9%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1,096 (50.2%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e842 (77.5%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e1,710 (80.4%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e6,547 (30.3%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eDisease Stage (AJCC 7)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e9,868 (61.0%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1,076 (49.3%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e466 (42.9%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e960 (45.1%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e12,370 (57.3%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e5,115 (31.6%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e827 (37.9%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e450 (41.4%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e944 (44.4%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e7,336 (34.0%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e877 (5.4%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e165 (7.6%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e89 (8.2%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e127 (6.0%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e1,258 (5.8%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e4\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e317 (2.0%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e114 (5.2%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e82 (7.5%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e97 (4.6%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e610 (2.8%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eAdjuvant Radiotherapy\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eNo\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e5,565 (34.4%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e763 (35.0%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e427 (39.3%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e749 (35.2%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e7,504 (34.8%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e10,612 (65.6%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1,419 (65.0%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e660 (60.7%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e1,379 (64.8%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e14,070 (65.2%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eSurgery Type\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eBCS\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e8,772 (54.2%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e853 (39.1%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e302 (27.8%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e980 (46.1%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e10,907 (50.6%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMastectomy\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e7,405 (45.8%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1,329 (60.9%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e785 (72.2%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e1,148 (53.9%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e10,667 (49.4%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eDiagnosis Year Cluster\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e2000\u0026ndash;2004\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e853 (5.3%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e187 (8.6%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e139 (12.8%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e261 (12.3%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e1,440 (6.7%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e2005\u0026ndash;2008\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e2,261 (14.0%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e298 (13.7%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e231 (21.3%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e414 (19.5%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e3,204 (14.9%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e2009\u0026ndash;2019\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e13,063 (80.8%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1,697 (77.8%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e717 (66.0%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e1,453 (68.3%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e16,930 (78.5%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eMachine learning model performance\u003c/h3\u003e\n\u003cp\u003eTo address this need, accurate survival models can increase our understanding of factors that affect breast cancer outcomes, inform treatment decision-making, and enable the stratification of patients into different risk groups, which is essential for personalised medicine. In order to generate an accurate and useful model, both machine learning and traditional models were explored for each receptor subtype. The survival prediction curves generated by the RSF model were contrasted with those generated by traditional adjusted CPH model, revealing notable differences in the shapes of the survival curves (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e). The RSF model predicted a sharper initial decline in survival across all except ER+/HER2- tumour receptor subtype, compared to the more gradual decline shown by the adjusted Kaplan-Meier (KM) curves. The RSF model predicted comparatively better survival for women with ER+/HER2- breast cancers relative to other receptor subtypes, whereas the curves for the adjusted KM are more condensed, potentially obscuring these distinctions.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eNext, the time-dependent survival prediction performance was evaluated via AUC and Brier score for five different models (see methods). A high AUC and low Brier score is indicative of a better performing model, or closer survival prediction to the true values. This analysis showed that RSF was the optimal model for survival prediction across all time points (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e). The other models analysed had similar survival prediction performance.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eSince the different Cox proportional hazards models studied had roughly equivalent prediction accuracy, for simplicity, this study proceeded with the simplest Cox model as the main comparison model representing a traditional tool. The Cox model with interactions and regularised Cox model were not explored further in this study. The traditional model CPH and RSF were then compared to evaluate the risk factors associated with breast cancer-specific survival (BCSS). To further understand the factors driving these predictions, the influential predictors identified through both models were analysed.\u003c/p\u003e\n\u003ch3\u003eInfluential predictors identified through traditional Cox regression and RSF models\u003c/h3\u003e\n\u003cp\u003eThe RSF model (which demonstrated the most accurate survival predictions) and the Cox proportional hazards model were analysed to uncover their underlying mechanisms and identify the key variables influencing their predictions. In the Cox proportional hazards model, worse BCSS was significantly associated with diagnosis from age 70, diagnosis after presenting with symptoms (i.e. not through breast screening), having triple negative receptor subtype, higher tumour grade and disease stage, and requiring a mastectomy instead of BCS (Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e). In contrast, Asian women, and those with the ER+/HER2\u0026thinsp;+\u0026thinsp;receptor subtype were associated with improved BCSS compared to other ethnicities and tumour subtypes, respectively (Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e).\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003e\u003cb\u003eCox proportional hazards model.\u003c/b\u003e Hazard ratios for breast cancer-specific survival are presented for each covariable, with 95% confidence intervals (CI) in parentheses.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"4\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eHazard Ratio (95% CI)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eP value\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eAge at diagnosis (years)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e45\u0026ndash;69\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u003cem\u003eReference\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u0026le;\u0026thinsp;44\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.96 (0.86\u0026ndash;1.07)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.42\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u0026ge;\u0026thinsp;70\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1.41 (1.26\u0026ndash;1.57)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.00\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eEthnicity\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eEuropean\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u003cem\u003eReference\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eMāori\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1.13 (0.99\u0026ndash;1.29)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.07\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ePacific Peoples\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.98 (0.83\u0026ndash;1.14)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.76\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAsian\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.6 (0.5\u0026ndash;0.71)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.00\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eOther/Unknown\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1.38 (1.08\u0026ndash;1.75)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.01\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eDetection method\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eScreened\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u003cem\u003eReference\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNot Screened\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1.52 (1.35\u0026ndash;1.71)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.00\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eReceptor Subtype\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eER+/HER2-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u003cem\u003eReference\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eER+/HER2+\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.87 (0.76\u0026ndash;0.99)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.04\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eER-/HER2+\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1.11 (0.96\u0026ndash;1.3)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.16\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eTriple Negative\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1.48 (1.31\u0026ndash;1.66)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.00\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eTumour Grade\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u003cem\u003eReference\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e2.48 (2.05\u0026ndash;3.01)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.00\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e4 (3.27\u0026ndash;4.88)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.00\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eDisease stage\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u003cem\u003eReference\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e2.26 (2.02\u0026ndash;2.52)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.00\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e4.17 (3.57\u0026ndash;4.87)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.00\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e4\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e11.33 (9.77\u0026ndash;13.15)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.00\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eRadiotherapy\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNo\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u003cem\u003eReference\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.97 (0.88\u0026ndash;1.06)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.47\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eMost invasive surgery\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eBCS\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u003cem\u003eReference\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eMastectomy\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1.45 (1.3\u0026ndash;1.62)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.00\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eDiagnosis year cluster\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e2000\u0026ndash;2004\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u003cem\u003eReference\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e2005\u0026ndash;2008\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.72 (0.63\u0026ndash;0.82)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.00\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e2009\u0026ndash;2019\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.47 (0.42\u0026ndash;0.53)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.00\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eWhile the Cox proportional hazards model generates hazards ratios which can be used to assess the impact of variables of survival, RSF model does not, although it does provide a measure \u0026ldquo;variable importance\u0026rdquo;. RSF variable importance is different from statistical model coefficients, however, it provides a metric for comparison and an alternative tool when assessing which variables influence survival and survival prediction. This analysis showed that stage, grade, receptor subtype and surgery type were the four most influential risk factors in the RSF model, when considering permutation variable importance (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eA). These results were consistent with the statistically significant coefficients observed in the Cox proportional hazards model. Ethnicity was the next most important variable, which also demonstrated statistical significance in the Cox proportional hazards model.\u003c/p\u003e \u003cp\u003eHaving identified the key predictors in each model, the next step was to explore how these variables contribute to the models' overall predictive performance. To achieve this, Brier score loss was compared after variable permutation. This analysis revealed that disease stage was the most important variable for the performance of the Cox proportional hazards model, whereas tumour receptor subtype was the most important variable for the performance of the RSF survival prediction model (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eB and C).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e"},{"header":"Discussion","content":"\u003cp\u003eThis study demonstrates the utility of machine learning methods, particularly Random Survival Forests (RSF), in analysing BCSS alongside traditional statistical approaches such as Cox proportional hazards model. Our findings indicate that the RSF model provided the most accurate survival predictions across all time points, outperforming traditional models in this dataset. While RSF does not yield the familiar hazard ratios (HRs) associated with Cox proportional hazards model, it offers alternative metrics such as variable importance and Brier score loss after permutations to identify influential predictors of survival.\u003c/p\u003e \u003cp\u003eAnalysis of the importance of the individual variables in the RSF model largely aligned with the significant HRs identified by the Cox model. Both models highlighted disease stage, tumour grade, receptor subtype, and surgery type as key predictors of BCSS. Notably, in the RSF model, receptor subtype emerged as the most influential predictor, contrasting with the Cox model where disease stage held the greatest influence. This difference highlights the potential of RSF to capture the complex, nonlinear relationships between receptor subtype and survival outcomes, which may be less apparent in traditional Cox regression analysis. This enhanced ability to detect intricate patterns underscores the potential of machine learning methods to uncover nuanced prognostic factors. This finding has important implications for personalised medicine in breast cancer. While disease stage remains a critical factor in treatment decisions, the RSF model's emphasis on receptor subtype provides a complementary perspective. By accurately predicting survival based on subtype-specific factors, this model could lead to more tailored treatment decisions, particularly within specific disease stages. For instance, the model may identify specific subtypes with particularly favourable or unfavourable prognoses within a given stage, allowing for more informed treatment selection or closer surveillance. Overall, the RSF model's emphasis on receptor subtype, in contrast to the Cox model's focus on disease stage, highlights its potential to enhance prognostication and treatment decision-making in breast cancer, ultimately leading to improved patient outcomes.\u003c/p\u003e \u003cp\u003eAn interesting observation from this study was that the performance of survival prediction for all models decreased as time from diagnosis increased, indicated by declining AUC and increasing Brier scores. Importantly, diagnosis year cluster was included in these models, which adjusts for some of the expected change over time, such as changes in treatment method. This trend may reflect model underfitting due to limited long-term data (11% of women in this cohort had follow-up times of 15 years or greater) or fewer events (deaths) occurring at extended follow-up times. However, performance metrics appeared to stabilize or even improve approaching the 20-year follow-up mark. This could suggest that the models are better at predicting long-term survivors, possibly due to distinct characteristics among women who survive beyond 20 years after diagnosis (1% of women in this cohort). These survivors may have unique clinical or biological features that are more readily captured by the models at extended time points. This pattern may also be influenced by survivorship bias, meaning the individuals who remain in the cohort at extended time points could represent a selective group with inherently better prognostic factors, potentially inflating model performance at those longer time points.\u003c/p\u003e \u003cp\u003eThe strengths and limitations of both traditional and machine learning methods are evident in this study. Traditional methods like Cox proportional hazards model rely on strong assumptions about the data, such as the proportional hazards assumption, which can be violated in long-term survival data (Kurt Omurlu et al., 2009; Wang \u0026amp; Li, 2017). In contrast, machine learning models like RSF can model complex nonlinear relationships and interactions between covariates without stringent assumptions, providing flexibility in handling diverse data structures. This flexibility is demonstrated in our findings, where RSF consistently outperformed traditional models in prediction accuracy across all time points (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e). However, increased model complexity can reduce interpretability. In this study we began to explore the underlying mechanism of how the models generated their predictions (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e). Future studies could include techniques like Local Interpretable Model-Agnostic Explanations and Shapley Additive Explanation values that can explore the influence of variables in complex models even further (Alabi et al., 2023; Lundberg \u0026amp; Lee, 2017; Moncada-Torres et al., 2021).\u003c/p\u003e \u003cp\u003eMachine learning models also handle missing data more effectively and can produce predictions even with incomplete data through ensemble methods and the use of weak learners (Fanizzi et al., 2023; Steele et al., 2018). Women with missing data were excluded from this study (with the exception of ethnicity) to ensure comparability with the Cox proportional hazards model, which requires complete datasets. Future studies could consider retaining these records to prevent information loss and explore statistical methods such as multiple imputation or other machine learning approaches for missing data.\u003c/p\u003e \u003cp\u003eOur findings align with previous studies comparing traditional and machine learning methods for survival analysis. Spooner et al. (2020) found similar performance across various machine learning algorithms and traditional Cox proportional hazards model, depending on whether model assumptions are met and the complexity of covariate relationships. In cases where the assumptions are violated or relationships are more complex, machine learning models may outperform traditional methods. Even when traditional model assumptions hold, the ability of machine learning models to detect more complex relationships provides an opportunity to enhance traditional models by identifying additional variables or interactions to include. Our study found better survival prediction performance for RSF compared to Cox proportional hazards, which aligns with the findings from Jia et al. (\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e2025\u003c/span\u003e), although their study cohort included only inflammatory breast cancer patients. Studies exploring breast cancer survival or recurrence as a binary analysis (ignoring censoring) are common (Hamedi et al., \u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e2024\u003c/span\u003e; Kamble et al., \u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e2025\u003c/span\u003e; Noman et al., \u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e2025\u003c/span\u003e). Noman et al. (\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e2025\u003c/span\u003e) nicely utilised a Cox proportional hazards model alongside other machine learning models, however, this was to predict recurrence, whereas our study analysed BCSS, so we are unable to compare results.\u003c/p\u003e \u003cp\u003eIt is important to note that there is no universally 'best' model across all datasets (Manikandan et al., 2023). The choice of model should be guided by the specific context, data characteristics, and research objectives. While machine learning offers advantages in modelling complex relationships, traditional models remain valuable, especially when their assumptions are appropriate for the data.\u003c/p\u003e \u003cp\u003eThe utility and interpretation of survival prediction models should be approached cautiously, with rigorous validation and calibration studies, and in conjunction with clinical expertise. Individual variability among patients necessitates careful consideration when applying these models for personalized prognostication. However, both traditional statistical models and machine learning methods are valuable tools for exploring survival patterns and identifying influential predictors in patient populations.\u003c/p\u003e"},{"header":"Conclusion","content":"\u003cp\u003eThis study underscores the potential of machine learning methods like RSF in enhancing survival analysis for breast cancer patients. The RSF model's ability to capture complex relationships without strict assumptions makes it a powerful complement to traditional methods. Key predictors such as tumour receptor subtype and disease stage were identified as influential for BCSS, highlighting the need for models that can accurately capture these complexities. Our findings advocate for the integration of machine learning approaches in survival analysis to improve risk stratification and support the development of personalised care strategies in breast cancer management.\u003c/p\u003e \u003c/div\u003e"},{"header":"Methods","content":"\u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003cdiv id=\"Sec9\" class=\"Section3\"\u003e \u003ch2\u003eEthics\u003c/h2\u003e \u003cp\u003eThis study was conducted according to the guidelines of the Declaration of Helsinki and approved by the Auckland Health Research Ethics Committee (AH2800). This study used data from Te Rēhita Mate Ūtaetae- the Breast Cancer Foundation NZ National Register. Te Rēhita is an opt-out register, which operates under the NZ Health and Disability Ethics Committee approval (16/NTA/139/AM03), privacy, and health legislation and Treaty of Waitangi principles. Patients receive an information sheet explaining that their de-identified data may be used for research purposes, subject to approval by Te Rēhita governance group. Those who choose not to opt-out implicitly consent to their data being included in this study.\u003c/p\u003e \u003c/div\u003e \n\u003ch3\u003eStudy cohort\u003c/h3\u003e\n\u003cp\u003eThis study analysed data from 26,463 women diagnosed with stage 1\u0026ndash;4 breast cancers in NZ. A 2000-to-2019-time span was selected to ensure a follow-up period of at least 3 years, and to mitigate bias introduced by the inclusion of data from new regions in 2020. Patient follow-up within Te Rēhita was confirmed as up to date as of the data extraction on March 2, 2023. BCSS was calculated from the date of diagnosis to the date of death from breast cancer, if it occurred. Otherwise, survival time was censored at the date of death from other causes or at the latest follow-up date within the study period. Women in the study cohort had a median follow-up time of 8 years.\u003c/p\u003e \u003cdiv id=\"Sec11\" class=\"Section2\"\u003e \u003ch2\u003ePredictors\u003c/h2\u003e \u003cp\u003eThis analysis incorporated several key patient factors as predictors, including age group at diagnosis, detection method, ethnicity, surgical intervention, and radiotherapy. Age groups were categorized as \u0026lt;\u0026thinsp;45, 45\u0026ndash;69, and \u0026ge;\u0026thinsp;70 years to correspond with the eligibility criteria for New Zealand\u0026rsquo;s national breast screening program, BreastScreen Aotearoa (BSA) (Breast Cancer Aotearoa Coalition, 2020) for women aged 45\u0026ndash;69 years. Detection method was recorded as invasive breast cancer detected by screening mammography- \u0026ldquo;screened\u0026rdquo;, or detection after presentation with symptoms- \u0026ldquo;not screened\u0026rdquo;. Te Rēhita collects up to three ethnicities per person, and sources these from New Zealand\u0026rsquo;s Ministry of Health through an interactive link with each person\u0026rsquo;s unique health identifier (Gautier et al., \u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). Ethnicity was recorded as prioritised ethnicity, level 1 per HISO 10001:2017 ethnicity data protocols (Wellington: Ministry of Health, 2017), with Middle Eastern, Latin American, African (MELAA), \u0026ldquo;Other,\u0026rdquo; and missing ethnicity information combined into an \u0026ldquo;Other/Unknown\u0026rdquo; category. The most invasive surgery performed on each patient was recorded, ensuring that cases where a mastectomy followed an initial BCS were accurately captured. Disease stage was assessed using the AJCC 7 TNM staging system to ensure consistency in staging across the entire cohort. Year of diagnosis was grouped into intervals (2000\u0026ndash;2004, 2005\u0026ndash;2008, and 2009\u0026ndash;2019) based on the results of k-medians survival clustering from our previous study (\u003cem\u003ereference clustering paper\u003c/em\u003e).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec12\" class=\"Section2\"\u003e \u003ch2\u003eModels\u003c/h2\u003e \u003cp\u003eIn this analysis, we evaluated several models for survival prediction, including standard Cox regression, regularized Cox regression, Cox regression with interactions, Random Survival Forests (RSF), and Generalized Boosting Models (GBM).\u003c/p\u003e \u003cp\u003eThe Cox proportional hazards model(Cox, \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e1972\u003c/span\u003e) is a semi-parametric model (Kalbfleisch \u0026amp; Schaubel, \u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e2023\u003c/span\u003e), containing components of both parametric (known distribution of regression coefficients \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{{\\beta\\:}}_{\\text{i}}\$\u003c/span\u003e\u003c/span\u003e) and non-parametric (unknown baseline hazard function \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{\\text{h}}_{0}\$\u003c/span\u003e\u003c/span\u003e). The Cox model takes the formula\u003cdiv id=\"Equa\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equa\" name=\"EquationSource\"\u003e\n$$\\:\\text{h}\\left(\\text{t}\\right)={\\text{h}}_{0}\\left(\\text{t}\\right)\\bullet\\:\\text{e}\\text{x}\\text{p}({{\\beta\\:}}_{1}{\\text{x}}_{1}+{{\\beta\\:}}_{1}{\\text{x}}_{1}+...+{{\\beta\\:}}_{\\text{p}}{\\text{x}}_{\\text{p}})$$\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003ewhere \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:\\text{h}\\left(\\text{t}\\right)\$\u003c/span\u003e\u003c/span\u003e is the hazard function at time \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:\\text{t}\$\u003c/span\u003e\u003c/span\u003e and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{\\text{h}}_{0}\$\u003c/span\u003e\u003c/span\u003e is the baseline hazard with\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:\\:{\\text{x}}_{\\text{i}}=0\$\u003c/span\u003e\u003c/span\u003e for all \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:\\text{p}\$\u003c/span\u003e\u003c/span\u003e predictors. The hazard function is the conditional probability of a person experiencing an event at some time point, for example breast cancer-specific death, given that the person has been event-free up until that time. Hazard ratios can be calculated by taking the exponential of the coefficients \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:\\text{e}\\text{x}\\text{p}\\left({{\\beta\\:}}_{\\text{i}}\\right)\$\u003c/span\u003e\u003c/span\u003e, and are measures of an instantaneous relative risk (Sashegyi \u0026amp; Ferry, \u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e2017\u003c/span\u003e). Note that the log of the hazard rate is a linear combination of covariates. The other assumption for the Cox model is that the effect of covariates on the hazard function is proportional over time, since the exponent \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:\\text{e}\\text{x}\\text{p}\\left({{\\beta\\:}}_{\\text{i}}{\\text{x}}_{\\text{i}}\\right)\$\u003c/span\u003e\u003c/span\u003e does not include time \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:\\text{t}\$\u003c/span\u003e\u003c/span\u003e. That is, the difference in hazard for one group at one time point, maintains the same proportion difference in hazard to other groups at any other time point (Bewick et al., \u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e2004\u003c/span\u003e; Kuitunen et al., \u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). In non-mathematical terms, the Cox model helps to quantify how each variable impacts the person \u0026lsquo;s risk of experiencing the event.\u003c/p\u003e \u003cp\u003eMachine learning provides some additional useful tools to examine survival that do not share the same rigorous statistical assumptions as those just described for Cox regression. Random Survival Forests (RSF)(Ishwaran et al., \u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e2008\u003c/span\u003e) are extensions of random forests (Breiman, \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e2001\u003c/span\u003e), ensemble tree methods that combine and average the survival predictions from many decision trees (Breiman et al., \u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e1984\u003c/span\u003e). RSFs reduce estimation variance by using independent bootstrap sampling before constructing each tree, with each split chosen from a random subset of the features rather than all features. In the context of survival analysis, RSF extends this approach by using a splitting criterion optimized for survival differences, such as the log-rank test, to construct each tree (Ishwaran et al., \u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e2008\u003c/span\u003e). Each node within an RSF tree divides the data to maximize survival contrast between groups, enabling the detection of complex, nonlinear relationships among covariates. Unlike Cox regression, RSF does not produce hazard ratios but instead provides aggregate survival predictions derived from many trees. To determine variable importance in RSF, we used an out-of-bag (OOB) approach, where the OOB data \u0026mdash; data not included in the bootstrap sample \u0026mdash; were permuted for each variable. Comparing prediction error before and after permutation indicated each variable\u0026rsquo;s importance, with larger discrepancies in error signifying greater influence on survival. Variables appearing higher in the tree structure influence more downstream nodes, and therefore, permutations of high-importance variables typically result in larger prediction errors.\u003c/p\u003e \u003cp\u003eGeneralised Boosted Models (GBM) works by consecutively building decision trees to predict the residuals of the previous tree (Ridgeway, 2020), similar to Gradient Boosting (Chen \u0026amp; Guestrin, 2016).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec13\" class=\"Section2\"\u003e \u003ch2\u003eModel implementation\u003c/h2\u003e \u003cp\u003eCox regression was implemented in R using survival::coxph(). Regularized Cox regression used riskRegression::GLMnet(), with alpha values tuned over a grid from 0 to 1 in 0.1 increments, resulting in an optimal alpha of 0, which effectively applied ridge regularization. Lambda was selected based on the minimum prediction error using the default regularization path. By incorporating regularization, we anticipated improved model generalizability and enhanced survival prediction accuracy on unseen test data, as regularization mitigates overfitting. Additionally, a Cox model with interaction terms was developed to account for the interaction between disease stage and ethnicity, which was found to be significant during exploratory analysis.\u003c/p\u003e \u003cp\u003eRSF was tuned on all covariates from the multivariable model, along with the diagnosis year, using randomForestSRC::tune.rfsrc. The optimal configuration was achieved with nodesize\u0026thinsp;=\u0026thinsp;15 and mtry\u0026thinsp;=\u0026thinsp;6. Feature importance scores and their confidence intervals were computed using delete-d jackknife procedures (randomForestSRC::subsample), while survival curves from RSF predictions were visualized through ggRandomForests::gg_rfsrc().\u003c/p\u003e \u003cp\u003eThe GBM model was fit in this study using gbm::gbm(), and employed 10-fold cross-validation over a parameter grid to minimize the AUC for 5-year BCSS. Optimal GBM parameters were identified as n.tree\u0026thinsp;=\u0026thinsp;300, interaction.depth\u0026thinsp;=\u0026thinsp;3, and shrinkage\u0026thinsp;=\u0026thinsp;0.1, with covariate influence on survival assessed through relative variable importance scores.\u003c/p\u003e \u003cp\u003eTo evaluate the survival prediction performance across models, we used the Area Under the Curve (AUC) and Brier score at 5-year follow-up and time-dependent intervals. The riskRegression::Score() function, which employs inverse probability of censoring weights (IPCW), was utilized for Brier score estimation. AUC, which quantifies a model\u0026rsquo;s discriminatory power, reflects the probability that a randomly selected positive instance (breast-cancer-specific death) is ranked higher than a negative one (survival) (Fawcett, 2006). Time-dependent AUC was calculated using the Blanche et al. method, a modification of the Uno method (Uno et al., 2007), wherein each AUC point reflects the probability that a patient who died from breast cancer had a higher predicted risk than a patient who survived (Wu \u0026amp; Li, 2018). Confidence intervals for these metrics were generated through 10-fold cross-validation.\u003c/p\u003e \u003cp\u003eTo enhance interpretability of the Cox regression and RSF models, we employed model-agnostic explanations using the survex package (Spytek et al., 2023), which allowed for time-dependent permutation-based feature importance. This approach provided insights into the dynamic influence of covariates over time and facilitated a deeper understanding of variable impacts on survival prediction in a model-agnostic context.\u003c/p\u003e \u003c/div\u003e"},{"header":"Declarations","content":"\u003ch2\u003eCompeting interests\u003c/h2\u003e \u003cp\u003eThe author(s) declare no competing interests.\u003c/p\u003e \u003ch2\u003eAuthor Contributions\u003c/h2\u003e \u003cp\u003eBW conducted the initial data analysis, contributed to study design, and participated in the interpretation of results. NK conceptualized and designed the study, interpreted the findings, and contributed to manuscript drafting. AL assisted in the interpretation of results and manuscript preparation. All authors reviewed and approved the final manuscript.\u003c/p\u003e\u003ch2\u003eAcknowledgements\u003c/h2\u003e \u003cp\u003eWe gratefully acknowledge Te Rēhita Mate Ūtaetae - Breast Cancer Foundation National Register for providing the data used in this study and Te Rēhita Clinical Advisory Group for reviewing this manuscript. Our thanks also extend to all the individuals who consented to participate in Te Rēhita, contributing invaluable information towards breast cancer research and management. We would also like to thank our funders, the New Zealand Breast Cancer Foundation via the Helena McAlpine Young Women\u0026rsquo;s Breast Cancer Study and the Breast Cancer Cure via the Not A One-Size-Fits-All Service grant.\u003c/p\u003e\u003ch2\u003eData availability\u003c/h2\u003e \u003cp\u003eThe dataset presented in this article is not readily available because this requires approval by the custodians of the data. Requests to access the dataset should be directed to Te Rēhita Mate Ūtaetae.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eAmato, O., Guarneri, V., \u0026amp; Girardi, F. (2023). Epidemiology trends and progress in breast cancer survival: earlier diagnosis, new therapeutics. \u003cem\u003eCurrent Opinion in Oncology\u003c/em\u003e, \u003cem\u003e35\u003c/em\u003e(6), 612. https://doi.org/10.1097/CCO.0000000000000991\u003c/li\u003e\n\u003cli\u003eArnold, M., Morgan, E., Rumgay, H., Mafra, A., Singh, D., Laversanne, M., Vignat, J., Gralow, J. R., Cardoso, F., Siesling, S., \u0026amp; Soerjomataram, I. (2022). Current and future burden of breast cancer: Global statistics for 2020 and 2040. \u003cem\u003eBreast (Edinburgh, Scotland)\u003c/em\u003e, \u003cem\u003e66\u003c/em\u003e, 15\u0026ndash;23. https://doi.org/10.1016/J.BREAST.2022.08.010\u003c/li\u003e\n\u003cli\u003eAye, P. S., Win, S. S., Tin Tin, S., \u0026amp; Elwood, J. M. (2023). Comparison of Cancer Mortality and Incidence Between New Zealand and Australia and Reflection on Differences in Cancer Care: An Ecological Cross-Sectional Study of 2014-2018. \u003cem\u003eCancer Control : Journal of the Moffitt Cancer Center\u003c/em\u003e, \u003cem\u003e30\u003c/em\u003e. https://doi.org/10.1177/10732748231152330\u003c/li\u003e\n\u003cli\u003eBewick, V., Cheek, L., \u0026amp; Ball, J. (2004). Statistics review 12: Survival analysis. \u003cem\u003eCritical Care\u003c/em\u003e, \u003cem\u003e8\u003c/em\u003e(5), 389. https://doi.org/10.1186/CC2955\u003c/li\u003e\n\u003cli\u003eBreiman, L. (2001). Random forests. \u003cem\u003eMachine Learning\u003c/em\u003e, \u003cem\u003e45\u003c/em\u003e(1), 5\u0026ndash;32. https://doi.org/10.1023/A:1010933404324/METRICS\u003c/li\u003e\n\u003cli\u003eBreiman, L., Friedman, J. H., Olshen, R. A., \u0026amp; Stone, C. J. (1984). Classification and regression trees. In \u003cem\u003eClassification and Regression Trees\u003c/em\u003e. CRC Press. https://doi.org/https://doi.org/10.1201/9781315139470\u003c/li\u003e\n\u003cli\u003eChih-Lin Chi, W Nick Street, \u0026amp; William H Wolberg. (2007). Application of artificial neural network-based survival analysis on two breast cancer datasets. \u003cem\u003eAMIA Annu Symp Proc .\u003c/em\u003e https://pubmed-ncbi-nlm-nih-gov.ezproxy.auckland.ac.nz/18693812/\u003c/li\u003e\n\u003cli\u003eCox, D. R. (1972). Regression Models and Life-Tables. \u003cem\u003eJournal of the Royal Statistical Society: Series B (Methodological)\u003c/em\u003e, \u003cem\u003e34\u003c/em\u003e(2), 187\u0026ndash;202. https://doi.org/10.1111/J.2517-6161.1972.TB00899.X\u003c/li\u003e\n\u003cli\u003eCygu, S., Seow, H., Dushoff, J., \u0026amp; Bolker, B. M. (2023). Comparing machine learning approaches to incorporate time-varying covariates in predicting cancer survival time. \u003cem\u003eScientific Reports 2023 13:1\u003c/em\u003e, \u003cem\u003e13\u003c/em\u003e(1), 1\u0026ndash;10. https://doi.org/10.1038/s41598-023-28393-7\u003c/li\u003e\n\u003cli\u003eElwood, J. M., Tawfiq, E., TinTin, S., Marshall, R. J., Phung, T. M., Campbell, I., Harvey, V., \u0026amp; Lawrenson, R. (2018). Development and validation of a new predictive model for breast cancer survival in New Zealand and comparison to the Nottingham prognostic index. \u003cem\u003eBMC Cancer\u003c/em\u003e, \u003cem\u003e18\u003c/em\u003e(1), 897. https://doi.org/10.1186/s12885-018-4791-x\u003c/li\u003e\n\u003cli\u003eGautier, A., Harvey, V., Kleinsman, S., Knowlton, N., Lasham, A., \u0026amp; Ramsaroop, R. (2022). \u003cem\u003e30,000 voices: Informing a better future for breast cancer in Aotearoa New Zealand\u003c/em\u003e. Breast Cancer Foundation NZ. https://doi.org/10.17608/k6.auckland.19679019\u003c/li\u003e\n\u003cli\u003eGupta, S., Tran, T., Luo, W., Phung, D., Kennedy, R. L., Broad, A., Campbell, D., Kipp, D., Singh, M., Khasraw, M., Matheson, L., Ashley, D. M., \u0026amp; Venkatesh, S. (2014). Machine-learning prediction of cancer survival: a retrospective study using electronic administrative records and a cancer registry. \u003cem\u003eBMJ Open\u003c/em\u003e, \u003cem\u003e4\u003c/em\u003e(3), e004007. https://doi.org/10.1136/BMJOPEN-2013-004007\u003c/li\u003e\n\u003cli\u003eHamedi, S. Z., Emami, H., Khayamzadeh, M., Rabiei, R., Aria, M., Akrami, M., \u0026amp; Zangouri, V. (2024). Application of machine learning in breast cancer survival prediction using a multimethod approach. \u003cem\u003eScientific Reports 2024 14:1\u003c/em\u003e, \u003cem\u003e14\u003c/em\u003e(1), 1\u0026ndash;18. https://doi.org/10.1038/s41598-024-81734-y\u003c/li\u003e\n\u003cli\u003eHao, J., Kim, Y., Mallavarapu, T., Oh, J. H., \u0026amp; Kang, M. (2019). Interpretable deep neural network for cancer survival analysis by integrating genomic and clinical data. \u003cem\u003eBMC Medical Genomics\u003c/em\u003e, \u003cem\u003e12\u003c/em\u003e(Suppl 10). https://doi.org/10.1186/S12920-019-0624-2\u003c/li\u003e\n\u003cli\u003eIshwaran, H., Kogalur, U. B., Blackstone, E. H., \u0026amp; Lauer, M. S. (2008). Random survival forests. \u003cem\u003eHttps://Doi.Org/10.1214/08-AOAS169\u003c/em\u003e, \u003cem\u003e2\u003c/em\u003e(3), 841\u0026ndash;860. https://doi.org/10.1214/08-AOAS169\u003c/li\u003e\n\u003cli\u003eJia, Y., Li, C., Feng, C., Sun, S., Cai, Y., Yao, P., Wei, X., Feng, Z., Liu, Y., Lv, W., Wu, H., Wu, F., Zhang, L., Zhang, S., \u0026amp; Ma, X. (2025). Prognostic prediction for inflammatory breast cancer patients using random survival forest modelling. \u003cem\u003eTranslational Oncology\u003c/em\u003e, \u003cem\u003e52\u003c/em\u003e, 102246. https://doi.org/10.1016/J.TRANON.2024.102246\u003c/li\u003e\n\u003cli\u003eKalbfleisch, J. D., \u0026amp; Schaubel, D. E. (2023). Fifty Years of the Cox Model. \u003cem\u003eHttps://Doi.Org/10.1146/Annurev-Statistics-033021-014043\u003c/em\u003e, \u003cem\u003e10\u003c/em\u003e, 1\u0026ndash;23. https://doi.org/10.1146/ANNUREV-STATISTICS-033021-014043\u003c/li\u003e\n\u003cli\u003eKamble, T. S., Wang, H., Myers, N., Littlefield, N., Reid, L., McCarthy, C. S., Lee, Y. J., Liu, H., Pantanowitz, L., Amirian, S., Rashidi, H. H., \u0026amp; Tafti, A. P. (2025). Predicting cancer survival at different stages: Insights from fair and explainable machine learning approaches. \u003cem\u003eInternational Journal of Medical Informatics\u003c/em\u003e, \u003cem\u003e197\u003c/em\u003e, 105822. https://doi.org/10.1016/J.IJMEDINF.2025.105822\u003c/li\u003e\n\u003cli\u003eKaplan, E. L., \u0026amp; Meier, P. (1958). Nonparametric Estimation from Incomplete Observations. \u003cem\u003eJournal of the American Statistical Association\u003c/em\u003e, \u003cem\u003e53\u003c/em\u003e(282), 457\u0026ndash;481. https://doi.org/10.1080/01621459.1958.10501452\u003c/li\u003e\n\u003cli\u003eKim, J., Harper, A., McCormack, V. \u003cem\u003eet al.\u003c/em\u003e Global patterns and trends in breast cancer incidence and mortality across 185 countries. \u003cem\u003eNat Med\u003c/em\u003e (2025). https://doi.org/10.1038/s41591-025-03502-3\u003c/li\u003e\n\u003cli\u003eKuitunen, I., Ponkilainen, V. T., Uimonen, M. M., Eskelinen, A., \u0026amp; Reito, A. (2021). Testing the proportional hazards assumption in cox regression and dealing with possible non-proportionality in total joint arthroplasty research: methodological perspectives and review. \u003cem\u003eBMC Musculoskeletal Disorders\u003c/em\u003e, \u003cem\u003e22\u003c/em\u003e(1), 1\u0026ndash;7. https://doi.org/10.1186/S12891-021-04379-2/TABLES/2\u003c/li\u003e\n\u003cli\u003eLawrenson, R., Lao, C., Elwood, M., Brown, C., Sarfati, D., \u0026amp; Campbell, I. (2016). Urban Rural Differences in Breast Cancer in New Zealand. \u003cem\u003eInternational Journal of Environmental Research and Public Health\u003c/em\u003e, \u003cem\u003e13\u003c/em\u003e(10). https://doi.org/10.3390/ijerph13101000\u003c/li\u003e\n\u003cli\u003eMihaylov, I., Nisheva, M., \u0026amp; Vassilev, D. (2019). Application of Machine Learning Models for Survival Prognosis in Breast Cancer Studies. \u003cem\u003eInformation 2019, Vol. 10, Page 93\u003c/em\u003e, \u003cem\u003e10\u003c/em\u003e(3), 93. https://doi.org/10.3390/INFO10030093\u003c/li\u003e\n\u003cli\u003eMontazeri, M., Montazeri, M., Montazeri, M., \u0026amp; Beigzadeh, A. (2016). Machine learning models in breast cancer survival prediction. \u003cem\u003eTechnology and Health Care : Official Journal of the European Society for Engineering and Medicine\u003c/em\u003e, \u003cem\u003e24\u003c/em\u003e(1), 31\u0026ndash;42. https://doi.org/10.3233/THC-151071\u003c/li\u003e\n\u003cli\u003eNajafi-Vosough, R., Faradmal, J., Tapak, L., Alafchi, B., Najafi-Ghobadi, K., \u0026amp; Mohammadi, T. (2022). Prediction the survival of patients with breast cancer using random survival forests for competing risks. \u003cem\u003eJournal of Preventive Medicine and Hygiene\u003c/em\u003e, \u003cem\u003e63\u003c/em\u003e(2), E298. https://doi.org/10.15167/2421-4248/JPMH2022.63.2.2405\u003c/li\u003e\n\u003cli\u003eNew Zealand. Te Aho o te Kahu. (2020). \u003cem\u003eThe state of cancer in New Zealand 2020.\u003c/em\u003e\u003c/li\u003e\n\u003cli\u003eNewman, L. (2023). Oncologic anthropology: Global variations in breast cancer risk, biology, and outcome. \u003cem\u003eJournal of Surgical Oncology\u003c/em\u003e, \u003cem\u003e128\u003c/em\u003e(6), 959\u0026ndash;966. https://doi.org/10.1002/JSO.27459\u003c/li\u003e\n\u003cli\u003eNoman, S. M., Fadel, Y. M., Henedak, M. T., Attia, N. A., Essam, M., Elmaasarawii, S., Fouad, F. A., Eltasawi, E. G., \u0026amp; Al-Atabany, W. (2025). Leveraging survival analysis and machine learning for accurate prediction of breast cancer recurrence and metastasis. \u003cem\u003eScientific Reports 2025 15:1\u003c/em\u003e, \u003cem\u003e15\u003c/em\u003e(1), 1\u0026ndash;16. https://doi.org/10.1038/s41598-025-87622-3\u003c/li\u003e\n\u003cli\u003eP\u0026ouml;lsterl, S., Sarasua, I., Guti\u0026eacute;rrez-Becker, B., \u0026amp; Wachinger, C. (2020). A wide and deep neural network for survival analysis from anatomical shape and tabular clinical data. \u003cem\u003eCommunications in Computer and Information Science\u003c/em\u003e, \u003cem\u003e1167 CCIS\u003c/em\u003e, 453\u0026ndash;464. https://doi.org/10.1007/978-3-030-43823-4_37/COVER\u003c/li\u003e\n\u003cli\u003eSashegyi, A., \u0026amp; Ferry, D. (2017). On the Interpretation of the Hazard Ratio and Communication of Survival Benefit. \u003cem\u003eThe Oncologist\u003c/em\u003e, \u003cem\u003e22\u003c/em\u003e(4), 484. https://doi.org/10.1634/THEONCOLOGIST.2016-0198\u003c/li\u003e\n\u003cli\u003eWellington: Ministry of Health. (2017). HISO 10001:2017 Ethnicity Data Protocols. In \u003cem\u003eEthnicity Data Protocols\u003c/em\u003e. Ministry of Health. chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/https://www.tewhatuora.govt.nz/assets/Our-health-system/Digital-health/Health-information-standards/HISO-10001-2017-Ethnicity-Data-Protocols.pdf\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[{"identity":"4c914d55-666b-4cba-bd0a-a275e8f9a0fd","identifier":"10.13039/501100001559","name":"Breast Cancer Foundation New Zealand","awardNumber":"Helena McAlpine Young Women’s Breast Cancer Study ","order_by":0}],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"University of Auckland","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Breast cancer, Survival prediction, Random Survival Forest, Cox proportional hazards, Machine learning, Tumour receptor subtype","lastPublishedDoi":"10.21203/rs.3.rs-5515692/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-5515692/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eAccurate prediction of breast cancer-specific survival is crucial for guiding personalized treatment decisions and improving patient outcomes. This study evaluated the performance of machine learning approaches (Random Survival Forest, RSF and Generalized Boosted Model, GBM) alongside traditional Cox proportional hazards models for predicting survival in 21,574 women diagnosed with stage I-IV breast cancer in New Zealand between 2000-2019. Performance comparisons using time-dependent Area Under the Curve and Brier score metrics demonstrated that RSF consistently outperformed both Cox regression variants and GBM across all time points. Distinct differences emerged in survival predictions between modelling approaches: RSF captured a sharper initial decline in survival for most tumour receptor subtypes and better differentiated the favourable prognosis of ER+/HER2- tumours compared to other subtypes. Notably, variable importance analysis revealed fundamentally different prognostic emphases between modelling approaches—disease stage dominated Cox model predictions while tumour receptor subtype most strongly influenced RSF predictions. These findings highlight how machine learning approaches can capture complex, nonlinear relationships between clinical variables and survival outcomes that may be missed by traditional statistical models. The complementary insights provided by different modelling approaches suggest potential value in their combined use for enhanced risk stratification and more tailored treatment planning in breast cancer management, particularly when accounting for tumour biological characteristics alongside conventional staging factors.\u003c/p\u003e","manuscriptTitle":"Evaluating Temporal Dynamics in Breast Cancer Survival Predictions with Machine Learning and Cox Regression Analysis","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-03-04 09:03:52","doi":"10.21203/rs.3.rs-5515692/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"9a421ba8-afba-42c3-8353-81079bf77aa7","owner":[],"postedDate":"March 4th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":45152585,"name":"Statistical Epidemiology"}],"tags":[],"updatedAt":"2025-03-04T09:03:52+00:00","versionOfRecord":[],"versionCreatedAt":"2025-03-04 09:03:52","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-5515692","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-5515692","identity":"rs-5515692","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00
unpaywall: last seen: 2026-05-24T02:00:01.246996+00:00

License: CC-BY-4.0