Machine learning identifies prognosticators of intracranial metastatic disease in patients with breast or lung cancer

preprint OA: closed
Full text JSON View at publisher
Full text 120,275 characters · extracted from preprint-html · click to expand
Machine learning identifies prognosticators of intracranial metastatic disease in patients with breast or lung cancer | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Machine learning identifies prognosticators of intracranial metastatic disease in patients with breast or lung cancer Sunit Das, Marco Istasy, Amol Verma, Katarzyna Jerzak This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-6247605/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted You are reading this latest preprint version Abstract Intracranial metastatic disease (IMD) is a devastating complication of cancer associated with high morbidity and mortality. Patients with breast and lung cancer have a particularly high risk of developing IMD. Early identification of individuals with breast or lung cancer at high risk for IMD development would enable targeted surveillance and timely intervention. In this study, we leverage machine learning (ML) algorithms to develop and validate predictive models for IMD risk using a population-based dataset of 143,341 patients with breast or lung cancer from Ontario, Canada, collected from 2010 to 2023. Our ML models outperform traditional statistical paradigms, demonstrating strong discriminative ability in predicting both global and five-year risk of IMD with area under the precision-recall curve values ranging from 0.75 to 0.85. We further employed Shapley Additive exPlanations analysis to elucidate the key predictors of IMD; histology, laterality and age emerged as significant factors for patients with breast cancer while tumour site, histology and sex predicted IMD among patients with lung cancer. These findings underscore the potential of ML algorithms to bolster personalised risk stratification and enable targeted surveillance for IMD in patients with metastatic cancer. Health sciences/Oncology/Cancer/Cancer models Health sciences/Oncology/Cancer/Metastasis Biological sciences/Computational biology and bioinformatics/Machine learning intracranial metastatic disease brain metastasis machine learning breast cancer lung cancer risk prediction Figures Figure 1 Figure 2 Figure 3 Figure 4 Introduction Intracranial metastatic disease (IMD) is estimated to occur in nearly 30% of patients with metastatic cancer 1 – 10 . The incidence of IMD is forecasted to increase as the population ages and advancements in cancer therapy extend patient survival 7 , 8 , 10 . Furthermore, IMD displays a tumour-dependent nature, with certain primary cancers exhibiting a higher propensity for metastatic spread to the brain than others. Breast and lung cancers are the leading sources of brain metastases and account for approximately 11% and 60% of all IMD cases, respectively 2 , 7 , 11 – 16 . IMD poses a significant burden on patient survival and quality-of-life (QoL) 17 – 20 . Without treatment, the average survival following development of IMD is less than two months 21 , 22 , while currently available treatments, including surgical resection, radiotherapy, and systemic therapies, on average extend survival to less than one year 16 , 21 , 23 – 30 . The complexity of treatment is further compounded by delays in identification: most patients with systemic cancer do not undergo brain imaging unless they manifest signs of neurological compromise, and more than 80% of patients have multiple brain metastases at the time of detection 31 . Late diagnosis is problematic as treatment of IMD with stereotactic radiosurgery, which has been shown to be effective in achieving intracranial disease control and improving QoL, depends on identification of brain metastases before they have grown to a large size 3 , 32 , 33 . Routine surveillance brain imaging of all patients with systemic cancer, however, is neither clinically sensible nor fiscally feasible 34 Consequently, there is a significant need to identify prognosticators of IMD to enable guided surveillance and early detection in patients at high risk. Several risk factors for IMD have been previously reported for patients with breast and lung cancers. These include demographic factors, such as younger age at diagnosis 35 – 38 , and tumour-specific features, such as high tumour grade and advanced tumour stage 36 , 39 – 42 . Certain characteristics specific to breast and lung cancers also significantly influence IMD risk. In breast cancer, ER-negative, HER2-positive and triple-negative subtypes 40 , 43 , 44 , as well as the presence of BRCA1/2 45,46 mutations, are associated with increased propensity for brain metastases; in lung cancer, sex 41 , 47 , histological subtype 37 , 47 and the presence of specific genetic alterations, such as EGFR, ROS1 ,and ALK rearrangements 41 , are also strongly associated with increased IMD risk. Despite the identification of these numerous risk factors, no series of predictors have been reliably correlated with the development of IMD. Indeed, traditional statistical methods have proven inadequate in offering a cogent stratification algorithm that can predict IMD risk as a function of patient- and tumour-specific features 1 , 40 , 41 , 48 – 50 . Consequently, there is a critical need for more sophisticated predictive models to identify patients at high risk of developing IMD. Machine Learning (ML) offers a promising alternative to traditional statistical methods, yet successful integration of ML models into clinical practice will require not only predictive accuracy but also model interpretability. Indeed, clinicians have been found to require a clear understanding of the underlying factors driving a model's predictions to confidently incorporate such tools into patient care 51 , 52 . In this study, we hypothesise that interpretable ML approaches can significantly improve the prediction of IMD in patients with breast or lung cancer. We present a comprehensive and cancer-agnostic ML paradigm with excellent discriminative ability to identify patients with breast or lung cancer at risk for IMD. Furthermore, by interrogating the decision-making process of our model, we demonstrate insight into the unique factors associated with metastatic spread. Results Patient Cohorts and Time-To-IMD Using the Ontario Cancer Registry, we identified 143,341 Ontario patients diagnosed with breast or lung cancer from 2010–2023. The breast cancer cohort consisted of 86,082 female patients, of whom 2,666 (3%) developed intracranial metastatic disease (IMD). The lung cancer cohort included 57,259 patients (Male = 28,558 [50%]: Female 28,701 [50%]), of whom 6,427 (11%) developed IMD. A chi-square test of independence revealed significant differences in the incidence of cranial metastasis between the patients with breast and lung cancer (X 2 (1, N = 134,428) = 4464.58, p < .001). While patients who initially present with non-metastatic (Stages 1–3) lung cancer were more likely to develop IMD than patients who initially present with non-metastatic breast cancer, the opposite held true for patients with breast or lung cancer diagnosed with metastatic (Stage 4) disease (Fig. 1 ). The mean time from initial diagnosis to IMD was 1,147 ± 894 days in the breast cancer cohort, compared to 446 ± 519 days in the lung cancer cohort. An independent-samples t-test confirmed a significant difference in the time to IMD between the two cohorts (t(9091) = -46.70, p < .001). Detailed demographic characteristics of the breast and lung cancer cohorts can be found in the supplemental material. Univariate Predictors of IMD To establish a foundation for model development, we first sought to assess the independent associations between previously reported covariates and the development of IMD in the context of breast or lung cancer primary cancer. Breast Cancer Cohort and IMD To assess the association between several covariates and IMD in patients with breast cancer, we performed a chi-square test of independence. Significant associations were found for tumour site, histology, stage, tumour grade, and tumour marker status (p < .01; Table 1 ). Laterality showed no significant association. Notably, IMD incidence varied across tumour sites, ranging from 3.5% in nipple lesions (ICD-10 C50.0) to 5.8% in axillary tail lesions (ICD-10 C50.6). Patients with lobular carcinoma had the lowest IMD rate (2.2%), followed by patients with infiltrating duct carcinoma (3.1%), infiltrating duct carcinoma with lobular features (each at 3.1%) and mixed subtypes (3.7%). Patients with Grade 1 tumours displayed low IMD risk (0.5%), with a stepwise increase in risk observed with higher grades. Similarly, a progressive increase in IMD incidence was observed with advancing stage, culminating in an 18.3% risk for patients with Stage 4 tumours. Cancer subtype was found also to be predictive of IMD, with HR-positive tumours exhibiting the lowest IMD rate (2.0%), compared to HER2-positive (5.0%) and triple-negative (7.0%) tumours. Detailed statistical comparisons between specific categories within each predictor are presented in Fig. 1 . Logistic regression revealed an inverse association of IMD risk with age at diagnosis (OR = 0.97, 95% CI [0.97–0.98], p < .001) and direct association with primary tumour size (OR = 1.01, 95% CI [1.01–1.02], p < .001). Lung Cancer Cohort and IMD Chi-square test of independence analysis showed all categorical covariates examined to be significantly associated with IMD: sex, site, histology, laterality, and stage (p < .01, Table 1 ). Males displayed a slightly lower IMD rate (10.7%) compared to females (11.8%). Overlapping lesion of the bronchus and lung (ICD-10 34.8) had the highest IMD incidence (11.8%), while lesion NOS (ICD-10 34.9) had the lowest (6.7%). Different histological subtypes also showed varying propensities for brain metastasis, with adenocarcinomas exhibiting the highest risk (13.8%) and squamous cell carcinoma the lowest (5.9%). Laterality showed a minor difference, with right-sided tumours having a slightly higher rate of brain metastasis (10.9%), compared to left-sided tumours (11.8%). Similar to breast cancer, the likelihood of brain metastasis increased with advancing stage, from 4.2% for Stage 1 to 13.0% for Stage 4. Detailed statistical comparisons between specific categories within each predictor are presented in Fig. 1 . Lastly, analogous to patients with breast cancer, logistic regression revealed age at diagnosis (OR = 0.96, 95% CI [0.95–0.96], p < .001) and tumour size (OR = 1.01, 95% CI [1.01–1.02], p < .001) to be associated with IMD. Machine Learning Models We derived four distinct machine learning models, two per cancer primary type, to predict the development of IMD in patients with breast or lung cancer. The models were tailored to address different aspects of risk prediction: 1) global prediction across the entire respective patient population who developed IMD at any point during the course of their disease; and 2) prediction of IMD development within a five-year window of initial diagnosis (Fig. 2 ). Global Classification Models Our initial approach involved developing global classification models to predict IMD across the full spectrum of the patient population separated by cancer primary. These models aimed to provide a comprehensive risk assessment by considering all available data up until IMD identification. For breast cancer, the model demonstrated an F1 score of 0.82 (Cross Validation [CV] 0.81 ± 0.01), precision of 0.95 (CV 0.94 ± 0.01), recall of 0.72 (CV 0.72 ± 0.01), AUC-PR of 0.85 (CV 0.83 ± 0.01), and AUC-ROC of 0.96 (CV 0.95 ± 0.01). Similarly, the lung cancer model showed robust performance with an F1 score of 0.74 (CV 0.72 ± 0.01), precision of 0.94 (CV 0.89 ± 0.01), recall of 0.60 (CV 0.61 ± 0.01), AUC-PR of 0.75 (CV 0.81 ± 0.01), and AUC-ROC of 0.87 (CV 0.81 ± 0.01). Five-Year Classification Models Recognizing the importance of time-specific predictions, we developed five-year classification models to predict IMD within a clinically relevant timeframe, once more separated by cancer primary. This approach excluded patients who died within the five-year window provided they did not have a diagnosis of IMD at the time of death, focusing instead on longer-term survivors who could potentially develop IMD. The five-year classification model for breast cancer demonstrated an F1 score of 0.73 (CV 0.68 ± 0.02), precision of 0.82 (CV 0.74 ± 0.02), recall of 0.65 (CV 0.63 ± 0.02), and an AUC-PR of 0.76 (CV 0.66 ± 0.01) and AUC-ROC of 0.96 (CV 0.96 ± 0.01). For lung cancer patients, the five-year model achieved an F1 score of 0.71 (CV 0.70 ± 0.01), precision of 0.91 (CV 0.89 ± 0.01), recall of 0.58 (CV 0.58 ± 0.01), and an AUC-PR of 0.73 (CV 0.72 ± 0.01) and AUC-ROC of 0.87 (CV 0.86 ± 0.01). Feature Importance of Machine-Learning Models To identify the most influential predictors of IMD and their consistency across prediction models, we assessed feature importance using SHapley Additive exPlanations (SHAP) values 53 to quantify the contribution of each feature to the model's output. For patients with breast cancer, global and five-year breast cancer models both consistently identified three top features: histology, laterality and age at diagnosis (Fig. 3 ). Histology emerged as a significant predictor in both models for breast cancer (H(3) = 41.00 for the global model, H(3) = 28.44 for the five-year model, Kruskal Wallis p < .001 for both), with post-hoc analyses revealing the mixed histological subtypes as associated with lower IMD risk (global model: mixed V. infiltrating ductal carcinoma (Mean Difference [MD] = 0.53), mixed V. infiltrating ductal carcinoma with lobular features (MD = 0.70), and mixed V. lobular carcinoma (MD = 0.39), all p < .01 using Tukey’s HSD with Bonferroni correction; five-year model: mixed V. infiltrating ductal carcinoma (MD = 0.29), p < .01 using Tukey’s HSD with Bonferroni correction). Age at diagnosis demonstrated a linear relationship with higher age correlating with increased risk in both the global (β = 0.11, R 2 = 0.62, p < .001) and five-year models (β = 0.04, R 2 = 0.61, p < .001). Notably, while laterality ranked as the second most important feature in both models, no significant differences in mean SHAP values were observed between right- or left-sided tumours. For patients with lung cancer, tumour site, histology and sex ranked as the top three influential features in both models (Fig. 4 ). Tumour site impacted IMD risk prediction (H(4) = 19.38 for the global model, H(4) = 23.61 for the five-year model, Kruskal Wallis p < .001 for both), with specific site comparisons exhibiting notable differences (global model: lower lobe, bronchus or lung V. overlapping lesion of bronchus and lung (MD = 0.46) and lower lobe, bronchus or lung V. bronchus or lung, NOS (MD = 0.47), all p < .01 using Tukey’s HSD with Bonferroni correction; five-year model: lower lobe, bronchus or lung V. overlapping lesion of bronchus and lung (MD = 0.66), upper lobe, bronchus or lung V. lower lobe, bronchus or lung (MD = 0.26) and middle lobe, bronchus or lung v. bronchus or lung, NOS (MD = 0.08), all p < .01 using Tukey’s HSD with Bonferroni correction). Histology was also found to be significant (H(3) = 136.81 for the global model, H(3) = 39.64 for the five-year model, Kruskal Wallis p < .001 for both; Fig. 4 ), with post-hoc examination exhibiting significant variations between all histology pairs in both models (all p < .001, Tukey’s HSD with Bonferroni correction). Interestingly, while the overall pattern of feature importance remained consistent, the magnitude of differences between histologies was less pronounced in the five-year model. Lastly, sex also emerged as a predictor (H(1) = 87.19 for the global model, H(1) = 81.20 for the five-year model, Kruskal Wallis p < .001 for both), with females displaying slightly higher mean SHAP values (-0.12 ± 1.15 and − 0.12 ± 1.06 for global and five-year models, respectively) compared to males (-0.20 ± 1.31 and − 0.22 ± 0.98 for global and five-year models, respectively). Discussion In this study, we leveraged machine learning (ML) to develop predictive models for intracranial metastatic disease (IMD) utilising a population-based administrative dataset with the goal to identify patients at high-risk. To achieve this, we developed two sets of models for patients with breast or lung cancer for global IMD risk prediction and IMD within five-years of initial diagnosis. Both sets of models demonstrated strong predictive performance. The global models exhibited positive predictive values of 0.95 and 0.94 and sensitivities of 0.72 and 0.60 for patients with breast and lung cancer, respectively. The five-year models maintained this performance with comparable positive predictive values of 0.82 and 0.91 and sensitivities of 0.65 and 0.58, respectively. The robust performance of our models in predicting the global and five-year risk of IMD development adds to a growing body of evidence underscoring the advantages of ML algorithms in predictive oncology 54 – 58 . Furthermore, these models identify key clinical indicators of IMD risk in patients with breast or lung cancer that align with, and expand upon, previously identified risk factors in the literature. In patients with breast cancer, mixed histology (i.e. those not otherwise captured as invasive ductal carcinoma, invasive lobular carcinoma, or invasive ductal carcinoma with lobular features) was a significant predictor of IMD, consistent with previous reports that have linked histological subtypes to metastatic potential 39 , 40 , 42 . The models also identified older age at diagnosis and right-sided tumor laterality as important predictors. Interestingly, our results revealed a discrepancy in the effect of age at diagnosis on IMD risk concordant with the current literature, in which some studies suggest younger age as a risk factor 35 – 38 , while others suggest the opposite 59 , 60 Our logistic regression analysis indicated a small yet statistically significant inverse association of age with IMD risk, while analysis of the ML models identified older age as contributing to increased risk. This discrepancy likely reflects the limitations of standard logistic regression and the assumption of linear and univariate relationships. The ML models suggest that, within specific subgroups defined by other multivariate characteristics, older age may indeed increase IMD risk. Lastly, the role of breast cancer laterality in IMD development has not been well-characterized. Our findings indicate that right-sided tumors may be associated with increased IMD risk. This observation warrants further investigation, particularly in light of emerging evidence regarding the distinct biological characteristics of right- and left-sided breast cancers 61 – 63 . In patients with lung cancer, tumour site and histological subtype were identified as key predictors of IMD, consistent with previous research demonstrating the influence of tumour site 32 , 64 and histology 16 , 18 , 64 on metastatic patterns. Additionally, the identification of sex as a significant predictor aligns with previous studies suggesting potential sex-related differences in the development of IMD 32 , 41 , 50 , 64 . While IMD is a relatively rare event in patients with metastatic cancer, its consequences are severe. Furthermore, the costs of default surveillance, both fiscally and practically, make global approaches to IMD surveillance unfeasible 65 . As such, there is significant clinical need for algorithms that can prospectively identify patients at increased risk for IMD who might benefit from clinical surveillance 7 , 17 – 20 . Our work describes the development and validation of ML models that can accurately predict the risk of IMD in patients with breast or lung cancer and potentially enable a more proactive, targeted and cost-effective approach to surveillance and management. Indeed, these models are designed to be used in a prospective clinical setting: they are based on features that may be readily collected by treating clinicians and require minimal computational power to deploy. This focus on practical application, along with model interpretability and an open-box framework, aims to build clinician trust and facilitate the integration of these predictive tools into routine practice 51 , 52 . While our study presents promising results, it is not without limitations. The retrospective nature of the dataset and reliance on administrative health data may introduce pitfalls related to data completeness and availability. Primarily, this is particularly relevant to the lung cancer cohort, where data on biomarkers such as ALK, ROS1, and EGFR status was unavailable. Secondarily, our study assumption that imputed data were missing completely at random; while to our knowledge this assumption is accurate, spurious correlations may be introduced were it not to hold. Furthermore, the generalizability of our models to populations beyond Ontario, Canada, necessitates further validation in diverse clinical settings. Lastly, future research should strive to explore the incorporation of additional dynamic variables, such as treatment responses and longitudinal clinical data, as well as biological and genetic data, previously implicated in IMD risk 40 , 41 , 43 , 45 – 47 . In conclusion, this study provides compelling evidence for the utility of ML to predict IMD in patients with breast or lung cancer. By providing accurate risk assessment and identifying key prognostic factors, our work can inform more effective and personalised cancer management strategies. Further validation across diverse populations and the integration of a broader range of data sources is needed to enhance model robustness and clinical utility. Notwithstanding, integrating such models into clinical practice holds significant potential for improving patient outcomes by facilitating early detection of IMD and reducing the overall burden of this serious complication. Methods Dataset Curation This study utilized a population-based dataset obtained from the Ontario Cancer Registry, which is the provincial database of information about all Ontario resident diagnosed with cancer. It is held at ICES, formerly the Institute for Clinical Evaluative Sciences. ICES is an independent, non-profit research institute whose legal status under Ontario’s health information privacy law allows it to collect and analyse health care and demographic data, without consent, for health system evaluation and improvement. The dataset utilised in this study included all patients from the Canadian province of Ontario patients from January 1, 2010 to August 30, 2023 with a diagnosis of breast cancer (ICD-10 C50.0-C50.9) or lung cancer (ICD-10 C34.0-C34.9) who were not excluded by the following criteria: 1) were under the age of 18 or over the age of 105 and 2) had more than one primary cancer diagnosis. To comprehensively assess patient comorbidities, this study employed the Johns Hopkins ACG® system, with a focus on Aggregated Diagnosis Groups (ADGs). Similar to previous work 66 , all diagnoses associated with hospital admissions and physician billing claims within two years prior to primary cancer diagnosis were identified for each patient. These diagnoses were then mapped to their corresponding ICD codes and categorized into 32 distinct ADGs. Following this, an aggregate ADG score and the probability of mortality at one year were calculated for each patient according to a framework published in the literature 66 , 67 . The dataset was linked using unique encoded identifiers and analysed at ICES Data Preprocessing The dataset was first split into training (49%), validation (21%), and test (30%) sets using random stratified sampling based on the development of IMD to ensure proportional representation of patients across sets. Missing values were then imputed separately within each set to prevent data leakage. For numeric features, missing values were imputed using the mean of the respective feature within the training set. For categorical features, missing values were imputed using the mode of the respective feature within the training set. These calculated training set statistics (mean and mode for each feature) were then applied to impute missing values in the corresponding features of the validation and test sets. Less than 30% of each feature required imputation. Outlier detection was not performed to preserve the full spectrum of clinical and demographic variability inherent in the respective patient population. Feature Space For both the breast and lung cancer cohort, 34 comorbidity-based features were included: the 32 distinct ADGs employed by the Johns Hopkins ACG® system as well as the aggregate ADG score and the probability of mortality at one year as previously reported in the literature 66 , 67 . In both cohorts, we also had the opportunity to assess the following variables: age at diagnosis and tumour size, site (according to the ICD-10 classification), histology, laterality, and stage. Information regarding hormone receptor, HER2, and triple-negative status and tumour grade were also included in the breast cancer cohort while sex was included in the lung cancer cohort. The selection of these features was guided by two primary considerations: first, a thorough review of the existing literature to identify established and potential risk factors associated with IMD; and second, a deliberate effort to maximize compatibility with standard clinical data, prioritizing readily accessible variables and avoiding specialized measurements Model Development We employed classification models leveraging gradient-boosted random forests (GBRF) implemented within the CatBoost 68 framework. Hyperparameter optimization was conducted using Optuna 69 , a Bayesian optimization library that searches the hyperparameter space for optimal model configurations. To ensure robust model evaluation and mitigate overfitting, a five-fold cross-validation (CV) strategy was adopted. To account for class imbalance in the dataset, specifically, the small proportion of patients who developed IMD, Synthetic Minority Over-Sampling Technique for Nominal and Continuous 70 was used, and the up-sampling ratio and other relevant parameters were further optimised using Optuna 69 . All model training and hyperparameter tuning were performed exclusively on the train-validation data split, with final performance assessments conducted on the held-out test. Models were optimised to maximise the F1 score, the harmonic mean of precision and recall. Model Evaluation A comprehensive evaluation of model performance was conducted using a suite of established metrics. The F1 score, which represents the harmonic mean of precision and recall and emphasizes a balance between minimising false positives and false negatives, served as the primary optimization metric. Additional metrics, including precision, recall, area under the precision-recall curve, and area under the receiver operating characteristic curve were also calculated to provide a comprehensive assessment of model performance. Following model training on the validation set within each CV fold, the optimal decision threshold was determined to maximise the F-score with a beta value of 0.75 This beta value reflects a deliberate prioritization of precision over recall and was deemed justified in this clinical context. Specifically, a false positive prediction of IMD could lead to unnecessary diagnostic procedures, increased healthcare costs, and patient anxiety. While a false negative is also undesirable, such patients would already be receiving the standard of care and would be monitored for their primary cancer and its potential complications Following the cross-validation process, final model performance was evaluated on a held-out test set comprising 30% of the original dataset that remained untouched during model development and hyperparameter tuning. Ethical Consideration ICES is a prescribed entity under Ontario’s Personal Health Information Protection Act (PHIPA). Section 45 of PHIPA authorizes ICES to collect personal health information, without consent, for the purpose of analysis or compiling statistical information with respect to the management of, evaluation or monitoring of, the allocation of resources to or planning for all or part of the health system. Projects that use data collected by ICES under section 45 of PHIPA, and use no other data, are exempt from REB review. The use of the data in this project is authorized under section 45 and approved by ICES’ Privacy and Legal Office. Declarations Data Availability The dataset from this study is held securely in coded form at ICES. While legal data sharing agreements between ICES and data providers (e.g., healthcare organizations and government) prohibit ICES from making the dataset publicly available, access may be granted to those who meet pre-specified criteria for confidential access, available at www.ices.on.ca/DAS (email: [email protected] ). The full dataset creation plan and underlying analytic code are available from the authors upon request, understanding that the computer programs may rely upon coding templates or macros that are unique to ICES and are therefore either inaccessible or may require modification. Acknowledgments This study was supported by ICES, which is funded by an annual grant from the Ontario Ministry of Health (MOH) and the Ministry of Long-Term Care (MLTC). MVI is also supported by the Graduate Diploma in Health Research at the University of Toronto. SD is supported by the Canadian Institute for Health Research. This document used data adapted from the Statistics Canada Postal Code OM Conversion File, which is based on data licensed from Canada Post Corporation, and/or data adapted from the Ontario Ministry of Health Postal Code Conversion File, which contains data copied under license from ©Canada Post Corporation and Statistics Canada. Parts of this material are based on data and/or information compiled and provided by: Ontario Health, Canadian Institute for Health Information, Ontario Ministry of Health, and The Johns Hopkins ACG® System Version 11. The analyses, conclusions, opinions and statements expressed herein are solely those of the authors and do not reflect those of the funding or data sources; no endorsement is intended or should be inferred. Author Contributions MVI and SD conceived and designed the study and acquired the data. MVI did the statistical analyses and developed, trained, and applied the artificial neural network. MVI, SD, and AV implemented quality control of the algorithms. All authors interpreted the analyzed data and aided in conclusion inference. MVI prepared the first draft of the manuscript. SD revised the manuscript. All authors contributed to manuscript preparation. Competing Interests The authors declare no competing interests. References Habbous S et al (2020) Incidence and real-world burden of brain metastases from solid tumors and hematologic malignancies in Ontario: a population-based study. Neuro-Oncol Adv 3:vdaa178 Kuksis M et al (2020) The incidence of brain metastases among patients with metastatic breast cancer: a systematic review and meta-analysis. Neuro-oncology 23, noaa285 Li AY et al (2022) Intracranial Metastatic Disease: Present Challenges, Future Opportunities. Front Oncol 12:855182 Liu JL, Walker EV, Paudel YR, Davis FG, Yuan Y (2022) Brain Metastases among Cancer Patients Diagnosed from 2010–2017 in Canada: Incidence Proportion at Diagnosis and Estimated Lifetime Incidence. Curr Oncol 29:2091–2105 Smedby KE, Brandt L, Bäcklund ML, Blomqvist P (2009) Brain metastases admissions in Sweden between 1987 and 2006. Brit J Cancer 101:1919–1924 Li AY et al (2022) Brain metastases in the setting of stable extracranial disease: A systematic review and meta-analysis. J Clin Oncol 40:2022–2022 Nayak L, Lee EQ, Wen PY (2012) Epidemiology of Brain Metastases. Curr Oncol Rep 14:48–54 Cagney DN et al (2017) Incidence and prognosis of patients with brain metastases at diagnosis of systemic malignancy: a population-based study. Neuro-Oncol 19:1511–1521 Barnholtz-Sloan JS et al (2004) Incidence Proportions of Brain Metastases in Patients Diagnosed (1973 to 2001) in the Metropolitan Detroit Cancer Surveillance System. J Clin Oncol 22:2865–2872 Achrol AS et al (2019) Brain metastases. Nat Rev Dis Primers 5:5 Boogerd W, Vos VW, Hart AAM, Baris G (1993) Brain metastases in breast cancer; natural history, prognostic factors and outcome. J Neuro-oncol 15:165–174 Witzel I, Oliveira-Ferrer L, Pantel K, Müller V, Wikman H (2016) Breast cancer brain metastases: biology and new clinical perspectives. Breast Cancer Res 18:8 Leone JP, Leone BA (2015) Breast cancer brain metastases: the last frontier. Exp Hematol Oncol 4:33 D’Antonio C et al (2014) Bone and brain metastasis in lung cancer: recent advances in therapeutic strategies. Ther Adv Méd Oncol 6:101–114 Goldberg SB, Contessa JN, Omay SB (2015) Chiang, V. Lung Cancer Brain Metastases. Cancer J 21:398–403 Rittberg R, Banerji S, Kim JO, Rathod S, Dawe DE (2021) Treatment and Prevention of Brain Metastases in Small Cell Lung Cancer. Am J Clin Oncol 44:629–638 Saria MG et al (2017) The Hidden Morbidity of Cancer Burden in Caregivers of Patients with Brain Metastases. Nurs Clin North Am 52:159–178 Sacks P, Rahman M (2020) Epidemiology of Brain Metastases. Neurosurg Clin North Am 31:481–488 Mukand JA, Blackinton DD, Crincoli MG, Lee JJ, Santos BB (2001) Incidence of Neurologic Deficits and Rehabilitation of Patients with Brain Tumors. Am J Phys Med Rehabilitation 80:346–350 Marciniak CM, Sliwa JA, Heinemann AW, Semik PE (2001) Functional outcomes of persons with brain tumors after inpatient rehabilitation. Arch Phys Med Rehabilitation 82:457–463 Soffietti R, Rudā R, Mutani R (2002) Management of brain metastases. J Neurol 249:1357–1369 Arvold ND et al (2016) Updates in the management of brain metastases. Neuro-Oncol 18:1043–1065 Erickson AW, Das S (2019) The Impact of Targeted Therapy on Intracranial Metastatic Disease Incidence and Survival. Front Oncol 9:797 Erickson AW et al (2021) Assessing the Association of Targeted Therapy and Intracranial Metastatic Disease. Jama Oncol 7:1220–1224 Erickson AW et al (2020) HER2-targeted therapy prolongs survival in patients with HER2-positive breast cancer and intracranial metastatic disease: a systematic review and meta-analysis. Neuro-oncology Adv 2:vdaa136 Tsao MN et al (2018) Whole brain radiotherapy for the treatment of newly diagnosed multiple brain metastases. Cochrane Database Syst Rev 1:CD003869 Scoccianti S, Ricardi U (2012) Treatment of brain metastases: Review of phase III randomized controlled trials. Radiother Oncol 102:168–179 Khuntia D, Brown P, Li J, Mehta MP (2006) Whole-Brain Radiotherapy in the Management of Brain Metastasis. J Clin Oncol 24:1295–1304 Langer CJ, Mehta MP (2005) Current Management of Brain Metastases, With a Focus on Systemic Options. J Clin Oncol 23:6207–6219 Lin NU (2013) Targeted Therapies in Brain Metastases. Curr Treat Options Neurol 16:276 Markesbery WR, Brooks WH, Gupta GD, Young AB (1978) Treatment for Patients With Cerebral Metastases. Arch Neurol 35:754–756 Myall NJ, Yu H, Soltys SG, Wakelee HA, Pollom E (2021) Management of brain metastases in lung cancer: evolving roles for radiation and systemic treatment in the era of targeted and immune therapies. Neuro-Oncol Adv 3:v52–v62 Gaebe K et al (2024) A Population-Based Analysis of Brain Metastasis Burden and Management in 8705 Small Cell Lung Cancer Patients. Neurosurgery 70:204–204 Stevens SP et al (2018) The utility of routine surveillance screening with magnetic resonance imaging (MRI) to detect tumour recurrence in children with low-grade central nervous system (CNS) tumours: a systematic review. J Neuro-Oncol 139:507–522 Graesslin O et al (2010) Nomogram to Predict Subsequent Brain Metastasis in Patients With Metastatic Breast Cancer. J Clin Oncol 28:2032–2037 Aversa C et al (2014) Metastatic breast cancer subtypes and central nervous system metastases. Breast 23:623–628 Gaspar LE et al (2005) Time From Treatment to Subsequent Diagnosis of Brain Metastases in Stage III Non–Small-Cell Lung Cancer: A Retrospective Review by the Southwest Oncology Group. J Clin Oncol 23:2955–2961 Carolan H et al (2005) Does the incidence and outcome of brain metastases in locally advanced non-small cell lung cancer justify prophylactic cranial irradiation or early detection? Lung Cancer 49:109–115 Hung M-H et al (2014) Effect of Age and Biological Subtype on the Risk and Timing of Brain Metastasis in Breast Cancer Patients. PLoS ONE 9:e89389 Koniali L et al (2020) Risk factors for breast cancer brain metastases: a systematic review. Oncotarget 11:650–669 Ding X et al (2012) Risk factors of brain metastases in completely resected pathological stage IIIA-N2 non-small cell lung cancer. Radiat Oncol 7:119 Bajard A et al (2004) Multivariate analysis of factors predictive of brain metastases in localised non-small cell lung carcinoma. Lung Cancer 45:317–323 Tonyali O et al (2016) Risk factors for brain metastasis as a first site of disease recurrence in patients with HER2 positive early stage breast cancer treated with adjuvant trastuzumab. Breast 25:22–26 Ahmed KA et al (2025) Phase II trial of brain MRI surveillance in stage IV breast cancer. Neuro-Oncol. noaf018 10.1093/neuonc/noaf018 Peshkin BN, Alabek ML, Isaacs C (2011) BRCA1/2 mutations and triple negative breast cancers. Breast Dis 32:25–33 Zavitsanos PJ et al (2018) BRCA1 Mutations Associated With Increased Risk of Brain Metastases in Breast Cancer. Am J Clin Oncol 41:1252–1256 An N et al (2018) Risk factors for brain metastases in patients with non–small-cell lung cancer. Cancer Med 7:6357–6364 Heitz F et al (2011) Cerebral metastases in metastatic breast cancer: disease-specific risk factors and survival. Ann Oncol 22:1571–1581 Minisini AM et al (2013) Risk factors and survival outcomes in patients with brain metastases from breast cancer. Clin Exp Metastas 30:951–956 He J et al (2021) Risk factors for brain metastases from non-small-cell lung cancer. Medicine 100:e24724 Istasy P et al (2021) The Impact of Artificial Intelligence on Health Equity in Oncology: A Scoping Review. Blood 138:4934 Istasy P et al (2022) The Impact of Artificial Intelligence on Health Equity in Oncology: Scoping Review. J Méd Internet Res 24:e39748 Lundberg SM et al (2020) From local explanations to global understanding with explainable AI for trees. Nat Mach Intell 2:56–67 Zhang B, Shi H, Wang H (2023) Machine Learning and AI in Cancer Prognosis, Prediction, and Treatment Selection: A Critical Approach. J Multidiscip Healthc 16:1779–1791 Zhang H et al (2023) Application of Deep Learning in Cancer Prognosis Prediction Model. Technol Cancer Res Treat 22:15330338231199288 Kourou K, Exarchos TP, Exarchos KP, Karamouzis MV, Fotiadis (2015) D. I. Machine learning applications in cancer prognosis and prediction. Comput Struct Biotechnol J 13:8–17 Zhu W, Xie L, Han J, Guo X (2020) The Application of Deep Learning in Cancer Prognosis Prediction. Cancers 12:603 Jiang Y et al (2023) Biology-guided deep learning predicts prognosis and cancer immunotherapy response. Nat Commun 14:5135 Brogi E et al (2011) Breast carcinoma with brain metastases: clinical analysis and immunoprofile on tissue microarrays. Ann Oncol 22:2597–2603 Shen Q et al (2015) Breast Cancer With Brain Metastases: Clinicopathologic Features, Survival, and Paired Biomarker Analysis. Oncol 20:466–473 Abdou Y et al (2022) Left sided breast cancer is associated with aggressive biology and worse outcomes than right sided breast cancer. Sci Rep 12:13377 Barbara R-C et al (2020) Divergent Impact of Breast Cancer Laterality on Clinicopathological, Angiogenic, and Hemostatic Profiles: A Potential Role of Tumor Localization in Future Outcomes. J Clin Med 9:1708 Wilting J, Hagedorn M (2011) Left-Right Asymmetry in Embryonic Development and Breast Cancer: Common Molecular Determinants? Curr Med Chem 18:5519–5527 Hao Y, Li G (2023) Risk and prognostic factors of brain metastasis in lung cancer patients: a Surveillance, Epidemiology, and End Results population–based cohort study. Eur J Cancer Prev 32:498–511 Crooks J et al (2024) Cost of Treatment for Brain Metastases Using Data From a National Health Insurance. Adv Radiat Oncol 9:101438 Austin PC, van Walraven C, Wodchis WP, Newman A, Anderson GM (2011) Using the Johns Hopkins Aggregated Diagnosis Groups (ADGs) to Predict Mortality in a General Adult Population Cohort in Ontario, Canada. Méd Care 49:932–939 Austin PC, van Walraven C (2011) The Mortality Risk Score and the ADG Score. Méd Care 49:940–947 Prokhorenkova L, Gusev G, Vorobev A, Dorogush AV, Gulin A (2017) CatBoost: unbiased boosting with categorical features. arXiv Akiba T, Sano S, Yanase T, Ohta T, Koyama M (2019) Optuna: A Next-generation Hyperparameter Optimization Framework. arXiv Lemaitre G, Nogueira F, Aridas CK (2016) Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning. arXiv Tables Table 1: Chi-square test results for association between covariates and intracranial metastatic disease in breast and lung cancer patients. Covariate Chi-Square (X 2 ) Degrees of Freedom P-Value Breast Cancer Tumour Site 59.98 8 <.01 Histology 32.35 3 <.01 Stage 4817.09 3 <.01 Tumour Grade 608.43 3 <.01 Tumour Marker Status 672 2 <.01 Laterality 0.48 1 0.49 Lung Cancer Sex 17.97 1 < 0.01 Site 161.04 4 < 0.01 Histology 457.36 2 < 0.01 Laterality 12.43 1 < 0.01 Stage 1042.71 3 < 0.01 Additional Declarations There is NO Competing Interest. Supplementary Files SupplementaryTables.docx Cite Share Download PDF Status: Under Review Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-6247605","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":432088377,"identity":"4d894b42-15fd-471c-8226-ed75e9e16d93","order_by":0,"name":"Sunit Das","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA20lEQVRIiWNgGAWjYJCCAwwMzAwM7A2kaDkA0sJzgBGsiYdIa4BaJBKI1CI/I/fg4Q8M1vLyM9+YP/i4xybPnr2B8cMPPFoMbuQlAB2Wbrjhdo5h44xnacU8PAeYJXvwaZHIMQBqOcy4QTrHsJnnwOHEHokENryuk58B0WI/f+YZhBbGP/g8cwOiJbHhBg9CCzM+WwzOvDE4cMYgPXnDmbTCmTMOpCX2nDnYLC2Dz2HtOcYfKiqsbee3H97w4cMBm8T29uaDH9/gcxjELhQeJH5GwSgYBaNgFFAAAL80UrUz2VF/AAAAAElFTkSuQmCC","orcid":"https://orcid.org/0000-0002-2146-4168","institution":"University of Toronto","correspondingAuthor":true,"prefix":"","firstName":"Sunit","middleName":"","lastName":"Das","suffix":""},{"id":432088378,"identity":"7f9da33f-1379-41ff-af38-1df2ac423729","order_by":1,"name":"Marco Istasy","email":"","orcid":"","institution":"University of Toronto","correspondingAuthor":false,"prefix":"","firstName":"Marco","middleName":"","lastName":"Istasy","suffix":""},{"id":432088379,"identity":"9b85d416-1b06-4f43-b6e0-519ce6012d66","order_by":2,"name":"Amol Verma","email":"","orcid":"","institution":"University of Toronto","correspondingAuthor":false,"prefix":"","firstName":"Amol","middleName":"","lastName":"Verma","suffix":""},{"id":432088380,"identity":"1d78174f-c629-4413-914d-7852f097b173","order_by":3,"name":"Katarzyna Jerzak","email":"","orcid":"","institution":"University of Toronto","correspondingAuthor":false,"prefix":"","firstName":"Katarzyna","middleName":"","lastName":"Jerzak","suffix":""}],"badges":[],"createdAt":"2025-03-17 20:55:39","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-6247605/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-6247605/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":82120847,"identity":"ae8dcbe8-f38c-4c1b-a0b4-2c5ff13fc49f","added_by":"auto","created_at":"2025-05-07 03:20:53","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":433872,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eAssociation of various putative predictors with the development of intracranial metastatic disease (IMD).\u003c/strong\u003ePercentages represent the proportion of each group within each category which ultimately developed IMD. Markers represent statistical significance under chi-squared examination (p\u0026lt;.05). Histology) IDC = Infiltrating Ductal Carcinoma. Site) Coded as per the ICD-10 classification. Morphological mappings for each code are presented in Tables 1 and 2 for breast and lung cancer, respectively.\u003c/p\u003e","description":"","filename":"floatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-6247605/v1/e9edffb01e7d9cae9df3f9fe.png"},{"id":82120851,"identity":"526c75ac-6735-43bd-9cca-29bb3a35ea4f","added_by":"auto","created_at":"2025-05-07 03:20:53","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":537851,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eVarious quantifications of the performance of the four classification models presented in this study: two predicting the global risk of developing intracranial metastatic disease in a breast and lung cancer cohort, respectively, and two predicting the risk thereof within a five-year time frame.\u003c/strong\u003e A) Area under the precision recall curve. B) Area under the receiver operating characteristic. C) Calibration curves, in which the grey faded diagonal line represents perfect calibration. D) Classification matrices on the held-out test set.\u003c/p\u003e","description":"","filename":"floatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-6247605/v1/93f9d8b9cac867febd29a1a3.png"},{"id":82120018,"identity":"708e82b4-9ca2-4b7f-bbbb-fd8ccda50e2a","added_by":"auto","created_at":"2025-05-07 03:12:53","extension":"jpeg","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":401712,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eSHAP value analysis for global and five-year breast cancer model on the held-out validation set.\u003c/strong\u003e A) The top features identified by SHAP values in the global (top) and five-year (bottom) breast cancer models. Histology, laterality, and age at diagnosis consistently rank as the top three features in both models. Blue text denotes variables from the Johns Hopkins ACG® system. B) Violin plots of the top three features denoted by each model. In the global model (top row), histology shows significant variability with mixed histological subtypes associated with decreased IMD risk. Age at diagnosis demonstrates a linear relationship with higher age correlating with increased risk. Laterality does not show significant differences between right- or left-sided tumours. Similar patterns are observed in the five-year model (bottom row). Bars denote statistical significance under chi-squared examination (p\u0026lt;.05). Distributions are plotted for both the total set (grey) and the subset that developed intracranial metastatic diseases (IMD).\u003c/p\u003e","description":"","filename":"floatimage3.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-6247605/v1/0d8c88e82ca73686152a7bc5.jpeg"},{"id":82120020,"identity":"7278e30e-1070-4b02-b5f0-25a2f192b932","added_by":"auto","created_at":"2025-05-07 03:12:53","extension":"jpeg","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":335150,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eSHAP value analysis for global and five-year lung cancer model on the held-out validation set. \u003c/strong\u003eA) The top features identified by SHAP values in the global (top) and five-year (bottom) lung cancer models. Site, histology, and sex consistently rank as the top three features in both models. Blue text denotes variables from the Johns Hopkins ACG® system. B) Violin plots of the top three features denoted by each model. In the global model (top row), tumour sites shows significant variability, with notable differences between specific site comparisons. Histology also wields substantial influence, with significant variations between all histology pairs. Sex differences are evident, with females displaying slightly higher mean SHAP values compared to males. Similar patterns are observed in the five-year model (bottom row). Bars denote statistical significance under chi-squared examination (p\u0026lt;.05). Distributions are plotted for both the total set (grey) and the subset that developed intracranial metastatic diseases (IMD).\u003c/p\u003e","description":"","filename":"floatimage4.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-6247605/v1/788e58207efa9af9334a9502.jpeg"},{"id":82123938,"identity":"6320fb61-9513-4c75-939b-a79fdbd131b3","added_by":"auto","created_at":"2025-05-07 03:36:58","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":2117711,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6247605/v1/3b792e00-d038-47a1-baba-d1131773181f.pdf"},{"id":82120016,"identity":"9b30ad23-36e0-4760-a238-f42c7536c37b","added_by":"auto","created_at":"2025-05-07 03:12:53","extension":"docx","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":17559,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryTables.docx","url":"https://assets-eu.researchsquare.com/files/rs-6247605/v1/27370dc35d62a6ea33775b2a.docx"}],"financialInterests":"There is \u003cb\u003eNO\u003c/b\u003e Competing Interest.","formattedTitle":"Machine learning identifies prognosticators of intracranial metastatic disease in patients with breast or lung cancer","fulltext":[{"header":"Introduction","content":"\u003cp\u003eIntracranial metastatic disease (IMD) is estimated to occur in nearly 30% of patients with metastatic cancer\u003csup\u003e\u003cspan additionalcitationids=\"CR2 CR3 CR4 CR5 CR6 CR7 CR8 CR9\" citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e\u003c/sup\u003e. The incidence of IMD is forecasted to increase as the population ages and advancements in cancer therapy extend patient survival\u003csup\u003e\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e,\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e,\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e\u003c/sup\u003e. Furthermore, IMD displays a tumour-dependent nature, with certain primary cancers exhibiting a higher propensity for metastatic spread to the brain than others. Breast and lung cancers are the leading sources of brain metastases and account for approximately 11% and 60% of all IMD cases, respectively\u003csup\u003e\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e,\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e,\u003cspan additionalcitationids=\"CR12 CR13 CR14 CR15\" citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eIMD poses a significant burden on patient survival and quality-of-life (QoL)\u003csup\u003e\u003cspan additionalcitationids=\"CR18 CR19\" citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e\u003c/sup\u003e. Without treatment, the average survival following development of IMD is less than two months\u003csup\u003e\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e,\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e\u003c/sup\u003e, while currently available treatments, including surgical resection, radiotherapy, and systemic therapies, on average extend survival to less than one year\u003csup\u003e\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e,\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e,\u003cspan additionalcitationids=\"CR24 CR25 CR26 CR27 CR28 CR29\" citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e\u003c/sup\u003e. The complexity of treatment is further compounded by delays in identification: most patients with systemic cancer do not undergo brain imaging unless they manifest signs of neurological compromise, and more than 80% of patients have multiple brain metastases at the time of detection\u003csup\u003e\u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e\u003c/sup\u003e. Late diagnosis is problematic as treatment of IMD with stereotactic radiosurgery, which has been shown to be effective in achieving intracranial disease control and improving QoL, depends on identification of brain metastases before they have grown to a large size\u003csup\u003e\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e,\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e,\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e\u003c/sup\u003e. Routine surveillance brain imaging of all patients with systemic cancer, however, is neither clinically sensible nor fiscally feasible\u003csup\u003e\u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e\u003c/sup\u003e Consequently, there is a significant need to identify prognosticators of IMD to enable guided surveillance and early detection in patients at high risk.\u003c/p\u003e \u003cp\u003eSeveral risk factors for IMD have been previously reported for patients with breast and lung cancers. These include demographic factors, such as younger age at diagnosis\u003csup\u003e\u003cspan additionalcitationids=\"CR36 CR37\" citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e\u003c/sup\u003e, and tumour-specific features, such as high tumour grade and advanced tumour stage\u003csup\u003e\u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e,\u003cspan additionalcitationids=\"CR40 CR41\" citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e\u003c/sup\u003e. Certain characteristics specific to breast and lung cancers also significantly influence IMD risk. In breast cancer, ER-negative, HER2-positive and triple-negative subtypes\u003csup\u003e\u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e40\u003c/span\u003e,\u003cspan citationid=\"CR43\" class=\"CitationRef\"\u003e43\u003c/span\u003e,\u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e44\u003c/span\u003e\u003c/sup\u003e, as well as the presence of BRCA1/2\u003csup\u003e45,46\u003c/sup\u003e mutations, are associated with increased propensity for brain metastases; in lung cancer, sex\u003csup\u003e\u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e,\u003cspan citationid=\"CR47\" class=\"CitationRef\"\u003e47\u003c/span\u003e\u003c/sup\u003e, histological subtype\u003csup\u003e\u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e37\u003c/span\u003e,\u003cspan citationid=\"CR47\" class=\"CitationRef\"\u003e47\u003c/span\u003e\u003c/sup\u003e and the presence of specific genetic alterations, such as EGFR, ROS1 ,and ALK rearrangements\u003csup\u003e\u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e\u003c/sup\u003e, are also strongly associated with increased IMD risk.\u003c/p\u003e \u003cp\u003eDespite the identification of these numerous risk factors, no series of predictors have been reliably correlated with the development of IMD. Indeed, traditional statistical methods have proven inadequate in offering a cogent stratification algorithm that can predict IMD risk as a function of patient- and tumour-specific features\u003csup\u003e\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e,\u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e40\u003c/span\u003e,\u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e,\u003cspan additionalcitationids=\"CR49\" citationid=\"CR48\" class=\"CitationRef\"\u003e48\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR50\" class=\"CitationRef\"\u003e50\u003c/span\u003e\u003c/sup\u003e. Consequently, there is a critical need for more sophisticated predictive models to identify patients at high risk of developing IMD. Machine Learning (ML) offers a promising alternative to traditional statistical methods, yet successful integration of ML models into clinical practice will require not only predictive accuracy but also model interpretability. Indeed, clinicians have been found to require a clear understanding of the underlying factors driving a model's predictions to confidently incorporate such tools into patient care\u003csup\u003e\u003cspan citationid=\"CR51\" class=\"CitationRef\"\u003e51\u003c/span\u003e,\u003cspan citationid=\"CR52\" class=\"CitationRef\"\u003e52\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eIn this study, we hypothesise that interpretable ML approaches can significantly improve the prediction of IMD in patients with breast or lung cancer. We present a comprehensive and cancer-agnostic ML paradigm with excellent discriminative ability to identify patients with breast or lung cancer at risk for IMD. Furthermore, by interrogating the decision-making process of our model, we demonstrate insight into the unique factors associated with metastatic spread.\u003c/p\u003e"},{"header":"Results","content":"\u003cp\u003ePatient Cohorts and Time-To-IMD\u003c/p\u003e \u003cp\u003eUsing the Ontario Cancer Registry, we identified 143,341 Ontario patients diagnosed with breast or lung cancer from 2010\u0026ndash;2023. The breast cancer cohort consisted of 86,082 female patients, of whom 2,666 (3%) developed intracranial metastatic disease (IMD). The lung cancer cohort included 57,259 patients (Male\u0026thinsp;=\u0026thinsp;28,558 [50%]: Female 28,701 [50%]), of whom 6,427 (11%) developed IMD. A chi-square test of independence revealed significant differences in the incidence of cranial metastasis between the patients with breast and lung cancer (X\u003csup\u003e\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e\u003c/sup\u003e (1, N\u0026thinsp;=\u0026thinsp;134,428)\u0026thinsp;=\u0026thinsp;4464.58, p\u0026thinsp;\u0026lt;\u0026thinsp;.001). While patients who initially present with non-metastatic (Stages 1\u0026ndash;3) lung cancer were more likely to develop IMD than patients who initially present with non-metastatic breast cancer, the opposite held true for patients with breast or lung cancer diagnosed with metastatic (Stage 4) disease (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). The mean time from initial diagnosis to IMD was 1,147\u0026thinsp;\u0026plusmn;\u0026thinsp;894 days in the breast cancer cohort, compared to 446\u0026thinsp;\u0026plusmn;\u0026thinsp;519 days in the lung cancer cohort. An independent-samples t-test confirmed a significant difference in the time to IMD between the two cohorts (t(9091) = -46.70, p\u0026thinsp;\u0026lt;\u0026thinsp;.001). Detailed demographic characteristics of the breast and lung cancer cohorts can be found in the supplemental material.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eUnivariate Predictors of IMD\u003c/p\u003e \u003cp\u003eTo establish a foundation for model development, we first sought to assess the independent associations between previously reported covariates and the development of IMD in the context of breast or lung cancer primary cancer.\u003c/p\u003e \u003cp\u003eBreast Cancer Cohort and IMD\u003c/p\u003e \u003cp\u003eTo assess the association between several covariates and IMD in patients with breast cancer, we performed a chi-square test of independence. Significant associations were found for tumour site, histology, stage, tumour grade, and tumour marker status (p\u0026thinsp;\u0026lt;\u0026thinsp;.01; Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). Laterality showed no significant association.\u003c/p\u003e \u003cp\u003eNotably, IMD incidence varied across tumour sites, ranging from 3.5% in nipple lesions (ICD-10 C50.0) to 5.8% in axillary tail lesions (ICD-10 C50.6). Patients with lobular carcinoma had the lowest IMD rate (2.2%), followed by patients with infiltrating duct carcinoma (3.1%), infiltrating duct carcinoma with lobular features (each at 3.1%) and mixed subtypes (3.7%). Patients with Grade 1 tumours displayed low IMD risk (0.5%), with a stepwise increase in risk observed with higher grades. Similarly, a progressive increase in IMD incidence was observed with advancing stage, culminating in an 18.3% risk for patients with Stage 4 tumours. Cancer subtype was found also to be predictive of IMD, with HR-positive tumours exhibiting the lowest IMD rate (2.0%), compared to HER2-positive (5.0%) and triple-negative (7.0%) tumours. Detailed statistical comparisons between specific categories within each predictor are presented in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e.\u003c/p\u003e \u003cp\u003eLogistic regression revealed an inverse association of IMD risk with age at diagnosis (OR\u0026thinsp;=\u0026thinsp;0.97, 95% CI [0.97\u0026ndash;0.98], p\u0026thinsp;\u0026lt;\u0026thinsp;.001) and direct association with primary tumour size (OR\u0026thinsp;=\u0026thinsp;1.01, 95% CI [1.01\u0026ndash;1.02], p\u0026thinsp;\u0026lt;\u0026thinsp;.001).\u003c/p\u003e \u003cp\u003eLung Cancer Cohort and IMD\u003c/p\u003e \u003cp\u003eChi-square test of independence analysis showed all categorical covariates examined to be significantly associated with IMD: sex, site, histology, laterality, and stage (p\u0026thinsp;\u0026lt;\u0026thinsp;.01, Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eMales displayed a slightly lower IMD rate (10.7%) compared to females (11.8%). Overlapping lesion of the bronchus and lung (ICD-10 34.8) had the highest IMD incidence (11.8%), while lesion NOS (ICD-10 34.9) had the lowest (6.7%). Different histological subtypes also showed varying propensities for brain metastasis, with adenocarcinomas exhibiting the highest risk (13.8%) and squamous cell carcinoma the lowest (5.9%). Laterality showed a minor difference, with right-sided tumours having a slightly higher rate of brain metastasis (10.9%), compared to left-sided tumours (11.8%). Similar to breast cancer, the likelihood of brain metastasis increased with advancing stage, from 4.2% for Stage 1 to 13.0% for Stage 4. Detailed statistical comparisons between specific categories within each predictor are presented in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e.\u003c/p\u003e \u003cp\u003eLastly, analogous to patients with breast cancer, logistic regression revealed age at diagnosis (OR\u0026thinsp;=\u0026thinsp;0.96, 95% CI [0.95\u0026ndash;0.96], p\u0026thinsp;\u0026lt;\u0026thinsp;.001) and tumour size (OR\u0026thinsp;=\u0026thinsp;1.01, 95% CI [1.01\u0026ndash;1.02], p\u0026thinsp;\u0026lt;\u0026thinsp;.001) to be associated with IMD.\u003c/p\u003e \u003cp\u003eMachine Learning Models\u003c/p\u003e \u003cp\u003eWe derived four distinct machine learning models, two per cancer primary type, to predict the development of IMD in patients with breast or lung cancer. The models were tailored to address different aspects of risk prediction: 1) global prediction across the entire respective patient population who developed IMD at any point during the course of their disease; and 2) prediction of IMD development within a five-year window of initial diagnosis (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eGlobal Classification Models\u003c/p\u003e \u003cp\u003eOur initial approach involved developing global classification models to predict IMD across the full spectrum of the patient population separated by cancer primary. These models aimed to provide a comprehensive risk assessment by considering all available data up until IMD identification.\u003c/p\u003e \u003cp\u003eFor breast cancer, the model demonstrated an F1 score of 0.82 (Cross Validation [CV] 0.81\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01), precision of 0.95 (CV 0.94\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01), recall of 0.72 (CV 0.72\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01), AUC-PR of 0.85 (CV 0.83\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01), and AUC-ROC of 0.96 (CV 0.95\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01). Similarly, the lung cancer model showed robust performance with an F1 score of 0.74 (CV 0.72\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01), precision of 0.94 (CV 0.89\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01), recall of 0.60 (CV 0.61\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01), AUC-PR of 0.75 (CV 0.81\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01), and AUC-ROC of 0.87 (CV 0.81\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01).\u003c/p\u003e \u003cp\u003eFive-Year Classification Models\u003c/p\u003e \u003cp\u003eRecognizing the importance of time-specific predictions, we developed five-year classification models to predict IMD within a clinically relevant timeframe, once more separated by cancer primary. This approach excluded patients who died within the five-year window provided they did not have a diagnosis of IMD at the time of death, focusing instead on longer-term survivors who could potentially develop IMD.\u003c/p\u003e \u003cp\u003eThe five-year classification model for breast cancer demonstrated an F1 score of 0.73 (CV 0.68\u0026thinsp;\u0026plusmn;\u0026thinsp;0.02), precision of 0.82 (CV 0.74\u0026thinsp;\u0026plusmn;\u0026thinsp;0.02), recall of 0.65 (CV 0.63\u0026thinsp;\u0026plusmn;\u0026thinsp;0.02), and an AUC-PR of 0.76 (CV 0.66\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01) and AUC-ROC of 0.96 (CV 0.96\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01). For lung cancer patients, the five-year model achieved an F1 score of 0.71 (CV 0.70\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01), precision of 0.91 (CV 0.89\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01), recall of 0.58 (CV 0.58\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01), and an AUC-PR of 0.73 (CV 0.72\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01) and AUC-ROC of 0.87 (CV 0.86\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01).\u003c/p\u003e \u003cp\u003eFeature Importance of Machine-Learning Models\u003c/p\u003e \u003cp\u003eTo identify the most influential predictors of IMD and their consistency across prediction models, we assessed feature importance using SHapley Additive exPlanations (SHAP) values\u003csup\u003e\u003cspan citationid=\"CR53\" class=\"CitationRef\"\u003e53\u003c/span\u003e\u003c/sup\u003eto quantify the contribution of each feature to the model's output.\u003c/p\u003e \u003cp\u003eFor patients with breast cancer, global and five-year breast cancer models both consistently identified three top features: histology, laterality and age at diagnosis (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e). Histology emerged as a significant predictor in both models for breast cancer (H(3)\u0026thinsp;=\u0026thinsp;41.00 for the global model, H(3)\u0026thinsp;=\u0026thinsp;28.44 for the five-year model, Kruskal Wallis p\u0026thinsp;\u0026lt;\u0026thinsp;.001 for both), with post-hoc analyses revealing the mixed histological subtypes as associated with lower IMD risk (global model: mixed V. infiltrating ductal carcinoma (Mean Difference [MD]\u0026thinsp;=\u0026thinsp;0.53), mixed V. infiltrating ductal carcinoma with lobular features (MD\u0026thinsp;=\u0026thinsp;0.70), and mixed V. lobular carcinoma (MD\u0026thinsp;=\u0026thinsp;0.39), all p\u0026thinsp;\u0026lt;\u0026thinsp;.01 using Tukey\u0026rsquo;s HSD with Bonferroni correction; five-year model: mixed V. infiltrating ductal carcinoma (MD\u0026thinsp;=\u0026thinsp;0.29), p\u0026thinsp;\u0026lt;\u0026thinsp;.01 using Tukey\u0026rsquo;s HSD with Bonferroni correction). Age at diagnosis demonstrated a linear relationship with higher age correlating with increased risk in both the global (β\u0026thinsp;=\u0026thinsp;0.11, R\u003csup\u003e2\u003c/sup\u003e\u0026thinsp;=\u0026thinsp;0.62, p\u0026thinsp;\u0026lt;\u0026thinsp;.001) and five-year models (β\u0026thinsp;=\u0026thinsp;0.04, R\u003csup\u003e2\u003c/sup\u003e\u0026thinsp;=\u0026thinsp;0.61, p\u0026thinsp;\u0026lt;\u0026thinsp;.001). Notably, while laterality ranked as the second most important feature in both models, no significant differences in mean SHAP values were observed between right- or left-sided tumours.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eFor patients with lung cancer, tumour site, histology and sex ranked as the top three influential features in both models (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e). Tumour site impacted IMD risk prediction (H(4)\u0026thinsp;=\u0026thinsp;19.38 for the global model, H(4)\u0026thinsp;=\u0026thinsp;23.61 for the five-year model, Kruskal Wallis p\u0026thinsp;\u0026lt;\u0026thinsp;.001 for both), with specific site comparisons exhibiting notable differences (global model: lower lobe, bronchus or lung V. overlapping lesion of bronchus and lung (MD\u0026thinsp;=\u0026thinsp;0.46) and lower lobe, bronchus or lung V. bronchus or lung, NOS (MD\u0026thinsp;=\u0026thinsp;0.47), all p\u0026thinsp;\u0026lt;\u0026thinsp;.01 using Tukey\u0026rsquo;s HSD with Bonferroni correction; five-year model: lower lobe, bronchus or lung V. overlapping lesion of bronchus and lung (MD\u0026thinsp;=\u0026thinsp;0.66), upper lobe, bronchus or lung V. lower lobe, bronchus or lung (MD\u0026thinsp;=\u0026thinsp;0.26) and middle lobe, bronchus or lung v. bronchus or lung, NOS (MD\u0026thinsp;=\u0026thinsp;0.08), all p\u0026thinsp;\u0026lt;\u0026thinsp;.01 using Tukey\u0026rsquo;s HSD with Bonferroni correction). Histology was also found to be significant (H(3)\u0026thinsp;=\u0026thinsp;136.81 for the global model, H(3)\u0026thinsp;=\u0026thinsp;39.64 for the five-year model, Kruskal Wallis p\u0026thinsp;\u0026lt;\u0026thinsp;.001 for both; Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e), with post-hoc examination exhibiting significant variations between all histology pairs in both models (all p\u0026thinsp;\u0026lt;\u0026thinsp;.001, Tukey\u0026rsquo;s HSD with Bonferroni correction). Interestingly, while the overall pattern of feature importance remained consistent, the magnitude of differences between histologies was less pronounced in the five-year model. Lastly, sex also emerged as a predictor (H(1)\u0026thinsp;=\u0026thinsp;87.19 for the global model, H(1)\u0026thinsp;=\u0026thinsp;81.20 for the five-year model, Kruskal Wallis p\u0026thinsp;\u0026lt;\u0026thinsp;.001 for both), with females displaying slightly higher mean SHAP values (-0.12\u0026thinsp;\u0026plusmn;\u0026thinsp;1.15 and \u0026minus;\u0026thinsp;0.12\u0026thinsp;\u0026plusmn;\u0026thinsp;1.06 for global and five-year models, respectively) compared to males (-0.20\u0026thinsp;\u0026plusmn;\u0026thinsp;1.31 and \u0026minus;\u0026thinsp;0.22\u0026thinsp;\u0026plusmn;\u0026thinsp;0.98 for global and five-year models, respectively).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e"},{"header":"Discussion","content":"\u003cp\u003eIn this study, we leveraged machine learning (ML) to develop predictive models for intracranial metastatic disease (IMD) utilising a population-based administrative dataset with the goal to identify patients at high-risk. To achieve this, we developed two sets of models for patients with breast or lung cancer for global IMD risk prediction and IMD within five-years of initial diagnosis.\u003c/p\u003e \u003cp\u003eBoth sets of models demonstrated strong predictive performance. The global models exhibited positive predictive values of 0.95 and 0.94 and sensitivities of 0.72 and 0.60 for patients with breast and lung cancer, respectively. The five-year models maintained this performance with comparable positive predictive values of 0.82 and 0.91 and sensitivities of 0.65 and 0.58, respectively. The robust performance of our models in predicting the global and five-year risk of IMD development adds to a growing body of evidence underscoring the advantages of ML algorithms in predictive oncology\u003csup\u003e\u003cspan additionalcitationids=\"CR55 CR56 CR57\" citationid=\"CR54\" class=\"CitationRef\"\u003e54\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR58\" class=\"CitationRef\"\u003e58\u003c/span\u003e\u003c/sup\u003e. Furthermore, these models identify key clinical indicators of IMD risk in patients with breast or lung cancer that align with, and expand upon, previously identified risk factors in the literature.\u003c/p\u003e \u003cp\u003eIn patients with breast cancer, mixed histology (i.e. those not otherwise captured as invasive ductal carcinoma, invasive lobular carcinoma, or invasive ductal carcinoma with lobular features) was a significant predictor of IMD, consistent with previous reports that have linked histological subtypes to metastatic potential\u003csup\u003e\u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e,\u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e40\u003c/span\u003e,\u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e\u003c/sup\u003e. The models also identified older age at diagnosis and right-sided tumor laterality as important predictors. Interestingly, our results revealed a discrepancy in the effect of age at diagnosis on IMD risk concordant with the current literature, in which some studies suggest younger age as a risk factor\u003csup\u003e\u003cspan additionalcitationids=\"CR36 CR37\" citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e\u003c/sup\u003e, while others suggest the opposite\u003csup\u003e\u003cspan citationid=\"CR59\" class=\"CitationRef\"\u003e59\u003c/span\u003e,\u003cspan citationid=\"CR60\" class=\"CitationRef\"\u003e60\u003c/span\u003e\u003c/sup\u003e Our logistic regression analysis indicated a small yet statistically significant inverse association of age with IMD risk, while analysis of the ML models identified older age as contributing to increased risk. This discrepancy likely reflects the limitations of standard logistic regression and the assumption of linear and univariate relationships. The ML models suggest that, within specific subgroups defined by other multivariate characteristics, older age may indeed increase IMD risk. Lastly, the role of breast cancer laterality in IMD development has not been well-characterized. Our findings indicate that right-sided tumors may be associated with increased IMD risk. This observation warrants further investigation, particularly in light of emerging evidence regarding the distinct biological characteristics of right- and left-sided breast cancers\u003csup\u003e\u003cspan additionalcitationids=\"CR62\" citationid=\"CR61\" class=\"CitationRef\"\u003e61\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR63\" class=\"CitationRef\"\u003e63\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eIn patients with lung cancer, tumour site and histological subtype were identified as key predictors of IMD, consistent with previous research demonstrating the influence of tumour site\u003csup\u003e\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e,\u003cspan citationid=\"CR64\" class=\"CitationRef\"\u003e64\u003c/span\u003e\u003c/sup\u003e and histology\u003csup\u003e\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e,\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e,\u003cspan citationid=\"CR64\" class=\"CitationRef\"\u003e64\u003c/span\u003e\u003c/sup\u003e on metastatic patterns. Additionally, the identification of sex as a significant predictor aligns with previous studies suggesting potential sex-related differences in the development of IMD\u003csup\u003e\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e,\u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e,\u003cspan citationid=\"CR50\" class=\"CitationRef\"\u003e50\u003c/span\u003e,\u003cspan citationid=\"CR64\" class=\"CitationRef\"\u003e64\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eWhile IMD is a relatively rare event in patients with metastatic cancer, its consequences are severe. Furthermore, the costs of default surveillance, both fiscally and practically, make global approaches to IMD surveillance unfeasible\u003csup\u003e\u003cspan citationid=\"CR65\" class=\"CitationRef\"\u003e65\u003c/span\u003e\u003c/sup\u003e. As such, there is significant clinical need for algorithms that can prospectively identify patients at increased risk for IMD who might benefit from clinical surveillance\u003csup\u003e\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e,\u003cspan additionalcitationids=\"CR18 CR19\" citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e\u003c/sup\u003e. Our work describes the development and validation of ML models that can accurately predict the risk of IMD in patients with breast or lung cancer and potentially enable a more proactive, targeted and cost-effective approach to surveillance and management. Indeed, these models are designed to be used in a prospective clinical setting: they are based on features that may be readily collected by treating clinicians and require minimal computational power to deploy. This focus on practical application, along with model interpretability and an open-box framework, aims to build clinician trust and facilitate the integration of these predictive tools into routine practice \u003csup\u003e\u003cspan citationid=\"CR51\" class=\"CitationRef\"\u003e51\u003c/span\u003e,\u003cspan citationid=\"CR52\" class=\"CitationRef\"\u003e52\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eWhile our study presents promising results, it is not without limitations. The retrospective nature of the dataset and reliance on administrative health data may introduce pitfalls related to data completeness and availability. Primarily, this is particularly relevant to the lung cancer cohort, where data on biomarkers such as ALK, ROS1, and EGFR status was unavailable. Secondarily, our study assumption that imputed data were missing completely at random; while to our knowledge this assumption is accurate, spurious correlations may be introduced were it not to hold. Furthermore, the generalizability of our models to populations beyond Ontario, Canada, necessitates further validation in diverse clinical settings. Lastly, future research should strive to explore the incorporation of additional dynamic variables, such as treatment responses and longitudinal clinical data, as well as biological and genetic data, previously implicated in IMD risk\u003csup\u003e\u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e40\u003c/span\u003e,\u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e,\u003cspan citationid=\"CR43\" class=\"CitationRef\"\u003e43\u003c/span\u003e,\u003cspan additionalcitationids=\"CR46\" citationid=\"CR45\" class=\"CitationRef\"\u003e45\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR47\" class=\"CitationRef\"\u003e47\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eIn conclusion, this study provides compelling evidence for the utility of ML to predict IMD in patients with breast or lung cancer. By providing accurate risk assessment and identifying key prognostic factors, our work can inform more effective and personalised cancer management strategies. Further validation across diverse populations and the integration of a broader range of data sources is needed to enhance model robustness and clinical utility. Notwithstanding, integrating such models into clinical practice holds significant potential for improving patient outcomes by facilitating early detection of IMD and reducing the overall burden of this serious complication.\u003c/p\u003e"},{"header":"Methods","content":"\u003cp\u003eDataset Curation\u003c/p\u003e \u003cp\u003eThis study utilized a population-based dataset obtained from the Ontario Cancer Registry, which is the provincial database of information about all Ontario resident diagnosed with cancer. It is held at ICES, formerly the Institute for Clinical Evaluative Sciences. ICES is an independent, non-profit research institute whose legal status under Ontario\u0026rsquo;s health information privacy law allows it to collect and analyse health care and demographic data, without consent, for health system evaluation and improvement. The dataset utilised in this study included all patients from the Canadian province of Ontario patients from January 1, 2010 to August 30, 2023 with a diagnosis of breast cancer (ICD-10 C50.0-C50.9) or lung cancer (ICD-10 C34.0-C34.9) who were not excluded by the following criteria: 1) were under the age of 18 or over the age of 105 and 2) had more than one primary cancer diagnosis. To comprehensively assess patient comorbidities, this study employed the Johns Hopkins ACG\u0026reg; system, with a focus on Aggregated Diagnosis Groups (ADGs). Similar to previous work\u003csup\u003e\u003cspan citationid=\"CR66\" class=\"CitationRef\"\u003e66\u003c/span\u003e\u003c/sup\u003e, all diagnoses associated with hospital admissions and physician billing claims within two years prior to primary cancer diagnosis were identified for each patient. These diagnoses were then mapped to their corresponding ICD codes and categorized into 32 distinct ADGs. Following this, an aggregate ADG score and the probability of mortality at one year were calculated for each patient according to a framework published in the literature\u003csup\u003e\u003cspan citationid=\"CR66\" class=\"CitationRef\"\u003e66\u003c/span\u003e,\u003cspan citationid=\"CR67\" class=\"CitationRef\"\u003e67\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eThe dataset was linked using unique encoded identifiers and analysed at ICES\u003c/p\u003e \u003cp\u003eData Preprocessing\u003c/p\u003e \u003cp\u003eThe dataset was first split into training (49%), validation (21%), and test (30%) sets using random stratified sampling based on the development of IMD to ensure proportional representation of patients across sets. Missing values were then imputed separately within each set to prevent data leakage. For numeric features, missing values were imputed using the mean of the respective feature within the training set. For categorical features, missing values were imputed using the mode of the respective feature within the training set. These calculated training set statistics (mean and mode for each feature) were then applied to impute missing values in the corresponding features of the validation and test sets. Less than 30% of each feature required imputation. Outlier detection was not performed to preserve the full spectrum of clinical and demographic variability inherent in the respective patient population.\u003c/p\u003e \u003cp\u003eFeature Space\u003c/p\u003e \u003cp\u003eFor both the breast and lung cancer cohort, 34 comorbidity-based features were included: the 32 distinct ADGs employed by the Johns Hopkins ACG\u0026reg; system as well as the aggregate ADG score and the probability of mortality at one year as previously reported in the literature\u003csup\u003e\u003cspan citationid=\"CR66\" class=\"CitationRef\"\u003e66\u003c/span\u003e,\u003cspan citationid=\"CR67\" class=\"CitationRef\"\u003e67\u003c/span\u003e\u003c/sup\u003e. In both cohorts, we also had the opportunity to assess the following variables: age at diagnosis and tumour size, site (according to the ICD-10 classification), histology, laterality, and stage. Information regarding hormone receptor, HER2, and triple-negative status and tumour grade were also included in the breast cancer cohort while sex was included in the lung cancer cohort. The selection of these features was guided by two primary considerations: first, a thorough review of the existing literature to identify established and potential risk factors associated with IMD; and second, a deliberate effort to maximize compatibility with standard clinical data, prioritizing readily accessible variables and avoiding specialized measurements\u003c/p\u003e \u003cp\u003eModel Development\u003c/p\u003e \u003cp\u003eWe employed classification models leveraging gradient-boosted random forests (GBRF) implemented within the CatBoost\u003csup\u003e\u003cspan citationid=\"CR68\" class=\"CitationRef\"\u003e68\u003c/span\u003e\u003c/sup\u003e framework. Hyperparameter optimization was conducted using Optuna\u003csup\u003e\u003cspan citationid=\"CR69\" class=\"CitationRef\"\u003e69\u003c/span\u003e\u003c/sup\u003e, a Bayesian optimization library that searches the hyperparameter space for optimal model configurations. To ensure robust model evaluation and mitigate overfitting, a five-fold cross-validation (CV) strategy was adopted.\u003c/p\u003e \u003cp\u003eTo account for class imbalance in the dataset, specifically, the small proportion of patients who developed IMD, Synthetic Minority Over-Sampling Technique for Nominal and Continuous\u003csup\u003e\u003cspan citationid=\"CR70\" class=\"CitationRef\"\u003e70\u003c/span\u003e\u003c/sup\u003ewas used, and the up-sampling ratio and other relevant parameters were further optimised using Optuna\u003csup\u003e\u003cspan citationid=\"CR69\" class=\"CitationRef\"\u003e69\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eAll model training and hyperparameter tuning were performed exclusively on the train-validation data split, with final performance assessments conducted on the held-out test. Models were optimised to maximise the F1 score, the harmonic mean of precision and recall.\u003c/p\u003e \u003cp\u003eModel Evaluation\u003c/p\u003e \u003cp\u003eA comprehensive evaluation of model performance was conducted using a suite of established metrics. The F1 score, which represents the harmonic mean of precision and recall and emphasizes a balance between minimising false positives and false negatives, served as the primary optimization metric. Additional metrics, including precision, recall, area under the precision-recall curve, and area under the receiver operating characteristic curve were also calculated to provide a comprehensive assessment of model performance. Following model training on the validation set within each CV fold, the optimal decision threshold was determined to maximise the F-score with a beta value of 0.75 This beta value reflects a deliberate prioritization of precision over recall and was deemed justified in this clinical context. Specifically, a false positive prediction of IMD could lead to unnecessary diagnostic procedures, increased healthcare costs, and patient anxiety. While a false negative is also undesirable, such patients would already be receiving the standard of care and would be monitored for their primary cancer and its potential complications\u003c/p\u003e \u003cp\u003eFollowing the cross-validation process, final model performance was evaluated on a held-out test set comprising 30% of the original dataset that remained untouched during model development and hyperparameter tuning.\u003c/p\u003e \u003cp\u003eEthical Consideration\u003c/p\u003e \u003cp\u003eICES is a prescribed entity under Ontario\u0026rsquo;s Personal Health Information Protection Act (PHIPA). Section 45 of PHIPA authorizes ICES to collect personal health information, without consent, for the purpose of analysis or compiling statistical information with respect to the management of, evaluation or monitoring of, the allocation of resources to or planning for all or part of the health system. Projects that use data collected by ICES under section 45 of PHIPA, and use no other data, are exempt from REB review. The use of the data in this project is authorized under section 45 and approved by ICES\u0026rsquo; Privacy and Legal Office.\u003c/p\u003e"},{"header":"Declarations","content":"\u003ch2\u003eData Availability\u003c/strong\u003e\u003c/h2\u003e\n\u003cp\u003eThe dataset from this study is held securely in coded form at ICES. While legal data sharing agreements between ICES and data providers (e.g., healthcare organizations and government) prohibit ICES from making the dataset publicly available, access may be granted to those who meet pre-specified criteria for confidential access, available at www.ices.on.ca/DAS (email: [email protected]). The full dataset creation plan and underlying analytic code are available from the authors upon request, understanding that the computer programs may rely upon coding templates or macros that are unique to ICES and are therefore either inaccessible or may require modification.\u003c/p\u003e\n\u003ch2\u003eAcknowledgments\u003c/strong\u003e\u003c/h2\u003e\n\u003cp\u003eThis study was supported by ICES, which is funded by an annual grant from the Ontario Ministry of Health (MOH) and the Ministry of Long-Term Care (MLTC). MVI is also supported by the Graduate Diploma in Health Research at the University of Toronto. SD is supported by the Canadian Institute for Health Research. This document used data adapted from the Statistics Canada Postal Code\u003csup\u003eOM\u003c/sup\u003e Conversion File, which is based on data licensed from Canada Post Corporation, and/or data adapted from the Ontario Ministry of Health Postal Code Conversion File, which contains data copied under license from \u0026copy;Canada Post Corporation and Statistics Canada. Parts of this material are based on data and/or information compiled and provided by: Ontario Health, Canadian Institute for Health Information, Ontario Ministry of Health, and The Johns Hopkins ACG\u0026reg; System Version 11. The analyses, conclusions, opinions and statements expressed herein are solely those of the authors and do not reflect those of the funding or data sources; no endorsement is intended or should be inferred.\u003c/p\u003e\n\u003ch2\u003eAuthor Contributions\u003c/strong\u003e\u003c/h2\u003e\n\u003cp\u003eMVI and SD conceived and designed the study and acquired the data. MVI did the statistical analyses and developed, trained, and applied the artificial neural network. MVI, SD, and AV implemented quality control of the algorithms. All authors interpreted the analyzed data and aided in conclusion inference. MVI prepared the first draft of the manuscript. SD revised the manuscript. All authors contributed to manuscript preparation.\u003c/p\u003e\n\u003ch2\u003eCompeting Interests\u003c/strong\u003e\u003c/h2\u003e\n\u003cp\u003eThe authors declare no competing interests.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eHabbous S et al (2020) Incidence and real-world burden of brain metastases from solid tumors and hematologic malignancies in Ontario: a population-based study. Neuro-Oncol Adv 3:vdaa178\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKuksis M et al (2020) The incidence of brain metastases among patients with metastatic breast cancer: a systematic review and meta-analysis. \u003cem\u003eNeuro-oncology\u003c/em\u003e 23, noaa285\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi AY et al (2022) Intracranial Metastatic Disease: Present Challenges, Future Opportunities. Front Oncol 12:855182\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLiu JL, Walker EV, Paudel YR, Davis FG, Yuan Y (2022) Brain Metastases among Cancer Patients Diagnosed from 2010\u0026ndash;2017 in Canada: Incidence Proportion at Diagnosis and Estimated Lifetime Incidence. Curr Oncol 29:2091\u0026ndash;2105\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSmedby KE, Brandt L, B\u0026auml;cklund ML, Blomqvist P (2009) Brain metastases admissions in Sweden between 1987 and 2006. Brit J Cancer 101:1919\u0026ndash;1924\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi AY et al (2022) Brain metastases in the setting of stable extracranial disease: A systematic review and meta-analysis. J Clin Oncol 40:2022\u0026ndash;2022\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNayak L, Lee EQ, Wen PY (2012) Epidemiology of Brain Metastases. Curr Oncol Rep 14:48\u0026ndash;54\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCagney DN et al (2017) Incidence and prognosis of patients with brain metastases at diagnosis of systemic malignancy: a population-based study. Neuro-Oncol 19:1511\u0026ndash;1521\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBarnholtz-Sloan JS et al (2004) Incidence Proportions of Brain Metastases in Patients Diagnosed (1973 to 2001) in the Metropolitan Detroit Cancer Surveillance System. J Clin Oncol 22:2865\u0026ndash;2872\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAchrol AS et al (2019) Brain metastases. Nat Rev Dis Primers 5:5\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBoogerd W, Vos VW, Hart AAM, Baris G (1993) Brain metastases in breast cancer; natural history, prognostic factors and outcome. J Neuro-oncol 15:165\u0026ndash;174\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWitzel I, Oliveira-Ferrer L, Pantel K, M\u0026uuml;ller V, Wikman H (2016) Breast cancer brain metastases: biology and new clinical perspectives. Breast Cancer Res 18:8\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLeone JP, Leone BA (2015) Breast cancer brain metastases: the last frontier. Exp Hematol Oncol 4:33\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eD\u0026rsquo;Antonio C et al (2014) Bone and brain metastasis in lung cancer: recent advances in therapeutic strategies. Ther Adv M\u0026eacute;d Oncol 6:101\u0026ndash;114\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGoldberg SB, Contessa JN, Omay SB (2015) Chiang, V. Lung Cancer Brain Metastases. Cancer J 21:398\u0026ndash;403\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRittberg R, Banerji S, Kim JO, Rathod S, Dawe DE (2021) Treatment and Prevention of Brain Metastases in Small Cell Lung Cancer. Am J Clin Oncol 44:629\u0026ndash;638\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSaria MG et al (2017) The Hidden Morbidity of Cancer Burden in Caregivers of Patients with Brain Metastases. Nurs Clin North Am 52:159\u0026ndash;178\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSacks P, Rahman M (2020) Epidemiology of Brain Metastases. Neurosurg Clin North Am 31:481\u0026ndash;488\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMukand JA, Blackinton DD, Crincoli MG, Lee JJ, Santos BB (2001) Incidence of Neurologic Deficits and Rehabilitation of Patients with Brain Tumors. Am J Phys Med Rehabilitation 80:346\u0026ndash;350\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMarciniak CM, Sliwa JA, Heinemann AW, Semik PE (2001) Functional outcomes of persons with brain tumors after inpatient rehabilitation. Arch Phys Med Rehabilitation 82:457\u0026ndash;463\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSoffietti R, Rudā R, Mutani R (2002) Management of brain metastases. J Neurol 249:1357\u0026ndash;1369\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eArvold ND et al (2016) Updates in the management of brain metastases. Neuro-Oncol 18:1043\u0026ndash;1065\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eErickson AW, Das S (2019) The Impact of Targeted Therapy on Intracranial Metastatic Disease Incidence and Survival. Front Oncol 9:797\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eErickson AW et al (2021) Assessing the Association of Targeted Therapy and Intracranial Metastatic Disease. Jama Oncol 7:1220\u0026ndash;1224\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eErickson AW et al (2020) HER2-targeted therapy prolongs survival in patients with HER2-positive breast cancer and intracranial metastatic disease: a systematic review and meta-analysis. Neuro-oncology Adv 2:vdaa136\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTsao MN et al (2018) Whole brain radiotherapy for the treatment of newly diagnosed multiple brain metastases. Cochrane Database Syst Rev 1:CD003869\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eScoccianti S, Ricardi U (2012) Treatment of brain metastases: Review of phase III randomized controlled trials. Radiother Oncol 102:168\u0026ndash;179\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKhuntia D, Brown P, Li J, Mehta MP (2006) Whole-Brain Radiotherapy in the Management of Brain Metastasis. J Clin Oncol 24:1295\u0026ndash;1304\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLanger CJ, Mehta MP (2005) Current Management of Brain Metastases, With a Focus on Systemic Options. J Clin Oncol 23:6207\u0026ndash;6219\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLin NU (2013) Targeted Therapies in Brain Metastases. Curr Treat Options Neurol 16:276\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMarkesbery WR, Brooks WH, Gupta GD, Young AB (1978) Treatment for Patients With Cerebral Metastases. Arch Neurol 35:754\u0026ndash;756\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMyall NJ, Yu H, Soltys SG, Wakelee HA, Pollom E (2021) Management of brain metastases in lung cancer: evolving roles for radiation and systemic treatment in the era of targeted and immune therapies. Neuro-Oncol Adv 3:v52\u0026ndash;v62\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGaebe K et al (2024) A Population-Based Analysis of Brain Metastasis Burden and Management in 8705 Small Cell Lung Cancer Patients. Neurosurgery 70:204\u0026ndash;204\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eStevens SP et al (2018) The utility of routine surveillance screening with magnetic resonance imaging (MRI) to detect tumour recurrence in children with low-grade central nervous system (CNS) tumours: a systematic review. J Neuro-Oncol 139:507\u0026ndash;522\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGraesslin O et al (2010) Nomogram to Predict Subsequent Brain Metastasis in Patients With Metastatic Breast Cancer. J Clin Oncol 28:2032\u0026ndash;2037\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAversa C et al (2014) Metastatic breast cancer subtypes and central nervous system metastases. Breast 23:623\u0026ndash;628\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGaspar LE et al (2005) Time From Treatment to Subsequent Diagnosis of Brain Metastases in Stage III Non\u0026ndash;Small-Cell Lung Cancer: A Retrospective Review by the Southwest Oncology Group. J Clin Oncol 23:2955\u0026ndash;2961\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCarolan H et al (2005) Does the incidence and outcome of brain metastases in locally advanced non-small cell lung cancer justify prophylactic cranial irradiation or early detection? Lung Cancer 49:109\u0026ndash;115\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHung M-H et al (2014) Effect of Age and Biological Subtype on the Risk and Timing of Brain Metastasis in Breast Cancer Patients. PLoS ONE 9:e89389\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKoniali L et al (2020) Risk factors for breast cancer brain metastases: a systematic review. Oncotarget 11:650\u0026ndash;669\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDing X et al (2012) Risk factors of brain metastases in completely resected pathological stage IIIA-N2 non-small cell lung cancer. Radiat Oncol 7:119\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBajard A et al (2004) Multivariate analysis of factors predictive of brain metastases in localised non-small cell lung carcinoma. Lung Cancer 45:317\u0026ndash;323\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTonyali O et al (2016) Risk factors for brain metastasis as a first site of disease recurrence in patients with HER2 positive early stage breast cancer treated with adjuvant trastuzumab. Breast 25:22\u0026ndash;26\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAhmed KA et al (2025) Phase II trial of brain MRI surveillance in stage IV breast cancer. \u003cem\u003eNeuro-Oncol.\u003c/em\u003e noaf018 \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1093/neuonc/noaf018\u003c/span\u003e\u003cspan address=\"10.1093/neuonc/noaf018\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePeshkin BN, Alabek ML, Isaacs C (2011) BRCA1/2 mutations and triple negative breast cancers. Breast Dis 32:25\u0026ndash;33\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZavitsanos PJ et al (2018) BRCA1 Mutations Associated With Increased Risk of Brain Metastases in Breast Cancer. Am J Clin Oncol 41:1252\u0026ndash;1256\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAn N et al (2018) Risk factors for brain metastases in patients with non\u0026ndash;small-cell lung cancer. Cancer Med 7:6357\u0026ndash;6364\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHeitz F et al (2011) Cerebral metastases in metastatic breast cancer: disease-specific risk factors and survival. Ann Oncol 22:1571\u0026ndash;1581\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMinisini AM et al (2013) Risk factors and survival outcomes in patients with brain metastases from breast cancer. Clin Exp Metastas 30:951\u0026ndash;956\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHe J et al (2021) Risk factors for brain metastases from non-small-cell lung cancer. Medicine 100:e24724\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eIstasy P et al (2021) The Impact of Artificial Intelligence on Health Equity in Oncology: A Scoping Review. Blood 138:4934\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eIstasy P et al (2022) The Impact of Artificial Intelligence on Health Equity in Oncology: Scoping Review. J M\u0026eacute;d Internet Res 24:e39748\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLundberg SM et al (2020) From local explanations to global understanding with explainable AI for trees. Nat Mach Intell 2:56\u0026ndash;67\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang B, Shi H, Wang H (2023) Machine Learning and AI in Cancer Prognosis, Prediction, and Treatment Selection: A Critical Approach. J Multidiscip Healthc 16:1779\u0026ndash;1791\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang H et al (2023) Application of Deep Learning in Cancer Prognosis Prediction Model. Technol Cancer Res Treat 22:15330338231199288\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKourou K, Exarchos TP, Exarchos KP, Karamouzis MV, Fotiadis (2015) D. I. Machine learning applications in cancer prognosis and prediction. Comput Struct Biotechnol J 13:8\u0026ndash;17\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhu W, Xie L, Han J, Guo X (2020) The Application of Deep Learning in Cancer Prognosis Prediction. Cancers 12:603\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJiang Y et al (2023) Biology-guided deep learning predicts prognosis and cancer immunotherapy response. Nat Commun 14:5135\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBrogi E et al (2011) Breast carcinoma with brain metastases: clinical analysis and immunoprofile on tissue microarrays. Ann Oncol 22:2597\u0026ndash;2603\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eShen Q et al (2015) Breast Cancer With Brain Metastases: Clinicopathologic Features, Survival, and Paired Biomarker Analysis. Oncol 20:466\u0026ndash;473\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAbdou Y et al (2022) Left sided breast cancer is associated with aggressive biology and worse outcomes than right sided breast cancer. Sci Rep 12:13377\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBarbara R-C et al (2020) Divergent Impact of Breast Cancer Laterality on Clinicopathological, Angiogenic, and Hemostatic Profiles: A Potential Role of Tumor Localization in Future Outcomes. J Clin Med 9:1708\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWilting J, Hagedorn M (2011) Left-Right Asymmetry in Embryonic Development and Breast Cancer: Common Molecular Determinants? Curr Med Chem 18:5519\u0026ndash;5527\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHao Y, Li G (2023) Risk and prognostic factors of brain metastasis in lung cancer patients: a Surveillance, Epidemiology, and End Results population\u0026ndash;based cohort study. Eur J Cancer Prev 32:498\u0026ndash;511\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCrooks J et al (2024) Cost of Treatment for Brain Metastases Using Data From a National Health Insurance. Adv Radiat Oncol 9:101438\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAustin PC, van Walraven C, Wodchis WP, Newman A, Anderson GM (2011) Using the Johns Hopkins Aggregated Diagnosis Groups (ADGs) to Predict Mortality in a General Adult Population Cohort in Ontario, Canada. M\u0026eacute;d Care 49:932\u0026ndash;939\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAustin PC, van Walraven C (2011) The Mortality Risk Score and the ADG Score. M\u0026eacute;d Care 49:940\u0026ndash;947\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eProkhorenkova L, Gusev G, Vorobev A, Dorogush AV, Gulin A (2017) CatBoost: unbiased boosting with categorical features. arXiv\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAkiba T, Sano S, Yanase T, Ohta T, Koyama M (2019) Optuna: A Next-generation Hyperparameter Optimization Framework. \u003cem\u003earXiv\u003c/em\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLemaitre G, Nogueira F, Aridas CK (2016) Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning. \u003cem\u003earXiv\u003c/em\u003e\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"},{"header":"Tables","content":"\u003cp\u003eTable 1: Chi-square test results for association between covariates and intracranial metastatic disease in breast and lung cancer patients.\u003c/p\u003e\n\u003ctable border=\"1\" cellspacing=\"0\" cellpadding=\"0\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 26.6026%;\"\u003e\n \u003cp\u003eCovariate\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 20.9936%;\"\u003e\n \u003cp\u003eChi-Square (X\u003csup\u003e2\u003c/sup\u003e)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 29.1667%;\"\u003e\n \u003cp\u003eDegrees of Freedom\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 23.2372%;\"\u003e\n \u003cp\u003eP-Value\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd colspan=\"4\" valign=\"top\" style=\"width: 100%;\"\u003e\n \u003cp\u003eBreast Cancer\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 26.6026%;\"\u003e\n \u003cp\u003eTumour Site\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 20.9936%;\"\u003e\n \u003cp\u003e59.98\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 29.1667%;\"\u003e\n \u003cp\u003e8\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 23.2372%;\"\u003e\n \u003cp\u003e\u0026lt;.01\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 26.6026%;\"\u003e\n \u003cp\u003eHistology\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 20.9936%;\"\u003e\n \u003cp\u003e32.35\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 29.1667%;\"\u003e\n \u003cp\u003e3\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 23.2372%;\"\u003e\n \u003cp\u003e\u0026lt;.01\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 26.6026%;\"\u003e\n \u003cp\u003eStage\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 20.9936%;\"\u003e\n \u003cp\u003e4817.09\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 29.1667%;\"\u003e\n \u003cp\u003e3\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 23.2372%;\"\u003e\n \u003cp\u003e\u0026lt;.01\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 26.6026%;\"\u003e\n \u003cp\u003eTumour Grade\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 20.9936%;\"\u003e\n \u003cp\u003e608.43\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 29.1667%;\"\u003e\n \u003cp\u003e3\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 23.2372%;\"\u003e\n \u003cp\u003e\u0026lt;.01\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 26.6026%;\"\u003e\n \u003cp\u003eTumour Marker Status\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 20.9936%;\"\u003e\n \u003cp\u003e672\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 29.1667%;\"\u003e\n \u003cp\u003e2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 23.2372%;\"\u003e\n \u003cp\u003e\u0026lt;.01\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 26.6026%;\"\u003e\n \u003cp\u003eLaterality\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 20.9936%;\"\u003e\n \u003cp\u003e0.48\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 29.1667%;\"\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 23.2372%;\"\u003e\n \u003cp\u003e0.49\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd colspan=\"4\" valign=\"top\" style=\"width: 100%;\"\u003e\n \u003cp\u003eLung Cancer\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 26.6026%;\"\u003e\n \u003cp\u003eSex\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 20.9936%;\"\u003e\n \u003cp\u003e17.97\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 29.1667%;\"\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 23.2372%;\"\u003e\n \u003cp\u003e\u0026lt; 0.01\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 26.6026%;\"\u003e\n \u003cp\u003eSite\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 20.9936%;\"\u003e\n \u003cp\u003e161.04\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 29.1667%;\"\u003e\n \u003cp\u003e4\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 23.2372%;\"\u003e\n \u003cp\u003e\u0026lt; 0.01\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 26.6026%;\"\u003e\n \u003cp\u003eHistology\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 20.9936%;\"\u003e\n \u003cp\u003e457.36\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 29.1667%;\"\u003e\n \u003cp\u003e2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 23.2372%;\"\u003e\n \u003cp\u003e\u0026lt; 0.01\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 26.6026%;\"\u003e\n \u003cp\u003eLaterality\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 20.9936%;\"\u003e\n \u003cp\u003e12.43\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 29.1667%;\"\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 23.2372%;\"\u003e\n \u003cp\u003e\u0026lt; 0.01\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 26.6026%;\"\u003e\n \u003cp\u003eStage\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 20.9936%;\"\u003e\n \u003cp\u003e1042.71\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 29.1667%;\"\u003e\n \u003cp\u003e3\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 23.2372%;\"\u003e\n \u003cp\u003e\u0026lt; 0.01\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"nature-portfolio","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"","title":"Nature Portfolio","twitterHandle":"","acdcEnabled":false,"dfaEnabled":false,"editorialSystem":"ejp","reportingPortfolio":"","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"intracranial metastatic disease, brain metastasis, machine learning, breast cancer, lung cancer, risk prediction","lastPublishedDoi":"10.21203/rs.3.rs-6247605/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-6247605/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eIntracranial metastatic disease (IMD) is a devastating complication of cancer associated with high morbidity and mortality. Patients with breast and lung cancer have a particularly high risk of developing IMD. Early identification of individuals with breast or lung cancer at high risk for IMD development would enable targeted surveillance and timely intervention. In this study, we leverage machine learning (ML) algorithms to develop and validate predictive models for IMD risk using a population-based dataset of 143,341 patients with breast or lung cancer from Ontario, Canada, collected from 2010 to 2023. Our ML models outperform traditional statistical paradigms, demonstrating strong discriminative ability in predicting both global and five-year risk of IMD with area under the precision-recall curve values ranging from 0.75 to 0.85. We further employed Shapley Additive exPlanations analysis to elucidate the key predictors of IMD; histology, laterality and age emerged as significant factors for patients with breast cancer while tumour site, histology and sex predicted IMD among patients with lung cancer. These findings underscore the potential of ML algorithms to bolster personalised risk stratification and enable targeted surveillance for IMD in patients with metastatic cancer.\u003c/p\u003e","manuscriptTitle":"Machine learning identifies prognosticators of intracranial metastatic disease in patients with breast or lung cancer","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-05-07 03:12:48","doi":"10.21203/rs.3.rs-6247605/v1","editorialEvents":[],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"communications-medicine","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"commsmed","sideBox":"Learn more about [Communications Medicine](http://www.nature.com/commsmed)","snPcode":"43856","submissionUrl":"https://mts-commsmed.nature.com/cgi-bin/main.plex","title":"Communications Medicine","twitterHandle":"@commsmedicine","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"ejp","reportingPortfolio":"Communications Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"a3f2cd41-3aed-4190-a954-4d62e77cb1ed","owner":[],"postedDate":"May 7th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[{"id":46024986,"name":"Health sciences/Oncology/Cancer/Cancer models"},{"id":46024987,"name":"Health sciences/Oncology/Cancer/Metastasis"},{"id":46024988,"name":"Biological sciences/Computational biology and bioinformatics/Machine learning"}],"tags":[],"updatedAt":"2025-05-07T03:12:48+00:00","versionOfRecord":[],"versionCreatedAt":"2025-05-07 03:12:48","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-6247605","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-6247605","identity":"rs-6247605","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00