Hybrid Machine Learning Models Integrating VI-RADS and Clinical Metrics for Bladder Cancer Staging | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Hybrid Machine Learning Models Integrating VI-RADS and Clinical Metrics for Bladder Cancer Staging Batu Akalin, Akif Erbin, Merve Sam Ozdemir, Halil Lutfi Canat This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-9081158/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted 7 You are reading this latest preprint version Abstract Objective In bladder cancer, integrating imaging and non-imaging parameters may enhance diagnostic performance beyond the Vesical Imaging-Reporting and Data System (VI-RADS). This study aimed to develop and validate machine learning models incorporating VI-RADS scores with clinical and laboratory variables to predict muscle invasion and support individualized treatment decisions. Materials and Methods A total of 372 patients who underwent transurethral resection of bladder tumor between 2019 and 2024 and had preoperative mpMRI performed according to the VI-RADS protocol were retrospectively evaluated. VI-RADS scores were combined with demographic data, hematological indices, biochemical markers, and urinalysis findings to construct predictive models. Machine learning algorithms—including logistic regression, random forest, support vector machines, extreme gradient boosting, light gradient boosting machine, and deep neural networks—were developed and optimized. Model performance was assessed using receiver operating characteristic area under the curve (AUC), sensitivity, specificity, Brier score, and decision curve analysis (DCA) and compared with VI-RADS alone. Results Pathological muscle invasion (≥ T2) was identified in 103 (27.8%) of the 372 patients. VI-RADS alone yielded an AUC of 0.89. Models supported with clinical and laboratory parameters demonstrated significant improvement, particularly random forest (AUC = 0.95), support vector machines (AUC = 0.95), and logistic regression (AUC = 0.94). Calibration analysis of the isotonic regression–adjusted random forest model yielded a slope of 1.16 and an intercept of − 0.039, indicating probability estimates closely aligned with clinical reality. In DCA, the RF model outperformed both the “treat-all” and “treat-none” strategies, demonstrating clear net clinical benefit. Conclusion Integrating VI-RADS with clinical and laboratory parameters improves discrimination and calibration in predicting muscle invasion compared with imaging alone. The random forest model, in particular, may reduce misclassification at critical decision points—such as early radical cystectomy or neoadjuvant chemotherapy—and provide more reliable information for patient counseling. Bladder Neoplasms Vesical Imaging-Reporting and Data System (VI-RADS) Machine Learning Magnetic Resonance Imaging Random Forest Model Artificial Intelligence Muscle Invasion Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Introduction Bladder cancer is a common malignancy worldwide, with approximately 614,000 new cases diagnosed annually in both men and women. According to GLOBOCAN 2022, it ranks as the ninth most frequently diagnosed cancer globally ( 1 ). The age-standardized incidence rate (ASIR) is estimated at 5.6 per 100,000 for men and 1.9 for women. Despite advances in diagnosis and treatment, both the incidence and mortality of bladder cancer continue to rise globally ( 2 ). The development of bladder cancer is influenced by both environmental and genetic factors. Tobacco smoking remains the predominant risk factor, responsible for nearly half of all cases and increasing the risk approximately threefold compared to non-smokers ( 3 ). Occupational exposure to aromatic amines and polycyclic aromatic hydrocarbons, particularly in the dye, textile, and leather industries, is also a significant contributor ( 4 ). Accurate staging plays a central role in determining prognosis and guiding management. The TNM classification system, developed by the American Joint Comitee on Cancer (AJCC) and Union Internationale Contre le Cancer (UICC), is the most widely adopted method for assessing the depth of bladder wall invasion, regional lymph node involvement, and distant metastasis ( 5 ). Tumors below stage T2 are classified as non–muscle-invasive bladder cancer (NMIBC), while T2 or higher tumors are considered muscle-invasive bladder cancer (MIBC). This distinction is clinically crucial, as NMIBC can often be managed with transurethral resection (TURBT), whereas MIBC typically requires radical cystectomy and/or multimodal treatment ( 4 ). Cystoscopy remains the gold standard for diagnosis, offering direct visualization of bladder lesions ( 5 ). However, it is invasive, may cause patient discomfort, and provides limited information about the depth of invasion. Ultrasonography and computed tomography (CT) are useful for upper urinary tract evaluation but have limited accuracy in assessing the extent of muscle invasion ( 6 ). Multiparametric magnetic resonance imaging (mpMRI), standardized through the Vesical Imaging-Reporting and Data System (VI-RADS), has shown promising diagnostic accuracy in predicting muscle invasion ( 7 ). Nevertheless, interobserver variability and differences in implementation across centers remain significant limitations. Accurate preoperative staging is critical for optimizing treatment strategy. Misclassification of tumor stage can result in unnecessary morbidity or suboptimal therapy, directly affecting patient outcomes. While VI-RADS has improved the radiologic assessment of bladder cancer, it relies solely on imaging data without incorporating clinical or biochemical variables that may also contribute to stage prediction. This study hypothesizes that integrating imaging findings with clinical and biochemical data through machine learning algorithms can enhance the predictive performance for muscle invasion in bladder cancer. Therefore, we aimed to develop machine learning–based models combining VI-RADS scores with relevant clinical and biochemical parameters to predict muscle invasion preoperatively. Furthermore, we evaluated these models against VI-RADS alone and discussed their potential clinical applicability as preoperative decision-support tools. Materials and Methods Compliance with ethical standards This research was conducted at the Urology Clinic of the University of Health Sciences Basaksehir Cam and Sakura City Hospital, a tertiary center in Istanbul, Turkey. This article presents a thesis study by Dr. Batu Akalin, a urology resident. The thesis adviser was Prof. Dr. Halil Lutfi Canat from the urology department. The presented study was conducted in accordance with the principles of the Declaration of Helsinki (2013 revision) and Good Clinical Practice guidelines. Ethical approval was obtained from the Scientific Research Ethics Committee (Basaksehir Cam and Sakura City Hospital, decision No: 256; date: October 16, 2024). The study was retrospective, utilizing data acquired during clinical care without any supplementary interventions or sampling. Given that the study involved minimal risk and all personally identifiable information was removed, the committee granted a waiver of informed consent. Study design Data from 924 patients who underwent TURBT between November 2020 and June 2024 were retrospectively reviewed. Patients who underwent TURBT for recurrent tumors, those with histopathological diagnoses indicating benign lesions or metastases from other primary malignancies, and those who did not undergo preoperative MRI in accordance with the VI-RADS protocol were excluded from the study. After applying these exclusion criteria, a total of 372 patients were included in the final analysis (Figure-1). A total of 8 clinical (patient’s sex and age; history of smoking, macroscopic, and microscopic hematuria, hypertension, coronary artery disease, antiplatelet medication use, and chemical exposure) and 14 biochemical parameters (hemoglobin, white blood cell count (WBC), platelet count, serum creatinine, activated partial thromboplastin time (aPTT), international normalized ratio (INR), serum glucose, sodium, potassium, alanine aminotransferase (ALT), aspartate aminotransferase (AST), and urinalysis findings including erythrocyte count, leukocyte count, and urine specific gravity), selected by the investigators, were collected along with each patient’s pathological findings and VI-RADS scores. Complete urinalysis (CUA) parameters were obtained from urine samples provided at the time of the patients’ initial outpatient evaluation, whereas biochemical parameters were derived from routine preoperative blood tests performed at the time the surgical decision was made. Imaging and pathological assessments were conducted in a mutually blinded fashion. Radiologists were blinded to the histopathological outcomes, while pathologists performed their evaluations independently of the VI-RADS findings. Pathological Assessment The pathological evaluation was based on specimens obtained from the initial TURBT. In cases where a repeat TURBT (re-TURBT) was performed, the pathology report indicating the more advanced stage between the two procedures was considered for analysis. All pathological assessments were performed and reported by the department of pathology at our institution. For the purposes of this study, pathological stages were categorized into two groups: <T2 (non-muscle-invasive) and ≥T2 (muscle-invasive) disease. Multiparametric magnetic resonance imaging protocol Magnetic resonance imaging (MRI) was performed in accordance with VI-RADS protocol using a 3.0 T scanner (Verio; Siemens, Erlangen, Germany). The multiplanar imaging capability of MRI minimized partial volume effects and optimized visualization for assessing the extent of muscular layer invasion. For contrast-enhanced T1-weighted sequences, a gadolinium-based, hydrophilic, nonionic macrocyclic contrast agent—gadobutrol (Gd-BT-DO3A; Gadovist, Bayer Schering Pharma AG, Berlin, Germany)—was administered as an intravenous bolus at a dose of 0.1 mL/kg using an MRI-compatible injector. To ensure adequate bladder distention, patients were instructed to drink 500–1000 mL of water approximately 30 minutes before imaging. All multiparametric MRI studies were interpreted by a single experienced urogenital radiologist. Data Processing Patients with more than 10% missing data were excluded from the study. For the remaining 16 patients with partially missing values, numerical variables were imputed using the median of the corresponding feature, whereas categorical variables were imputed using the mode. The dataset was randomly divided into a training set (80%) and a validation set (20%). In the training set, 214 cases were classified as <T2 and 82 cases as ≥T2 based on pathological staging. To address class imbalance, Synthetic Minority Over-sampling Technique (SMOTE) was applied exclusively to the training data after the train–validation split. Feature Selection and Machine Learning Feature selection was performed using a Mutual Information (MI)–based filter approach implemented in Python (version 3.13). The relationship between each independent variable and the target variable (pathological outcome) was calculated individually using the scikit-learn library. The resulting MI scores were ranked in descending order, and the nine most informative variables were selected for model construction. Subsequently, multiple supervised learning algorithms were developed, including logistic regression (LogReg), decision tree (DT), random forest (RF), Naïve Bayes, support vector machine (SVM), k-nearest neighbors (KNN), Light Gradient Boosting Machine (LightGBM), and Extreme Gradient Boosting (XGBoost). These models were implemented using the NumPy , Pandas , scikit-learn , LightGBM , and XGBoost libraries. In addition, a deep neural network (DNN) model was constructed using the TensorFlow framework. All models were tuned using a fivefold cross-validation approach implemented through the GridSearchCV function in scikit-learn . The model exhibiting the highest predictive power was considered optimal for subsequent analyses. For the DT model, tree depth was varied from 1 to 10, and both Gini and entropy criteria were evaluated. For RF, the number of trees ranged from 300 to 900, with Gini as the splitting criterion. Additional parameters explored included maximum depth (None, 10, 20, 30), minimum samples for split (2, 5, 10), minimum samples per leaf (1, 2, 4), and maximum features ( sqrt , log2 ). For SVM, linear and radial basis function (RBF) kernels were assessed, with C ranging from 10⁻³ to 10³ and gamma tuned between scale and auto . For KNN, the number of neighbors was varied between 3 and 21, with weighting schemes ( uniform , distance ) and distance metrics ( Minkowski , p = 1 or 2, corresponding to Manhattan and Euclidean distances). For XGBoost, the learning rate ( eta ) ranged from 0.01 to 0.2, maximum depth values of 3, 5, and 7 were tested, and subsample ratios of 0.8 and 1.0 were evaluated. For LightGBM, the number of leaves (15, 31, 63), maximum depth (−1, 5, 10), and subsample ratios (0.8, 1.0) were examined. The DNN architecture was optimized by varying the number of hidden layers (1–3), neurons per layer (32, 64, 128), dropout rates (0.0, 0.2, 0.4), learning rates (0.001, 0.0003), number of epochs (50, 100), and batch sizes (32, 64). The ReLU activation function was used for hidden layers and sigmoid for the output layer, with optimization performed using the Adam algorithm. Statistical Analysis Descriptive analyses were performed using the Python programming language (version 3.13). The Pandas, NumPy, and SciPy libraries were utilized for statistical analysis. Continuous variables were presented as mean ± standard deviation and median values, whereas categorical variables were expressed as counts and percentages. The normality of continuous variables was assessed using the Kolmogorov-Smirnov test. Continuous variables were evaluated with the independent-samples t test or the Mann–Whitney U test according to their distributional properties, whereas categorical variables were assessed using the chi-square test. Subsequently, to establish a baseline for comparison with machine learning models, a logistic regression model was constructed using only VI-RADS scores and pathological outcomes. A receiver operating characteristic (ROC) curve was generated, and the corresponding area under the curve (AUC) was calculated. All the models were then calibrated with isotonic regression. Differences in predictive performance among models and the VI-RADS only model were evaluated using the bootstrap test and Delong’s test. The model with the best predictive performance was selected by the researchers for subsequent analysis. A calibration curve was drawn to test its calibration, slope, and intercept values, and the Brier score was calculated. The Brier scores of the best predictive model and the VI-RADS–based logistic regression model were statistically compared using a paired bootstrap test. Finally, the calibrated and uncalibrated versions of the model with the highest AUC were evaluated using decision curve analysis (DCA), with 95% confidence intervals presented. The maximum net benefit (NB), threshold probability–dependent net benefits, and net intervention reduction per 100 patients (NRI/100) were computed, and a Clinical Impact Curve was generated for the calibrated model. A p-value of <0.05 was considered statistically significant. Results Differences in demographic characteristics, clinical features, and laboratory parameters between the two groups are summarized in Tables 1 and 2. Only the patient age exhibited a normal distribution. Patient age, VI-RADS score, erythrocyte and leukocyte counts in urinalysis, history of macroscopic hematuria, as well as serum hemoglobin, creatinine, sodium, platelet, and ALT levels were found to be significantly associated with tumor stage. Table-1: Continous Variables Variable (unit) Non–muscle invasive group (n = 268) Muscle invasive group (n= 104) p-value Age (years) 64.66 ± 10.73 69.82 ± 9.88 <0.001 † Hemoglobin (g/dL) 13.80 [2.70] 12.80 [3.35] <0.001 ‡ White Blood Cell (×10⁹/L) 8.18 [3.16] 8.40 [3.06] 0.122 ‡ Platelet (×10⁹/L) 258.00 [104.00] 279.00 [115.50] 0.002 ‡ Creatinine (mg/dL) 0.93 [0.29] 1.12 [0.52] <0.001 ‡ Activated Partial Thromboplastin Time (s) 29.30 [5.30] 29.80 [4.35] 0.278 ‡ International Normalized Ratio (INR) 1.00 [0.09] 1.00 [0.07] 0.520 ‡ Erythrocyte count (×10⁶/µL) 28.00 [260.75] 409.00 [310.50] <0.001 ‡ Urinary Leukocyte (/HPF) 4.00 [13.00] 16.00 [40.00] <0.001 ‡ Urine Specific Gravity 1017.00 [10.25] 1015.00 [9.00] 0.058 ‡ Smoking (pack-years) 30.00 [45.50] 25.00 [45.00] 0.168 ‡ Glucose (mg/dL) 101.00 [38.50] 106.00 [39.50] 0.290 ‡ Sodium (mmol/L) 141.00 [3.00] 139.00 [3.00] 0.002 ‡ Potassium (mmol/L) 4.48 [0.53] 4.51 [0.52] 0.694 ‡ Alanine Aminotransferase (U/L) 16.00 [9.00] 15.00 [9.50] 0.005 ‡ Aspartate Aminotransferase (U/L) 17.00 [7.00] 16.00 [6.00] 0.251 ‡ †: Independent-samples t -test; ‡: Mann–Whitney U test Table-2: Categorical Variables Variable Category Non–muscle invasive group (n = 268) Muscle invasive group (n= 104) p-value VI-RADS score No Visible Lesion 28 (10.4%) 0 (0.0%) <0.001 § 1 37 (13.8%) 1 (1.0%) 2 122 (45.5%) 5 (4.9%) 3 73 (27.2%) 29 (28.2%) 4 5 (1.9%) 23 (22.3%) 5 3 (1.1%) 45 (43.7%) Macroscopic hematuria Absent 232 (86.6%) 59 (57.3%) <0.001 § Present 36 (13.4%) 44 (42.7%) Hypertension (HT) Absent 145 (54.1%) 44 (42.7%) 0.0645 § Present 123 (45.9%) 59 (57.3%) Coronary artery disease (CAD) Absent 202 (75.4%) 69 (67.0%) 0.1339 § Present 66 (24.6%) 34 (33.0%) Antiplatelet use No 181 (67.5%) 60 (58.3%) 0.1194 § Yes 87 (32.5%) 43 (41.7%) Yes 28 (10.4%) 14 (13.6%) Chemical exposure No 251 (93.7%) 97 (94.2%) 1.0000 § Yes 17 (6.3%) 6 (5.8%) Sex Female 40 (14.9%) 22 (21.4%) 0.1828 § Male 228 (85.1%) 81 (78.6%) §: Chi-square test After splitting the dataset into training and validation subsets, SMOTE was applied to the training data to address class imbalance. To reduce data noise and improve model performance, a MI based filter approach was then used for feature ranking. The nine highest-ranked features were selected and used for model construction and validation (Table 3). For baseline comparison, a logistic regression model was developed using only the VI-RADS score (Figure 2). The accuracy of this model was calculated as 85.3%. The VI-RADS-based model correctly classified non–muscle-invasive cases (pathology < T2) with a recall of 96.3%, whereas the recall for muscle-invasive cases (pathology ≥ T2) was 57.1%. The F1 score was 68.6%. Although the overall sensitivity and specificity were balanced, the model showed lower sensitivity in predicting muscle invasion. The ROC curve was plotted, yielding an AUC of 0.89, and the corresponding confusion matrix was generated (Figure 3). Table-3: Parameters Used in the Models and Their MI Scores Feature MI Score VI-RADS 0.275 Erythrocyte (urinalysis) 0.090 Leukocyte (urinalysis) 0.067 Age 0.048 Platelet count 0.030 History of macroscopic hematuria 0.024 Urine specific gravity 0.018 White blood cell count (WBC) 0.017 Alanine aminotransferase (ALT) 0.014 Machine learning models were constructed and subsequently calibrated using isotonic regression. ROC curves were generated for both calibrated and uncalibrated models (Figures 4 and 5). Among the calibrated models, logistic regression achieved an accuracy of 87%, precision of 79%, F1 score of 75%, recall of 71%, and an AUC of 0.94 when trained with the top nine features. The decision tree model reached 84% accuracy, 85% precision, 65% F1 score, 52% recall, and an AUC of 0.91. The random forest model demonstrated the strongest overall performance, with 89% accuracy, 84% precision, 80% F1 score, 76.2% recall, and an AUC of 0.95. The support vector machine (SVM) achieved 83% accuracy, 67% precision, 71% F1 score, 76% recall, and an AUC of 0.95. The k-nearest neighbors (KNN) model yielded 81% accuracy, 65% precision, 68% F1 score, 71% recall, and an AUC of 0.90. The Naïve Bayes model achieved 80% accuracy, 64% precision, 65% F1 score, 67% recall, and an AUC of 0.93. The XGBoost model reached 85% accuracy, 73% precision, 74% F1 score, 76% recall, and an AUC of 0.92. The LightGBM model showed 87% accuracy, 79% precision, 75% F1 score, 71% recall, and an AUC of 0.92. The deep neural network (DNN) model achieved an accuracy of 86.7%, precision of 82.4%, F1 score of 73.7%, recall of 66.7%, specificity of 94.4%, and an AUC of 0.90 (Table 4). Calibration curves were plotted for all models, and their Brier scores were calculated. Table-4: Characteristics of the Models Model AUC Brier Score Accuracy Precision Recall F1 Score Logistic Regression (VI-RADS only) 0.887 0.1074 0.853 0.857 0.571 0.686 Support Vector Machine (SVM) 0.947 0.0927 0.827 0.667 0.762 0.711 Random Forest 0.946 0.0790 0.893 0.842 0.762 0.800 Logistic Regression 0.944 0.0832 0.867 0.789 0.714 0.750 Naïve Bayes 0.930 0.1034 0.800 0.636 0.667 0.651 XGBoost 0.918 0.1012 0.853 0.727 0.762 0.744 Decision Tree 0.906 0.0982 0.840 0.846 0.524 0.647 k-Nearest Neighbors (KNN) 0.903 0.1223 0.813 0.652 0.714 0.682 LightGBM 0.896 0.1062 0.867 0.789 0.714 0.750 Deep Neural Network (DNN) 0.900 0.1152 0.867 0.824 0.667 0.737 Among the calibrated models, the highest AUC value was observed in the Support Vector Machine (SVM) model (0.947), whereas the lowest belonged to the LightGBM model (0.896). Overall, the Random Forest model demonstrated superior performance, characterized by high accuracy, strong F1 score, high specificity, and balanced sensitivity, while also achieving the lowest Brier score among all models. The remaining algorithms exhibited moderate to high ranges of accuracy and F1 scores. The predictive performances of all machine learning models were compared with that of the model constructed using only the VI-RADS score, employing both the Bootstrap and DeLong tests (Table 5). According to the Bootstrap analysis, the predictive performances of the Random Forest, SVM, and Logistic Regression models were statistically superior to the VI-RADS–only model, with p-values of 0.005, 0.030, and 0.0345, respectively. In contrast, the DeLong test indicated that only the Random Forest model demonstrated a statistically significant improvement in predictive performance compared with the VI-RADS–only model (p = 0.030). Table-5: Comparison of the Models with the VI-RADS–Only Model Model Bootstrap p-value DeLong p-value Support Vector Machine 0.0300 0.075 Random Forest 0.0050 0.030 Logistic Regression 0.0345 0.091 Naïve Bayes 0.1220 0.274 XGBoost 0.1605 0.317 Decision Tree 0.2175 0.444 k-Nearest Neighbors 0.3445 0.702 LightGBM 0.3845 0.756 Deep Neural Network 0.1805 0.354 Precision–recall (Figure 6) and calibration curves (Figure 7) were generated for the Random Forest model, which exhibited the highest predictive performance. The calibration slope and intercept were calculated as 1.16 and −0.039, respectively. The Brier scores for the logistic regression model based solely on the VI-RADS score and for the calibrated Random Forest model were 0.107 and 0.079, corresponding to Brier Skill Scores (BSS) of 37.6 and 54, respectively. A paired bootstrap test was conducted to compare these two models, yielding a p-value of 0.027 (95% CI: −0.0573 to −0.0031), indicating a statistically significant difference. For the same Random Forest model, DCA and clinical impact curves were plotted with 95% confidence intervals (Figure 8). The maximum net benefit of the model was calculated as 0.26. Based on different threshold probabilities, the corresponding net benefits and the net reduction in interventions per 100 patients (NAI/100) were calculated and summarized in Table 6. Table-6: Net Benefit and NAI/100 Values of the Random Forest Model Threshold Probability (pₜ) Net Benefit (Model) Net Benefit (Treat All) NAI / 100 Patients Positive (%) True Positive (%) 0.05 263 239 42.7 57.3 28.0 0.10 234 195 33.4 41.3 25.3 0.20 212 90 46.2 34.7 24.0 0.30 184 -46 50.9 32.0 22.7 0.40 184 -178 56.9 29.3 22.7 0.50 174 -416 61.1 22.7 20.0 0.60 134 -775 62.1 20.0 17.3 Discussion Since its introduction by Panebianco et al. in 2018, the VI-RADS system has been increasingly utilized to predict the stage of bladder cancer (7). Initially, small-scale studies aimed at testing its accuracy and applicability were conducted; over time, these were followed by multicenter prospective studies evaluating interobserver agreement, sensitivity, and specificity of the system (8, 9). Different versions and adaptations of VI-RADS have subsequently emerged, demonstrating that the scoring system remains reliable despite variations in image quality, MRI protocols, and observer experience across centers (10). The system has now been incorporated into both European Association of Urology (EAU) and American Urological Association (AUA) guidelines (4, 11) and has become progressively integrated into routine clinical practice. One notable example of this integration is its use as an important reference point during multidisciplinary tumor board discussions of cases with suspected muscle invasion. However, the increasing clinical adoption of VI-RADS has also prompted new research directions. The observer-dependent nature of radiologic scoring and the variability in image quality have highlighted the need for more objective and automated approaches. In this context, artificial intelligence and machine learning–based methods hold promise for enhancing diagnostic accuracy by supporting the interpretation of VI-RADS scores. Numerous studies in the literature have investigated the integration of VI-RADS with machine learning, most of which have focused on augmenting the score with radiomic features (12, 13). Machine learning has shown substantial progress over the past decade across multiple disciplines. Applications have been developed using diverse data types, including genomic, transcriptomic, and proteomic data, imaging data, electronic health records, and clinical laboratory data (14, 15). One of the important applications of machine learning, as in the present study, is the improvement of diagnostic accuracy. Obermeyer et al. suggested that applying machine learning to clinical laboratory data could markedly enhance both prognostic and diagnostic precision (16). Accordingly, a decision-support system based on VI-RADS scores and clinical laboratory parameters may provide a rapid and cost-effective solution for improving diagnostic accuracy. In our study, clinical laboratory data were integrated with machine learning algorithms to improve the preoperative prediction of bladder cancer stage. One of the primary challenges encountered was the imbalance of the dataset. An unequal distribution between classes is referred to as imbalanced data (17). Numerous studies have demonstrated that imbalanced datasets can cause classification problems during the training of machine learning algorithms (18). Similarly, in our dataset, the class imbalance was expected to adversely affect the classification performance of the models. To mitigate this issue, we applied SMOTE, one of the most widely used oversampling methods. After splitting the dataset into training and test subsets, SMOTE was applied exclusively to the training data to synthetically generate new samples for the minority class, thereby balancing the class distribution. This approach is widely recognized to improve classification performance and enhance the generalizability of machine learning models (19). Another challenge in this study was the presence of missing data in some patients. Patients without VI-RADS scores were excluded at the beginning of the study. Since the remaining variables were routinely collected during preoperative evaluation, missing data were limited to only 16 patients. To ensure data integrity and model reliability, patients with more than 10% missing data were excluded from the analysis. The literature reports several effective methods to address missing data, including collecting additional samples, excluding incomplete cases, imputing missing values using mean or median substitution, and applying data imputation techniques (20). In our study, missing continuous variables were imputed with the median value of the corresponding variable group, whereas missing categorical variables were replaced with the mode of their respective groups. The associations between biochemical parameters and bladder cancer stage and prognosis have been investigated in numerous prior studies. Although anemia has not been consistently associated with stage, it has been linked to prognosis. Additionally, studies have reported that hematologic indices incorporating WBC and platelet counts correlate with higher tumor stage. Advanced-stage bladder cancer, due to its higher tumor burden, may induce a more pronounced inflammatory response, thereby affecting WBC and platelet levels more markedly than in early-stage disease (21, 22). The presence of macroscopic hematuria has also been reported to be associated with muscle invasion (23). In advanced-stage tumors, more prominent hematuria may lead to greater reductions in hemoglobin levels compared with early-stage disease, potentially explaining the observed correlation between lower hemoglobin and higher tumor stage. Similarly, advanced-stage tumors may contribute to alterations in creatinine and electrolyte levels through multiple mechanisms. In our analysis, urinary parameters such as erythrocyte count, leukocyte count, and urine specific gravity—used in our models—were also found to be significantly associated with tumor stage. Previous studies have shown that pyuria may be related to higher stage and grade in bladder cancer (24). Furthermore, several studies have reported a positive correlation between patient age and bladder cancer stage (25). The emergence of these variables through mutual information–based feature selection demonstrates that the models rely not only on complex mathematical representations but also on clinically interpretable and pathophysiologically meaningful patterns. In contrast, comorbidities, smoking history, and chemical exposure were not found to be associated with tumor stage in our dataset and were therefore excluded from model development. Ueno et al. reported, in a multicenter and multi-reader study, AUC values of 0.84 (range, 0.83–0.85) for inexperienced readers and 0.88 (range, 0.82–0.91) for experienced readers, with an overall mean AUC of 0.87 for all readers (8). In our study, the logistic regression model constructed using only the VI-RADS and pathology results yielded an AUC of 0.89, consistent with findings reported in the literature. The principal finding of our research, however, is that the inclusion of simple clinical and laboratory parameters alongside VI-RADS improved the discriminative ability and probability calibration for predicting muscle invasion compared with an image-only approach. While the logistic regression model based solely on VI-RADS demonstrated high accuracy, it showed limited sensitivity in identifying ≥T2 cases. In contrast, calibrated methods such as RF, SVM, and logistic regression exhibited statistically superior AUC values according to the Bootstrap test, and the RF model demonstrated a significantly lower Brier score and higher BSS compared with the VI-RADS-only model. These findings suggest that easily obtainable indicators such as hematologic, biochemical, and clinical history parameters provide additional independent information beyond imaging-based risk and may contribute to clinical benefit. The PROBAST+AI framework emphasizes the importance of evaluating model calibration and explicitly recommends its analysis (26). For our isotonic regression–calibrated RF model, the calibration slope and intercept were calculated as 1.16 and −0.039, respectively. Although the slope value was slightly greater than 1, its proximity to 1 indicates a mild tendency toward underfitting; the model showed a slight optimism at lower probability ranges but remained overall acceptably calibrated. We believe the clinical relevance of these calibration results deserves particular attention. The significantly lower Brier score of the isotonic-calibrated RF model compared with the VI-RADS-only logistic regression model indicates not only superior predictive accuracy but also that the probability estimates produced by the RF model are more consistent with clinical reality. Such calibration improvement may reduce the risk of overtreatment or delayed intervention near critical therapeutic thresholds, such as decisions regarding early radical cystectomy or neoadjuvant chemotherapy. To further interpret the clinical utility of our model, the DCA results warrant discussion. The NB curve of the model remained above both the “treat all” and “treat none” strategies, demonstrating a clear clinical advantage. Alongside NB values calculated for different threshold probabilities, we also compared the proportions of predicted positive cases and true positives. When the threshold probability was set at 5%, corresponding to the highest calculated net benefit, treating patients with probabilities above this threshold as T2 cases would correctly identify all true positives but would result in 29.7 false positives per 100 patients. When the threshold was increased to 50%, only 2.7 false positives per 100 patients were observed, at the cost of missing 8 true positive cases. We suggest that this threshold selection should be guided by clinician experience and preference. The integration of machine learning models into clinical practice should be evaluated not only in terms of accuracy and calibration but also within the framework of ethical responsibility. Data were anonymized, securely stored on encrypted institutional servers, and accessible only to authorized researchers. To minimize algorithmic bias, particular care was taken during both the study design and model development phases. Methodologically, one of the main internal validity strengths of this study was the independent and blinded reporting by radiologists and pathologists, ensuring that imaging data, including VI-RADS scores, did not influence pathology results. In addition, SMOTE was applied strictly within the training dataset, and imputation, scaling, and feature selection were performed within a cross-validation pipeline to prevent data leakage and optimism bias, which are common in imbalanced clinical datasets. There are, however, several limitations to this study. It is a single-center, retrospective analysis. The VI-RADS score may be affected by protocols and observer variability across institutions. The use of a single reader and the lack of modeling for MRI protocol differences may limit the generalizability of our findings. Therefore, recalibration (e.g., using Platt scaling or isotonic regression) may be necessary before applying the model in external settings. Although SMOTE enhanced class separation by enriching class boundaries, it may have affected the probability distributions; therefore external validation is warranted. Another limitation is the sample size: the study included 372 patients, and the relatively modest performance of the DNN model may be attributable to insufficient sample volume. Finally, this study demonstrates that the integration of low-cost, easily accessible laboratory parameters with VI-RADS can provide a scalable contribution to clinical decision support. Our findings suggest that enhanced preoperative stage prediction could enable more personalized decision-making in processes such as TUR-BT technique selection, early referral for neoadjuvant chemotherapy or radical cystectomy, and patient counseling. The next logical step is to evaluate the model’s portability and recalibration needs through multicenter, multi-reader, time-based external validation and to assess the feasibility and clinical impact (in terms of time, cost, and patient outcomes) of integrating a simple electronic decision-support tool (VI-RADS plus selected parameters) into clinical workflow. Since our study focused on preoperative stage prediction using routinely available parameters, non-routine laboratory assays were excluded. Future studies may explore the inclusion of less conventional variables that could improve stage prediction, as well as the use of VI-RADS and additional parameters to predict prognosis or recurrence risk. Moreover, applying radiomics-based machine learning could further enhance model performance by allowing artificial intelligence to generate a combined imaging–clinical prediction with higher calibration and interpretability. Such a model could eventually be deployed as a simple online calculator or mobile application for clinical use. In summary, our study demonstrated that incorporating clinical and laboratory signals into VI-RADS–based prediction provides significant improvements in both discrimination and calibration performance, while maintaining internal validity through a leakage-free workflow and blinded evaluation. With further external validation and decision-based analyses, this approach has the potential to evolve into a practical clinical decision-support tool for preoperative staging in bladder cancer. Conclusion Machine learning algorithms, particularly when integrated with multiple clinical and biochemical parameters, serve as robust tools for predicting muscle invasion in bladder tumors, demonstrating superior discriminatory and calibration performance compared with the use of the VI-RADS alone. If validated by multicenter prospective studies, VI-RADS–based hybrid machine learning models may serve as practical and scalable tools for clinical decision support. This approach could facilitate more precise preoperative risk stratification and improve multidisciplinary decision-making in bladder cancer management. Declarations Consent to Participate: Because this study was retrospective and involved no interventions beyond routine clinical care, the Institutional Review Board granted a waiver of informed consent. All patient data were anonymized prior to analysis. Ethics Approval: This study was approved by the Scientific Research Ethics Committee of Başakşehir Çam and Sakura City Hospital (Approval No: 256; Date: October 16, 2024). Funding: The authors received no financial support for the research, authorship, or publication of this article. Clinical trial number: not applicable References GLOBOCAN 2022: Bladder Cancer 9th Most Common Worldwide. World Bladder Cancer Patient Coalition, 14 Feb. 2024. https://worldbladdercancer.org/news_events/globocan-2022-bladder-cancer-is-the-9th-most-commonly-diagnosed-worldwide/ Zhang Y, Rumgay H, Li M, Yu H, Pan H, Ni J. The global landscape of bladder cancer incidence and mortality in 2020 and projections to 2040. J Glob Health. 2023;13:04109. 10.7189/jogh.13.04109 . PMID: 37712386; PMCID: PMC10502766. Cumberbatch MG, Kwesi et al. Dec. Epidemiology of Bladder Cancer: A Systematic Review and Contemporary Update of Risk Factors in 2018. European Urology, vol. 74, no. 6, 2018, pp. 784–95. 10.1016/j.eururo.2018.09.001 European Association of Urology. EAU Guidelines on Muscle-Invasive and Metastatic Bladder Cancer, 2025. European Association of Urology; 2025. European Association of Urology. EAU Guidelines on Non-muscle-invasive Bladder Cancer, 2025. European Association of Urology; 2025. Lee C, Hung no et al. 6, June 2017, pp. 1193–205. 10.2214/ajr.16.17114 Panebianco V, Narumi Y, Altun E, Bochner BH, Efstathiou JA, Hafeez S, Huddart R, Kennish S, Lerner S, Montironi R, Muglia VF, Salomon G, Thomas S, Vargas HA, Witjes JA, Takeuchi M, Barentsz J, Catto JWF. Multiparametric Magnetic Resonance Imaging for Bladder Cancer: Development of VI-RADS (Vesical Imaging-Reporting And Data System). Eur Urol. 2018;74(3):294–306. Epub 2018 May 10. PMID: 29755006; PMCID: PMC6690492. Ueno Y, et al. VI-RADS: Multiinstitutional Multireader Diagnostic Accuracy and Interobserver Agreement Study. Am J Roentgenol. May 2021;216(5):1257–66. 10.2214/ajr.20.23604 . Jazayeri SB et al. Diagnostic Accuracy of Vesical Imaging-Reporting and Data System (VI-RADS) in Suspected Muscle Invasive Bladder Cancer: A Systematic Review and Diagnostic Meta-Analysis. Urologic Oncology: Seminars Original Investigations, 40, 2, pp. 45–55, 10.1016/j.urolonc.2021.11.008 Pecoraro M et al. Mar. Multiparametric MRI for Bladder Cancer: A Practical Approach to the Clinical Application of VI-RADS. Radiology, 314, 3, 2025, 10.1148/radiol.233459 Holzbeierlein J et al. Diagnosis and Treatment of Non–Muscle Invasive Bladder Cancer: AUA/SUO Guideline—2024 Amendment. The Journal of Urology, 2024. American Urological Association. Zheng Z, Xu F, Gu Z, Yan Y, Xu T, Liu S, Yao X, Combining Multiparametric. Score to Preoperatively Differentiate Muscle Invasion of Bladder Cancer. Front Oncol. 2021;11:619893. PMID: 34055600; PMCID: PMC8155615. MRI Radiomics Signature With the Vesical Imaging-Reporting and Data System (VI-RADS). Wang W. Integrating Radiomics with the Vesical Imaging-Reporting and Data System to Predict Muscle Invasion of Bladder Cancer. Urologic Oncology: Seminars and Original Investigations, vol. 41, no. 6, p. 294 .e1-294.e8 , doi:10.1016/j.urolonc.2022.10.024. Lin M, et al. Machine learning and multi-omics integration: advancing cardiovascular translational research and clinical practice. J Transl Med. 2025;23:388. https://doi.org/10.1186/s12967-025-06425-2 . You J, et al. Advancing Laboratory Medicine Practice With Machine Learning: Swift yet Exact. Ann Lab Med. 2025;45(1):22–35. 10.3343/alm.2024.0354 . Epub 2024 Nov 26. PMID: 39587856; PMCID: PMC11609717. Obermeyer Z, Emanuel EJ. Predicting the Future-Big Data, Machine Learning, and Clinical Medicine. N Engl J Med. 2016;375:1216–9. Kaur H, Pannu HS, Malhi AK. A Systematic Review on Imbalanced Data Challenges in Machine Learning: Applications and Solutions. ACM Comput Surv. 2019;52:79. Aubaidan BH, et al. A review of intelligent data analysis: Machine learning approaches for addressing class imbalance in healthcare - challenges and perspectives. Intell Data Anal. 2025;29(3):699–719. 10.1177/1088467X241305509 . Elreedy D. A Comprehensive Analysis of Synthetic Minority Oversampling Technique (SMOTE) for Handling Class Imbalance. Information Sciences, 505, pp. 32–64, 10.1016/j.ins.2019.07.070 Wei R, Wang J, Su M, Jia E, Chen S, Chen T, Ni Y. Missing Value Imputation Approach for Mass Spectrometry-based Metabolomics Data. Sci Rep. 2018;8:663. Chen C et al. Preoperative Anemia as a Simple Prognostic Factor in Patients with Urinary Bladder Cancer. Med Sci Monit. 2017;23:3528–3535. 10.12659/msm.902855 . PMID: 28723884; PMCID: PMC5531533. Li DX, et al. Prognostic value of preoperative neutrophil-to-lymphocyte ratio in histological variants of non-muscle-invasive bladder cancer. Investig Clin Urol. 2021;62(6):641–9. https://doi.org/10.4111/icu.20210278 . Jakus D, et al. The Impact of the Initial Clinical Presentation of Bladder Cancer on Histopathological and Morphological Tumor Characteristics. J Clin Med. 2023;12(13):4259. 10.3390/jcm12134259 . PMID: 37445294; PMCID: PMC10342402. Poletajew Set al, et al. Preoperative pyuria predicts the presence of high-grade bladder carcinoma in patients with bladder tumors. Cent Eur J Urol. 2020;73(4):423–36. 10.5173/ceju.2020.0289 . Epub 2020 Dec 3. PMID: 33552566; PMCID: PMC7848834. Lin W, et al. Impact of Age at Diagnosis of Bladder Cancer on Survival: A Surveillance, Epidemiology, and End Results-Based Study 2004–2015. Cancer Control: J Moffitt Cancer Cent. 2023;30. https://doi.org/10.1177/10732748231152322 . Wolff RF, et al. PROBAST: A Tool to Assess the Risk of Bias and Applicability of Prediction Model Studies. Ann Intern Med. 2019;170(1):51–8. Additional Declarations No competing interests reported. Cite Share Download PDF Status: Under Review Version 1 posted Reviews received at journal 06 May, 2026 Reviewers agreed at journal 30 Apr, 2026 Reviewers invited by journal 29 Mar, 2026 Editor invited by journal 17 Mar, 2026 Editor assigned by journal 14 Mar, 2026 Submission checks completed at journal 14 Mar, 2026 First submitted to journal 10 Mar, 2026 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-9081158","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":614082565,"identity":"66216dae-3c71-41be-a64c-d5ddbb0caed7","order_by":0,"name":"Batu Akalin","email":"","orcid":"","institution":"Department of Urology, Health Science University, Basaksehir Cam Sakura City Hospital","correspondingAuthor":false,"prefix":"","firstName":"Batu","middleName":"","lastName":"Akalin","suffix":""},{"id":614082568,"identity":"ceef9eac-84a0-45d0-b774-6dc92f772a43","order_by":1,"name":"Akif Erbin","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA40lEQVRIiWNgGAWjYDACHsaGDwwMB3gY2BuAPAMLorQ0zgBr4TkA0iJBjBYGRpAWBgaJBBCXCC38PYcbGz7uuSNjLvn86oYfBRIM/O3dCXi1SJxtbGyc8ewZj+XsnLKbPUCHSZw5uwG/NecZ2x/zHDjMY3A7J+0GD1CLgUQufi3y5xkbm/+AtNw8k3bzDzFaDIAOa2YAabnBfuw2UbYYnjnY2Nhz4BmPwZkcttsyBhI8BP0idyb9YcOPA3fsDY4ff3bzzR8bOf72XgLeRwAeAzBJrHIQYH9AiupRMApGwSgYQQAAJWBQ5zqAwx4AAAAASUVORK5CYII=","orcid":"","institution":"Department of Urology, Health Science University, Basaksehir Cam Sakura City Hospital","correspondingAuthor":true,"prefix":"","firstName":"Akif","middleName":"","lastName":"Erbin","suffix":""},{"id":614082569,"identity":"dd86f6a4-c5e8-44e5-bb6e-d78edb3ef86d","order_by":2,"name":"Merve Sam Ozdemir","email":"","orcid":"","institution":"Department of Radiology, Health Science University, Basaksehir Cam Sakura City Hospital","correspondingAuthor":false,"prefix":"","firstName":"Merve","middleName":"Sam","lastName":"Ozdemir","suffix":""},{"id":614082574,"identity":"2d4cfe75-8a0b-4573-9754-9f7532aa0370","order_by":3,"name":"Halil Lutfi Canat","email":"","orcid":"","institution":"Department of Urology, Health Science University, Basaksehir Cam Sakura City Hospital","correspondingAuthor":false,"prefix":"","firstName":"Halil","middleName":"Lutfi","lastName":"Canat","suffix":""}],"badges":[],"createdAt":"2026-03-10 08:24:13","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-9081158/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-9081158/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":106069032,"identity":"7ac699a2-485a-49df-aee5-34f8486e29ca","added_by":"auto","created_at":"2026-04-03 06:22:14","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":50912,"visible":true,"origin":"","legend":"\u003cp\u003eFlowchart of the study\u003c/p\u003e","description":"","filename":"1.png","url":"https://assets-eu.researchsquare.com/files/rs-9081158/v1/c1a22c4369a2c2859a16f153.png"},{"id":106094955,"identity":"e745e5ea-e92e-4f40-a2e9-9d20af5ececc","added_by":"auto","created_at":"2026-04-03 11:43:45","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":45358,"visible":true,"origin":"","legend":"\u003cp\u003eROC Curve of the Model Constructed Using Only VI-RADS Score\u003c/p\u003e","description":"","filename":"2.png","url":"https://assets-eu.researchsquare.com/files/rs-9081158/v1/8223ae4b6ada9f4248af670e.png"},{"id":106094830,"identity":"d3ef0b43-33e9-4949-a6c4-7209f74f47d6","added_by":"auto","created_at":"2026-04-03 11:43:23","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":21164,"visible":true,"origin":"","legend":"\u003cp\u003eConfusion Matrix of the Logistic Regression Model Constructed Using Only VI-RADS Score\u003c/p\u003e","description":"","filename":"3.png","url":"https://assets-eu.researchsquare.com/files/rs-9081158/v1/1907a9afce268c9c15eb28e8.png"},{"id":106069034,"identity":"312e5478-3cf9-4a3b-919c-706446906e0f","added_by":"auto","created_at":"2026-04-03 06:22:14","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":59864,"visible":true,"origin":"","legend":"\u003cp\u003eUncalibrated Machine Learning Models\u003c/p\u003e","description":"","filename":"4.png","url":"https://assets-eu.researchsquare.com/files/rs-9081158/v1/b1dbe30f60f4c5ee3f2f2479.png"},{"id":106401726,"identity":"d968d5c5-ccd0-40d3-b7f4-f40ca03d29c7","added_by":"auto","created_at":"2026-04-08 09:09:19","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":69731,"visible":true,"origin":"","legend":"\u003cp\u003eMachine Learning Models Calibrated Using Isotonic Regression\u003c/p\u003e","description":"","filename":"5.png","url":"https://assets-eu.researchsquare.com/files/rs-9081158/v1/a4e6f076ad79d0d93b8aad0c.png"},{"id":106069037,"identity":"65e9bcc0-0d43-4ac9-94b0-09e1424e67d6","added_by":"auto","created_at":"2026-04-03 06:22:14","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":29949,"visible":true,"origin":"","legend":"\u003cp\u003ePrecision–Recall Curve of the Random Forest Model\u003c/p\u003e","description":"","filename":"6.png","url":"https://assets-eu.researchsquare.com/files/rs-9081158/v1/256668b54c4771a92ba224fc.png"},{"id":106069036,"identity":"65fdf4b6-30b1-453f-9de5-20f86dc4d7e8","added_by":"auto","created_at":"2026-04-03 06:22:14","extension":"png","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":39000,"visible":true,"origin":"","legend":"\u003cp\u003eRandom Forest Calibration Curve\u003c/p\u003e","description":"","filename":"7.png","url":"https://assets-eu.researchsquare.com/files/rs-9081158/v1/33dfb87b2407b9b976480da8.png"},{"id":106069038,"identity":"d1f7862a-8fd2-4450-a1fa-e4657bd8e6d2","added_by":"auto","created_at":"2026-04-03 06:22:14","extension":"png","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":83125,"visible":true,"origin":"","legend":"\u003cp\u003eDCA and Clinical Impact Curves of the Random Forest Model\u003c/p\u003e","description":"","filename":"8.png","url":"https://assets-eu.researchsquare.com/files/rs-9081158/v1/06da46a00e9c99f34b46c724.png"},{"id":106405528,"identity":"e33d4269-d3a1-4551-8fde-55558486371b","added_by":"auto","created_at":"2026-04-08 09:27:04","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1710089,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-9081158/v1/24742ad9-b045-4da0-a720-35e08132d128.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Hybrid Machine Learning Models Integrating VI-RADS and Clinical Metrics for Bladder Cancer Staging","fulltext":[{"header":"Introduction","content":"\u003cp\u003eBladder cancer is a common malignancy worldwide, with approximately 614,000 new cases diagnosed annually in both men and women. According to GLOBOCAN 2022, it ranks as the ninth most frequently diagnosed cancer globally (\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e). The age-standardized incidence rate (ASIR) is estimated at 5.6 per 100,000 for men and 1.9 for women. Despite advances in diagnosis and treatment, both the incidence and mortality of bladder cancer continue to rise globally (\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e). The development of bladder cancer is influenced by both environmental and genetic factors. Tobacco smoking remains the predominant risk factor, responsible for nearly half of all cases and increasing the risk approximately threefold compared to non-smokers (\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e). Occupational exposure to aromatic amines and polycyclic aromatic hydrocarbons, particularly in the dye, textile, and leather industries, is also a significant contributor (\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eAccurate staging plays a central role in determining prognosis and guiding management. The TNM classification system, developed by the American Joint Comitee on Cancer (AJCC) and Union Internationale Contre le Cancer (UICC), is the most widely adopted method for assessing the depth of bladder wall invasion, regional lymph node involvement, and distant metastasis (\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e). Tumors below stage T2 are classified as non\u0026ndash;muscle-invasive bladder cancer (NMIBC), while T2 or higher tumors are considered muscle-invasive bladder cancer (MIBC). This distinction is clinically crucial, as NMIBC can often be managed with transurethral resection (TURBT), whereas MIBC typically requires radical cystectomy and/or multimodal treatment (\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eCystoscopy remains the gold standard for diagnosis, offering direct visualization of bladder lesions (\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e). However, it is invasive, may cause patient discomfort, and provides limited information about the depth of invasion. Ultrasonography and computed tomography (CT) are useful for upper urinary tract evaluation but have limited accuracy in assessing the extent of muscle invasion (\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e). Multiparametric magnetic resonance imaging (mpMRI), standardized through the Vesical Imaging-Reporting and Data System (VI-RADS), has shown promising diagnostic accuracy in predicting muscle invasion (\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e). Nevertheless, interobserver variability and differences in implementation across centers remain significant limitations.\u003c/p\u003e \u003cp\u003eAccurate preoperative staging is critical for optimizing treatment strategy. Misclassification of tumor stage can result in unnecessary morbidity or suboptimal therapy, directly affecting patient outcomes. While VI-RADS has improved the radiologic assessment of bladder cancer, it relies solely on imaging data without incorporating clinical or biochemical variables that may also contribute to stage prediction.\u003c/p\u003e \u003cp\u003eThis study hypothesizes that integrating imaging findings with clinical and biochemical data through machine learning algorithms can enhance the predictive performance for muscle invasion in bladder cancer. Therefore, we aimed to develop machine learning\u0026ndash;based models combining VI-RADS scores with relevant clinical and biochemical parameters to predict muscle invasion preoperatively. Furthermore, we evaluated these models against VI-RADS alone and discussed their potential clinical applicability as preoperative decision-support tools.\u003c/p\u003e"},{"header":"Materials and Methods","content":"\u003cp\u003e\u003cstrong\u003e\u003cem\u003eCompliance with ethical standards\u003c/em\u003e\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis research was conducted at the Urology Clinic of the University of Health Sciences Basaksehir Cam and Sakura City Hospital, a tertiary center in Istanbul, Turkey. This article presents a thesis study by Dr. Batu Akalin, a urology resident. The thesis adviser was Prof. Dr. Halil Lutfi Canat from the urology department.\u003c/p\u003e\n\u003cp\u003eThe presented study was conducted in accordance with the principles of the Declaration of Helsinki (2013 revision) and Good Clinical Practice guidelines. Ethical approval was obtained from the Scientific Research Ethics Committee (Basaksehir Cam and Sakura City Hospital, decision No: 256; date: October 16, 2024). The study was retrospective, utilizing data acquired during clinical care without any supplementary interventions or sampling. Given that the study involved minimal risk and all personally identifiable information was removed, the committee granted a waiver of informed consent.\u003c/p\u003e\n\u003ch2\u003e\u003cstrong\u003e\u003cem\u003eStudy design\u003c/em\u003e\u003c/strong\u003e\u003c/h2\u003e\n\u003cp\u003eData from 924 patients who underwent TURBT between November 2020 and June 2024 were retrospectively reviewed. Patients who underwent TURBT for recurrent tumors, those with histopathological diagnoses indicating benign lesions or metastases from other primary malignancies, and those who did not undergo preoperative MRI in accordance with the VI-RADS protocol were excluded from the study. After applying these exclusion criteria, a total of 372 patients were included in the final analysis (Figure-1).\u003c/p\u003e\n\u003cp\u003eA total of 8 clinical (patient’s sex and age; history of smoking, macroscopic, and microscopic hematuria, hypertension, coronary artery disease, antiplatelet medication use, and chemical exposure) and 14 biochemical parameters (hemoglobin, white blood cell count (WBC), platelet count, serum creatinine, activated partial thromboplastin time (aPTT), international normalized ratio (INR), serum glucose, sodium, potassium, alanine aminotransferase (ALT), aspartate aminotransferase (AST), and urinalysis findings including erythrocyte count, leukocyte count, and urine specific gravity), selected by the investigators, were collected along with each patient’s pathological findings and VI-RADS scores. Complete urinalysis (CUA) parameters were obtained from urine samples provided at the time of the patients’ initial outpatient evaluation, whereas biochemical parameters were derived from routine preoperative blood tests performed at the time the surgical decision was made.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eImaging and pathological assessments were conducted in a mutually blinded fashion. Radiologists were blinded to the histopathological outcomes, while pathologists performed their evaluations independently of the VI-RADS findings.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e\u003cem\u003ePathological Assessment\u003c/em\u003e\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe pathological evaluation was based on specimens obtained from the initial TURBT. In cases where a repeat TURBT (re-TURBT) was performed, the pathology report indicating the more advanced stage between the two procedures was considered for analysis. All pathological assessments were performed and reported by the department of pathology at our institution. \u0026nbsp;For the purposes of this study, pathological stages were categorized into two groups: \u0026lt;T2 (non-muscle-invasive) and ≥T2 (muscle-invasive) disease.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e\u003cem\u003eMultiparametric magnetic resonance imaging protocol\u003c/em\u003e\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eMagnetic resonance imaging (MRI) was performed in accordance with VI-RADS protocol using a 3.0 T scanner (Verio; Siemens, Erlangen, Germany). The multiplanar imaging capability of MRI minimized partial volume effects and optimized visualization for assessing the extent of muscular layer invasion. For contrast-enhanced T1-weighted sequences, a gadolinium-based, hydrophilic, nonionic macrocyclic contrast agent—gadobutrol (Gd-BT-DO3A; Gadovist, Bayer Schering Pharma AG, Berlin, Germany)—was administered as an intravenous bolus at a dose of 0.1 mL/kg using an MRI-compatible injector. To ensure adequate bladder distention, patients were instructed to drink 500–1000 mL of water approximately 30 minutes before imaging. All multiparametric MRI studies were interpreted by a single experienced urogenital radiologist.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e\u003cem\u003eData Processing\u003c/em\u003e\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003ePatients with more than 10% missing data were excluded from the study. For the remaining 16 patients with partially missing values, numerical variables were imputed using the median of the corresponding feature, whereas categorical variables were imputed using the mode. The dataset was randomly divided into a training set (80%) and a validation set (20%). In the training set, 214 cases were classified as \u0026lt;T2 and 82 cases as ≥T2 based on pathological staging. To address class imbalance, Synthetic Minority Over-sampling Technique (SMOTE) was applied exclusively to the training data after the train–validation split.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e\u003cem\u003eFeature Selection and Machine Learning\u003c/em\u003e\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eFeature selection was performed using a Mutual Information (MI)–based filter approach implemented in Python (version 3.13). The relationship between each independent variable and the target variable (pathological outcome) was calculated individually using the \u003cem\u003escikit-learn\u003c/em\u003e library. The resulting MI scores were ranked in descending order, and the nine most informative variables were selected for model construction.\u003c/p\u003e\n\u003cp\u003eSubsequently, multiple supervised learning algorithms were developed, including logistic regression (LogReg), decision tree (DT), random forest (RF), Naïve Bayes, support vector machine (SVM), k-nearest neighbors (KNN), Light Gradient Boosting Machine (LightGBM), and Extreme Gradient Boosting (XGBoost). These models were implemented using the \u003cem\u003eNumPy\u003c/em\u003e, \u003cem\u003ePandas\u003c/em\u003e, \u003cem\u003escikit-learn\u003c/em\u003e, \u003cem\u003eLightGBM\u003c/em\u003e, and \u003cem\u003eXGBoost\u003c/em\u003e libraries. In addition, a deep neural network (DNN) model was constructed using the \u003cem\u003eTensorFlow\u003c/em\u003e framework.\u003c/p\u003e\n\u003cp\u003eAll models were tuned using a fivefold cross-validation approach implemented through the \u003cem\u003eGridSearchCV\u003c/em\u003e function in \u003cem\u003escikit-learn\u003c/em\u003e. The model exhibiting the highest predictive power was considered optimal for subsequent analyses.\u003c/p\u003e\n\u003cp\u003eFor the DT model, tree depth was varied from 1 to 10, and both \u003cem\u003eGini\u003c/em\u003e and \u003cem\u003eentropy\u003c/em\u003e criteria were evaluated. For RF, the number of trees ranged from 300 to 900, with \u003cem\u003eGini\u003c/em\u003e as the splitting criterion. Additional parameters explored included maximum depth (None, 10, 20, 30), minimum samples for split (2, 5, 10), minimum samples per leaf (1, 2, 4), and maximum features (\u003cem\u003esqrt\u003c/em\u003e, \u003cem\u003elog2\u003c/em\u003e). For SVM, \u003cem\u003elinear\u003c/em\u003e and \u003cem\u003eradial basis function (RBF)\u003c/em\u003e kernels were assessed, with \u003cem\u003eC\u003c/em\u003e ranging from 10⁻³ to 10³ and \u003cem\u003egamma\u003c/em\u003e tuned between \u003cem\u003escale\u003c/em\u003e and \u003cem\u003eauto\u003c/em\u003e.\u003c/p\u003e\n\u003cp\u003eFor KNN, the number of neighbors was varied between 3 and 21, with weighting schemes (\u003cem\u003euniform\u003c/em\u003e, \u003cem\u003edistance\u003c/em\u003e) and distance metrics (\u003cem\u003eMinkowski\u003c/em\u003e, p = 1 or 2, corresponding to Manhattan and Euclidean distances). For XGBoost, the learning rate (\u003cem\u003eeta\u003c/em\u003e) ranged from 0.01 to 0.2, maximum depth values of 3, 5, and 7 were tested, and \u003cem\u003esubsample\u003c/em\u003e ratios of 0.8 and 1.0 were evaluated. For LightGBM, the number of leaves (15, 31, 63), maximum depth (−1, 5, 10), and subsample ratios (0.8, 1.0) were examined.\u003c/p\u003e\n\u003cp\u003eThe DNN architecture was optimized by varying the number of hidden layers (1–3), neurons per layer (32, 64, 128), dropout rates (0.0, 0.2, 0.4), learning rates (0.001, 0.0003), number of epochs (50, 100), and batch sizes (32, 64). The ReLU activation function was used for hidden layers and sigmoid for the output layer, with optimization performed using the Adam algorithm.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e\u003cem\u003eStatistical Analysis\u003c/em\u003e\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eDescriptive analyses were performed using the Python programming language (version 3.13). The Pandas, NumPy, and SciPy libraries were utilized for statistical analysis. Continuous variables were presented as mean ± standard deviation and median values, whereas categorical variables were expressed as counts and percentages. The normality of continuous variables was assessed using the Kolmogorov-Smirnov test. Continuous variables were evaluated with the independent-samples t test or the Mann–Whitney U test according to their distributional properties, whereas categorical variables were assessed using the chi-square test.\u003c/p\u003e\n\u003cp\u003eSubsequently, to establish a baseline for comparison with machine learning models, a logistic regression model was constructed using only VI-RADS scores and pathological outcomes. A receiver operating characteristic (ROC) curve was generated, and the corresponding area under the curve (AUC) was calculated. All the models were then calibrated with isotonic regression. Differences in predictive performance among models and the VI-RADS only model were evaluated using the bootstrap test and Delong’s test. The model with the best predictive performance was selected by the researchers for subsequent analysis. A calibration curve was drawn to test its calibration, slope, and intercept values, and the Brier score was calculated. The Brier scores of the best predictive model and the VI-RADS–based logistic regression model were statistically compared using a paired bootstrap test.\u003c/p\u003e\n\u003ch2\u003eFinally, the calibrated and uncalibrated versions of the model with the highest AUC were evaluated using decision curve analysis (DCA), with 95% confidence intervals presented. The maximum net benefit (NB), threshold probability–dependent net benefits, and net intervention reduction per 100 patients (NRI/100) were computed, and a Clinical Impact Curve was generated for the calibrated model. A p-value of \u0026lt;0.05 was considered statistically significant.\u003c/h2\u003e"},{"header":"Results","content":"\u003cp\u003eDifferences in demographic characteristics, clinical features, and laboratory parameters between the two groups are summarized in Tables 1 and 2. Only the patient age exhibited a normal distribution. Patient age, VI-RADS score, erythrocyte and leukocyte counts in urinalysis, history of macroscopic hematuria, as well as serum hemoglobin, creatinine, sodium, platelet, and ALT levels were found to be significantly associated with tumor stage.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eTable-1: Continous Variables\u003c/strong\u003e\u003c/p\u003e\n\u003ctable style=\"width: 100%;\" cellspacing=\"3\"\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eVariable (unit)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eNon\u0026ndash;muscle invasive group (n = 268)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eMuscle invasive group (n= 104)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003ep-value\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eAge (years)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e64.66 \u0026plusmn; 10.73\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e69.82 \u0026plusmn; 9.88\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026lt;0.001\u003csup\u003e\u0026dagger;\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eHemoglobin (g/dL)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e13.80 [2.70]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e12.80 [3.35]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026lt;0.001\u003csup\u003e\u0026Dagger;\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eWhite Blood Cell (\u0026times;10⁹/L)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e8.18 [3.16]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e8.40 [3.06]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.122\u003csup\u003e\u0026Dagger;\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003ePlatelet (\u0026times;10⁹/L)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e258.00 [104.00]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e279.00 [115.50]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.002\u003csup\u003e\u0026Dagger;\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eCreatinine (mg/dL)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.93 [0.29]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e1.12 [0.52]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026lt;0.001\u003csup\u003e\u0026Dagger;\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eActivated Partial Thromboplastin Time (s)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e29.30 [5.30]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e29.80 [4.35]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.278\u003csup\u003e\u0026Dagger;\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eInternational Normalized Ratio (INR)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e1.00 [0.09]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e1.00 [0.07]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.520\u003csup\u003e\u0026Dagger;\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eErythrocyte count (\u0026times;10⁶/\u0026micro;L)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e28.00 [260.75]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e409.00 [310.50]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026lt;0.001\u003csup\u003e\u0026Dagger;\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eUrinary Leukocyte (/HPF)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e4.00 [13.00]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e16.00 [40.00]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026lt;0.001\u003csup\u003e\u0026Dagger;\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eUrine Specific Gravity\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e1017.00 [10.25]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e1015.00 [9.00]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.058\u003csup\u003e\u0026Dagger;\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eSmoking (pack-years)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e30.00 [45.50]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e25.00 [45.00]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.168\u003csup\u003e\u0026Dagger;\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eGlucose (mg/dL)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e101.00 [38.50]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e106.00 [39.50]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.290\u003csup\u003e\u0026Dagger;\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eSodium (mmol/L)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e141.00 [3.00]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e139.00 [3.00]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.002\u003csup\u003e\u0026Dagger;\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003ePotassium (mmol/L)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e4.48 [0.53]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e4.51 [0.52]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.694\u003csup\u003e\u0026Dagger;\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eAlanine Aminotransferase (U/L)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e16.00 [9.00]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e15.00 [9.50]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.005\u003csup\u003e\u0026Dagger;\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eAspartate Aminotransferase (U/L)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e17.00 [7.00]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e16.00 [6.00]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.251\u003csup\u003e\u0026Dagger;\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003e\u003cem\u003e\u0026dagger;:\u003c/em\u003e Independent-samples \u003cem\u003et\u003c/em\u003e-test; \u003cem\u003e\u0026Dagger;:\u003c/em\u003e Mann\u0026ndash;Whitney \u003cem\u003eU\u003c/em\u003e test\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eTable-2: Categorical Variables\u003c/strong\u003e\u003c/p\u003e\n\u003ctable style=\"width: 100%;\" cellspacing=\"3\"\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eVariable\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eCategory\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eNon\u0026ndash;muscle invasive group (n = 268)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eMuscle invasive group (n= 104)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003ep-value\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd rowspan=\"6\"\u003e\n \u003cp\u003e\u003cstrong\u003eVI-RADS score\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eNo Visible Lesion\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e28 (10.4%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0 (0.0%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"6\"\u003e\n \u003cp\u003e\u0026lt;0.001\u003csup\u003e\u0026sect;\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e37 (13.8%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e1 (1.0%)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e122 (45.5%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e5 (4.9%)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e3\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e73 (27.2%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e29 (28.2%)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e4\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e5 (1.9%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e23 (22.3%)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e5\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e3 (1.1%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e45 (43.7%)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd rowspan=\"2\"\u003e\n \u003cp\u003e\u003cstrong\u003eMacroscopic hematuria\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eAbsent\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e232 (86.6%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e59 (57.3%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"2\"\u003e\n \u003cp\u003e\u0026lt;0.001\u003csup\u003e\u0026sect;\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003ePresent\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e36 (13.4%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e44 (42.7%)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd rowspan=\"2\"\u003e\n \u003cp\u003e\u003cstrong\u003eHypertension (HT)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eAbsent\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e145 (54.1%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e44 (42.7%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"2\"\u003e\n \u003cp\u003e0.0645\u003csup\u003e\u0026sect;\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003ePresent\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e123 (45.9%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e59 (57.3%)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd rowspan=\"2\"\u003e\n \u003cp\u003e\u003cstrong\u003eCoronary artery disease (CAD)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eAbsent\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e202 (75.4%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e69 (67.0%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"2\"\u003e\n \u003cp\u003e0.1339\u003csup\u003e\u0026sect;\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003ePresent\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e66 (24.6%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e34 (33.0%)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd rowspan=\"3\"\u003e\n \u003cp\u003e\u003cstrong\u003eAntiplatelet use\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eNo\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e181 (67.5%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e60 (58.3%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"3\"\u003e\n \u003cp\u003e0.1194\u003csup\u003e\u0026sect;\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eYes\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e87 (32.5%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e43 (41.7%)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eYes\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e28 (10.4%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e14 (13.6%)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd rowspan=\"2\"\u003e\n \u003cp\u003e\u003cstrong\u003eChemical exposure\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eNo\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e251 (93.7%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e97 (94.2%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"2\"\u003e\n \u003cp\u003e1.0000\u003csup\u003e\u0026sect;\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eYes\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e17 (6.3%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e6 (5.8%)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd rowspan=\"2\"\u003e\n \u003cp\u003e\u003cstrong\u003eSex\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eFemale\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e40 (14.9%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e22 (21.4%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"2\"\u003e\n \u003cp\u003e0.1828\u003csup\u003e\u0026sect;\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eMale\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e228 (85.1%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e81 (78.6%)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003e\u003cem\u003e\u0026sect;:\u003c/em\u003e Chi-square test\u003c/p\u003e\n\u003cp\u003e\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eAfter splitting the dataset into training and validation subsets, SMOTE was applied to the training data to address class imbalance. To reduce data noise and improve model performance, a MI based filter approach was then used for feature ranking. The nine highest-ranked features were selected and used for model construction and validation (Table 3).\u003c/p\u003e\n\u003cp\u003eFor baseline comparison, a logistic regression model was developed using only the VI-RADS score (Figure 2). The accuracy of this model was calculated as 85.3%. The VI-RADS-based model correctly classified non\u0026ndash;muscle-invasive cases (pathology \u0026lt; T2) with a recall of 96.3%, whereas the recall for muscle-invasive cases (pathology \u0026ge; T2) was 57.1%. The F1 score was 68.6%. Although the overall sensitivity and specificity were balanced, the model showed lower sensitivity in predicting muscle invasion. The ROC curve was plotted, yielding an AUC of 0.89, and the corresponding confusion matrix was generated (Figure 3).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eTable-3: Parameters Used in the Models and Their MI Scores\u003c/strong\u003e\u003c/p\u003e\n\u003ctable style=\"width: 100%;\" cellspacing=\"3\"\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eFeature\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eMI Score\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eVI-RADS\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.275\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eErythrocyte (urinalysis)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.090\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eLeukocyte (urinalysis)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.067\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eAge\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.048\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003ePlatelet count\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.030\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eHistory of macroscopic hematuria\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.024\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eUrine specific gravity\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.018\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eWhite blood cell count (WBC)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.017\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eAlanine aminotransferase (ALT)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.014\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003eMachine learning models were constructed and subsequently calibrated using isotonic regression. ROC curves were generated for both calibrated and uncalibrated models (Figures 4 and 5). Among the calibrated models, logistic regression achieved an accuracy of 87%, precision of 79%, F1 score of 75%, recall of 71%, and an AUC of 0.94 when trained with the top nine features. The decision tree model reached 84% accuracy, 85% precision, 65% F1 score, 52% recall, and an AUC of 0.91. The random forest model demonstrated the strongest overall performance, with 89% accuracy, 84% precision, 80% F1 score, 76.2% recall, and an AUC of 0.95. The support vector machine (SVM) achieved 83% accuracy, 67% precision, 71% F1 score, 76% recall, and an AUC of 0.95. The k-nearest neighbors (KNN) model yielded 81% accuracy, 65% precision, 68% F1 score, 71% recall, and an AUC of 0.90. The Na\u0026iuml;ve Bayes model achieved 80% accuracy, 64% precision, 65% F1 score, 67% recall, and an AUC of 0.93. The XGBoost model reached 85% accuracy, 73% precision, 74% F1 score, 76% recall, and an AUC of 0.92. The LightGBM model showed 87% accuracy, 79% precision, 75% F1 score, 71% recall, and an AUC of 0.92. The deep neural network (DNN) model achieved an accuracy of 86.7%, precision of 82.4%, F1 score of 73.7%, recall of 66.7%, specificity of 94.4%, and an AUC of 0.90 (Table 4). Calibration curves were plotted for all models, and their Brier scores were calculated.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eTable-4:\u0026nbsp;\u003c/strong\u003eCharacteristics of the Models\u003c/p\u003e\n\u003ctable cellspacing=\"3\" width=\"100%\"\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eModel\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eAUC\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eBrier Score\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eAccuracy\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003ePrecision\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eRecall\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eF1 Score\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eLogistic Regression (VI-RADS only)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.887\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.1074\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.853\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.857\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.571\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.686\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eSupport Vector Machine (SVM)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.947\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.0927\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.827\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.667\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.762\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.711\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eRandom Forest\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.946\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.0790\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.893\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.842\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.762\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.800\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eLogistic Regression\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.944\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.0832\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.867\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.789\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.714\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.750\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eNa\u0026iuml;ve Bayes\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.930\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.1034\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.800\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.636\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.667\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.651\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eXGBoost\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.918\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.1012\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.853\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.727\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.762\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.744\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eDecision Tree\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.906\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.0982\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.840\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.846\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.524\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.647\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003ek-Nearest Neighbors (KNN)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.903\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.1223\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.813\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.652\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.714\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.682\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eLightGBM\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.896\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.1062\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.867\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.789\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.714\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.750\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eDeep Neural Network (DNN)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.900\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.1152\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.867\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.824\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.667\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.737\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003eAmong the calibrated models, the highest AUC value was observed in the Support Vector Machine (SVM) model (0.947), whereas the lowest belonged to the LightGBM model (0.896). Overall, the Random Forest model demonstrated superior performance, characterized by high accuracy, strong F1 score, high specificity, and balanced sensitivity, while also achieving the lowest Brier score among all models. The remaining algorithms exhibited moderate to high ranges of accuracy and F1 scores.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThe predictive performances of all machine learning models were compared with that of the model constructed using only the VI-RADS score, employing both the Bootstrap and DeLong tests (Table 5). According to the Bootstrap analysis, the predictive performances of the Random Forest, SVM, and Logistic Regression models were statistically superior to the VI-RADS\u0026ndash;only model, with p-values of 0.005, 0.030, and 0.0345, respectively. In contrast, the DeLong test indicated that only the Random Forest model demonstrated a statistically significant improvement in predictive performance compared with the VI-RADS\u0026ndash;only model (p = 0.030).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eTable-5:\u0026nbsp;\u003c/strong\u003eComparison of the Models with the VI-RADS\u0026ndash;Only Model\u003c/p\u003e\n\u003cp\u003e\u0026nbsp;\u003c/p\u003e\n\u003ctable cellspacing=\"3\" width=\"100%\"\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eModel\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eBootstrap p-value\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eDeLong p-value\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eSupport Vector Machine\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.0300\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.075\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eRandom Forest\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.0050\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.030\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eLogistic Regression\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.0345\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.091\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eNa\u0026iuml;ve Bayes\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.1220\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.274\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eXGBoost\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.1605\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.317\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eDecision Tree\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.2175\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.444\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003ek-Nearest Neighbors\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.3445\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.702\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eLightGBM\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.3845\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.756\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eDeep Neural Network\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.1805\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.354\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003ePrecision\u0026ndash;recall (Figure 6) and calibration curves (Figure 7) were generated for the Random Forest model, which exhibited the highest predictive performance. The calibration slope and intercept were calculated as 1.16 and \u0026minus;0.039, respectively. The Brier scores for the logistic regression model based solely on the VI-RADS score and for the calibrated Random Forest model were 0.107 and 0.079, corresponding to Brier Skill Scores (BSS) of 37.6 and 54, respectively. A paired bootstrap test was conducted to compare these two models, yielding a p-value of 0.027 (95% CI: \u0026minus;0.0573 to \u0026minus;0.0031), indicating a statistically significant difference.\u003c/p\u003e\n\u003cp\u003eFor the same Random Forest model, DCA and clinical impact curves were plotted with 95% confidence intervals (Figure 8). The maximum net benefit of the model was calculated as 0.26. Based on different threshold probabilities, the corresponding net benefits and the net reduction in interventions per 100 patients (NAI/100) were calculated and summarized in Table 6.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eTable-6:\u0026nbsp;\u003c/strong\u003eNet Benefit and NAI/100 Values of the Random Forest Model\u003c/p\u003e\n\u003ctable cellspacing=\"3\"\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eThreshold Probability (pₜ)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eNet Benefit (Model)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eNet Benefit (Treat All)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eNAI / 100 Patients\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003ePositive (%)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eTrue Positive (%)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e0.05\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e263\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e239\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e42.7\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e57.3\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e28.0\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e0.10\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e234\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e195\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e33.4\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e41.3\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e25.3\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e0.20\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e212\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e90\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e46.2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e34.7\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e24.0\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e0.30\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e184\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e-46\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e50.9\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e32.0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e22.7\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e0.40\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e184\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e-178\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e56.9\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e29.3\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e22.7\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e0.50\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e174\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e-416\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e61.1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e22.7\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e20.0\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e0.60\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e134\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e-775\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e62.1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e20.0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e17.3\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003e\u003cbr\u003e\u003c/p\u003e"},{"header":"Discussion","content":"\u003cp\u003eSince its introduction by Panebianco et al. in 2018, the VI-RADS system has been increasingly utilized to predict the stage of bladder cancer (7). Initially, small-scale studies aimed at testing its accuracy and applicability were conducted; over time, these were followed by multicenter prospective studies evaluating interobserver agreement, sensitivity, and specificity of the system (8, 9). Different versions and adaptations of VI-RADS have subsequently emerged, demonstrating that the scoring system remains reliable despite variations in image quality, MRI protocols, and observer experience across centers (10). The system has now been incorporated into both European Association of Urology (EAU) and American Urological Association (AUA) guidelines (4, 11) and has become progressively integrated into routine clinical practice. One notable example of this integration is its use as an important reference point during multidisciplinary tumor board discussions of cases with suspected muscle invasion.\u003c/p\u003e\n\u003cp\u003eHowever, the increasing clinical adoption of VI-RADS has also prompted new research directions. The observer-dependent nature of radiologic scoring and the variability in image quality have highlighted the need for more objective and automated approaches. In this context, artificial intelligence and machine learning\u0026ndash;based methods hold promise for enhancing diagnostic accuracy by supporting the interpretation of VI-RADS scores. Numerous studies in the literature have investigated the integration of VI-RADS with machine learning, most of which have focused on augmenting the score with radiomic features (12, 13).\u003c/p\u003e\n\u003cp\u003eMachine learning has shown substantial progress over the past decade across multiple disciplines. Applications have been developed using diverse data types, including genomic, transcriptomic, and proteomic data, imaging data, electronic health records, and clinical laboratory data (14, 15). One of the important applications of machine learning, as in the present study, is the improvement of diagnostic accuracy. Obermeyer et al. suggested that applying machine learning to clinical laboratory data could markedly enhance both prognostic and diagnostic precision (16). Accordingly, a decision-support system based on VI-RADS scores and clinical laboratory parameters may provide a rapid and cost-effective solution for improving diagnostic accuracy.\u003c/p\u003e\n\u003cp\u003eIn our study, clinical laboratory data were integrated with machine learning algorithms to improve the preoperative prediction of bladder cancer stage. One of the primary challenges encountered was the imbalance of the dataset. An unequal distribution between classes is referred to as imbalanced data (17). Numerous studies have demonstrated that imbalanced datasets can cause classification problems during the training of machine learning algorithms (18). Similarly, in our dataset, the class imbalance was expected to adversely affect the classification performance of the models. To mitigate this issue, we applied SMOTE, one of the most widely used oversampling methods. After splitting the dataset into training and test subsets, SMOTE was applied exclusively to the training data to synthetically generate new samples for the minority class, thereby balancing the class distribution. This approach is widely recognized to improve classification performance and enhance the generalizability of machine learning models (19).\u003c/p\u003e\n\u003cp\u003eAnother challenge in this study was the presence of missing data in some patients. Patients without VI-RADS scores were excluded at the beginning of the study. Since the remaining variables were routinely collected during preoperative evaluation, missing data were limited to only 16 patients. To ensure data integrity and model reliability, patients with more than 10% missing data were excluded from the analysis. The literature reports several effective methods to address missing data, including collecting additional samples, excluding incomplete cases, imputing missing values using mean or median substitution, and applying data imputation techniques (20). In our study, missing continuous variables were imputed with the median value of the corresponding variable group, whereas missing categorical variables were replaced with the mode of their respective groups.\u003c/p\u003e\n\u003cp\u003eThe associations between biochemical parameters and bladder cancer stage and prognosis have been investigated in numerous prior studies. Although anemia has not been consistently associated with stage, it has been linked to prognosis. Additionally, studies have reported that hematologic indices incorporating WBC and platelet counts correlate with higher tumor stage. Advanced-stage bladder cancer, due to its higher tumor burden, may induce a more pronounced inflammatory response, thereby affecting WBC and platelet levels more markedly than in early-stage disease (21, 22). The presence of macroscopic hematuria has also been reported to be associated with muscle invasion (23). In advanced-stage tumors, more prominent hematuria may lead to greater reductions in hemoglobin levels compared with early-stage disease, potentially explaining the observed correlation between lower hemoglobin and higher tumor stage. Similarly, advanced-stage tumors may contribute to alterations in creatinine and electrolyte levels through multiple mechanisms.\u003c/p\u003e\n\u003cp\u003eIn our analysis, urinary parameters such as erythrocyte count, leukocyte count, and urine specific gravity\u0026mdash;used in our models\u0026mdash;were also found to be significantly associated with tumor stage. Previous studies have shown that pyuria may be related to higher stage and grade in bladder cancer (24). Furthermore, several studies have reported a positive correlation between patient age and bladder cancer stage (25). The emergence of these variables through mutual information\u0026ndash;based feature selection demonstrates that the models rely not only on complex mathematical representations but also on clinically interpretable and pathophysiologically meaningful patterns. In contrast, comorbidities, smoking history, and chemical exposure were not found to be associated with tumor stage in our dataset and were therefore excluded from model development.\u003c/p\u003e\n\u003cp\u003eUeno et al. reported, in a multicenter and multi-reader study, AUC values of 0.84 (range, 0.83\u0026ndash;0.85) for inexperienced readers and 0.88 (range, 0.82\u0026ndash;0.91) for experienced readers, with an overall mean AUC of 0.87 for all readers (8). In our study, the logistic regression model constructed using only the VI-RADS and pathology results yielded an AUC of 0.89, consistent with findings reported in the literature. The principal finding of our research, however, is that the inclusion of simple clinical and laboratory parameters alongside VI-RADS improved the discriminative ability and probability calibration for predicting muscle invasion compared with an image-only approach. While the logistic regression model based solely on VI-RADS demonstrated high accuracy, it showed limited sensitivity in identifying \u0026ge;T2 cases. In contrast, calibrated methods such as RF, SVM, and logistic regression exhibited statistically superior AUC values according to the Bootstrap test, and the RF model demonstrated a significantly lower Brier score and higher BSS compared with the VI-RADS-only model. These findings suggest that easily obtainable indicators such as hematologic, biochemical, and clinical history parameters provide additional independent information beyond imaging-based risk and may contribute to clinical benefit.\u003c/p\u003e\n\u003cp\u003eThe PROBAST+AI framework emphasizes the importance of evaluating model calibration and explicitly recommends its analysis (26). For our isotonic regression\u0026ndash;calibrated RF model, the calibration slope and intercept were calculated as 1.16 and \u0026minus;0.039, respectively. Although the slope value was slightly greater than 1, its proximity to 1 indicates a mild tendency toward underfitting; the model showed a slight optimism at lower probability ranges but remained overall acceptably calibrated. We believe the clinical relevance of these calibration results deserves particular attention. The significantly lower Brier score of the isotonic-calibrated RF model compared with the VI-RADS-only logistic regression model indicates not only superior predictive accuracy but also that the probability estimates produced by the RF model are more consistent with clinical reality. Such calibration improvement may reduce the risk of overtreatment or delayed intervention near critical therapeutic thresholds, such as decisions regarding early radical cystectomy or neoadjuvant chemotherapy.\u003c/p\u003e\n\u003cp\u003eTo further interpret the clinical utility of our model, the DCA results warrant discussion. The NB curve of the model remained above both the \u0026ldquo;treat all\u0026rdquo; and \u0026ldquo;treat none\u0026rdquo; strategies, demonstrating a clear clinical advantage. Alongside NB values calculated for different threshold probabilities, we also compared the proportions of predicted positive cases and true positives. When the threshold probability was set at 5%, corresponding to the highest calculated net benefit, treating patients with probabilities above this threshold as T2 cases would correctly identify all true positives but would result in 29.7 false positives per 100 patients. When the threshold was increased to 50%, only 2.7 false positives per 100 patients were observed, at the cost of missing 8 true positive cases. We suggest that this threshold selection should be guided by clinician experience and preference.\u003c/p\u003e\n\u003cp\u003eThe integration of machine learning models into clinical practice should be evaluated not only in terms of accuracy and calibration but also within the framework of ethical responsibility. Data were anonymized, securely stored on encrypted institutional servers, and accessible only to authorized researchers. To minimize algorithmic bias, particular care was taken during both the study design and model development phases. Methodologically, one of the main internal validity strengths of this study was the independent and blinded reporting by radiologists and pathologists, ensuring that imaging data, including VI-RADS scores, did not influence pathology results. In addition, SMOTE was applied strictly within the training dataset, and imputation, scaling, and feature selection were performed within a cross-validation pipeline to prevent data leakage and optimism bias, which are common in imbalanced clinical datasets.\u003c/p\u003e\n\u003cp\u003eThere are, however, several limitations to this study. It is a single-center, retrospective analysis. The VI-RADS score may be affected by protocols and observer variability across institutions. The use of a single reader and the lack of modeling for MRI protocol differences may limit the generalizability of our findings. Therefore, recalibration (e.g., using Platt scaling or isotonic regression) may be necessary before applying the model in external settings. Although SMOTE enhanced class separation by enriching class boundaries, it may have affected the probability distributions; therefore external validation is warranted. Another limitation is the sample size: the study included 372 patients, and the relatively modest performance of the DNN model may be attributable to insufficient sample volume.\u003c/p\u003e\n\u003cp\u003eFinally, this study demonstrates that the integration of low-cost, easily accessible laboratory parameters with VI-RADS can provide a scalable contribution to clinical decision support. Our findings suggest that enhanced preoperative stage prediction could enable more personalized decision-making in processes such as TUR-BT technique selection, early referral for neoadjuvant chemotherapy or radical cystectomy, and patient counseling. The next logical step is to evaluate the model\u0026rsquo;s portability and recalibration needs through multicenter, multi-reader, time-based external validation and to assess the feasibility and clinical impact (in terms of time, cost, and patient outcomes) of integrating a simple electronic decision-support tool (VI-RADS plus selected parameters) into clinical workflow. Since our study focused on preoperative stage prediction using routinely available parameters, non-routine laboratory assays were excluded. Future studies may explore the inclusion of less conventional variables that could improve stage prediction, as well as the use of VI-RADS and additional parameters to predict prognosis or recurrence risk. Moreover, applying radiomics-based machine learning could further enhance model performance by allowing artificial intelligence to generate a combined imaging\u0026ndash;clinical prediction with higher calibration and interpretability. Such a model could eventually be deployed as a simple online calculator or mobile application for clinical use.\u003c/p\u003e\n\u003cp\u003eIn summary, our study demonstrated that incorporating clinical and laboratory signals into VI-RADS\u0026ndash;based prediction provides significant improvements in both discrimination and calibration performance, while maintaining internal validity through a leakage-free workflow and blinded evaluation. With further external validation and decision-based analyses, this approach has the potential to evolve into a practical clinical decision-support tool for preoperative staging in bladder cancer.\u003c/p\u003e"},{"header":"Conclusion","content":"\u003cp\u003eMachine learning algorithms, particularly when integrated with multiple clinical and biochemical parameters, serve as robust tools for predicting muscle invasion in bladder tumors, demonstrating superior discriminatory and calibration performance compared with the use of the VI-RADS alone. If validated by multicenter prospective studies, VI-RADS–based hybrid machine learning models may serve as practical and scalable tools for clinical decision support. This approach could facilitate more precise preoperative risk stratification and improve multidisciplinary decision-making in bladder cancer management.\u003c/p\u003e\n\u003cp\u003e\u003cbr\u003e\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eConsent to Participate:\u003c/strong\u003e Because this study was retrospective and involved no interventions beyond routine clinical care, the Institutional Review Board granted a waiver of informed consent. All patient data were anonymized prior to analysis.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eEthics Approval:\u003c/strong\u003e This study was approved by the Scientific Research Ethics Committee of Başakşehir Çam and Sakura City Hospital (Approval No: 256; Date: October 16, 2024).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding:\u003c/strong\u003e The authors received no financial support for the research, authorship, or publication of this article.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eClinical trial number:\u003c/strong\u003e not applicable\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eGLOBOCAN 2022: Bladder Cancer 9th Most Common Worldwide. World Bladder Cancer Patient Coalition, 14 Feb. 2024. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://worldbladdercancer.org/news_events/globocan-2022-bladder-cancer-is-the-9th-most-commonly-diagnosed-worldwide/\u003c/span\u003e\u003cspan address=\"https://worldbladdercancer.org/news_events/globocan-2022-bladder-cancer-is-the-9th-most-commonly-diagnosed-worldwide/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang Y, Rumgay H, Li M, Yu H, Pan H, Ni J. The global landscape of bladder cancer incidence and mortality in 2020 and projections to 2040. J Glob Health. 2023;13:04109. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.7189/jogh.13.04109\u003c/span\u003e\u003cspan address=\"10.7189/jogh.13.04109\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. PMID: 37712386; PMCID: PMC10502766.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCumberbatch MG, Kwesi et al. Dec. Epidemiology of Bladder Cancer: A Systematic Review and Contemporary Update of Risk Factors in 2018. European Urology, vol. 74, no. 6, 2018, pp. 784\u0026ndash;95. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.eururo.2018.09.001\u003c/span\u003e\u003cspan address=\"10.1016/j.eururo.2018.09.001\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eEuropean Association of Urology. EAU Guidelines on Muscle-Invasive and Metastatic Bladder Cancer, 2025. European Association of Urology; 2025.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eEuropean Association of Urology. EAU Guidelines on Non-muscle-invasive Bladder Cancer, 2025. European Association of Urology; 2025.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLee C, Hung no et al. 6, June 2017, pp. 1193\u0026ndash;205. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.2214/ajr.16.17114\u003c/span\u003e\u003cspan address=\"10.2214/ajr.16.17114\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePanebianco V, Narumi Y, Altun E, Bochner BH, Efstathiou JA, Hafeez S, Huddart R, Kennish S, Lerner S, Montironi R, Muglia VF, Salomon G, Thomas S, Vargas HA, Witjes JA, Takeuchi M, Barentsz J, Catto JWF. Multiparametric Magnetic Resonance Imaging for Bladder Cancer: Development of VI-RADS (Vesical Imaging-Reporting And Data System). Eur Urol. 2018;74(3):294\u0026ndash;306. Epub 2018 May 10. PMID: 29755006; PMCID: PMC6690492.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eUeno Y, et al. VI-RADS: Multiinstitutional Multireader Diagnostic Accuracy and Interobserver Agreement Study. Am J Roentgenol. May 2021;216(5):1257\u0026ndash;66. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.2214/ajr.20.23604\u003c/span\u003e\u003cspan address=\"10.2214/ajr.20.23604\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJazayeri SB et al. Diagnostic Accuracy of Vesical Imaging-Reporting and Data System (VI-RADS) in Suspected Muscle Invasive Bladder Cancer: A Systematic Review and Diagnostic Meta-Analysis. Urologic Oncology: Seminars Original Investigations, 40, 2, pp. 45\u0026ndash;55, \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.urolonc.2021.11.008\u003c/span\u003e\u003cspan address=\"10.1016/j.urolonc.2021.11.008\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePecoraro M et al. Mar. Multiparametric MRI for Bladder Cancer: A Practical Approach to the Clinical Application of VI-RADS. Radiology, 314, 3, 2025, \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1148/radiol.233459\u003c/span\u003e\u003cspan address=\"10.1148/radiol.233459\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHolzbeierlein J et al. Diagnosis and Treatment of Non\u0026ndash;Muscle Invasive Bladder Cancer: AUA/SUO Guideline\u0026mdash;2024 Amendment. The Journal of Urology, 2024. American Urological Association.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZheng Z, Xu F, Gu Z, Yan Y, Xu T, Liu S, Yao X, Combining Multiparametric. Score to Preoperatively Differentiate Muscle Invasion of Bladder Cancer. Front Oncol. 2021;11:619893. PMID: 34055600; PMCID: PMC8155615. MRI Radiomics Signature With the Vesical Imaging-Reporting and Data System (VI-RADS).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang W. Integrating Radiomics with the Vesical Imaging-Reporting and Data System to Predict Muscle Invasion of Bladder Cancer. Urologic Oncology: Seminars and Original Investigations, vol. 41, no. 6, p. 294\u003cdiv class=\"ExternalRefDOI\"\u003e.e1-294.e8\u003c/div\u003e, doi:10.1016/j.urolonc.2022.10.024.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLin M, et al. Machine learning and multi-omics integration: advancing cardiovascular translational research and clinical practice. J Transl Med. 2025;23:388. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1186/s12967-025-06425-2\u003c/span\u003e\u003cspan address=\"10.1186/s12967-025-06425-2\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYou J, et al. Advancing Laboratory Medicine Practice With Machine Learning: Swift yet Exact. Ann Lab Med. 2025;45(1):22\u0026ndash;35. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.3343/alm.2024.0354\u003c/span\u003e\u003cspan address=\"10.3343/alm.2024.0354\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. Epub 2024 Nov 26. PMID: 39587856; PMCID: PMC11609717.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eObermeyer Z, Emanuel EJ. Predicting the Future-Big Data, Machine Learning, and Clinical Medicine. N Engl J Med. 2016;375:1216\u0026ndash;9.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKaur H, Pannu HS, Malhi AK. A Systematic Review on Imbalanced Data Challenges in Machine Learning: Applications and Solutions. ACM Comput Surv. 2019;52:79.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAubaidan BH, et al. A review of intelligent data analysis: Machine learning approaches for addressing class imbalance in healthcare - challenges and perspectives. Intell Data Anal. 2025;29(3):699\u0026ndash;719. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1177/1088467X241305509\u003c/span\u003e\u003cspan address=\"10.1177/1088467X241305509\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eElreedy D. A Comprehensive Analysis of Synthetic Minority Oversampling Technique (SMOTE) for Handling Class Imbalance. Information Sciences, 505, pp. 32\u0026ndash;64, \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.ins.2019.07.070\u003c/span\u003e\u003cspan address=\"10.1016/j.ins.2019.07.070\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWei R, Wang J, Su M, Jia E, Chen S, Chen T, Ni Y. Missing Value Imputation Approach for Mass Spectrometry-based Metabolomics Data. Sci Rep. 2018;8:663.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChen C et al. Preoperative Anemia as a Simple Prognostic Factor in Patients with Urinary Bladder Cancer. Med Sci Monit. 2017;23:3528\u0026ndash;3535. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.12659/msm.902855\u003c/span\u003e\u003cspan address=\"10.12659/msm.902855\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. PMID: 28723884; PMCID: PMC5531533.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi DX, et al. Prognostic value of preoperative neutrophil-to-lymphocyte ratio in histological variants of non-muscle-invasive bladder cancer. Investig Clin Urol. 2021;62(6):641\u0026ndash;9. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.4111/icu.20210278\u003c/span\u003e\u003cspan address=\"10.4111/icu.20210278\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJakus D, et al. The Impact of the Initial Clinical Presentation of Bladder Cancer on Histopathological and Morphological Tumor Characteristics. J Clin Med. 2023;12(13):4259. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.3390/jcm12134259\u003c/span\u003e\u003cspan address=\"10.3390/jcm12134259\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. PMID: 37445294; PMCID: PMC10342402.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePoletajew Set al, et al. Preoperative pyuria predicts the presence of high-grade bladder carcinoma in patients with bladder tumors. Cent Eur J Urol. 2020;73(4):423\u0026ndash;36. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.5173/ceju.2020.0289\u003c/span\u003e\u003cspan address=\"10.5173/ceju.2020.0289\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. Epub 2020 Dec 3. PMID: 33552566; PMCID: PMC7848834.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLin W, et al. Impact of Age at Diagnosis of Bladder Cancer on Survival: A Surveillance, Epidemiology, and End Results-Based Study 2004\u0026ndash;2015. Cancer Control: J Moffitt Cancer Cent. 2023;30. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1177/10732748231152322\u003c/span\u003e\u003cspan address=\"10.1177/10732748231152322\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWolff RF, et al. PROBAST: A Tool to Assess the Risk of Bias and Applicability of Prediction Model Studies. Ann Intern Med. 2019;170(1):51\u0026ndash;8.\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"bmc-urology","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"buro","sideBox":"Learn more about [BMC Urology](http://bmcurol.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/buro/default.aspx","title":"BMC Urology","twitterHandle":"BMC_series","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"Bladder Neoplasms, Vesical Imaging-Reporting and Data System (VI-RADS), Machine Learning, Magnetic Resonance Imaging, Random Forest Model, Artificial Intelligence, Muscle Invasion","lastPublishedDoi":"10.21203/rs.3.rs-9081158/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-9081158/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003ch2\u003eObjective\u003c/h2\u003e \u003cp\u003eIn bladder cancer, integrating imaging and non-imaging parameters may enhance diagnostic performance beyond the Vesical Imaging-Reporting and Data System (VI-RADS). This study aimed to develop and validate machine learning models incorporating VI-RADS scores with clinical and laboratory variables to predict muscle invasion and support individualized treatment decisions.\u003c/p\u003e\u003ch2\u003eMaterials and Methods\u003c/h2\u003e \u003cp\u003eA total of 372 patients who underwent transurethral resection of bladder tumor between 2019 and 2024 and had preoperative mpMRI performed according to the VI-RADS protocol were retrospectively evaluated. VI-RADS scores were combined with demographic data, hematological indices, biochemical markers, and urinalysis findings to construct predictive models. Machine learning algorithms\u0026mdash;including logistic regression, random forest, support vector machines, extreme gradient boosting, light gradient boosting machine, and deep neural networks\u0026mdash;were developed and optimized. Model performance was assessed using receiver operating characteristic area under the curve (AUC), sensitivity, specificity, Brier score, and decision curve analysis (DCA) and compared with VI-RADS alone.\u003c/p\u003e\u003ch2\u003eResults\u003c/h2\u003e \u003cp\u003ePathological muscle invasion (\u0026ge;\u0026thinsp;T2) was identified in 103 (27.8%) of the 372 patients. VI-RADS alone yielded an AUC of 0.89. Models supported with clinical and laboratory parameters demonstrated significant improvement, particularly random forest (AUC\u0026thinsp;=\u0026thinsp;0.95), support vector machines (AUC\u0026thinsp;=\u0026thinsp;0.95), and logistic regression (AUC\u0026thinsp;=\u0026thinsp;0.94). Calibration analysis of the isotonic regression\u0026ndash;adjusted random forest model yielded a slope of 1.16 and an intercept of \u0026minus;\u0026thinsp;0.039, indicating probability estimates closely aligned with clinical reality. In DCA, the RF model outperformed both the \u0026ldquo;treat-all\u0026rdquo; and \u0026ldquo;treat-none\u0026rdquo; strategies, demonstrating clear net clinical benefit.\u003c/p\u003e\u003ch2\u003eConclusion\u003c/h2\u003e \u003cp\u003eIntegrating VI-RADS with clinical and laboratory parameters improves discrimination and calibration in predicting muscle invasion compared with imaging alone. The random forest model, in particular, may reduce misclassification at critical decision points\u0026mdash;such as early radical cystectomy or neoadjuvant chemotherapy\u0026mdash;and provide more reliable information for patient counseling.\u003c/p\u003e","manuscriptTitle":"Hybrid Machine Learning Models Integrating VI-RADS and Clinical Metrics for Bladder Cancer Staging","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-04-03 06:22:07","doi":"10.21203/rs.3.rs-9081158/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"editorInvitedReview","content":"","date":"2026-05-06T16:39:36+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"149833767283217382163889483442490186938","date":"2026-04-30T14:18:33+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2026-03-29T15:58:15+00:00","index":"","fulltext":""},{"type":"editorInvited","content":"","date":"2026-03-17T20:13:53+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2026-03-14T05:15:15+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2026-03-14T05:14:19+00:00","index":"","fulltext":""},{"type":"submitted","content":"BMC Urology","date":"2026-03-10T08:12:33+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"bmc-urology","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"buro","sideBox":"Learn more about [BMC Urology](http://bmcurol.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/buro/default.aspx","title":"BMC Urology","twitterHandle":"BMC_series","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"fc57c908-8e3e-466f-9578-d4949714989c","owner":[],"postedDate":"April 3rd, 2026","published":true,"recentEditorialEvents":[{"type":"editorInvitedReview","content":"","date":"2026-05-06T16:39:36+00:00","index":34,"fulltext":""},{"type":"reviewerAgreed","content":"149833767283217382163889483442490186938","date":"2026-04-30T14:18:33+00:00","index":33,"fulltext":""}],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[],"tags":[],"updatedAt":"2026-04-03T06:22:07+00:00","versionOfRecord":[],"versionCreatedAt":"2026-04-03 06:22:07","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-9081158","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-9081158","identity":"rs-9081158","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.