Deep Neural Network Architectures for Brain Metastasis Segmentation: A Comparison of 3D U-Net and 3D nnU-Net on FLAIR-Weighted MR Imaging

doi:10.21203/rs.3.rs-8754688/v1

Deep Neural Network Architectures for Brain Metastasis Segmentation: A Comparison of 3D U-Net and 3D nnU-Net on FLAIR-Weighted MR Imaging

2026 · doi:10.21203/rs.3.rs-8754688/v1

preprint OA: closed

Full text JSON View at publisher

Full text 144,547 characters · extracted from preprint-html · click to expand

Deep Neural Network Architectures for Brain Metastasis Segmentation: A Comparison of 3D U-Net and 3D nnU-Net on FLAIR-Weighted MR Imaging | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Deep Neural Network Architectures for Brain Metastasis Segmentation: A Comparison of 3D U-Net and 3D nnU-Net on FLAIR-Weighted MR Imaging Mohamad Ali Kavehpour, Daryoush Shahbazi-Gahrouei, Mahnaz Etehad Tavakol, and 2 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-8754688/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted 12 You are reading this latest preprint version Abstract Background Brain metastasis is one of the most common intracranial tumors, and accurate diagnosis is crucial for effective treatment planning and overall patient survival. Manual segmentation of tumors and their sub-regions is often time-consuming and costly. In recent years, artificial intelligence has made remarkable advances in medical imaging. The study aims to compare two U-Net-based neural network architectures (3D U-Net and 3D nnU-Net) and to evaluate their performance in clinical settings using magnetic resonance imaging (MRI). Methods This study is a retrospective case-cohort. A dataset of 200 case images from the Brain Tumor Segmentation (BraTS) and 15 cases from multiple imaging centers was used for training and testing, respectively. Two neural network models (3D U-Net and 3D nnU-Net) were utilized and compared. Each case included FLAIR weighted MR imaging in Multiple center cases. Various metrics are Dice similarity coefficient (DSC), Hausdorff Distance 95 (HD95), precision, Jaccard index, and recall for tumor segmentation. Statistical analysis included descriptive statistics, Wilcoxon signed-rank testing, and agreement analysis (P < 0.05). Results The 3D U-Net showed a mean tumor DSC of 0.791 ± 0.01, while the 3D nnU-Net indicated a DSC of 0.764 ± 0.25. For 3D U-Net, HD95 was found to be 15.11 ± 1.33, while 13.74 ± 26.89 for 3D nnU-Net. The amount of 0.669 ± 0.001 and 0.607 ± 0.21 achieved a Jaccard index and 0.834 ± 0.01 and 0.7 ± 0.22 precision for 3D U-Net and 3D nnU-Net, respectively. Conclusions A visual evaluation of the segmentation outputs clearly demonstrated that 3D U-Net provided more consistent delineation of necrotic regions, whereas 3D nnU-Net showed higher robustness for segmenting edema and enhancing tumor regions. Overall, 3D nnU-Net demonstrated improved performance compared to the 3D U-Net model. However, these differences were not statistically significant. Brain cancers Deep learning Magnetic resonance imaging 3D nnU-Net 3D U-Net Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 1. Introduction Brain metastases (BMs) constitute the most common type of intracranial (IC) malignancies and, despite substantial advancements in therapeutic and supportive care strategies, are still associated with markedly poor overall survival (OS) ( 1 ). They generally arise from lung, breast tumors, and melanoma, which constitute the most frequent primary sources ( 2 ). Neurological manifestations in patients with BMs primarily arise from extensive peritumoral vasogenic edema, which can exceed the size of the lesion and disrupt neural function. Common symptoms include headache, visual disturbances, focal deficits, increased intracranial pressure, and epileptic seizures ( 3 ). Early diagnosis of BMs is clinically critical, as it enables prompt intervention and can lead to clinically meaningful improvements in patients’ quality of life ( 4 ). Radiation therapy, surgery, chemotherapy, targeted agents, immunotherapy, or a combination of these constitute the principal therapeutic modalities for BMs, with treatment selection primarily determined by the number, size, and anatomical location of the lesions ( 5 , 6 ). Tumor segmentation is a foundational step in medical image analysis, aimed at the automated or semi-automated delineation of neoplastic regions from surrounding healthy brain tissue to achieve precise quantification of tumor location, volumetric extent, morphological characteristics, and spatially explicit boundaries ( 7 ). Accurate segmentation also helps prevent unnecessary and costly interventions. Therefore, developing non-invasive methods capable of automating tumor segmentation would provide substantial clinical value for diagnostic and adaptive treatment planning. Computed tomography (CT) is often used as the initial imaging modality, particularly in emergency settings where rapid assessment of neurological symptoms is required; however, its sensitivity is limited compared with Magnetic Resonance Imaging (MRI). High-resolution contrast-enhanced MRI has been well established as the gold standard for detecting BMs ( 8 ). Nevertheless, manual segmentation remains a labor intensive, time consuming, and operator dependent process, the inherent subjectivity of which can introduce substantial errors that adversely affect treatment planning. Thresholding and k-means clustering are among the traditional automated approaches used for segmentation. Multilevel thresholding techniques have been proposed for brain metastasis segmentation, demonstrating superior performance metrics, including higher Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index Measure (SSIM), compared with conventional baseline methods. However, these approaches exhibit suboptimal performance when applied to heterogeneous tumors or lesions of small volume ( 9 ). Basin-based methods delineate boundaries by exploiting gradient information. However, they are inherently sensitive to noise and often require substantial preprocessing to ensure stable performance ( 10 ). Artificial intelligence (AI) has witnessed rapid growth within healthcare systems and is increasingly regarded as one of the most powerful tools for disease diagnosis and treatment ( 11 ). Deep learning, a prominent branch of artificial intelligence, has the capacity to learn task relevant representations directly from raw data, thereby substantially reducing the need for manual feature engineering. Its capabilities have been successfully demonstrated across a wide range of domains, including image recognition, speech processing, and natural language understanding ( 12 ). Building on these capabilities, recent studies have demonstrated that advanced deep learning models, such as hybrid U-Net architectures and CNN–LSTM frameworks, can further enhance medical image segmentation performance. By incorporating transfer learning, feature engineering, and temporal information, these approaches achieve improved DSCs even when training data are limited ( 13 ). In addition, Naser et al. ( 14 ) investigated a hybrid architecture that combines VGG16 and U-Net for glioma detection, grading, and tumor segmentation. Regarding the application of deep learning in segmentation. Their model achieved a mean DSC of 0.84 for tumor segmentation. However, an independent and multicenter dataset was not used as the final evaluation in this study. Moreover, in this study, the DSC was calculated only between a single neural network model and the manually masked. Another study, Rasool et al. ( 15 ), used the Brain Tumor Segmentation (BraTS) 2018 dataset for the segmentation of high-grade glioma tumors and the architecture of their network is based on 3D U-Net model. The DSCs for the whole tumor, tumor core, and enhancing tumor were 0.89, 0.95, and 0.90, respectively. However, no post-processing techniques were applied, and the employed 3D U-Net architecture was entirely baseline, without any architectural modification. After that, Ahsan et al. ( 16 ) conducted a comparative analysis of several neural network architectures for pixel-wise tumor segmentation. All models were trained on two publicly available benchmark datasets, namely Brain Tumor Figshare (BTF) and BraTS 2018. They concluded that the combination of YOLOv5 with a 2D U-Net achieved the highest tumor-segmentation performance on the BTF dataset and outperformed both the standard 2D U-Net and Mask R-CNN architectures. The analysis was conducted on a limited number of tumor classes, which may affect the generalizability of the results, despite the inclusion of the BraTS 2018 dataset for glioma grading. In addition, the proposed framework relies on two separately trained models for detection and segmentation, increasing computational complexity. Consequently, Tabassum et al. ( 17 ) utilized a hybrid model combining 3D nnU-Net and transfer learning for the segmentation of various brain tumors, including meningiomas, brain metastases, and gliomas. The results of this study achieved DSC of 0.86 and 0.81 for meningioma and metastasis, respectively. The study lacks validation in clinical settings, which may affect the assessment of its practical applicability. Additionally, data limitations and heterogeneity are among the concerns in this study. With the rapid advancement of medical technologies and their expanding role in clinical practice, early detection and timely management of cancer have become essential priorities. As mentioned above, MRI, a safe imaging modality, plays a valuable role in detecting and distinguishing soft tissues, masses, blood vessels, and edema. Its integration with neural network architectures has contributed to significant medical advances. This work aims to assess the performance of 3D U-Net and 3D nnU-Net architectures by combining MRI sequences with comprehensive clinical data for testing and to compare their effectiveness in the automated segmentation of BMs. Finally, by incorporating population-specific factors such as ethnic background, environmental exposures, and regional climatic conditions, this research aims to establish a locally adapted and robust predictive framework capable of improving clinical decision-making for patients with BMs. 2. Methods and Materials 2.1. Study Design and Ethical Approval This retrospective study utilized patient data from multi-MR imaging centers (Ethical codes: IR.MUI.MED.REC.1403.499 and IR.ARI.MUI.REC.1404.037). All procedures were performed in accordance with the Declaration of Helsinki. The requirement for informed consent was waived due to the retrospective nature of the study. Data confidentiality was rigorously maintained. 2.2. Datasets and Sample size Two datasets were used to develop and evaluate the proposed artificial intelligence–based segmentation framework. 2.2.1. Public training dataset: A total of 200 cases from the Brain Tumor Segmentation (BraTS) 2025 Challenge were used for model training. It comprises several disease-specific challenges, including adult glioma, intracranial meningioma, brain metastases, pediatric brain tumors, and a cross-entity generalizability task. In this study, we exclusively used data from Brain Metastases. Each case included FLAIR weighted. Four subregions are annotated for each case, encompassing edema, tumor core, necrosis, and background. These data were obtained from the Synapse platform, the official hosting environment for the BraTS Challenges ( https://www.synapse.org ). 2.2.2. Clinical evaluation dataset: The clinical dataset comprised MRI scans from 15 patients diagnosed with BMs, collected from multiple imaging centers. Each case included FLAIR modality and corresponding expert annotations. A rigorous quality control (QC) process was conducted to ensure complete modality availability, sufficient image quality, and the absence of artifacts. All MRI volumes and segmentation masks were independently reviewed by a radiologist using ITK-SNAP to verify anatomical consistency and labeling accuracy ( 18 ). 2.3. Image Preprocessing Initially, the curated BraTS dataset undergoes verification to confirm the presence of requisite MRI modality, including FLAIR weighted as well as the corresponding segmentation channels for the three tumor sub-regions, such as tumor core, necrosis, and edema. These procedures were implemented in Google Colab, where the SimpleITK, MONAI, and NiBabel libraries were executed to facilitate data processing and preprocessing workflows. All images were reoriented to a common RAS anatomical convention and resampled to an isotropic voxel size of 1×1×1 mm³. To eliminate non-informative background and reduce computational load, foreground cropping was performed using the FLAIR volume, followed by symmetric padding or cropping to standardize all modalities and segmentation masks to 240×240×155 voxels. A subject-specific brain mask was automatically extracted from the FLAIR volume using Otsu thresholding and applied for modality wise normalization. In-mask voxels were z-score normalized, whereas out of mask voxels were set to zero. The resulting mask was saved as an additional NIfTI file for each case. The segmentation label map was explicitly co-registered to the FLAIR affine to ensure strict geometric consistency across volumes ( 19 ). A three-dimensional Gaussian smoothing kernel was applied to estimate a low-frequency approximation of the underlying bias field. Within the confines of the brain mask, voxel-wise intensity standard deviations were computed for both the native MRI volume and its smoothed bias approximation. The Global Non-Uniformity (GNU), smaller values reflect higher intensity homogeneity, whereas elevated values indicate pronounced bias-related variation. For any modality in which the GNU metric exceeded the predefined instability threshold (GNU ≥ 0.15), a more structurally robust and bias-resistant mask was reconstructed, replacing the previous mask that exhibited significant intensity inhomogeneity. Figure 1 illustrates the preprocessing of the raw data before training the neural network. Figure 1 2.4. Train/test/validation For 3D U-Net baseline and optimization, the dataset was split into training, testing, and validation sets with 80% allocated for training, 10% for validation, and 10% for testing. To prevent model overfitting, this form of data splitting is a common and effective practice in machine-learning studies, particularly in deep learning ( 20 ). In addition, 3D nnU-Net neural network architecture employed a fully automated 5-fold cross-validation strategy, consistent with its self-configuring design, and eliminating the need for manual data partitioning. 2.5. Data augmentation In this study, integration of on-the-fly GAN-based augmentation into the training pipeline of deep learning architectures by employing GliGAN to generate synthetic tumors during training was performed. This augmentation requires no additional storage and improves lesion diversity and sensitivity to small or multi-lesion cases, while validation data remain unchanged ( 21 ). 2.6. GPU specification All model training experiments were conducted on a high-performance computing (HPC) system equipped with an NVIDIA Tesla V100-SXM2 GPU (32 GB memory) and 64 GB of system RAM. The CUDA and driver versions were 12.2 and 535.86.10, respectively. Model implementation was performed using Python 3.10.8, PyTorch 2.7.1, and MONAI. Data preprocessing and clinical inference were additionally performed in Google Colab Premium using an NVIDIA A100-SXM4 GPU with 80 GB of memory (CUDA 12.4, driver version 550.54.15). 2.7. 3D U-Net architecture The U-Net architecture is an encoder–decoder convolutional neural network specifically designed for medical image segmentation ( 22 ). In 3D U-Net used in this study, the input MRI volumes of size 240×240×160 are fed into the network, and with each encoder level, the spatial dimensions are reduced by a factor of two. Following each pair of convolutional layers, a ReLU activation is applied, succeeded by a 3D max pooling operation (2×2×2) with a stride of 2 in all dimensions. In the final layer, a 1×1×1 convolution is applied to reduce the number of output channels to the desired number of segmentation labels. After training, the model contained a total of 5,652,097 parameters, of which 5,649,153 were trainable, and 2,944 were non-trainable. Two separate training configurations were evaluated. 2.7.1. First training configuration: Model was trained for 300 epochs using a batch size of 2 and a learning rate of 1×10⁻⁴. A combined Dice and binary cross-entropy loss function was used for supervision network training. The Adam optimizer was employed for parameter optimization due to its high computational efficiency, stability, and reduced need for manual hyperparameter tuning ( 23 ). 2.7.2. Second training configuration (optimized) : For the second training, several hyperparameters were modified. The loss function was modified from a combination of DSC and cross-entropy to a composite DSC, Focal, and boundary loss. Dice loss function calculates the overlap between predicted and ground truth segmentation masks. To avoid division by zero, a smoothing term (smooth = 1.0) is applied. The focal loss formula is calculated by: Focal Loss=−α t ⋅(1 − p t ) γ ⋅log(p t ) which, gamma (γ) controls the focusing power (default value: 2.0), p t is softmax probabilities, and alpha (α), which adjusts the weight for each class (default value). Binary loss is defined by: Boundary Loss=∑(Probabilities×Distance Map) Which distance map measures the distance to the nearest object boundary, encouraging the model to focus on accurately predicting object boundaries. To improve the efficiency of learning from small lesions, we adopted 3D patch-based training with subvolumes (128×128×128) instead of full 240×240×155 volumes. The network capacity was increased by doubling the base number of feature channels from 16 to 32. Furthermore, the optimizer was switched from Adam to AdamW (learning rate = 1×10⁻⁴, weight decay = 1×10⁻⁴), and a learning rate scheduler (cosine annealing or Reduce LR On Plateau) was introduced to stabilize convergence. The optimized 3D U-Net architecture consists of four encoder-decoder blocks with feature map progression doubling at each layer, starting with 32 channels (BASE_CH = 32), followed by BatchNorm3d layers and ReLU activations. The final output layer uses a 1x1 convolution to produce the segmentation output for the four classes (NUM_CLASSES = 4). 2.7.3 3D nnU-Net 3D nnU-Net is a deep learning framework endowed with fully self-configuring capabilities. It automatically adapts its preprocessing, training, and post-processing pipelines to the characteristics of the input data ( 24 ). A total of 200 label maps and 800 MRI volumes (comprising the four standard modalities) are provided to the model. One of the key advantages of 3D nnU-Net is its fully automated preprocessing pipeline. The framework autonomously configures critical hyperparameters such as batch size (batch = 2), patch size ([128, 160, 112]), median image size in voxels ([134.0, 172.0, 134.0]), and Z-score intensity normalization. The architecture employs 3×3×3 convolutional kernels with feature map dimensions of (32, 64, 128, 256, 320, 320) across successive layers. Training was conducted for 720 epochs, with the learning rate decreasing from 0.0099 to 0.00318. Total training time was approximately 12 hours and 2 minutes. 2.8. Manual segmentation Clinical data is segmented into four regions, including background, necrosis, edema, and tumor, by a radiologist. A separate mask is generated for each class. All these masks are generated in a binary format (0 and 1). In the next step, the mask for each class, created by the radiologist, is compared with the corresponding class mask generated by two trained neural network architectures. To achieve better alignment of the masks, they must be compared in terms of intensity, orientation, and dimensions, then adjusted to account for any differences. 2.9. Evaluation metrics To evaluate the performance of neural network architectures and compare them, various metrics such as DSC, HD95, precision, Jaccard index, and recall are used. The DSC for the whole tumor is calculated by computing the DSC for the three regions encompassing edema, necrosis, and enhancing tumor, while the DSC for the central tumor is calculated by computing the DSC for necrosis and enhancing tumor for each case. HD95 is a metric used to measure the largest discrepancy between two sets of points, typically represented by pixel or voxel sets in images. Precision indicates the proportion of instances that are truly positive among all instances predicted as positive by the model. Also, the Jaccard index is a metric used to measure the similarity between two sets. Additionally, Recall is a segmentation evaluation parameter that indicates the model's ability to identify all positive instances ( 25 ). Statistical analysis included descriptive statistics, Wilcoxon signed-rank testing, and agreement analysis (P < 0.05). 3. Results 3.1. Quantitative Segmentation Performance In this study, the quantitative performance of two different U-Net models for tumor segmentation is evaluated using the DSC, HD95, precision, Jaccard index, and recall. The results obtained from tumor segmentation are demonstrated in Table 1 . Table 1 Validation Dice results. Model DSC WT DSC TC Mean tumor dice HD95 Precision Recall Jaccard index Epoch 3D U-Net (baseline) 0.703 ± 0.01 0.773 ± 0.01 0.791 ± 0.01 Class 1: 17.950 ± 2.84 Class 2: 15.110 ± 1.33 Class 3: 18.280 ± 1.25 Class 1: 0.723 ± 0.03 Class 2: 0.834 ± 0.01 Class 3: 0.824 ± 0.02 Class 1: 0.776 ± 0.01 Class 2: 0.770 ± 0.01 Class 3: 0.761 ± 0.01 Class 1: 0.624 ± 0.02 Class 2: 0.669 ± 0.001 Class 3: 0.680 ± 0.01 300 3D U-Net (optimize configuration) 0.738 ± 0.07 0.717 ± 0.03 0.762 ± 0.03 Class 1: 20.870 ± 2.70 Class 2: 31.550 ± 5.67 Class 3: 35.260 ± 4.93 Class 1: 0.690 ± 0.19 Class 2: 0.840 ± 0.05 Class 3: 0.813 ± 0.05 Class 1: 0.616 ± 0.01 Class 2: 0.755 ± 0.01 Class 3: 0.645 ± 0.08 Class 1: 0.478 ± 0.10 Class 2: 0.654 ± 0.05 Class 3: 0.557 ± 0.04 200 3D 3D nn U-Net 0.720 ± 0.02 0.770 ± 0.02 0.764 ± 0.25 Class 1: 19.510 ± 39.00 Class 2: 13.740 ± 26.89 Class 3: 20.310 ± 31.31 Class 1: 0.610 ± 0.32 Class 2: 0.664 ± 0.26 Class 3: 0.700 ± 0.22 Class 1: 0.637 ± 0.34 Class 2: 0.676 ± 0.27 Class 3: 0.700 ± 0.25 Class 1: 0.404 ± 0.28 Class 2: 0.532 ± 0.26 Class 3: 0.6067 ± 0.21 720 *: WT: whole tumor and TC: tumor core 3.1.1. 3D U-Net Baseline configuration : 3D U-Net model baseline achieved a DSC of 0.703 ± 0.01 for Whole Tumor (WT) segmentation, 0.773 ± 0.01 for Tumor Core (TC), and a mean tumor DSC of 0.791 ± 0.01 (Fig. 2 A, B). The best values for Dice per class (classes 1, 2, and 3) achieved 0.7682 at epoch 141, 0.8018 at epoch 179, and 0.8089 at epoch 154, respectively (Fig. 2 C). In addition, the HD95, precision, Jaccard index, and recall values for all three classes are reported in Table 1 . Figure 2 D presents box plots of the HD95 values for all three classes. The best values for the Jaccard index per class 1, 2, and 3 saved 0.6237 (at epoch 141), 0.6691 (at epoch 179), and 0.6803 (at epoch 154), respectively (Fig. 5 A). Optimized configuration : The optimized 3D U-Net trained for 200 epoch, performed with a DSC of 0.738 ± 0.07 for WT, 0.717 ± 0.03 for TC, and a mean tumor DSC of 0.762 ± 0.03 (Fig. 3 A, 3 B). The best values for DSC per class (classes 1, 2, and 3) achieved 0.7220 at epoch 179, 0.8184 at epoch 165, and 0.7505 at epoch 188, respectively (Fig. 3 C). Indeed, the HD95 values, precision, and Recall scores per class are calculated (Table 1 ). In addition, the best Jaccard index for classes 1, 2, and 3 was recorded at epoch 170, epoch 165, and epoch 200, respectively (Table 1 and Fig. 5 B). Table 1 Figure 2 Figure 3 3.1.2. 3D nnU-Net 3D nnU-Net model was trained for 720 epochs. Early stopping was performed to prevent overfitting. The segmentation performance of this architecture refers to WT and TC DSC, 0.720 ± 0.02 and 0.770 ± 0.02, respectively (Fig. 4 A). The best mean tumor DSC was derived as 0.764 ± 0.25. Consequently, the findings obtained from DSC, HD95, precision, and recall for each class are presented in Table 1 . DSC for classes 1, 2, and 3 were obtained 0.8481 (by epoch 105), 0.8720 (by epoch 672), and 0.8095 (by epoch 705), respectively (Fig. 4 C). As shown in Fig. 5 , Jaccard index Box plots were generated for 40 validation cases per class. Figure 4 Figure 5 3.2. Training and Validation Loss results 3.2.1. 3D U-Net (baseline) After training the 3D U-Net, the training and validation losses started at 1.2461 and 1.1688, respectively (Fig. 6 A). At the beginning of the stages, a sharp decrease indicates that the models improve quickly. At the end of training, the training and validation losses were 0.1116 and 0.1763, respectively. The lowest loss occurred at epochs 0.1082 and 0.1755, and at epoch 291, while signs of overfitting emerged after epoch 140. 3.2.2. 3D U-Net (with configuration) The training and validation losses are evaluated as 0.7622 and 0.6195, respectively. At the end of the training model, the training and validation losses were 0.1682 and 0.1788. The lowest train and validation were 0.1682 at epoch 200 and 0.1738 at epoch 185, with the overfitting beginning around epoch 115 (Fig. 6 B). 3.2.3. 3D nnU-Net The training and validation losses start at -0.0381 and − 0.2186, respectively. As indicated in Fig. 6 C, the minimum training loss occurs at epoch 676 with a value of -0.7904, and the minimum validation loss occurs at epoch 668 with a value of -0.6614. As for overfitting, it appears to start around epoch 668. Figure 6 3.3. Clinical Test Cohort Characteristics After training 3D U-Net and 3D nnU-Net models, the data from 15 clinical cases, obtained from various imaging centers, were used as test data for the models .74 metastatic tumors were identified across the evaluated clinical cases. The mean age of the patients is 47.93. Ten male and five female patients were included as test subjects in the study. The input data for the study mainly consists of tumors originating from the lungs and prostate. 3.4. MRI Scanners Specification Table 2 summarizes the specifications of the MRI scanners used in this study. The table presents the Field of View (FOV) in millimeters and the corresponding MRI machine brand for each device. The FOV values are provided for different MRI scanners with varying configurations. Table 2 MRI Scanners Characteristic. MRI Scanner Brand FOV (in millimiters) Slice Thickness (in millimiters) Philips 250 × 250 1 Symphony 512 × 448 5.5 Avanto 230 × 230 5 Interna 220 × 220 5 Table 2 3.5. Visual Assessment As the radiologist’s mask is drawn based on the FLAIR image, 3D U-Net and 3D nnU-Net models were trained using this modality, which served as the reference modality. It should be noted that, for this purpose, a 3D U-Net baseline, which demonstrated relatively better performance than the modified model, is compared with 3D nnU-Net, whose hyperparameters are automatically optimized at each epoch. In contrast, 3D nnU-Net model exhibited high sensitivity to domain shift. The masks manually delineated by a radiologist for each class, alongside those generated by the neural network architectures, are presented. Also, Table 4 shows masks for the same cases listed in Table 3 . The DSC for each class, as well as for 3D U-Net and 3D nnU-Net neural networks, is presented separately in Tables 5 and 6 , respectively. Table 5 Results of DSC for each case (3D U-Net). Case DSC class 1 DSC class 2 DSC class 3 1 0.8 0.6 0.75 2 0.84 0.476 0.6315 3 - 0.19 0.6 4 0.85 0.35 0.6 Table 6 Results of DSC for each case (3D nnU-Net). case DSC class 1 DSC class 2 DSC class 3 1 0.65 0.6 0.7 2 0.6842 0.7458 0.842 3 - 0.22 0.55 4 0.85 0.55 0.6 3.6. Statistical Analysis Table 7 indicates the statistical comparison of DSC for each class and mean tumor between 3D U-Net and 3D nnU-Net models. As shown in Table 6, the 3D U-Net achieved slightly higher mean DSC for Class 1, whereas 3D nnU-Net demonstrated higher DSC values for Classes 2 and 3. However, the Wilcoxon signed-rank test revealed that none of these differences were statistically significant (p > 0.05). Table 7 Statistical comparison of Dice similarity coefficients per class using the Wilcoxon signed-rank test. Dice Similarity Coefficients 3D U-Net 3D nnU-Net p-value (p < 0.05) Class 1 0.771 ± 0.15 0.71 ± 0.15 0.063 Class 2 0.571 ± 0.25 0.65 ± 0.23 0.167 Class 3 0.62 ± 0.2 0.676 ± 0.17 0.443 Mean tumor 0.652 ± 0.19 0.672 ± 0.18 0.776 Values are reported as mean ± standard deviation, and the Wilcoxon signed-rank test was used to assess statistical significance. Table 3 Table 4 Table 5 Table 6 Table 7 4. Discussion In this study, 3D U-Net and 3D nnU-Net neural network models were trained using the BraTS 2025 dataset to evaluate and compare these models for the segmentation of different regions of brain metastasis. After training, the models were assessed in terms of performance using clinical MRI data. According to Table 1 , the highest average DSC for the tumor class was observed in the baseline 3D U-Net model around the midpoint of the training process. However, 3D U-Net optimized model, after several adjustments to its code, showed a gradual and consistent increase in the DSC (Fig. 3 A) compared to the baseline 3D U-Net (Fig. 2 A). The highest DSC for this model was recorded at 0.760 during the final epoch of training. Notably, the overall DSC for the tumor class was considerably higher than the baseline model, although the DSC for the central tumor was still lower. These results indicate that the baseline 3D U-Net model performed better in capturing the masses compared to the adjusted version. According to related works ( 15 , 26 ), it has been shown that the 3D U-Net model demonstrates better performance in training and segmenting glioma tumors and their various regions compared to the present study. In general, brain metastasis tumors are often smaller and more dispersed than glioma tumors, making their detection significantly more challenging. Additionally, studies conducted for tumor segmentation have frequently employed hybrid deep learning models, transfer learning, or hybrid versions, which have yielded higher results. However, the calculation of HD95 showed that the modifications made to 3D U-Net model (Table 1 , Fig. 2 D and 3 D) not only reduced the value but actually increased it. However, by evaluating the precision metric, 3D U-Net model with modifications demonstrated better performance in detecting positive instances of edema compared to the baseline model, with the maximum precision observed particularly towards the end of the training. This indicates that the modified model has the potential to achieve significantly better results compared to the baseline model in longer training sessions. The baseline 3D U-Net model demonstrates higher precision for Classes 1 and 3, while the modified model shows superior precision for Class 2. Evaluating recall and the results, the classic 3D U-Net model outperforms the modified architecture, especially because all three classes show a more significant decrease. This indicates that the baseline model exhibits higher sensitivity across all three classes, with the highest sensitivity observed for Class 2. Reviewing the study by Beers et al. ( 27 ) a 3D U-Net was used for brain tumor segmentation with input data from the BraTS 2018 dataset, achieving DSCs of 0.732 and 0.73 for the enhancing tumor and tumor core, respectively. The model performed better on the tumor core, with a DSC of 0.773 ± 0.01, while the DSC for the whole tumor decreased slightly. Reviewing the relevant literature to this study ( 17 ), which achieved higher DSC in brain tumor segmentation, the use of a hybrid version consisting of 3D nnU-Net architecture and transfer learning has enabled better performance in class detection. However, this model had lower performance in Class 1. This is again because the cases with necrosis were fewer in number. According to the analysis conducted on the validation data, the number of lesions corresponding to Classes 1, 2, and 3 was 100, 216, and 273, respectively. Of these, the model successfully identified 75, 149, and 160 lesions for each class, respectively. The DSC calculated for all three classes using the 3D nnU-Net architecture was higher than that of both the baseline 3D U-Net model and the modified version, indicating the strong performance of this model in detecting all three classes. The DSC for both 3D U-Net architectures (Fig. 2 C and 3 C) is significantly higher for Class 2 compared to Classes 1 and 3. This is due to its larger extent and easier segmentation for the trained model. Another reason is that some brain metastasis cases did not have necrosis or had such small areas of it that detection by the model was difficult. Class 3, which lies between Class 1 and Class 2, has a smaller size compared to Class 2. Although it was expected that 3D nnU-Net model, with its self-adjusting architecture in both preprocessing and model training, would perform better, the results presented in Table 1 indicated a lower DSC for the whole tumor compared to the modified 3D U-Net model, although it outperforms the baseline 3D U-Net model. However, the Dice for the tumor core shows comparable results to the baseline 3D U-Net. Despite this, the average DSC of the tumor is lower than the baseline model. The HD95 values are better than the optimized configuration of 3D U-Net for all classes, but Classes 1 and 3 still show higher results than Class 2. These lower results are because the model used is based on a purely foundational architecture without any additional adjustments or integration with other architectures. As shown in Table 1 , the Jaccard index is reported separately for all three classes and for each neural network architecture. The Jaccard index for the baseline 3D U-Net architecture is significantly higher than that of the optimized 3D U-Net and 3D nnU-Net. In addition, for each architecture, the Jaccard index of Class 2 is higher than that of the other two classes. This metric was reported to be comparable to that of the baseline 3D U-Net architecture and the study related to the present work that employed the DeepLab V3 network by Grøvik et al. ( 28 ). However, its value for 3D nnU-Net is somewhat lower. According to the results obtained from Fig. 6 A and 6 B, the loss function baseline model showed higher performance during training, whereas the modified model showed improved performance during validation. This suggests that, with continued training, the second model has the potential to yield more favorable results. The baseline 3D U-Net enters the learning process with relatively high loss values (1.24 for training and 1.17 for validation). However, the sharp decrease in these early losses shows that the network quickly grasps initial features. By the end of training, losses converge approximately 0.1116 (training) and 0.1763 (validation), indicating a reasonable balance between learning and Generalizability. After about epoch 140, the gap between the training and validation curves widens, suggesting the model is becoming overly tuned to the training data. In other words, this architecture performs steadily and predictably, reaching a sensible compromise between accuracy and robustness despite its slower loss reduction. By computing the DSC between the radiologist-generated masks and the masks produced by the neural network models, we obtained the results presented in Tables 5 and 6 for a subset of the evaluated cases. Case 1, which contains two large tumors, for which the 3D U-Net model demonstrated superior performance across all three classes compared to the 3D nnU-Net. Similarly, in Case 2, the 3D U-Net again outperformed 3D nnU-Net; however, in this instance, the model showed particularly improved detection of the tumor region (Class 3). Case 3 contains a very large number of small tumors, for which both models demonstrated low accuracy in segmenting Classes 2 and 3. However, 3D nnU-Net achieved slightly better performance in segmenting edema, whereas the 3D U-Net showed higher accuracy in detecting the tumor region. In Case 4, the 3D U-Net demonstrated superior segmentation performance for Classes 1 and 3, whereas the 3D nnU-Net achieved higher accuracy for Class 2. These findings further support the observation that 3D nnU-Net consistently performs better in identifying peritumoral edema and enhancing tumor, while the 3D U-Net provides more accurate delineation of necrosis regions. This distinction is likely attributable to the architectural modifications applied to the 3D U-Net, which enhanced its sensitivity to necrotic components. It is important to note that, in this study, 3D nnU-Net architecture was employed in its standard self-configuring form, without any supplementary architectural refinements or enhanced preprocessing strategies. Therefore, it is reasonable to expect that applying similar modifications, such as optimized preprocessing pipelines, architectural refinements, hybrid model integration, or the use of transfer learning, could further improve the performance of both models. In particular, these strategies have the potential to substantially strengthen 3D nnU-Net’s capabilities, enhancing not only its training efficiency but also its robustness and accuracy when evaluated on heterogeneous clinical datasets. Based on quantitative evaluation metrics, visual inspection, and statistical analysis of clinical test cases, 3D nnU-Net architecture exhibited more favorable performance compared to 3D U-Net. This observation may be attributed to several factors. One explanation is that hyperparameter tuning in 3D U-Net, including preprocessing and training settings, was performed manually. In contrast, 3D nnU-Net employs a fully automated configuration strategy, which resulted in improved performance. This automation enhances not only general applicability but also leads to better accuracy across different clinical scenarios. Furthermore, the 3D U-Net architecture supports multi-modality integration; nnU-Net demonstrated competitive segmentation performance when applied to a single-modality configuration. 5. Limitations and Future Directions Despite recent progress in convolutional architectures, accurate segmentation of brain metastases remains challenging. Models such as 3D U-Net, 3D nnU-Net, and their derivatives have improved delineation of enhancing tumor, necrotic-core, and edematous regions, yet several methodological and practical limitations. First, collecting large amounts of clinical data from imaging centers was one of the challenges. Many of the datasets were either unavailable or incomplete, leading to their exclusion from the study. Another limitation is the difference in the number of slices per modality. The clinical data had a slice range between 16 and 21, whereas the collected BraTS data contained about 150 slices. Even though resizing and patching were performed to match the number of slices to those on which the model was trained, this introduces some bias. Another issue was that the neural network architectures were not able to accurately segment tumors located at the brain's edges or those in the highest slices. Finally, domain shift posed a significant challenge. The multi-center data that were collected varied in resolution, noise, intensity, scanner type, and voxel size. Additionally, the 3D nnU-Net settings caused the fusion of masks on the images to be inferior to that of the 3D U-Net. Following the training of the neural network models, clinical testing is conducted to assess their accuracy, reliability, and applicability in real-world clinical environments. The cases included in this study were obtained from multiple medical centers, each utilizing different MRI scanners from various manufacturers (e.g., Philips, Symphony, Siemens, and others), as well as exhibiting variations in slice thickness, field of view (FOV), and image resolution. These heterogeneities ensure a more rigorous evaluation of the model’s performance across diverse acquisition settings and clinical conditions. Additionally, both architectures demonstrated difficulty in accurately segmenting small lesions and tumors located at the brain periphery, highlighting an ongoing limitation of convolutional neural networks in handling extreme spatial variability. Several approaches are considered for future work. First, integrating different neural network architectures to achieve improved segmentation performance is of interest. For example, employing a VGG16-based U-Net architecture could be explored. Alternatively, a hybrid approach that combines a 2D VGG16 model with a 3D network may be investigated to provide improved volumetric representation of tumors. Second, a potential approach involves extracting tumor-related and other class specific information using radiomics. Third, employing a substantially larger dataset is recommended to enable more rigorous validation of the trained model on clinical data. Finally, applying task-specific enhancements to nnU-Net, such as customized preprocessing or transfer learning, may further improve its performance in heterogeneous clinical environments. 6. Conclusions In this study, 3D U-Net and 3D nnU-Net neural network architectures were evaluated for the segmentation of different classes of brain metastases. Both models were trained using the BraTS 2025 benchmark dataset and tested on clinical MRI data collected from multiple imaging centers, highlighting their applicability to heterogeneous real-world imaging conditions. The use of FLAIR modality plays a significant role in the detection and differentiation of these lesions, and its integration with deep learning substantially enhances segmentation performance. Based on the quantitative results, 3D U-Net architecture achieved a higher DSC than nnU-Net (0.791 ± 0.01 vs. 0.764 ± 0.25). However, observations from clinical cases indicate that 3D nnU-Net is better suited to clinical environments. Further work on segmentation accuracy improvement by integrating different network architectures or by increasing the size and diversity of the training dataset is recommended. At the end, based on both quantitative outcomes and visual assessment of clinical cases, 3D nnU-Net demonstrated improved performance compared to the 3D U-Net model. However, these differences were not statistically significant. Declarations Author Contributions: Conceptualization, D.S.-G., M.K.; Methodology, D.S.-G., M.K., M.E., S.H., and P.A.; Validation, D.S.-G., M.K., M.E., and S.H.; Investigation, M.K.; Resources, D.S.-G.; Data Curation, D.S.-G., M.K., M.E., S.H., and P.A.; Writing—Original Draft Preparation, M.K.; Writing—Review and Editing, D.S.-G., M.E.; Supervision, D.S.-G.; Project Administration, D.S.-G.; Funding Acquisition, D.S.-G. All authors have read and agreed to the published version of the manuscript. Funding: This study was funded financially (Grant Numbers: 3403806 and 1403341) by Isfahan University of Medical Sciences, Isfahan, Iran. Institutional Review Board Statement: This article contains no studies with human participants or animals performed by the authors. Ethics approval and consent to participate: This article contains no studies with human participants or animals performed by the authors. Ethical codes: IR.MUI.MED.REC.1403.499 and IR.ARI.MUI.REC.1404.037 by Isfahan University of Medical Sciences, Isfahan, Iran. Consent for publication: Not Applicable. Data Availability Statement: The data presented in this study are available on request from the corresponding author. Conflicts of Interest: The authors declare they have no conflicts of interest. Ethical Considerations and AI Use Declaration: All procedures complied with the IRB and Declaration of Helsinki standards. This manuscript was reviewed using Grok 3 solely for grammar correction and language refinement, as the authors are non-native English speakers. The intellectual content, data analysis, and conclusions are the sole responsibility of the authors. References Zuccato JA, Mamatjan Y, Nassiri F, Ajisebutu A, Liu JC, Muazzam A, et al. Prediction of brain metastasis development with DNA methylation signatures. Nat Med. 2025;31(1):116–25. Jiang K, Parker M, Materi J, Azad TD, Kamson DO, Kleinberg L, et al. Epidemiology and survival outcomes of synchronous and metachronous brain metastases: a retrospective population-based study. NeuroSurg Focus. 2023;55(2):E3. Le Rhun E, Weller M, Anders C, Larkin J, Li J, Moss NS, et al. Symptomatic melanoma brain metastases: A call for clear definitions and adoption of standardized tools. Eur J Cancer. 2024;208:114202. Mirmoeeni S, Azari Jafari A, Shah M, Salemi F, Hashemi SZ, Seifi A. The clinical, diagnostic, therapeutic, and prognostic characteristics of brain metastases in prostate cancer: a systematic review. Prostate Cancer. 2022;2022(1):5324600. Nieblas-Bedolla E, Nayyar N, Singh M, Sullivan RJ, Brastianos PK. Emerging immunotherapies in the treatment of brain metastases. Oncologist. 2021;26(3):231–41. Bodensohn R, Kaempfel A-L, Boulesteix A-L, Orzelek AM, Corradini S, Fleischmann DF, et al. Stereotactic radiosurgery versus whole-brain radiotherapy in patients with 4–10 brain metastases: A nonrandomized controlled trial. Radiother Oncol. 2023;186:109744. Rajendran S, Rajagopal SK, Thanarajan T, Shankar K, Kumar S, Alsubaie NM, et al. Automated segmentation of brain tumor MRI images using deep learning. IEEE Access. 2023;11:64758–68. Soltaninejad A, Shahbazi-Gahrouei D, Khorasani A, Hemati S. Evaluation of CNN-based deep learning models for auto-contouring in Glioblastoma radiotherapy: A review. Radiat Oncol. 2026;20:169. Sharma SR, Alshathri S, Singh B, Kaur M, Mostafa RR, El-Shafai W. Hybrid multilevel thresholding image segmentation approach for brain MRI. Diagnostics. 2023;13(5):925. Mohammed AS, Mihoub Z, editors. A Review of Image Segmentation Strategies from Classical Methods to Deep Learning. 2024 Conference of Young Researchers in Electrical and Electronic Engineering (ElCon); 2024: IEEE. Rezaei M, Rahmani E, Khouzani SJ, Rahmannia M, Ghadirzadeh E, Bashghareh P, et al. Role of artificial intelligence in the diagnosis and treatment of diseases. Kindle. 2023;3(1):1–160. Chen X, Hu X, Huang Y, Jiang H, Ji W, Jiang Y, et al. Deep learning-based software engineering: progress, challenges, and opportunities. Sci China Inform Sci. 2025;68(1):111102. Etehadtavakol M, Etehadtavakol M, Ng E. Optimizing thyroid nodule segmentation in thermal imaging with temporal sequences and advanced deep Learning backbones. Expert Syst Appl. 2025;296:129105. Naser MA, Deen MJ. Brain tumor segmentation and grading of lower-grade glioma using deep learning in MRI images. Comput Biol Med. 2020;121:103758. Rasool N, Bhat J. Multimodal Brain Tumor Segmentation using 3D-U-Net. 2023. Ahsan R, Shahzadi I, Najeeb F, Omer H. Brain tumor detection and segmentation using deep learning. Magn Reson Mater Phys Biol Med. 2025;38(1):13–22. Tabassum M, Di Ieva A, Liu S. Meta transfer learning for brain tumor segmentation using nnUNet in meningioma and metastasis cases. Sci Rep. 2025;15. Yushkevich PA, Yang G, Gerig G. ITK-SNAP: An interactive tool for semi-automatic segmentation of multi-modality biomedical images. Annu Int Conf IEEE Eng Med Biol Soc. 2016;2016:3342–5. Huang J, Yagmurlu B, Molleti P, Lee R, VanderPloeg A, Noor H, et al. Brain tumor segmentation using deep learning: high performance with minimized MRI data. Front Radiol. 2025;5:1616293. Sivakumar M, Parthasarathy S, Padmapriya T. Trade-off between training and testing ratio in machine learning for medical image processing. PeerJ Comput Sci. 2024;10:e2245. Jain I, Willems S, Latre S, De Schepper T. On-the-Fly Data Augmentation for Brain Tumor Segmentation. arXiv preprint arXiv:250924973. 2025. Agrawal P, Katal N, Hooda N. Segmentation and classification of brain tumor using 3D-UNet deep neural networks. Int J Cogn Comput Eng. 2022;3:199–210. Zhang Z, editor. Improved adam optimizer for deep neural networks. 2018 IEEE/ACM 26th international symposium on quality of service (IWQoS); 2018: Ieee. Luu HM, Park S-H, editors. Extending nn-UNet for brain tumor segmentation. International MICCAI brainlesion workshop. Springer; 2021. Rainio O, Teuho J, Klén R. Evaluation metrics and statistical tests for machine learning. Sci Rep. 2024;14(1):6086. Jiang H, Imran M, Muralidharan P, Patel A, Pensa J, Liang M, et al. MicroSegNet: A deep learning approach for prostate segmentation on micro-ultrasound images. Comput Med Imaging Graph. 2024;112:102326. Beers A, Chang K, Brown J, Sartor E, Mammen C, Gerstner E et al. Sequential 3d u-nets for biologically-informed brain tumor segmentation. arXiv preprint arXiv:170902967. 2017. Grøvik E, Yi D, Iv M, Tong E, Nilsen LB, Latysheva A, et al. Handling missing MRI sequences in deep learning segmentation of brain metastases: a multicenter study. NPJ Digit Med. 2021;4(1):33. Table 3 and 4 Table 3 and 4 are available in the Supplementary Files section. Additional Declarations No competing interests reported. Supplementary Files Table34.docx Cite Share Download PDF Status: Under Review Version 1 posted Editorial decision: Revision requested 27 Apr, 2026 Reviews received at journal 26 Apr, 2026 Reviews received at journal 25 Apr, 2026 Reviewers agreed at journal 23 Apr, 2026 Reviews received at journal 20 Apr, 2026 Reviewers agreed at journal 20 Apr, 2026 Reviewers agreed at journal 17 Apr, 2026 Reviewers agreed at journal 15 Apr, 2026 Reviewers invited by journal 15 Apr, 2026 Editor assigned by journal 09 Feb, 2026 Submission checks completed at journal 06 Feb, 2026 First submitted to journal 06 Feb, 2026 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-8754688","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":626532376,"identity":"8a9328e5-4544-4d7c-b383-b7b1b03db7c8","order_by":0,"name":"Mohamad Ali Kavehpour","email":"","orcid":"","institution":"Isfahan University of Medical Sciences","correspondingAuthor":false,"prefix":"","firstName":"Mohamad","middleName":"Ali","lastName":"Kavehpour","suffix":""},{"id":626532377,"identity":"e1486d87-3169-4188-ba69-837afe68744f","order_by":1,"name":"Daryoush Shahbazi-Gahrouei","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA5UlEQVRIiWNgGAWjYDACCQY2ZiDFww9mAxlg0QRitEg2kKqFweAARAthID+7+dnjgpo6GeNrhw/eYMw5LMPAfvgBw8M9uLUY3DlmbjzjGBuP2e20ZAvGbYd5GHjSDBgSnuHRIpFgJs3DxgPUkmMmAdbCkAP0ywE8DpuR/k2a558Ej/Hs/G8QLfxv8GthuJFjJs3bZsBjIJ3DBtEiQcAWgxs5ZdIz+xJ4JG6nGVskbkvnYZN4ZnCAgMO2SRd8q7Pnn5388MbHbdb2/PzJDx/+wOcwFJAAxGxATLSGUTAKRsEoGAXYAQD3D0akME/2AAAAAABJRU5ErkJggg==","orcid":"","institution":"Isfahan University of Medical Sciences","correspondingAuthor":true,"prefix":"","firstName":"Daryoush","middleName":"","lastName":"Shahbazi-Gahrouei","suffix":""},{"id":626532378,"identity":"38c24036-728c-4535-bdc1-273b69c2c24a","order_by":2,"name":"Mahnaz Etehad Tavakol","email":"","orcid":"","institution":"Isfahan University of Medical Sciences","correspondingAuthor":false,"prefix":"","firstName":"Mahnaz","middleName":"Etehad","lastName":"Tavakol","suffix":""},{"id":626532379,"identity":"9f76a36d-45de-44a0-aada-18496bba18f5","order_by":3,"name":"Simin Hemati","email":"","orcid":"","institution":"Isfahan Universi","correspondingAuthor":false,"prefix":"","firstName":"Simin","middleName":"","lastName":"Hemati","suffix":""},{"id":626532380,"identity":"adab667f-2cc4-4122-9cd2-584bad15a874","order_by":4,"name":"Pooya Akbari","email":"","orcid":"","institution":"Isfahan Universi","correspondingAuthor":false,"prefix":"","firstName":"Pooya","middleName":"","lastName":"Akbari","suffix":""}],"badges":[],"createdAt":"2026-02-01 08:09:56","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-8754688/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-8754688/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":107707224,"identity":"ae9484b7-6ef5-43f1-aee5-68d09fc9e562","added_by":"auto","created_at":"2026-04-24 09:19:50","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":186158,"visible":true,"origin":"","legend":"\u003cp\u003e(A) Raw MRI data before preprocessing and Preprocessed MRI data (B) Distinct segmentation classes for a representative case (from left to right: background, necrosis, edema, and enhancing lesion)\u003c/p\u003e","description":"","filename":"Figure1.png","url":"https://assets-eu.researchsquare.com/files/rs-8754688/v1/a41197f9ea8e55cf3e5a851f.png"},{"id":107622180,"identity":"6a8b20b4-f2e6-4c81-a1bb-a03db3ab54ab","added_by":"auto","created_at":"2026-04-23 09:49:35","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":315636,"visible":true,"origin":"","legend":"\u003cp\u003e(A) Mean tumor dice for 300 epochs after training 3D U-Net classic (B) Whole tumor and tumor core dice for 300 epochs after training 3D U-Net classic (C) Dice per class for training of 3D U-Net baseline (D) HD95 per class after training 300 epochs of classical 3D U-Net.\u003c/p\u003e","description":"","filename":"Figure2.png","url":"https://assets-eu.researchsquare.com/files/rs-8754688/v1/45cb96661b967a9383abbc53.png"},{"id":107622182,"identity":"d09b1561-0bd2-488a-b5dc-6f3b6a865621","added_by":"auto","created_at":"2026-04-23 09:49:35","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":302229,"visible":true,"origin":"","legend":"\u003cp\u003e(A) mean tumor dice (training on 200 epoch) (B) WT and TC dice for second configuration 3D U-Net (training on 200 (C) Dice per class trained on 200 epoch (second configuration 3D U-Net), (D) HD95 per class (trained on second configuration 3D U-Net).\u003c/p\u003e","description":"","filename":"Figure3.png","url":"https://assets-eu.researchsquare.com/files/rs-8754688/v1/1b08710078131e11d4b3464a.png"},{"id":107707482,"identity":"eb838bc2-fba8-482f-b0fa-90899d03564b","added_by":"auto","created_at":"2026-04-24 09:20:25","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":398365,"visible":true,"origin":"","legend":"\u003cp\u003e(A) WT and TC dice for 3D nnU-Net (B) Mean tumor dice training on 3D nnU-Net (C) DSC per class (D) HD95 per class for 40 validation cases.\u003c/p\u003e","description":"","filename":"Figure4.png","url":"https://assets-eu.researchsquare.com/files/rs-8754688/v1/dba9ec49bcc74ac7afaf6e96.png"},{"id":107706821,"identity":"cf36004d-1f66-4a20-b595-9b66ca198ddf","added_by":"auto","created_at":"2026-04-24 09:18:49","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":88452,"visible":true,"origin":"","legend":"\u003cp\u003eJaccard index box plot per class. (A), (B), and (C) correspond to the baseline 3D U-Net, the optimized 3D U-Net, and the 3D nnU-Net, respectively.\u003c/p\u003e","description":"","filename":"Figure5.png","url":"https://assets-eu.researchsquare.com/files/rs-8754688/v1/99c1682701a23c8b3abf1b95.png"},{"id":107706822,"identity":"9401d54e-a757-400a-aa03-5522253c0230","added_by":"auto","created_at":"2026-04-24 09:18:49","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":243138,"visible":true,"origin":"","legend":"\u003cp\u003etraining and validation loss curve (A) 3D U-Net (300 epoch), (B) 3D U-Net (200 epoch), and (C) 3D nnU-Net.\u003c/p\u003e","description":"","filename":"Figure6.png","url":"https://assets-eu.researchsquare.com/files/rs-8754688/v1/c8c8b1d934a1291bf60cc47e.png"},{"id":107709092,"identity":"961dc13a-287f-4399-89c5-5ef2be576b1a","added_by":"auto","created_at":"2026-04-24 09:34:46","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1816165,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8754688/v1/310f55cc-84a2-4266-a103-3cdec49e6566.pdf"},{"id":107622179,"identity":"c038ceb3-088e-4397-8d57-aef574d616d0","added_by":"auto","created_at":"2026-04-23 09:49:35","extension":"docx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":1378672,"visible":true,"origin":"","legend":"","description":"","filename":"Table34.docx","url":"https://assets-eu.researchsquare.com/files/rs-8754688/v1/6c45a1f29dc38d1375180502.docx"}],"financialInterests":"No competing interests reported.","formattedTitle":"Deep Neural Network Architectures for Brain Metastasis Segmentation: A Comparison of 3D U-Net and 3D nnU-Net on FLAIR-Weighted MR Imaging","fulltext":[{"header":"1. Introduction","content":"\u003cp\u003eBrain metastases (BMs) constitute the most common type of intracranial (IC) malignancies and, despite substantial advancements in therapeutic and supportive care strategies, are still associated with markedly poor overall survival (OS) (\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e). They generally arise from lung, breast tumors, and melanoma, which constitute the most frequent primary sources (\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e). Neurological manifestations in patients with BMs primarily arise from extensive peritumoral vasogenic edema, which can exceed the size of the lesion and disrupt neural function. Common symptoms include headache, visual disturbances, focal deficits, increased intracranial pressure, and epileptic seizures (\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e). Early diagnosis of BMs is clinically critical, as it enables prompt intervention and can lead to clinically meaningful improvements in patients\u0026rsquo; quality of life (\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e). Radiation therapy, surgery, chemotherapy, targeted agents, immunotherapy, or a combination of these constitute the principal therapeutic modalities for BMs, with treatment selection primarily determined by the number, size, and anatomical location of the lesions (\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e, \u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eTumor segmentation is a foundational step in medical image analysis, aimed at the automated or semi-automated delineation of neoplastic regions from surrounding healthy brain tissue to achieve precise quantification of tumor location, volumetric extent, morphological characteristics, and spatially explicit boundaries (\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e). Accurate segmentation also helps prevent unnecessary and costly interventions. Therefore, developing non-invasive methods capable of automating tumor segmentation would provide substantial clinical value for diagnostic and adaptive treatment planning. Computed tomography (CT) is often used as the initial imaging modality, particularly in emergency settings where rapid assessment of neurological symptoms is required; however, its sensitivity is limited compared with Magnetic Resonance Imaging (MRI). High-resolution contrast-enhanced MRI has been well established as the gold standard for detecting BMs (\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e). Nevertheless, manual segmentation remains a labor intensive, time consuming, and operator dependent process, the inherent subjectivity of which can introduce substantial errors that adversely affect treatment planning.\u003c/p\u003e \u003cp\u003eThresholding and k-means clustering are among the traditional automated approaches used for segmentation. Multilevel thresholding techniques have been proposed for brain metastasis segmentation, demonstrating superior performance metrics, including higher Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index Measure (SSIM), compared with conventional baseline methods. However, these approaches exhibit suboptimal performance when applied to heterogeneous tumors or lesions of small volume (\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e). Basin-based methods delineate boundaries by exploiting gradient information. However, they are inherently sensitive to noise and often require substantial preprocessing to ensure stable performance (\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eArtificial intelligence (AI) has witnessed rapid growth within healthcare systems and is increasingly regarded as one of the most powerful tools for disease diagnosis and treatment (\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e). Deep learning, a prominent branch of artificial intelligence, has the capacity to learn task relevant representations directly from raw data, thereby substantially reducing the need for manual feature engineering. Its capabilities have been successfully demonstrated across a wide range of domains, including image recognition, speech processing, and natural language understanding (\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e). Building on these capabilities, recent studies have demonstrated that advanced deep learning models, such as hybrid U-Net architectures and CNN\u0026ndash;LSTM frameworks, can further enhance medical image segmentation performance. By incorporating transfer learning, feature engineering, and temporal information, these approaches achieve improved DSCs even when training data are limited (\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e). In addition, Naser et al. (\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e) investigated a hybrid architecture that combines VGG16 and U-Net for glioma detection, grading, and tumor segmentation. Regarding the application of deep learning in segmentation. Their model achieved a mean DSC of 0.84 for tumor segmentation. However, an independent and multicenter dataset was not used as the final evaluation in this study. Moreover, in this study, the DSC was calculated only between a single neural network model and the manually masked. Another study, Rasool et al. (\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e), used the Brain Tumor Segmentation (BraTS) 2018 dataset for the segmentation of high-grade glioma tumors and the architecture of their network is based on 3D U-Net model. The DSCs for the whole tumor, tumor core, and enhancing tumor were 0.89, 0.95, and 0.90, respectively. However, no post-processing techniques were applied, and the employed 3D U-Net architecture was entirely baseline, without any architectural modification. After that, Ahsan et al. (\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e) conducted a comparative analysis of several neural network architectures for pixel-wise tumor segmentation. All models were trained on two publicly available benchmark datasets, namely Brain Tumor Figshare (BTF) and BraTS 2018. They concluded that the combination of YOLOv5 with a 2D U-Net achieved the highest tumor-segmentation performance on the BTF dataset and outperformed both the standard 2D U-Net and Mask R-CNN architectures. The analysis was conducted on a limited number of tumor classes, which may affect the generalizability of the results, despite the inclusion of the BraTS 2018 dataset for glioma grading. In addition, the proposed framework relies on two separately trained models for detection and segmentation, increasing computational complexity. Consequently, Tabassum et al. (\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e) utilized a hybrid model combining 3D nnU-Net and transfer learning for the segmentation of various brain tumors, including meningiomas, brain metastases, and gliomas. The results of this study achieved DSC of 0.86 and 0.81 for meningioma and metastasis, respectively. The study lacks validation in clinical settings, which may affect the assessment of its practical applicability. Additionally, data limitations and heterogeneity are among the concerns in this study.\u003c/p\u003e \u003cp\u003eWith the rapid advancement of medical technologies and their expanding role in clinical practice, early detection and timely management of cancer have become essential priorities. As mentioned above, MRI, a safe imaging modality, plays a valuable role in detecting and distinguishing soft tissues, masses, blood vessels, and edema. Its integration with neural network architectures has contributed to significant medical advances. This work aims to assess the performance of 3D U-Net and 3D nnU-Net architectures by combining MRI sequences with comprehensive clinical data for testing and to compare their effectiveness in the automated segmentation of BMs. Finally, by incorporating population-specific factors such as ethnic background, environmental exposures, and regional climatic conditions, this research aims to establish a locally adapted and robust predictive framework capable of improving clinical decision-making for patients with BMs.\u003c/p\u003e"},{"header":"2. Methods and Materials","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003e2.1. Study Design and Ethical Approval\u003c/h2\u003e \u003cp\u003e This retrospective study utilized patient data from multi-MR imaging centers (Ethical codes: IR.MUI.MED.REC.1403.499 and IR.ARI.MUI.REC.1404.037). All procedures were performed in accordance with the Declaration of Helsinki. The requirement for informed consent was waived due to the retrospective nature of the study. Data confidentiality was rigorously maintained.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec4\" class=\"Section2\"\u003e \u003ch2\u003e2.2. Datasets and Sample size\u003c/h2\u003e \u003cp\u003eTwo datasets were used to develop and evaluate the proposed artificial intelligence\u0026ndash;based segmentation framework.\u003c/p\u003e \u003cdiv id=\"Sec5\" class=\"Section3\"\u003e \u003ch2\u003e2.2.1. Public training dataset:\u003c/h2\u003e \u003cp\u003eA total of 200 cases from the Brain Tumor Segmentation (BraTS) 2025 Challenge were used for model training. It comprises several disease-specific challenges, including adult glioma, intracranial meningioma, brain metastases, pediatric brain tumors, and a cross-entity generalizability task. In this study, we exclusively used data from Brain Metastases. Each case included FLAIR weighted. Four subregions are annotated for each case, encompassing edema, tumor core, necrosis, and background. These data were obtained from the Synapse platform, the official hosting environment for the BraTS Challenges (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.synapse.org\u003c/span\u003e\u003cspan address=\"https://www.synapse.org\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e).\u003c/span\u003e\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec6\" class=\"Section3\"\u003e \u003ch2\u003e2.2.2. Clinical evaluation dataset:\u003c/h2\u003e \u003cp\u003eThe clinical dataset comprised MRI scans from 15 patients diagnosed with BMs, collected from multiple imaging centers. Each case included FLAIR modality and corresponding expert annotations. A rigorous quality control (QC) process was conducted to ensure complete modality availability, sufficient image quality, and the absence of artifacts. All MRI volumes and segmentation masks were independently reviewed by a radiologist using ITK-SNAP to verify anatomical consistency and labeling accuracy (\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e).\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv id=\"Sec7\" class=\"Section2\"\u003e \u003ch2\u003e2.3. Image Preprocessing\u003c/h2\u003e \u003cp\u003eInitially, the curated BraTS dataset undergoes verification to confirm the presence of requisite MRI modality, including FLAIR weighted as well as the corresponding segmentation channels for the three tumor sub-regions, such as tumor core, necrosis, and edema. These procedures were implemented in Google Colab, where the SimpleITK, MONAI, and NiBabel libraries were executed to facilitate data processing and preprocessing workflows.\u003c/p\u003e \u003cp\u003eAll images were reoriented to a common RAS anatomical convention and resampled to an isotropic voxel size of 1\u0026times;1\u0026times;1 mm\u0026sup3;. To eliminate non-informative background and reduce computational load, foreground cropping was performed using the FLAIR volume, followed by symmetric padding or cropping to standardize all modalities and segmentation masks to 240\u0026times;240\u0026times;155 voxels.\u003c/p\u003e \u003cp\u003eA subject-specific brain mask was automatically extracted from the FLAIR volume using Otsu thresholding and applied for modality wise normalization. In-mask voxels were z-score normalized, whereas out of mask voxels were set to zero. The resulting mask was saved as an additional NIfTI file for each case. The segmentation label map was explicitly co-registered to the FLAIR affine to ensure strict geometric consistency across volumes (\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eA three-dimensional Gaussian smoothing kernel was applied to estimate a low-frequency approximation of the underlying bias field. Within the confines of the brain mask, voxel-wise intensity standard deviations were computed for both the native MRI volume and its smoothed bias approximation. The Global Non-Uniformity (GNU), smaller values reflect higher intensity homogeneity, whereas elevated values indicate pronounced bias-related variation. For any modality in which the GNU metric exceeded the predefined instability threshold (GNU\u0026thinsp;\u0026ge;\u0026thinsp;0.15), a more structurally robust and bias-resistant mask was reconstructed, replacing the previous mask that exhibited significant intensity inhomogeneity. Figure\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e illustrates the preprocessing of the raw data before training the neural network.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eFigure \u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003e2.4. Train/test/validation\u003c/h2\u003e \u003cp\u003eFor 3D U-Net baseline and optimization, the dataset was split into training, testing, and validation sets with 80% allocated for training, 10% for validation, and 10% for testing. To prevent model overfitting, this form of data splitting is a common and effective practice in machine-learning studies, particularly in deep learning (\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eIn addition, 3D nnU-Net neural network architecture employed a fully automated 5-fold cross-validation strategy, consistent with its self-configuring design, and eliminating the need for manual data partitioning.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec9\" class=\"Section2\"\u003e \u003ch2\u003e2.5. Data augmentation\u003c/h2\u003e \u003cp\u003eIn this study, integration of on-the-fly GAN-based augmentation into the training pipeline of deep learning architectures by employing GliGAN to generate synthetic tumors during training was performed. This augmentation requires no additional storage and improves lesion diversity and sensitivity to small or multi-lesion cases, while validation data remain unchanged (\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec10\" class=\"Section2\"\u003e \u003ch2\u003e2.6. GPU specification\u003c/h2\u003e \u003cp\u003eAll model training experiments were conducted on a high-performance computing (HPC) system equipped with an NVIDIA Tesla V100-SXM2 GPU (32 GB memory) and 64 GB of system RAM. The CUDA and driver versions were 12.2 and 535.86.10, respectively. Model implementation was performed using Python 3.10.8, PyTorch 2.7.1, and MONAI.\u003c/p\u003e \u003cp\u003eData preprocessing and clinical inference were additionally performed in Google Colab Premium using an NVIDIA A100-SXM4 GPU with 80 GB of memory (CUDA 12.4, driver version 550.54.15).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec11\" class=\"Section2\"\u003e \u003ch2\u003e2.7. 3D U-Net architecture\u003c/h2\u003e \u003cp\u003eThe U-Net architecture is an encoder\u0026ndash;decoder convolutional neural network specifically designed for medical image segmentation (\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e). In 3D U-Net used in this study, the input MRI volumes of size 240\u0026times;240\u0026times;160 are fed into the network, and with each encoder level, the spatial dimensions are reduced by a factor of two. Following each pair of convolutional layers, a ReLU activation is applied, succeeded by a 3D max pooling operation (2\u0026times;2\u0026times;2) with a stride of 2 in all dimensions. In the final layer, a 1\u0026times;1\u0026times;1 convolution is applied to reduce the number of output channels to the desired number of segmentation labels. After training, the model contained a total of 5,652,097 parameters, of which 5,649,153 were trainable, and 2,944 were non-trainable. Two separate training configurations were evaluated.\u003c/p\u003e \u003cdiv id=\"Sec12\" class=\"Section3\"\u003e \u003ch2\u003e2.7.1. First training configuration:\u003c/h2\u003e \u003cp\u003eModel was trained for 300 epochs using a batch size of 2 and a learning rate of 1\u0026times;10⁻⁴. A combined Dice and binary cross-entropy loss function was used for supervision network training. The Adam optimizer was employed for parameter optimization due to its high computational efficiency, stability, and reduced need for manual hyperparameter tuning (\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec13\" class=\"Section3\"\u003e \u003ch2\u003e\u003cb\u003e2.7.2. Second training configuration (optimized)\u003c/b\u003e:\u003c/h2\u003e \u003cp\u003eFor the second training, several hyperparameters were modified. The loss function was modified from a combination of DSC and cross-entropy to a composite DSC, Focal, and boundary loss. Dice loss function calculates the overlap between predicted and ground truth segmentation masks. To avoid division by zero, a smoothing term (smooth\u0026thinsp;=\u0026thinsp;1.0) is applied. The focal loss formula is calculated by:\u003c/p\u003e \u003cp\u003eFocal Loss=\u0026minus;α\u003csub\u003et\u003c/sub\u003e\u0026sdot;(1\u0026thinsp;\u0026minus;\u0026thinsp;p\u003csub\u003et\u003c/sub\u003e)\u003csup\u003eγ\u003c/sup\u003e\u0026sdot;log(p\u003csub\u003et\u003c/sub\u003e)\u003c/p\u003e \u003cp\u003ewhich, gamma (γ) controls the focusing power (default value: 2.0), p\u003csub\u003et\u003c/sub\u003e is softmax probabilities, and alpha (α), which adjusts the weight for each class (default value). Binary loss is defined by:\u003c/p\u003e \u003cp\u003eBoundary Loss=\u0026sum;(Probabilities\u0026times;Distance Map)\u003c/p\u003e \u003cp\u003eWhich distance map measures the distance to the nearest object boundary, encouraging the model to focus on accurately predicting object boundaries.\u003c/p\u003e \u003cp\u003eTo improve the efficiency of learning from small lesions, we adopted 3D patch-based training with subvolumes (128\u0026times;128\u0026times;128) instead of full 240\u0026times;240\u0026times;155 volumes. The network capacity was increased by doubling the base number of feature channels from 16 to 32. Furthermore, the optimizer was switched from Adam to AdamW (learning rate\u0026thinsp;=\u0026thinsp;1\u0026times;10⁻⁴, weight decay\u0026thinsp;=\u0026thinsp;1\u0026times;10⁻⁴), and a learning rate scheduler (cosine annealing or Reduce LR On Plateau) was introduced to stabilize convergence.\u003c/p\u003e \u003cp\u003eThe optimized 3D U-Net architecture consists of four encoder-decoder blocks with feature map progression doubling at each layer, starting with 32 channels (BASE_CH\u0026thinsp;=\u0026thinsp;32), followed by BatchNorm3d layers and ReLU activations. The final output layer uses a 1x1 convolution to produce the segmentation output for the four classes (NUM_CLASSES\u0026thinsp;=\u0026thinsp;4).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec14\" class=\"Section3\"\u003e \u003ch2\u003e2.7.3 3D nnU-Net\u003c/h2\u003e \u003cp\u003e3D nnU-Net is a deep learning framework endowed with fully self-configuring capabilities. It automatically adapts its preprocessing, training, and post-processing pipelines to the characteristics of the input data (\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e). A total of 200 label maps and 800 MRI volumes (comprising the four standard modalities) are provided to the model. One of the key advantages of 3D nnU-Net is its fully automated preprocessing pipeline. The framework autonomously configures critical hyperparameters such as batch size (batch\u0026thinsp;=\u0026thinsp;2), patch size ([128, 160, 112]), median image size in voxels ([134.0, 172.0, 134.0]), and Z-score intensity normalization. The architecture employs 3\u0026times;3\u0026times;3 convolutional kernels with feature map dimensions of (32, 64, 128, 256, 320, 320) across successive layers. Training was conducted for 720 epochs, with the learning rate decreasing from 0.0099 to 0.00318. Total training time was approximately 12 hours and 2 minutes.\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv id=\"Sec15\" class=\"Section2\"\u003e \u003ch2\u003e2.8. Manual segmentation\u003c/h2\u003e \u003cp\u003eClinical data is segmented into four regions, including background, necrosis, edema, and tumor, by a radiologist. A separate mask is generated for each class. All these masks are generated in a binary format (0 and 1). In the next step, the mask for each class, created by the radiologist, is compared with the corresponding class mask generated by two trained neural network architectures. To achieve better alignment of the masks, they must be compared in terms of intensity, orientation, and dimensions, then adjusted to account for any differences.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec16\" class=\"Section2\"\u003e \u003ch2\u003e2.9. Evaluation metrics\u003c/h2\u003e \u003cp\u003eTo evaluate the performance of neural network architectures and compare them, various metrics such as DSC, HD95, precision, Jaccard index, and recall are used. The DSC for the whole tumor is calculated by computing the DSC for the three regions encompassing edema, necrosis, and enhancing tumor, while the DSC for the central tumor is calculated by computing the DSC for necrosis and enhancing tumor for each case. HD95 is a metric used to measure the largest discrepancy between two sets of points, typically represented by pixel or voxel sets in images. Precision indicates the proportion of instances that are truly positive among all instances predicted as positive by the model. Also, the Jaccard index is a metric used to measure the similarity between two sets. Additionally, Recall is a segmentation evaluation parameter that indicates the model's ability to identify all positive instances (\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eStatistical analysis included descriptive statistics, Wilcoxon signed-rank testing, and agreement analysis (P\u0026thinsp;\u0026lt;\u0026thinsp;0.05).\u003c/p\u003e \u003c/div\u003e"},{"header":"3. Results","content":"\u003cdiv id=\"Sec18\" class=\"Section2\"\u003e \u003ch2\u003e3.1. Quantitative Segmentation Performance\u003c/h2\u003e \u003cp\u003eIn this study, the quantitative performance of two different U-Net models for tumor segmentation is evaluated using the DSC, HD95, precision, Jaccard index, and recall. The results obtained from tumor segmentation are demonstrated in Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eValidation Dice results.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"9\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c8\" colnum=\"8\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c9\" colnum=\"9\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eModel\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eDSC WT\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eDSC\u003c/p\u003e \u003cp\u003eTC\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eMean tumor dice\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eHD95\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003ePrecision\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003eRecall\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c8\"\u003e \u003cp\u003eJaccard\u003c/p\u003e \u003cp\u003eindex\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c9\"\u003e \u003cp\u003eEpoch\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e3D U-Net\u003c/p\u003e \u003cp\u003e\u003cb\u003e(baseline)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.703\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.773\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.791\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eClass 1: 17.950\u0026thinsp;\u0026plusmn;\u0026thinsp;2.84\u003c/p\u003e \u003cp\u003eClass 2:\u003c/p\u003e \u003cp\u003e15.110\u0026thinsp;\u0026plusmn;\u0026thinsp;1.33\u003c/p\u003e \u003cp\u003eClass 3: 18.280\u0026thinsp;\u0026plusmn;\u0026thinsp;1.25\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eClass 1:\u003c/p\u003e \u003cp\u003e0.723\u0026thinsp;\u0026plusmn;\u0026thinsp;0.03\u003c/p\u003e \u003cp\u003eClass 2:\u003c/p\u003e \u003cp\u003e0.834\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01\u003c/p\u003e \u003cp\u003eClass 3:\u003c/p\u003e \u003cp\u003e0.824\u0026thinsp;\u0026plusmn;\u0026thinsp;0.02\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eClass 1:\u003c/p\u003e \u003cp\u003e0.776\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01\u003c/p\u003e \u003cp\u003eClass 2:\u003c/p\u003e \u003cp\u003e0.770\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01\u003c/p\u003e \u003cp\u003eClass 3:\u003c/p\u003e \u003cp\u003e0.761\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003eClass 1:\u003c/p\u003e \u003cp\u003e0.624\u0026thinsp;\u0026plusmn;\u0026thinsp;0.02\u003c/p\u003e \u003cp\u003eClass 2:\u003c/p\u003e \u003cp\u003e0.669\u0026thinsp;\u0026plusmn;\u0026thinsp;0.001\u003c/p\u003e \u003cp\u003eClass 3:\u003c/p\u003e \u003cp\u003e0.680\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e300\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e3D U-Net\u003c/p\u003e \u003cp\u003e\u003cb\u003e(optimize configuration)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.738\u0026thinsp;\u0026plusmn;\u0026thinsp;0.07\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.717\u0026thinsp;\u0026plusmn;\u0026thinsp;0.03\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.762\u0026thinsp;\u0026plusmn;\u0026thinsp;0.03\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eClass 1:\u003c/p\u003e \u003cp\u003e20.870\u0026thinsp;\u0026plusmn;\u0026thinsp;2.70\u003c/p\u003e \u003cp\u003eClass 2:\u003c/p\u003e \u003cp\u003e31.550\u0026thinsp;\u0026plusmn;\u0026thinsp;5.67\u003c/p\u003e \u003cp\u003eClass 3:\u003c/p\u003e \u003cp\u003e35.260\u0026thinsp;\u0026plusmn;\u0026thinsp;4.93\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eClass 1:\u003c/p\u003e \u003cp\u003e0.690\u0026thinsp;\u0026plusmn;\u0026thinsp;0.19\u003c/p\u003e \u003cp\u003eClass 2:\u003c/p\u003e \u003cp\u003e0.840\u0026thinsp;\u0026plusmn;\u0026thinsp;0.05\u003c/p\u003e \u003cp\u003eClass 3:\u003c/p\u003e \u003cp\u003e0.813\u0026thinsp;\u0026plusmn;\u0026thinsp;0.05\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eClass 1:\u003c/p\u003e \u003cp\u003e0.616\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01\u003c/p\u003e \u003cp\u003eClass 2:\u003c/p\u003e \u003cp\u003e0.755\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01\u003c/p\u003e \u003cp\u003eClass 3:\u003c/p\u003e \u003cp\u003e0.645\u0026thinsp;\u0026plusmn;\u0026thinsp;0.08\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003eClass 1:\u003c/p\u003e \u003cp\u003e0.478\u0026thinsp;\u0026plusmn;\u0026thinsp;0.10\u003c/p\u003e \u003cp\u003eClass 2:\u003c/p\u003e \u003cp\u003e0.654\u0026thinsp;\u0026plusmn;\u0026thinsp;0.05\u003c/p\u003e \u003cp\u003eClass 3:\u003c/p\u003e \u003cp\u003e0.557\u0026thinsp;\u0026plusmn;\u0026thinsp;0.04\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e200\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e3D 3D nn U-Net\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.720\u0026thinsp;\u0026plusmn;\u0026thinsp;0.02\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.770\u0026thinsp;\u0026plusmn;\u0026thinsp;0.02\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.764\u0026thinsp;\u0026plusmn;\u0026thinsp;0.25\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eClass 1:\u003c/p\u003e \u003cp\u003e19.510\u0026thinsp;\u0026plusmn;\u0026thinsp;39.00\u003c/p\u003e \u003cp\u003eClass 2:\u003c/p\u003e \u003cp\u003e13.740\u0026thinsp;\u0026plusmn;\u0026thinsp;26.89\u003c/p\u003e \u003cp\u003eClass 3:\u003c/p\u003e \u003cp\u003e20.310\u0026thinsp;\u0026plusmn;\u0026thinsp;31.31\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eClass 1:\u003c/p\u003e \u003cp\u003e0.610\u0026thinsp;\u0026plusmn;\u0026thinsp;0.32\u003c/p\u003e \u003cp\u003eClass 2:\u003c/p\u003e \u003cp\u003e0.664\u0026thinsp;\u0026plusmn;\u0026thinsp;0.26\u003c/p\u003e \u003cp\u003eClass 3:\u003c/p\u003e \u003cp\u003e0.700\u0026thinsp;\u0026plusmn;\u0026thinsp;0.22\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eClass 1:\u003c/p\u003e \u003cp\u003e0.637\u0026thinsp;\u0026plusmn;\u0026thinsp;0.34\u003c/p\u003e \u003cp\u003eClass 2:\u003c/p\u003e \u003cp\u003e0.676\u0026thinsp;\u0026plusmn;\u0026thinsp;0.27\u003c/p\u003e \u003cp\u003eClass 3:\u003c/p\u003e \u003cp\u003e0.700\u0026thinsp;\u0026plusmn;\u0026thinsp;0.25\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003eClass 1:\u003c/p\u003e \u003cp\u003e0.404\u0026thinsp;\u0026plusmn;\u0026thinsp;0.28\u003c/p\u003e \u003cp\u003eClass 2:\u003c/p\u003e \u003cp\u003e0.532\u0026thinsp;\u0026plusmn;\u0026thinsp;0.26\u003c/p\u003e \u003cp\u003eClass 3:\u003c/p\u003e \u003cp\u003e0.6067\u0026thinsp;\u0026plusmn;\u0026thinsp;0.21\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e720\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003ctfoot\u003e \u003ctr\u003e\u003ctd colspan=\"9\"\u003e*: WT: whole tumor and TC: tumor core\u003c/td\u003e\u003c/tr\u003e \u003c/tfoot\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cdiv id=\"Sec19\" class=\"Section3\"\u003e \u003ch2\u003e3.1.1. 3D U-Net\u003c/h2\u003e \u003cp\u003e \u003cb\u003eBaseline configuration\u003c/b\u003e:\u003c/p\u003e \u003cp\u003e3D U-Net model baseline achieved a DSC of 0.703\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01 for Whole Tumor (WT) segmentation, 0.773\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01 for Tumor Core (TC), and a mean tumor DSC of 0.791\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01 (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eA, B). The best values for Dice per class (classes 1, 2, and 3) achieved 0.7682 at epoch 141, 0.8018 at epoch 179, and 0.8089 at epoch 154, respectively (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eC). In addition, the HD95, precision, Jaccard index, and recall values for all three classes are reported in Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e. Figure\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eD presents box plots of the HD95 values for all three classes. The best values for the Jaccard index per class 1, 2, and 3 saved 0.6237 (at epoch 141), 0.6691 (at epoch 179), and 0.6803 (at epoch 154), respectively (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e5\u003c/span\u003eA).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003eOptimized configuration\u003c/b\u003e:\u003c/p\u003e \u003cp\u003eThe optimized 3D U-Net trained for 200 epoch, performed with a DSC of 0.738\u0026thinsp;\u0026plusmn;\u0026thinsp;0.07 for WT, 0.717\u0026thinsp;\u0026plusmn;\u0026thinsp;0.03 for TC, and a mean tumor DSC of 0.762\u0026thinsp;\u0026plusmn;\u0026thinsp;0.03 (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e3\u003c/span\u003eA, \u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e3\u003c/span\u003eB). The best values for DSC per class (classes 1, 2, and 3) achieved 0.7220 at epoch 179, 0.8184 at epoch 165, and 0.7505 at epoch 188, respectively (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e3\u003c/span\u003eC). Indeed, the HD95 values, precision, and Recall scores per class are calculated (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). In addition, the best Jaccard index for classes 1, 2, and 3 was recorded at epoch 170, epoch 165, and epoch 200, respectively (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e and Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e5\u003c/span\u003eB).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eTable\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e\u003c/p\u003e \u003cp\u003eFigure \u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e\u003c/p\u003e \u003cp\u003eFigure \u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e3\u003c/span\u003e\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec20\" class=\"Section3\"\u003e \u003ch2\u003e3.1.2. 3D nnU-Net\u003c/h2\u003e \u003cp\u003e3D nnU-Net model was trained for 720 epochs. Early stopping was performed to prevent overfitting. The segmentation performance of this architecture refers to WT and TC DSC, 0.720\u0026thinsp;\u0026plusmn;\u0026thinsp;0.02 and 0.770\u0026thinsp;\u0026plusmn;\u0026thinsp;0.02, respectively (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e4\u003c/span\u003eA). The best mean tumor DSC was derived as 0.764\u0026thinsp;\u0026plusmn;\u0026thinsp;0.25. Consequently, the findings obtained from DSC, HD95, precision, and recall for each class are presented in Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e. DSC for classes 1, 2, and 3 were obtained 0.8481 (by epoch 105), 0.8720 (by epoch 672), and 0.8095 (by epoch 705), respectively (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e4\u003c/span\u003eC). As shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e5\u003c/span\u003e, Jaccard index Box plots were generated for 40 validation cases per class.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eFigure \u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e4\u003c/span\u003e\u003c/p\u003e \u003cp\u003eFigure \u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e5\u003c/span\u003e\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv id=\"Sec21\" class=\"Section2\"\u003e \u003ch2\u003e3.2. Training and Validation Loss results\u003c/h2\u003e \u003cdiv id=\"Sec22\" class=\"Section3\"\u003e \u003ch2\u003e3.2.1. 3D U-Net (baseline)\u003c/h2\u003e \u003cp\u003eAfter training the 3D U-Net, the training and validation losses started at 1.2461 and 1.1688, respectively (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003eA). At the beginning of the stages, a sharp decrease indicates that the models improve quickly. At the end of training, the training and validation losses were 0.1116 and 0.1763, respectively. The lowest loss occurred at epochs 0.1082 and 0.1755, and at epoch 291, while signs of overfitting emerged after epoch 140.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec23\" class=\"Section3\"\u003e \u003ch2\u003e3.2.2. 3D U-Net (with configuration)\u003c/h2\u003e \u003cp\u003eThe training and validation losses are evaluated as 0.7622 and 0.6195, respectively. At the end of the training model, the training and validation losses were 0.1682 and 0.1788. The lowest train and validation were 0.1682 at epoch 200 and 0.1738 at epoch 185, with the overfitting beginning around epoch 115 (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003eB).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec24\" class=\"Section3\"\u003e \u003ch2\u003e3.2.3. 3D nnU-Net\u003c/h2\u003e \u003cp\u003eThe training and validation losses start at -0.0381 and \u0026minus;\u0026thinsp;0.2186, respectively. As indicated in Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003eC, the minimum training loss occurs at epoch 676 with a value of -0.7904, and the minimum validation loss occurs at epoch 668 with a value of -0.6614. As for overfitting, it appears to start around epoch 668.\u003c/p\u003e \u003cp\u003eFigure \u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003e\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv id=\"Sec25\" class=\"Section2\"\u003e \u003ch2\u003e3.3. Clinical Test Cohort Characteristics\u003c/h2\u003e \u003cp\u003eAfter training 3D U-Net and 3D nnU-Net models, the data from 15 clinical cases, obtained from various imaging centers, were used as test data for the models .74 metastatic tumors were identified across the evaluated clinical cases. The mean age of the patients is 47.93. Ten male and five female patients were included as test subjects in the study. The input data for the study mainly consists of tumors originating from the lungs and prostate.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec26\" class=\"Section2\"\u003e \u003ch2\u003e3.4. MRI Scanners Specification\u003c/h2\u003e \u003cp\u003eTable\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e summarizes the specifications of the MRI scanners used in this study. The table presents the Field of View (FOV) in millimeters and the corresponding MRI machine brand for each device. The FOV values are provided for different MRI scanners with varying configurations.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eMRI Scanners Characteristic.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"3\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\"\u0026times;\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMRI Scanner Brand\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eFOV (in millimiters)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eSlice Thickness (in millimiters)\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ePhilips\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026times;\" colname=\"c2\"\u003e \u003cp\u003e250 \u0026times; 250\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSymphony\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026times;\" colname=\"c2\"\u003e \u003cp\u003e512 \u0026times; 448\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e5.5\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAvanto\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026times;\" colname=\"c2\"\u003e \u003cp\u003e230 \u0026times; 230\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e5\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eInterna\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026times;\" colname=\"c2\"\u003e \u003cp\u003e220 \u0026times; 220\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e5\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eTable\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec27\" class=\"Section2\"\u003e \u003ch2\u003e3.5. Visual Assessment\u003c/h2\u003e \u003cp\u003eAs the radiologist\u0026rsquo;s mask is drawn based on the FLAIR image, 3D U-Net and 3D nnU-Net models were trained using this modality, which served as the reference modality. It should be noted that, for this purpose, a 3D U-Net baseline, which demonstrated relatively better performance than the modified model, is compared with 3D nnU-Net, whose hyperparameters are automatically optimized at each epoch. In contrast, 3D nnU-Net model exhibited high sensitivity to domain shift.\u003c/p\u003e \u003cp\u003eThe masks manually delineated by a radiologist for each class, alongside those generated by the neural network architectures, are presented. Also, Table\u0026nbsp;\u003cspan refid=\"Tab4\" class=\"InternalRef\"\u003e4\u003c/span\u003e shows masks for the same cases listed in Table\u0026nbsp;\u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e3\u003c/span\u003e. The DSC for each class, as well as for 3D U-Net and 3D nnU-Net neural networks, is presented separately in Tables\u0026nbsp;\u003cspan refid=\"Tab5\" class=\"InternalRef\"\u003e5\u003c/span\u003e and \u003cspan refid=\"Tab6\" class=\"InternalRef\"\u003e6\u003c/span\u003e, respectively.\u003c/p\u003e \u003cdiv\u003e\n \u003ctable float=\"Yes\" id=\"Tab5\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv\u003eTable 5\u003c/div\u003e\n \u003cdiv\u003e\n \u003cp\u003eResults of DSC for each case (3D U-Net).\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eCase\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colname=\"c2\"\u003e\n \u003cp\u003eDSC class 1\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colname=\"c3\"\u003e\n \u003cp\u003eDSC class 2\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colname=\"c4\"\u003e\n \u003cp\u003eDSC class 3\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c2\"\u003e\n \u003cp\u003e0.8\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e0.6\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e0.75\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003e2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c2\"\u003e\n \u003cp\u003e0.84\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e0.476\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e0.6315\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003e3\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c2\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e0.19\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e0.6\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003e4\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c2\"\u003e\n \u003cp\u003e0.85\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e0.35\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e0.6\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n\u003c/div\u003e\n\u003cdiv\u003e\n \u003cdiv align=\"char\" char=\".\" colname=\"c3\" colnum=\"3\"\u003e\u003cbr\u003e\u003c/div\u003e\u0026nbsp;\u003ctable float=\"Yes\" id=\"Tab6\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv\u003eTable 6\u003c/div\u003e\n \u003cdiv\u003e\n \u003cp\u003eResults of DSC for each case (3D nnU-Net).\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003ecase\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colname=\"c2\"\u003e\n \u003cp\u003eDSC class 1\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colname=\"c3\"\u003e\n \u003cp\u003eDSC class 2\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colname=\"c4\"\u003e\n \u003cp\u003eDSC class 3\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c2\"\u003e\n \u003cp\u003e0.65\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e0.6\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e0.7\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003e2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c2\"\u003e\n \u003cp\u003e0.6842\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e0.7458\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e0.842\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003e3\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c2\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e0.22\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e0.55\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003e4\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c2\"\u003e\n \u003cp\u003e0.85\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e0.55\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e0.6\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec28\"\u003e\n \u003ch2\u003e3.6. Statistical Analysis\u003c/h2\u003e\n \u003cp\u003eTable 7 indicates the statistical comparison of DSC for each class and mean tumor between 3D U-Net and 3D nnU-Net models. As shown in Table 6, the 3D U-Net achieved slightly higher mean DSC for Class 1, whereas 3D nnU-Net demonstrated higher DSC values for Classes 2 and 3. However, the Wilcoxon signed-rank test revealed that none of these differences were statistically significant (p \u0026gt; 0.05).\u003c/p\u003e\n \u003ctable float=\"Yes\" id=\"Tab7\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv\u003eTable 7\u003c/div\u003e\n \u003cdiv\u003e\n \u003cp\u003eStatistical comparison of Dice similarity coefficients per class using the Wilcoxon signed-rank test.\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eDice Similarity Coefficients\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colname=\"c2\"\u003e\n \u003cp\u003e3D U-Net\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colname=\"c3\"\u003e\n \u003cp\u003e3D nnU-Net\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colname=\"c4\"\u003e\n \u003cp\u003ep-value\u003c/p\u003e\n \u003cp\u003e(p \u0026lt; 0.05)\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eClass 1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\"±\" colname=\"c2\"\u003e\n \u003cp\u003e0.771 ± 0.15\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\"±\" colname=\"c3\"\u003e\n \u003cp\u003e0.71 ± 0.15\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e0.063\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eClass 2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\"±\" colname=\"c2\"\u003e\n \u003cp\u003e0.571 ± 0.25\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\"±\" colname=\"c3\"\u003e\n \u003cp\u003e0.65 ± 0.23\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e0.167\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eClass 3\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\"±\" colname=\"c2\"\u003e\n \u003cp\u003e0.62 ± 0.2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\"±\" colname=\"c3\"\u003e\n \u003cp\u003e0.676 ± 0.17\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e0.443\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eMean tumor\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\"±\" colname=\"c2\"\u003e\n \u003cp\u003e0.652 ± 0.19\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\"±\" colname=\"c3\"\u003e\n \u003cp\u003e0.672 ± 0.18\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e0.776\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003ctfoot\u003e\n \u003ctr\u003e\n \u003ctd colspan=\"4\"\u003eValues are reported as mean ± standard deviation, and the Wilcoxon signed-rank test was used to assess statistical significance.\u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tfoot\u003e\n \u003c/table\u003e\n \u003cp\u003eTable 3\u003c/p\u003e\n \u003cp\u003eTable\u0026nbsp;4\u003c/p\u003e\n \u003cp\u003eTable\u0026nbsp;5\u003c/p\u003e\n \u003cp\u003eTable\u0026nbsp;6\u003c/p\u003e\n \u003cp\u003eTable\u0026nbsp;7\u003c/p\u003e\n\u003c/div\u003e"},{"header":"4. Discussion","content":"\u003cp\u003eIn this study, 3D U-Net and 3D nnU-Net neural network models were trained using the BraTS 2025 dataset to evaluate and compare these models for the segmentation of different regions of brain metastasis. After training, the models were assessed in terms of performance using clinical MRI data.\u003c/p\u003e \u003cp\u003eAccording to Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e, the highest average DSC for the tumor class was observed in the baseline 3D U-Net model around the midpoint of the training process. However, 3D U-Net optimized model, after several adjustments to its code, showed a gradual and consistent increase in the DSC (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e3\u003c/span\u003eA) compared to the baseline 3D U-Net (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eA). The highest DSC for this model was recorded at 0.760 during the final epoch of training. Notably, the overall DSC for the tumor class was considerably higher than the baseline model, although the DSC for the central tumor was still lower. These results indicate that the baseline 3D U-Net model performed better in capturing the masses compared to the adjusted version. According to related works (\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e, \u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e), it has been shown that the 3D U-Net model demonstrates better performance in training and segmenting glioma tumors and their various regions compared to the present study. In general, brain metastasis tumors are often smaller and more dispersed than glioma tumors, making their detection significantly more challenging. Additionally, studies conducted for tumor segmentation have frequently employed hybrid deep learning models, transfer learning, or hybrid versions, which have yielded higher results. However, the calculation of HD95 showed that the modifications made to 3D U-Net model (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e, Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eD and \u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e3\u003c/span\u003eD) not only reduced the value but actually increased it. However, by evaluating the precision metric, 3D U-Net model with modifications demonstrated better performance in detecting positive instances of edema compared to the baseline model, with the maximum precision observed particularly towards the end of the training. This indicates that the modified model has the potential to achieve significantly better results compared to the baseline model in longer training sessions.\u003c/p\u003e \u003cp\u003eThe baseline 3D U-Net model demonstrates higher precision for Classes 1 and 3, while the modified model shows superior precision for Class 2. Evaluating recall and the results, the classic 3D U-Net model outperforms the modified architecture, especially because all three classes show a more significant decrease. This indicates that the baseline model exhibits higher sensitivity across all three classes, with the highest sensitivity observed for Class 2. Reviewing the study by Beers et al. (\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e) a 3D U-Net was used for brain tumor segmentation with input data from the BraTS 2018 dataset, achieving DSCs of 0.732 and 0.73 for the enhancing tumor and tumor core, respectively. The model performed better on the tumor core, with a DSC of 0.773\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01, while the DSC for the whole tumor decreased slightly. Reviewing the relevant literature to this study (\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e), which achieved higher DSC in brain tumor segmentation, the use of a hybrid version consisting of 3D nnU-Net architecture and transfer learning has enabled better performance in class detection. However, this model had lower performance in Class 1. This is again because the cases with necrosis were fewer in number. According to the analysis conducted on the validation data, the number of lesions corresponding to Classes 1, 2, and 3 was 100, 216, and 273, respectively. Of these, the model successfully identified 75, 149, and 160 lesions for each class, respectively. The DSC calculated for all three classes using the 3D nnU-Net architecture was higher than that of both the baseline 3D U-Net model and the modified version, indicating the strong performance of this model in detecting all three classes. The DSC for both 3D U-Net architectures (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eC and \u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e3\u003c/span\u003eC) is significantly higher for Class 2 compared to Classes 1 and 3. This is due to its larger extent and easier segmentation for the trained model. Another reason is that some brain metastasis cases did not have necrosis or had such small areas of it that detection by the model was difficult. Class 3, which lies between Class 1 and Class 2, has a smaller size compared to Class 2. Although it was expected that 3D nnU-Net model, with its self-adjusting architecture in both preprocessing and model training, would perform better, the results presented in Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e indicated a lower DSC for the whole tumor compared to the modified 3D U-Net model, although it outperforms the baseline 3D U-Net model. However, the Dice for the tumor core shows comparable results to the baseline 3D U-Net. Despite this, the average DSC of the tumor is lower than the baseline model. The HD95 values are better than the optimized configuration of 3D U-Net for all classes, but Classes 1 and 3 still show higher results than Class 2. These lower results are because the model used is based on a purely foundational architecture without any additional adjustments or integration with other architectures.\u003c/p\u003e \u003cp\u003eAs shown in Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e, the Jaccard index is reported separately for all three classes and for each neural network architecture. The Jaccard index for the baseline 3D U-Net architecture is significantly higher than that of the optimized 3D U-Net and 3D nnU-Net. In addition, for each architecture, the Jaccard index of Class 2 is higher than that of the other two classes. This metric was reported to be comparable to that of the baseline 3D U-Net architecture and the study related to the present work that employed the DeepLab V3 network by Gr\u0026oslash;vik et al. (\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e). However, its value for 3D nnU-Net is somewhat lower.\u003c/p\u003e \u003cp\u003eAccording to the results obtained from Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003eA and \u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003eB, the loss function baseline model showed higher performance during training, whereas the modified model showed improved performance during validation. This suggests that, with continued training, the second model has the potential to yield more favorable results. The baseline 3D U-Net enters the learning process with relatively high loss values (1.24 for training and 1.17 for validation). However, the sharp decrease in these early losses shows that the network quickly grasps initial features. By the end of training, losses converge approximately 0.1116 (training) and 0.1763 (validation), indicating a reasonable balance between learning and Generalizability. After about epoch 140, the gap between the training and validation curves widens, suggesting the model is becoming overly tuned to the training data. In other words, this architecture performs steadily and predictably, reaching a sensible compromise between accuracy and robustness despite its slower loss reduction.\u003c/p\u003e \u003cp\u003eBy computing the DSC between the radiologist-generated masks and the masks produced by the neural network models, we obtained the results presented in Tables\u0026nbsp;\u003cspan refid=\"Tab5\" class=\"InternalRef\"\u003e5\u003c/span\u003e and \u003cspan refid=\"Tab6\" class=\"InternalRef\"\u003e6\u003c/span\u003e for a subset of the evaluated cases. Case 1, which contains two large tumors, for which the 3D U-Net model demonstrated superior performance across all three classes compared to the 3D nnU-Net. Similarly, in Case 2, the 3D U-Net again outperformed 3D nnU-Net; however, in this instance, the model showed particularly improved detection of the tumor region (Class 3). Case 3 contains a very large number of small tumors, for which both models demonstrated low accuracy in segmenting Classes 2 and 3. However, 3D nnU-Net achieved slightly better performance in segmenting edema, whereas the 3D U-Net showed higher accuracy in detecting the tumor region. In Case 4, the 3D U-Net demonstrated superior segmentation performance for Classes 1 and 3, whereas the 3D nnU-Net achieved higher accuracy for Class 2.\u003c/p\u003e \u003cp\u003eThese findings further support the observation that 3D nnU-Net consistently performs better in identifying peritumoral edema and enhancing tumor, while the 3D U-Net provides more accurate delineation of necrosis regions. This distinction is likely attributable to the architectural modifications applied to the 3D U-Net, which enhanced its sensitivity to necrotic components. It is important to note that, in this study, 3D nnU-Net architecture was employed in its standard self-configuring form, without any supplementary architectural refinements or enhanced preprocessing strategies. Therefore, it is reasonable to expect that applying similar modifications, such as optimized preprocessing pipelines, architectural refinements, hybrid model integration, or the use of transfer learning, could further improve the performance of both models. In particular, these strategies have the potential to substantially strengthen 3D nnU-Net\u0026rsquo;s capabilities, enhancing not only its training efficiency but also its robustness and accuracy when evaluated on heterogeneous clinical datasets.\u003c/p\u003e \u003cp\u003eBased on quantitative evaluation metrics, visual inspection, and statistical analysis of clinical test cases, 3D nnU-Net architecture exhibited more favorable performance compared to 3D U-Net. This observation may be attributed to several factors. One explanation is that hyperparameter tuning in 3D U-Net, including preprocessing and training settings, was performed manually. In contrast, 3D nnU-Net employs a fully automated configuration strategy, which resulted in improved performance. This automation enhances not only general applicability but also leads to better accuracy across different clinical scenarios. Furthermore, the 3D U-Net architecture supports multi-modality integration; nnU-Net demonstrated competitive segmentation performance when applied to a single-modality configuration.\u003c/p\u003e"},{"header":"5. Limitations and Future Directions","content":"\u003cp\u003eDespite recent progress in convolutional architectures, accurate segmentation of brain metastases remains challenging. Models such as 3D U-Net, 3D nnU-Net, and their derivatives have improved delineation of enhancing tumor, necrotic-core, and edematous regions, yet several methodological and practical limitations.\u003c/p\u003e \u003cp\u003eFirst, collecting large amounts of clinical data from imaging centers was one of the challenges. Many of the datasets were either unavailable or incomplete, leading to their exclusion from the study. Another limitation is the difference in the number of slices per modality. The clinical data had a slice range between 16 and 21, whereas the collected BraTS data contained about 150 slices. Even though resizing and patching were performed to match the number of slices to those on which the model was trained, this introduces some bias. Another issue was that the neural network architectures were not able to accurately segment tumors located at the brain's edges or those in the highest slices. Finally, domain shift posed a significant challenge. The multi-center data that were collected varied in resolution, noise, intensity, scanner type, and voxel size.\u003c/p\u003e \u003cp\u003eAdditionally, the 3D nnU-Net settings caused the fusion of masks on the images to be inferior to that of the 3D U-Net. Following the training of the neural network models, clinical testing is conducted to assess their accuracy, reliability, and applicability in real-world clinical environments. The cases included in this study were obtained from multiple medical centers, each utilizing different MRI scanners from various manufacturers (e.g., Philips, Symphony, Siemens, and others), as well as exhibiting variations in slice thickness, field of view (FOV), and image resolution. These heterogeneities ensure a more rigorous evaluation of the model\u0026rsquo;s performance across diverse acquisition settings and clinical conditions. Additionally, both architectures demonstrated difficulty in accurately segmenting small lesions and tumors located at the brain periphery, highlighting an ongoing limitation of convolutional neural networks in handling extreme spatial variability.\u003c/p\u003e \u003cp\u003eSeveral approaches are considered for future work. First, integrating different neural network architectures to achieve improved segmentation performance is of interest. For example, employing a VGG16-based U-Net architecture could be explored. Alternatively, a hybrid approach that combines a 2D VGG16 model with a 3D network may be investigated to provide improved volumetric representation of tumors. Second, a potential approach involves extracting tumor-related and other class specific information using radiomics. Third, employing a substantially larger dataset is recommended to enable more rigorous validation of the trained model on clinical data. Finally, applying task-specific enhancements to nnU-Net, such as customized preprocessing or transfer learning, may further improve its performance in heterogeneous clinical environments.\u003c/p\u003e"},{"header":"6. Conclusions","content":"\u003cp\u003eIn this study, 3D U-Net and 3D nnU-Net neural network architectures were evaluated for the segmentation of different classes of brain metastases. Both models were trained using the BraTS 2025 benchmark dataset and tested on clinical MRI data collected from multiple imaging centers, highlighting their applicability to heterogeneous real-world imaging conditions. The use of FLAIR modality plays a significant role in the detection and differentiation of these lesions, and its integration with deep learning substantially enhances segmentation performance.\u003c/p\u003e \u003cp\u003eBased on the quantitative results, 3D U-Net architecture achieved a higher DSC than nnU-Net (0.791\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01 vs. 0.764\u0026thinsp;\u0026plusmn;\u0026thinsp;0.25). However, observations from clinical cases indicate that 3D nnU-Net is better suited to clinical environments. Further work on segmentation accuracy improvement by integrating different network architectures or by increasing the size and diversity of the training dataset is recommended. At the end, based on both quantitative outcomes and visual assessment of clinical cases, 3D nnU-Net demonstrated improved performance compared to the 3D U-Net model. However, these differences were not statistically significant.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eAuthor Contributions: \u003c/strong\u003eConceptualization, D.S.-G., M.K.; Methodology, D.S.-G., M.K., M.E., S.H., and P.A.; Validation, D.S.-G., M.K., M.E., and S.H.; Investigation, M.K.; Resources, D.S.-G.; Data Curation, D.S.-G., M.K., M.E., S.H., and P.A.; Writing\u0026mdash;Original Draft Preparation, M.K.; Writing\u0026mdash;Review and Editing, D.S.-G., M.E.; Supervision, D.S.-G.; Project Administration, D.S.-G.; Funding Acquisition, D.S.-G. All authors have read and agreed to the published version of the manuscript.\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eFunding:\u003c/strong\u003eThis study was funded financially (Grant Numbers: 3403806 and 1403341) by Isfahan University of Medical Sciences, Isfahan, Iran.\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eInstitutional Review Board Statement: \u003c/strong\u003eThis article contains no studies with human participants or animals performed by the authors.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eEthics approval and consent to participate: \u003c/strong\u003eThis article contains no studies with human participants or animals performed by the authors. Ethical codes: IR.MUI.MED.REC.1403.499 and IR.ARI.MUI.REC.1404.037 by Isfahan University of Medical Sciences, Isfahan, Iran.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConsent for publication: \u003c/strong\u003eNot Applicable.\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eData Availability Statement: \u003c/strong\u003eThe data presented in this study are available on request from the corresponding author.\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eConflicts of Interest: \u003c/strong\u003eThe authors declare they have no conflicts of interest. \u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eEthical Considerations and AI Use Declaration: \u003c/strong\u003eAll procedures complied with the IRB and Declaration of Helsinki standards. This manuscript was reviewed using Grok 3 solely for grammar correction and language refinement, as the authors are non-native English speakers. The intellectual content, data analysis, and conclusions are the sole responsibility of the authors.\u003c/p\u003e\n"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eZuccato JA, Mamatjan Y, Nassiri F, Ajisebutu A, Liu JC, Muazzam A, et al. Prediction of brain metastasis development with DNA methylation signatures. Nat Med. 2025;31(1):116\u0026ndash;25.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJiang K, Parker M, Materi J, Azad TD, Kamson DO, Kleinberg L, et al. Epidemiology and survival outcomes of synchronous and metachronous brain metastases: a retrospective population-based study. NeuroSurg Focus. 2023;55(2):E3.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLe Rhun E, Weller M, Anders C, Larkin J, Li J, Moss NS, et al. Symptomatic melanoma brain metastases: A call for clear definitions and adoption of standardized tools. Eur J Cancer. 2024;208:114202.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMirmoeeni S, Azari Jafari A, Shah M, Salemi F, Hashemi SZ, Seifi A. The clinical, diagnostic, therapeutic, and prognostic characteristics of brain metastases in prostate cancer: a systematic review. Prostate Cancer. 2022;2022(1):5324600.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNieblas-Bedolla E, Nayyar N, Singh M, Sullivan RJ, Brastianos PK. Emerging immunotherapies in the treatment of brain metastases. Oncologist. 2021;26(3):231\u0026ndash;41.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBodensohn R, Kaempfel A-L, Boulesteix A-L, Orzelek AM, Corradini S, Fleischmann DF, et al. Stereotactic radiosurgery versus whole-brain radiotherapy in patients with 4\u0026ndash;10 brain metastases: A nonrandomized controlled trial. Radiother Oncol. 2023;186:109744.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRajendran S, Rajagopal SK, Thanarajan T, Shankar K, Kumar S, Alsubaie NM, et al. Automated segmentation of brain tumor MRI images using deep learning. IEEE Access. 2023;11:64758\u0026ndash;68.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSoltaninejad A, Shahbazi-Gahrouei D, Khorasani A, Hemati S. Evaluation of CNN-based deep learning models for auto-contouring in Glioblastoma radiotherapy: A review. Radiat Oncol. 2026;20:169.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSharma SR, Alshathri S, Singh B, Kaur M, Mostafa RR, El-Shafai W. Hybrid multilevel thresholding image segmentation approach for brain MRI. Diagnostics. 2023;13(5):925.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMohammed AS, Mihoub Z, editors. A Review of Image Segmentation Strategies from Classical Methods to Deep Learning. 2024 Conference of Young Researchers in Electrical and Electronic Engineering (ElCon); 2024: IEEE.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRezaei M, Rahmani E, Khouzani SJ, Rahmannia M, Ghadirzadeh E, Bashghareh P, et al. Role of artificial intelligence in the diagnosis and treatment of diseases. Kindle. 2023;3(1):1\u0026ndash;160.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChen X, Hu X, Huang Y, Jiang H, Ji W, Jiang Y, et al. Deep learning-based software engineering: progress, challenges, and opportunities. Sci China Inform Sci. 2025;68(1):111102.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eEtehadtavakol M, Etehadtavakol M, Ng E. Optimizing thyroid nodule segmentation in thermal imaging with temporal sequences and advanced deep Learning backbones. Expert Syst Appl. 2025;296:129105.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNaser MA, Deen MJ. Brain tumor segmentation and grading of lower-grade glioma using deep learning in MRI images. Comput Biol Med. 2020;121:103758.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRasool N, Bhat J. Multimodal Brain Tumor Segmentation using 3D-U-Net. 2023.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAhsan R, Shahzadi I, Najeeb F, Omer H. Brain tumor detection and segmentation using deep learning. Magn Reson Mater Phys Biol Med. 2025;38(1):13\u0026ndash;22.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTabassum M, Di Ieva A, Liu S. Meta transfer learning for brain tumor segmentation using nnUNet in meningioma and metastasis cases. Sci Rep. 2025;15.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYushkevich PA, Yang G, Gerig G. ITK-SNAP: An interactive tool for semi-automatic segmentation of multi-modality biomedical images. Annu Int Conf IEEE Eng Med Biol Soc. 2016;2016:3342\u0026ndash;5.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHuang J, Yagmurlu B, Molleti P, Lee R, VanderPloeg A, Noor H, et al. Brain tumor segmentation using deep learning: high performance with minimized MRI data. Front Radiol. 2025;5:1616293.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSivakumar M, Parthasarathy S, Padmapriya T. Trade-off between training and testing ratio in machine learning for medical image processing. PeerJ Comput Sci. 2024;10:e2245.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJain I, Willems S, Latre S, De Schepper T. On-the-Fly Data Augmentation for Brain Tumor Segmentation. arXiv preprint arXiv:250924973. 2025.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAgrawal P, Katal N, Hooda N. Segmentation and classification of brain tumor using 3D-UNet deep neural networks. Int J Cogn Comput Eng. 2022;3:199\u0026ndash;210.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang Z, editor. Improved adam optimizer for deep neural networks. 2018 IEEE/ACM 26th international symposium on quality of service (IWQoS); 2018: Ieee.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLuu HM, Park S-H, editors. Extending nn-UNet for brain tumor segmentation. International MICCAI brainlesion workshop. Springer; 2021.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRainio O, Teuho J, Kl\u0026eacute;n R. Evaluation metrics and statistical tests for machine learning. Sci Rep. 2024;14(1):6086.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJiang H, Imran M, Muralidharan P, Patel A, Pensa J, Liang M, et al. MicroSegNet: A deep learning approach for prostate segmentation on micro-ultrasound images. Comput Med Imaging Graph. 2024;112:102326.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBeers A, Chang K, Brown J, Sartor E, Mammen C, Gerstner E et al. Sequential 3d u-nets for biologically-informed brain tumor segmentation. arXiv preprint arXiv:170902967. 2017.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGr\u0026oslash;vik E, Yi D, Iv M, Tong E, Nilsen LB, Latysheva A, et al. Handling missing MRI sequences in deep learning segmentation of brain metastases: a multicenter study. NPJ Digit Med. 2021;4(1):33.\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"},{"header":"Table 3 and 4","content":"\u003cp\u003eTable 3 and 4 are available in the Supplementary Files section.\u003c/p\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"bmc-medical-imaging","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"bmim","sideBox":"Learn more about [BMC Medical Imaging](http://bmcmedimaging.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/bmim/default.aspx","title":"BMC Medical Imaging","twitterHandle":"BMC_series","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"Brain cancers, Deep learning, Magnetic resonance imaging, 3D nnU-Net, 3D U-Net","lastPublishedDoi":"10.21203/rs.3.rs-8754688/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-8754688/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003ch2\u003eBackground\u003c/h2\u003e \u003cp\u003eBrain metastasis is one of the most common intracranial tumors, and accurate diagnosis is crucial for effective treatment planning and overall patient survival. Manual segmentation of tumors and their sub-regions is often time-consuming and costly. In recent years, artificial intelligence has made remarkable advances in medical imaging. The study aims to compare two U-Net-based neural network architectures (3D U-Net and 3D nnU-Net) and to evaluate their performance in clinical settings using magnetic resonance imaging (MRI).\u003c/p\u003e\u003ch2\u003eMethods\u003c/h2\u003e \u003cp\u003eThis study is a retrospective case-cohort. A dataset of 200 case images from the Brain Tumor Segmentation (BraTS) and 15 cases from multiple imaging centers was used for training and testing, respectively. Two neural network models (3D U-Net and 3D nnU-Net) were utilized and compared. Each case included FLAIR weighted MR imaging in Multiple center cases. Various metrics are Dice similarity coefficient (DSC), Hausdorff Distance 95 (HD95), precision, Jaccard index, and recall for tumor segmentation. Statistical analysis included descriptive statistics, Wilcoxon signed-rank testing, and agreement analysis (P\u0026thinsp;\u0026lt;\u0026thinsp;0.05).\u003c/p\u003e\u003ch2\u003eResults\u003c/h2\u003e \u003cp\u003eThe 3D U-Net showed a mean tumor DSC of 0.791\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01, while the 3D nnU-Net indicated a DSC of 0.764\u0026thinsp;\u0026plusmn;\u0026thinsp;0.25. For 3D U-Net, HD95 was found to be 15.11\u0026thinsp;\u0026plusmn;\u0026thinsp;1.33, while 13.74\u0026thinsp;\u0026plusmn;\u0026thinsp;26.89 for 3D nnU-Net. The amount of 0.669\u0026thinsp;\u0026plusmn;\u0026thinsp;0.001 and 0.607\u0026thinsp;\u0026plusmn;\u0026thinsp;0.21 achieved a Jaccard index and 0.834\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01 and 0.7\u0026thinsp;\u0026plusmn;\u0026thinsp;0.22 precision for 3D U-Net and 3D nnU-Net, respectively.\u003c/p\u003e\u003ch2\u003eConclusions\u003c/h2\u003e \u003cp\u003eA visual evaluation of the segmentation outputs clearly demonstrated that 3D U-Net provided more consistent delineation of necrotic regions, whereas 3D nnU-Net showed higher robustness for segmenting edema and enhancing tumor regions. Overall, 3D nnU-Net demonstrated improved performance compared to the 3D U-Net model. However, these differences were not statistically significant.\u003c/p\u003e","manuscriptTitle":"Deep Neural Network Architectures for Brain Metastasis Segmentation: A Comparison of 3D U-Net and 3D nnU-Net on FLAIR-Weighted MR Imaging","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-04-23 09:49:31","doi":"10.21203/rs.3.rs-8754688/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2026-04-27T09:29:02+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-04-26T11:26:24+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-04-26T03:19:09+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"157159985349529984006603504980081504694","date":"2026-04-23T14:25:10+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-04-20T18:25:03+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"279823821037624811700490575262180000207","date":"2026-04-20T14:46:43+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"323730229222429728093265927652669402753","date":"2026-04-18T02:01:22+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"8742489628302869306501803275890084060","date":"2026-04-16T00:49:29+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2026-04-16T00:47:47+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2026-02-09T17:37:49+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2026-02-06T19:35:18+00:00","index":"","fulltext":""},{"type":"submitted","content":"BMC Medical Imaging","date":"2026-02-06T19:27:30+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"bmc-medical-imaging","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"bmim","sideBox":"Learn more about [BMC Medical Imaging](http://bmcmedimaging.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/bmim/default.aspx","title":"BMC Medical Imaging","twitterHandle":"BMC_series","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"2b4979ea-5902-40c6-b3da-6b36424a149f","owner":[],"postedDate":"April 23rd, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[],"tags":[],"updatedAt":"2026-05-02T18:53:11+00:00","versionOfRecord":[],"versionCreatedAt":"2026-04-23 09:49:31","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-8754688","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-8754688","identity":"rs-8754688","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00