Automated Deep Learning–Based Demyelination Load Segmentation in Metachromatic Leukodystrophy

doi:10.21203/rs.3.rs-7924598/v1

Automated Deep Learning–Based Demyelination Load Segmentation in Metachromatic Leukodystrophy

2025 · doi:10.21203/rs.3.rs-7924598/v1

preprint OA: closed

Full text JSON View at publisher

Full text 127,471 characters · extracted from preprint-html · click to expand

Automated Deep Learning–Based Demyelination Load Segmentation in Metachromatic Leukodystrophy | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Automated Deep Learning–Based Demyelination Load Segmentation in Metachromatic Leukodystrophy Pascal Martin, Joël Schaerer, Thomas Cajgfinger, Allesandro Delmonte, and 9 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-7924598/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 21 Apr, 2026 Read the published version in Clinical Neuroradiology → Version 1 posted You are reading this latest preprint version Abstract Purpose Metachromatic leukodystrophy (MLD) is a rare lysosomal storage disorder characterized by progressive white matter demyelination. Quantification of demyelinated white matter on MRI—typically expressed as the demyelination load —serves as a key imaging biomarker of disease burden, enabling objective monitoring beyond visual rating scales. However, current semi-automated pipelines are limited by manual interaction, pediatric brain variability, and differences in MRI acquisition. This study aimed to develop and validate a self-configuring convolutional neural network (CNN) for automated segmentation of demyelinated white matter in MLD and to compare its performance with a conventional semi-automated method across heterogeneous MRI datasets. Methods An nnU-Net was trained on 189 3D T1- and axial T2-weighted scans from 35 MLD patients using visually controlled conventional masks as ground truth. Independent testing was performed on 130 scans (73 high-resolution 3D, 57 lower-resolution 2D T1-weighted) from 49 patients. Performance was assessed by Dice coefficient, Bland–Altman bias, correlation with Gross Motor Function Classification (GMFC-MLD), longitudinal consistency, and qualitative review of outliers. Results CNN-based segmentation showed strong spatial agreement with the reference method, highest in 3D T1-weighted and robust in 2D scans. Volumetric bias was minimal, and CNN-derived lesion volumes correlated well with motor impairment. Longitudinal analyses showed smooth, monotonic changes, and qualitative review revealed fewer boundary misclassifications. Conclusion The nnU-Net enables fast, reproducible, and clinically meaningful segmentation of demyelinated white matter in MLD. It generalizes across MRI protocols, correlates with motor function, and offers a scalable tool for standardized biomarker extraction in clinical trials and other leukodystrophies. Metachromatic leukodystrophy convolutional neural network demyelination load quantitative MRI automated lesion segmentation Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Introduction Metachromatic leukodystrophy (MLD) is a rare autosomal-recessive lysosomal storage disorder caused by mutations in the ARSA gene, leading to deficient arylsulfatase A activity and resulting in sulfatide accumulation. This pathological cascade culminates in progressive demyelination of central and peripheral nervous systems, clinically presenting with motor and cognitive decline 1–3 . MRI plays a pivotal role in diagnosing and monitoring MLD. T2-weighted hyperintensities, often exhibiting a tigroid pattern, are the hallmark imaging feature 2; 4 . To quantify disease burden, several imaging-based approaches have been developed. The most straightforward method is the MLD MRI severity score 5 , a visual rating scale assessing the extent of white matter lesions and global brain atrophy across predefined regions. While widely used in clinical routine, this approach remains subject to inter-rater variability and lacks more detailed quantification of T2-hyperintensities. To reduce subjectivity and enable objective, quantitative assessment, the “demyelination load” was introduced 6 . This method semi-automatically quantifies the proportion of T2-hyperintense white matter relative to total brain volume, allowing reproducible estimation of disease burden validated in landmark studies 2; 7–9 and has become a reference standard in MLD imaging. The demyelination load method relies on tissue segmentation using standard MRI volumetry pipelines such as SPM12 and CAT12 to generate tissue probability maps. These maps are then used to define white matter regions, within which abnormal T2 signal intensity is segmented and quantified. While robust, the method carries limitations: segmentation accuracy may be reduced in a disease affected pediatric population due to anatomical variability and atrophy, brain maturation effects, scanner-dependent image quality. Furthermore, conventional segmentation algorithms typically rely on adult-derived tissue probability maps, which may inadequately represent age-specific anatomical features; this discrepancy, compounded by disease-related structural alterations, further undermines tissue classification accuracy 10; 11 . Thus, while the technique is semi-automated and reproducible, it remains constrained by methodological assumptions. To overcome these limitations and improve efficiency, machine learning approaches—particularly deep learning—have emerged as powerful tools for white matter lesion segmentation. Previous work has demonstrated their utility in adult populations with multiple sclerosis or cerebral small vessel disease, achieving high accuracy and reliability 12–14 . However, applications in pediatric leukodystrophies, and in particular in MLD, are missing. In this study, we trained a convolutional neural network (CNN) on a curated dataset of MLD patients for whom demyelination load was previously calculated using high-resolution 3D T1-weighted images and axial T2-weighted sequences. The ground truth segmentations were derived using the established semi-automated pipeline with strict visual quality control. We then tested the model on an independent dataset including both high-resolution 3D and 2D T1-weighted images, each paired with T2 images, to assess generalizability. Evaluation metrics included Dice coefficient (DC) and Bland-Altman analysis, alongside correlation with clinical measure of motor symptoms and longitudinal evaluation. Our goal was to assess whether this CNN-based method can serve as a robust alternative to the conventional pipeline, facilitating more efficient and standardized lesion quantification in MLD. Materials and Methods Study Design and Cohorts This study followed a two-phase design involving a training and a test cohort in MLD patients. MLD diagnosis was confirmed based on deficient arylsulfatase A (ARSA) enzyme activity, elevated urinary sulfatides, clinical presentation, and (where available) pathogenic mutations in the ARSA gene. Ethical approval was obtained from the local Ethics Committee (reference 401/2005), and written informed consent was provided by patients or legal guardians. Imaging Data and Preprocessing All MRI scans included both T1-weighted and T2-weighted sequences. Data were acquired across multiple clinical sites using scanners from different manufacturers (GE, Philips, and Siemens) and at field strengths of 1.5 T and 3 T, reflecting real-world heterogeneity in acquisition protocols. This diversity was considered advantageous for developing a robust and generalizable segmentation approach. Visual quality control (QC) was applied to all images. T1- and T2-weighted scans were manually inspected for motion artifacts and insufficient brain coverage. Scans with substantial technical limitations were excluded. Conventional Demyelination Load Segmentation The reference standard for both CNN training and evaluation was derived using a semi-automated lesion segmentation method validated in previous studies 6; 7 . The pipeline was applied to both training and test data as follows: T2-weighted images were rigidly co-registered to corresponding T1-weighted images. Tissue segmentation was performed using the CAT12 toolbox within SPM12, generating grey matter, white matter, and CSF probability maps based on adult tissue priors. White matter abnormalities were identified by modeling the T2 intensity distribution within the white matter mask using Gaussian mixture modeling. Lesion voxels were defined by thresholding at the intersection of Gaussian components and refined using a Markov Random Field (MRF) algorithm to reduce noise. Voxels in clusters < 200 were excluded to minimize false positives. All resulting segmentations were visually inspected. Scans with incorrect or anatomically implausible masks were excluded from CNN training. CNN-Based Segmentation A 3D U-Net model based on the nnUnet framework 12 was trained to jointly segment MLD white matter lesions, gray matter, healthy white matter and cerebrospinal fluid (CSF). The architecture follows an encoder-decoder architecture with six layers combining a series of 3D convolutions, instance normalization and LeakyReLU activation layers (see Fig. 1 ). Co-registered T1 and T2-weighted images were used as input to the model after resampling 0.5x0.5x1.2 mm and random 3D patch cropping with size 64x192x160 vox. The model was trained using 5-fold cross-validation, implementing extensive data augmentation to address the limited dataset size. Random data augmentation included flipping, rotation, scaling and intensity-based transformations. All folds were trained on a NVIDIA A10G GPU using the Adam optimizer with a batch size of 2. . Predictions from the five cross-validated models were ensembled, averaged and applied to the independent test set (Group 2), which included both high-resolution 3D T1 and lower resolution 2D T1 images. The ability of the CNN to generalize across scan types was a critical component of the evaluation. Schematic overview of the nnU-Net pipeline used for automated demyelination load segmentation. The proposed model is based on a 3D U-Net architecture processing input patches of size 64x192x160 voxels. Co-registered and skull-stripped images are concatenated and used as input to the first layer of the model (dark gray). The 6-level encoder-decoder module with skip connections (dotted arrows) enforces multi-resolution image feature learning and it was trained using a weighted Dice and cross-entropy loss function. Each level implements a series of convolutions with a 3x3x3 kernel, normalization and LeakyReLU activations. The final softmax layer (light gray) assign voxel-wise probabilities to each of the three output classes, including MLD lesions, healthy white matter and gray matter. Joint learning of lesion and healthy brain tissue proved beneficial in model convergence. The final MLD lesion mask is computed and binarized from post-processed softmax probabilities. Following each segmentation process, the output maps generated by both the CNN and the conventional method were reviewed to ensure technical validity. Specifically, the alignment of T1 and T2 images, the plausibility of grey and white matter classification, and the anatomical consistency of lesion masks were assessed. Outputs with clear technical errors in the conventional segmentation approach (e.g., misregistration or incomplete segmentation) were excluded from subsequent analyses. Importantly, this procedure did not involve excluding cases based on prediction quality, but solely to remove technically invalid results that would preclude meaningful evaluation. Patient Cohort This study included two independent datasets: a training dataset and a test dataset. To avoid overlap of patients between the training and test groups, MRI scans were recruited from separate projects. Specifically, the training cohort consisted of data from a previously published study 15 , whereas the test dataset was collected independently at the Department of Diagnostic and Interventional Neuroradiology, University Hospital Tübingen. Due to this approach, although similar, excact matching for age or gender between the two datasets could not be ensured. Training Dataset : Initially, 221 MRI scans from 44 individual patients (20 female; median age 4.3 years, range 1.7–10.9 years) were included. Following rigorous visual quality control (QC), 189 MRI scans from 32 patients remained (14 female; median age 4.2 years, range 1.7–10.9 years; mean number of scans per patient: 4). For model training, a particularly stringent QC threshold was applied to ensure optimal input quality and minimize noise in the ground truth segmentations. Both T1- and T2-weighted images were required to be free of relevant motion artifacts, as high-quality, co-registered input from both modalities was essential for subsequent automated processing. In some cases or in settings where sedation was minimized for clinical reasons, residual motion led to consistently reduced image quality. Consequently, all scans from certain subjects had to be excluded when neither scan met the predefined quality standards. These exclusions reflect the practical challenges of pediatric MRI acquisition rather than site-specific shortcomings and ensured that only technically reliable data contributed to model training. Test Dataset : The test dataset initially consisted of 196 MRI scans from 49 individual patients (23 female; median age 13.0 years, range 1.4–33.4 years). Following visual quality control, 128 MRI scans from 42 individual patients remained (26 female; median age 12.4 years, range 1.5–33.7 years; mean number of scans per patient: 4). An overview is presented in Table 1 . Table 1 Overview of the patient cohorts included in the training and test datasets before and after visual quality control (QC). Dataset Initial scans (patients) After QC scans (patients) Female Median age (range) Mean scans per patient Training dataset 221 (44) 189 (32) 14 4.2 (1.7–10.9) 4 Test dataset 196 (49) 128 (42) 26 12.4 (1.5–33.7) 3 The test dataset was further subdivided into two subgroups based on imaging protocols: 73 scans from 31 individual patients (14 female, median age 13.5 years, age range 1.5–33.4 years) with high-resolution 3D T1-weighted gradient-echo (GRE)sequences (median resolution 1 × 0.97 × 1 mm³) and matching T2 axial images (median resolution 0.5 × 0.5 × 3.3 mm³) - hereafter referred to as 3D T1-based group. These 3D gradient-echo acquisitions represent the high-resolution standard typically used in research and advanced clinical protocols and provide a consistent imaging basis for quantitative analysis. 57 scans from 24 individual patients (12 female, median age 9.7 years, age range 2.0–23.1 years) with lower-resolution 2D T1-weighted turbo spin echo (TSE) sequences (median resolution 0.66 × 0.74 × 5 mm³) and T2-weighted axial images (median resolution 0.5 × 0.5 × 5.3 mm³) - hereafter referred to as 2D T1-based group. The inclusion of T1-weighted TSE sequences, as opposed to the 3D gradient-echo acquisitions used in the other subgroup, reflects real-world variability in clinical MRI protocols and allows assessment of model generalizability across distinct acquisition settings. Importantly, the CNN was trained only on 3D MPRAGE data but tested on both 3D and 2D T1-weighted inputs to assess generalizability. Evaluation Strategy To comprehensively assess CNN performance, we applied both quantitative and qualitative metrics. The conventional method was applied to the test cohort using the same pipeline described above. Comparative analyses were performed on the CNN and conventional segmentations: Spatial Overlap: Dice coefficient (DC) was calculated to measure spatial agreement between CNN and conventional outputs. Volume Agreement: Bland-Altman analysis was used to assess volumetric differences between methods. Outlier Analysis: Dice and Bland-Altman outliers were reviewed visually. These cases were evaluated across scanners, age groups, and anatomical variance (e.g., unmyelinated regions or enlarged ventricles) to determine which segmentation method better handled these conditions. Clinical Correlation: Lesion volumes were correlated with Gross Motor Function Classification for MLD (GMFC-MLD) using Spearman’s rank correlation. Longitudinal Stability: For patients with multiple scans, lesion volume change between consecutive timepoints was measured to evaluate consistency and plausibility. For the calculation of the demyelination load (white matter lesion volume divided by total brain volume), the total brain volume was defined as the sum of gray matter and total white matter (normal-appearing white matter and white matter lesion volumes). Importantly, total brain volume was derived separately within each segmentation pipeline—using SPM-based tissue maps for the conventional approach and CNN-derived tissue classes for the deep learning approach—to ensure internal consistency of both methods. This multi-pronged strategy allowed evaluation not only of segmentation accuracy but also of clinical relevance and robustness to structural variability, especially in a pediatric rare disease context. Results Segmentation Performance The CNN-based model demonstrated good agreement with the conventional demyelination load segmentation method. The median Dice Coefficient (DC) across all 73 3D T1-based scans DC reached 0.82 (SD = 0.21). As shown in Fig. 2 , the violin plot of Dice coefficients is tightly concentrated in the 0.8–0.9 range, reflecting consistently high overlap between CNN and conventional segmentations for the vast majority of cases. A small number of points extend well below this band, however, representing rare outliers (Dice scores down toward 0.2–0.4), which were evaluated visually in the outlier analysis. Bland-Altman analysis demonstrated low bias between the two segmentation methods. For 3D T1- images, 6 out of 73 cases (8.2%) fell outside the limits of agreement with CNN-based model estimating a slightly higher demyelination load than the conventional segmentation method (median + 0.0115). (A) Distribution of Dice coefficients across all 3D T1-aquised test cases (n = 73) for the automated CNN-based segmentation of demyelinating lesions. The width of each violin corresponds to the frequency of Dice scores; dashed lines indicate the median (0.82) and interquartile range. (B) Bland-Altman diagram depicting agreement between CNN-based and conventional segmentation methods for estimating demyelination load. Each point represents the difference in demyelination load estimates plotted against their mean. For 3D scans, 6 out of 73 cases (8.2%) fell outside the 95% limits of agreement, with the CNN-based model yielding slightly higher demyelination load (median difference: +0.0115 ). No systematic bias between the two methods was observed. Outliers predominantly occurred in scans with low lesion burden. (C) Representative segmentation example with a very good dice coefficient of 0.91 (left - Original T2-weighted image; middle - conventional segmentation of demyelination load projected onto the T2 image (turquoise); right - CNN-based segmentation of demyelination load (orange) Demyelination load derived from the CNN model correlated significantly with clinical severity (see Fig. 3 ). GMFC scores at the time of MRI demonstrated a significant positive association with lesion load as identified by the CNN-based model (r S = 0.38, p < 0.001), suggesting that higher motor impairment corresponded to a greater extent of demyelination. The conventional segmentation yielded a statistically significant correlation as well, which was somewhat weaker than that of the CNN-based segmentation (r S = 0.26, p < 0.025); however, the difference did not reach statistical significance (Williams' test: t = 1.81, p = 0.075). In patients with multiple longitudinal datasets, the median change in demyelination load between consecutive time points showed a slight progression consistent with the expected disease course. This trend was observed in both the CNN-based and the conventional segmentation approaches. Apart from a few outliers, no systematic implausibilities were detected in either method. Correlation with clinical parameters and longitudinal consistency A: Scatter plots illustrate the relationship between clinical motor impairment (GMFC score at time of MRI) and demyelination load derived from CNN-based (blue circles) and conventional (red squares) segmentation methods. Shaded areas represent the respective 95% confidence intervals. Demyelination load correlated significantly with GMFC score for both methods, with stronger association for CNN-based segmentation (Spearman’s r = 0.38, p < 0.001) than conventional segmentation (r = 0.26, p = 0.025). B and C: Box-and-whisker plots show the distribution of volumetric changes in demyelination load between consecutive MRI time points across all patients with multiple scans. Each box represents pooled difference values from several patients, with each data point corresponding to one interval between two consecutive scans of a single patient. Panel A displays results from the CNN-based segmentation, Panel B from the conventional method. The plotted values reflect the difference between value at later time point minus that at earlier time point, resulting in predominantly negative values, as demyelination load typically increased over time. This trend is consistent with disease progression in MLD. Both segmentation approaches demonstrated plausible and stable behavior across time points, with only a few outliers, supporting their suitability for longitudinal tracking. Outlier Analysis Qualitative review of individual cases with low Dice scores or outliers in the Bland-Altman diagram revealed that discrepancies clustered in few individuals where misclassification arose from partial volume effects especially near the ventricles/CSF, overinterpretation of incomplete myelination in young patients or altered anatomy with expanded ventricles or advanced atrophy. In all cases, the CNN approach better matched the visual extent of demyelination, particularly in scans with lower contrast. In cases of extensive white matter alterations, both methods were able to accurately capture the extent of the changes, demonstrating a high DC and strong visual concordance (see Fig. 4 ). Low values of each demyelination load, MLD MRI severity score, age and GMFC were associated with lower agreement (see Fig. 5 ). Examples A–E illustrate cases with high (A) and low (B–E) Dice coefficients between conventional and CNN-based segmentation of T2-hyperintense white matter (demyelination load). Each case is presented in three columns: left – axial T2-weighted image, middle – conventional segmentation of demyelination load projected onto the T2 image (turquoise) - right – CNN-based segmentation overlaid projected onto the T2 image (orange). (A) Depicts a case with the highest Dice coefficient (0.92), showing close visual correspondence between CNN-based and conventional segmentation, with no evident discrepancies. (B) A case with ventricle deformation and relatively low-resolution T2 imaging, where the extent of frontal white matter abnormalities is underestimated in the conventional segmentation. (C) Demonstrates undersegmentation by the conventional method near the frontal horn of the lateral ventricle, where adjacent hyperintensities are incompletely captured. (D) Shows spurious lesion detection in the conventional segmentation, especially in periventricular regions and the corpus callosum, which are not supported by visual T2 hyperintensities. (E) The case with the lowest Dice coefficient, obtained in a patient aged 1.4 years. Conventional segmentation overestimates lesion volume, likely due to misclassification of incompletely myelinated regions as lesions. Panel E-4 (same patient at age 3.4 years) shows nearly complete myelination with only subtle parietal T2-hyperintensities, likely reflecting early pathology. On the other hand, the CNN-based segmentation reveals a misclassification of gray matter as white matter lesion. Comparison of segmentation overlap (Dice coefficient) between the CNN-based model and conventional SPM-based segmentation for T2-hyperintense white matter (demyelination load) across different clinical and radiological parameters. Each subplot shows the Dice coefficient as a function of one of the following variables: (A) patient age at scan, (B) CNN-based demyelination load, (C) GMFC score at scan time, and (D) MLD MRI severity score. Higher Dice values indicate better agreement. Lower segmentation concordance was observed in younger patients and in those with lower lesion burden, suggesting greater variability or structural immaturity in early disease stages. Comparison Across Modalities Despite not being trained on 2D T1-weighted sequences, the CNN model performed in at least comparable accordance to the conventional segmentation. As expected the overall performance of both tested segmentation methods was less heterogeneous in the 2D T1-group than in the 3D T1-group. The mean dice coefficient was lower with 0.75 and standard deviation was higher with 0.32 (compared to 0.82 +/- 0.2 in 3D T1). The 2D T1-group thus differed significantly from the 3D T1-group (p < 0.05). The violin plot in Fig. 6 A demonstrates a vast majority of well according datasets, yet compared to Fig. 2 A more data points fall in the lower DC range. Comparable to the results from 3D T1, no systematic over- or underestimation of lesion volume by the CNN-based model was observed in the Bland-Altman-Diagram (Fig. 6 C). Regarding the Dice Coefficient, most outliers corresponded to scans with small volumes of T2-hyperintense white matter. Correlation with clinical parameter GMFC showed a strong positive correlation observed between GMFC scores and lesion volume derived from the CNN-based segmentation (r S = 0.68, p < 0.001). While conventional segmentation yielded a similarly significant result, the strength of the relationship was moderately reduced (r S = 0.57, p < 0.001). Visual inspections supported the statistical finding that, in the case of 2D T1-weighted data, the conventional approach had incorrectly classified excessive white matter as demyelination load – maybe due to partial volume effects. Notably, small clusters located outside the typically confluent regions of demyelination were more frequently retained despite the application of the 200-voxel filter, which may be not sufficient for this modality. (A) Violin plot of Dice coefficients for CNN-based and conventional lesion segmentation. The width of the violin reflects the frequency of values; the dashed lines mark the 25th, 50th (median), and 75th percentiles. The lower quartile “tail” illustrates that, while most cases achieve high overlap (> 0.8), a subset shows markedly lower agreement. (B) Scatter plot of demyelination load versus GMFC-MLD score, comparing CNN-based (blue circles, blue regression line) and conventional (red squares, red regression line) segmentations. The CNN method shows slightly higher correlation with increasing disability levels. (C) Bland–Altman analysis showed high agreement with 2 datapoints falling out of the limits of agreement (3.6%). The CNN-based segmentation selected slightly less voxels as lesion compared to the conventional approach (median − 0.0065). Discussion This study demonstrates that a convolutional neural network (CNN)-based approach can reliably segment demyelinated white matter in MLD patients, showing substantial agreement with an established semi-automated reference method. The model performed robustly across heterogeneous MRI acquisition settings, including both 2D and 3D imaging protocols, and patient characteristics, with high spatial overlap (median Dice coefficient for 3D T1-images = 0.82, 0.75 for 2D T1-images), low segmentation bias, good correlation with clinical severity and pathophysiological plausible, steadily increasing lesion loads in longitudinal follow-up. Segmentation accuracy and influencing factors The overall segmentation agreement was good, with higher accuracy in 3D T1-weighted sequences compared to 2D T1 images. This discrepancy is expected, as 3D scans offer higher spatial resolution and contrast uniformity. The dice coefficient was most affected by low lesion burden, young age, and low GMFC scores—likely due to developmental factors such as incomplete myelination or greater anatomical variability, which challenge both segmentation methods. Importantly, in these cases the CNN-based approach more closely matched the visual impression than the conventional method, especially in the presence of low contrast or structural abnormalities. Compared with recent deep-learning pipelines for multiple sclerosis lesion segmentation—which typically report Dice scores in the 0.75–0.85 range and 5–10% outliers 16–18 —our CNN achieves comparable or even improved overlap on T1/T2 data. Unlike most MS models trained exclusively on adult FLAIR scans, it generalizes robustly across varied protocols and a markedly younger cohort, demonstrating that modern AI frameworks can be successfully adapted to challenging, heterogeneous pediatric leukodystrophy imaging. Outliers and qualitative findings Visual inspection of segmentation outliers confirmed that most discrepancies stemmed from known pitfalls of conventional segmentation in neuroimaging, even more in a pediatric population—such as partial volume effects near the ventricles/CSF or misclassification of unmyelinated white matter. In contrast, the CNN model was less prone to oversegmentation in such areas and showed greater spatial specificity. Notably, low Dice scores were not randomly distributed but clustered in a small number of patients and remained relatively stable across their longitudinal scans, even when acquired on different MRI scanners. This suggests that certain anatomical or developmental conditions—such as early age, delayed or incomplete myelination, or ventricular deformation—create consistent challenges for conventional segmentation pipelines. These findings highlight the potential of deep learning approaches to better account for structural and maturational variability in pediatric MLD. Cross modalitiy generalizability Despite being trained exclusively on 3D data, the CNN model generalized well to 2D T1-weighted scans, albeit with moderately reduced Dice coefficients. This demonstrates its flexibility and potential applicability in real-world clinical settings where 2D acquisitions may still be common. Notably, small, isolated clusters outside typical lesion regions were more frequently retained in the conventional segmentation of low-resolution 2D scans. These likely reflect limitations of voxel-based filtering when applied to noisier or lower-contrast data. While increasing the filter threshold (e.g., above 200 voxels) might have reduced such spurious detections, predefined thresholds were deliberately kept constant across all analyses to preserve comparability between methods. This highlights the need for modality-specific adaptation in conventional pipelines, while the CNN model demonstrated better intrinsic robustness to such variations. Correlation with clinical severity Lesion volumes derived from both the CNN-based and the conventional segmentation methods showed significant correlations with motor impairment as assessed by the GMFC score. While the CNN-based model consistently yielded slightly stronger correlations in both cohorts, the differences were modest and did not indicate a substantial performance gap. Interestingly, the correlation was markedly stronger in the 2D T1-weighted dataset (r = 0.68) compared to the 3D T1-weighted dataset (r = 0.38), even though the CNN model was trained exclusively on 3D data. The reason for this discrepancy remains unclear. One possible explanation lies in the distribution of clinical severity within each group: in the 3D cohort, only a few patients had a GMFC score of 6, potentially limiting the range of observable clinical impairment and thus weakening the strength of the association. In contrast, the 2D group included a broader spectrum of disease severity, allowing for a clearer trend. Importantly, the correlation coefficients observed for the 2D data align well with previous studies on demyelination load and clinical function 7 , suggesting that the CNN-based segmentation captures disease burden in a clinically meaningful way even in lower-resolution imaging. Longitudinal performance In longitudinal datasets, both segmentation methods yielded plausible progression patterns, with volumetric changes consistent with natural disease evolution. The CNN approach showed no implausible fluctuations or outliers, supporting its suitability for longitudinal tracking. This aspect is crucial for applications in therapeutic monitoring, particularly in the context of emerging treatments like gene therapy. Clinical implications and future applications The CNN-based segmentation approach offers a promising alternative to conventional methods for quantifying demyelination in MLD. It demonstrated substantially reduced processing time and operator dependency, while maintaining strong concordance with semi-automated pipelines, even when evaluated across heterogeneous MRI sequence types and scanner acquisition parameters.. Particularly in cases with subtle or ambiguous findings—such as early-stage disease or atypical patterns—the CNN model may provide more consistent results, helping to guide clinical interpretation. Furthermore, its applicability to both 3D and 2D T1-weighted scans makes it compatible with a broader range of clinical imaging protocols, including those used in resource-limited settings. Importantly, the observed correlation between demyelination load and motor impairment supports its use as a potential imaging biomarker in future clinical trials or therapeutic monitoring. Given the increasing use of gene therapy in MLD (e.g., arsa-cel), such automated and scalable methods could assist in standardized evaluation of treatment response across centers. Nevertheless, human oversight remains imperative to ensure plausibility of automated results. While the current model was trained on MLD data, the underlying architecture and principles are likely transferable to other leukodystrophies characterized by confluent white matter changes, such as X-linked adrenoleukodystrophy (X-ALD). However, disease-specific adaptations and validations would be required before broader application, as lesion distribution, imaging characteristics, and age-dependent anatomical variability may differ. Limitations: This study has limitations. First, the ground truth was based on a conventional pipeline that itself is not error-free. Thus, segmentation discrepancies should not be interpreted as incorrect predictions per se—even though visual inspection in the very most cases suggested that the CNN model more accurately reflected the true extent of demyelination. Second, although the model performed well across 2D and 3D data, it was not explicitly trained on mixed-modal datasets. Future training using diverse input formats and scanner types may further improve robustness. Third, FLAIR imaging was not included in this study. While FLAIR may improve lesion conspicuity beyond early childhood—when myelination is more advanced—its diagnostic utility in younger individuals is limited by age-dependent myelination effects. In incompletely myelinated brains, the contrast between normal and demyelinated white matter is often reduced on FLAIR, making T2-weighted imaging more reliable for lesion detection. Moreover, FLAIR sequences were not consistently available across all sites and showed greater heterogeneity in acquisition parameters. Our approach therefore focused on routinely acquired T1- and T2-weighted sequences, which are available for practically all patients and form the basis of established semi-quantitative and volumetric scoring methods. Future studies incorporating standardized FLAIR or multi-contrast protocols may further enhance lesion characterization and model generalizability. Fourth, the tissue probability maps used for conventional segmentation were derived from standard adult templates provided by CAT12/SPM12. While these offer robust segmentation in healthy adult populations, they are not specifically tailored to pediatric brains. As a result, anatomical deviations due to age-related differences or disease-induced brain changes in MLD may lead to systematic misclassification of tissue types, which in turn could bias the ground truth. Finally, the cohort size—though substantial for a rare disease—remains limited. Larger, multicenter datasets are needed to validate generalizability and to train next-generation models with improved adaptability across ages and imaging protocols. Conclusion In conclusion, this CNN-based segmentation approach provides a fast, scalable, and clinically relevant method for quantifying demyelination in MLD. It aligns closely with conventional methods, correlates with clinical severity, and is robust across modalities and time points. As precision imaging becomes integral to treatment monitoring in leukodystrophies, such tools will be instrumental in translating advanced MRI into routine clinical care. Declarations Competing Interests J.S., T.C., A.D., A. F., D.S. and J.S. are full-time employees of Clario with no conflicts of interest related to this work. During the preparation of this manuscript, C.J.M. and D.W. were full-time employees and stockholders of Takeda Pharmaceuticals with no other conflicts of interest related to this work. S.G. received institutional research grants from Shire (a Takeda company) and Orchard Therapeutics, and does adviser activities for Clario, Orchard Therapeutics, and Sanofi, without personal payments. P.M., B.B. and H.R. report no conflicts of interest. Author Contribution Samuel Groeschel and Luc Bracoud contributed equally.Pascal Martin made substantial contributions to the conception and design of the work, performed data analysis and interpretation, and drafted the manuscript. Joël Schaerer, Thomas Cajgfinger, Alessandro Delmonte, Alexandre Fusellier, David Scott, and Joyce Suhy developed, implemented, and applied the convolutional neural network (CNN) used in this study; Thomas Cajgfinger additionally prepared Figure 1. Benjamin Bender contributed to imaging data acquisition and evaluation. David Whiteman and C.J. Malanga contributed to data acquisition. Hendrik Rosewich contributed to clinical data interpretation. Samuel Groeschel and Luc Bracoud jointly conceived and supervised the study, contributed to study design and data interpretation, and critically revised the manuscript. All authors revised the work critically and approved the final version to be published and agree to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. References Gieselmann, V., and Krageloh-Mann, I. (2010). Metachromatic leukodystrophy--an update. Neuropediatrics 41, 1-6. Groeschel, S., Kehrer, C., Engel, C., C, I.D., Bley, A., Steinfeld, R., Grodd, W., and Krageloh-Mann, I. (2011). Metachromatic leukodystrophy: natural course of cerebral MRI changes in relation to clinical course. J Inherit Metab Dis 34, 1095-1102. Kehrer, C., Groeschel, S., Kustermann-Kuhn, B., Bürger, F., Köhler, W., Kohlschütter, A., Bley, A., Steinfeld, R., Gieselmann, V., and Krägeloh-Mann, I. (2014). Language and cognition in children with metachromatic leukodystrophy: onset and natural course in a nationwide cohort. Orphanet Journal of Rare Diseases 9, 18. van der Voorn, J.P., Pouwels, P.J., Kamphorst, W., Powers, J.M., Lammens, M., Barkhof, F., and van der Knaap, M.S. (2005). Histopathologic correlates of radial stripes on MR images in lysosomal storage disorders. AJNR Am J Neuroradiol 26, 442-446. Eichler, F., Grodd, W., Grant, E., Sessa, M., Biffi, A., Bley, A., Kohlschuetter, A., Loes, D.J., and Kraegeloh-Mann, I. (2009). Metachromatic leukodystrophy: a scoring system for brain MR imaging observations. AJNR Am J Neuroradiol 30, 1893-1897. Groeschel, S., i Dali, C., Clas, P., Bohringer, J., Duno, M., Krarup, C., Kehrer, C., Wilke, M., and Krageloh-Mann, I. (2012). Cerebral gray and white matter changes and clinical course in metachromatic leukodystrophy. Neurology 79, 1662-1670. Strolin, M., Krageloh-Mann, I., Kehrer, C., Wilke, M., and Groeschel, S. (2017). Demyelination load as predictor for disease progression in juvenile metachromatic leukodystrophy. Ann Clin Transl Neurol 4, 403-410. Clas, P., Groeschel, S., and Wilke, M. (2012). A semi-automatic algorithm for determining the demyelination load in metachromatic leukodystrophy. Acad Radiol 19, 26-34. Tillema, J.M., Derks, M.G., Pouwels, P.J., de Graaf, P., van Rappard, D.F., Barkhof, F., Steenweg, M.E., van der Knaap, M.S., and Wolf, N.I. (2015). Volumetric MRI data correlate to disease severity in metachromatic leukodystrophy. Ann Clin Transl Neurol 2, 932-940. Kim, M.J., Hong, E., Yum, M.S., Lee, Y.J., Kim, J., and Ko, T.S. (2024). Deep learning-based, fully automated, pediatric brain segmentation. Sci Rep 14, 4344. Chacko, A., Schoeman, S., Venkatakrishna, S.S.B., Bolton, S., Shearn, A.I.U., and Andronikou, S. (2023). Caution: shortcomings of traditional segmentation methods from magnetic resonance imaging brain scans intended for 3-dimensional surface modelling in children with pathology. Pediatr Radiol 53, 1854-1862. Chen, Y.T., Huang, Y.C., Chen, H.L., Lo, H.C., Chen, P.C., Yu, C.C., Tu, Y.C., Liu, T.L., and Lin, W.C. (2025). Automatic segmentation of white matter lesions on multi-parametric MRI: convolutional neural network versus vision transformer. BMC Neurol 25, 5. Akkus, Z., Galimzianova, A., Hoogi, A., Rubin, D.L., and Erickson, B.J. (2017). Deep Learning for Brain MRI Segmentation: State of the Art and Future Directions. J Digit Imaging 30, 449-459. Isensee, F., Jaeger, P.F., Kohl, S.A.A., Petersen, J., and Maier-Hein, K.H. (2021). nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nat Methods 18, 203-211. Groeschel, S., Beerepoot, S., Amedick, L.B., Krӓgeloh-Mann, I., Li, J., Whiteman, D.A.H., Wolf, N.I., and Port, J.D. (2024). The effect of intrathecal recombinant arylsulfatase A therapy on structural brain magnetic resonance imaging in children with metachromatic leukodystrophy. J Inherit Metab Dis 47, 778-791. McKinley, R., Wepfer, R., Aschwanden, F., Grunder, L., Muri, R., Rummel, C., Verma, R., Weisstanner, C., Reyes, M., Salmen, A., et al. (2021). Simultaneous lesion and brain segmentation in multiple sclerosis using deep neural networks. Sci Rep 11, 1087. La Rosa, F., Abdulkadir, A., Fartaria, M.J., Rahmanzadeh, R., Lu, P.J., Galbusera, R., Barakovic, M., Thiran, J.P., Granziera, C., and Cuadra, M.B. (2020). Multiple sclerosis cortical and WM lesion segmentation at 3T MRI: a deep learning method based on FLAIR and MP2RAGE. Neuroimage Clin 27, 102335. Cerri, S., Puonti, O., Meier, D.S., Wuerfel, J., Muhlau, M., Siebner, H.R., and Van Leemput, K. (2021). A contrast-adaptive method for simultaneous whole-brain and lesion segmentation in multiple sclerosis. Neuroimage 225, 117471. Additional Declarations Competing interest reported. J.S., T.C., A.D., A. F., D.S. and J.S. are full-time employees of Clario with no conflicts of interest related to this work. During the preparation of this manuscript, C.J.M. and D.W. were full-time employees and stockholders of Takeda Pharmaceuticals with no other conflicts of interest related to this work. S.G. received institutional research grants from Shire (a Takeda company) and Orchard Therapeutics, and does adviser activities for Clario, Orchard Therapeutics, and Sanofi, without personal payments. P.M., B.B. and H.R. report no conflicts of interest. Cite Share Download PDF Status: Published Journal Publication published 21 Apr, 2026 Read the published version in Clinical Neuroradiology → Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-7924598","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":542145303,"identity":"e787eac5-ebf2-4a7e-b376-aea2e19cc306","order_by":0,"name":"Pascal Martin","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABV0lEQVRIie2PMWvCQBTHXwjY5RrXHLH2K1wJKEK/SZccHbootEtxEBoQ4iK4Kor9CumStZEDsxx1FSI0UrCLQ6UdMoj0ElE8CtKx0PyGd3fv3Y//HUBGxp/ldrv4UE8rgJUcdcXe9X9CkqKKEd8r5HcKKM7+QqLsxrJSPgnmHzGBq3x/MmJ3Q6+o+agURbCmHaPpzL/qM9AC+1CptKsmbhOodV+ugfW80MQ+KhPxMNobjFpmgS8AcymG+NUcIKHYXAV26oXUFSk63RDqTqljYIcBmVqSMnlfKGuhPKbKIHxIlSTlOVU2QnmNJGVqldQkxU0VO7TITnF16uCVnaRI3690l6ZRIHrtiauEoXF40WO5e6GYZnc2cgwYM4S59LBy/ma+WtYva0OuvH2iRniuBU0Px1A86/RbCxw3WFEL5O9vF/2gpR7uEQACGQJHUeLj84yMjIx/wTf1cIFrZnGV0wAAAABJRU5ErkJggg==","orcid":"","institution":"University of Tübingen","correspondingAuthor":true,"prefix":"","firstName":"Pascal","middleName":"","lastName":"Martin","suffix":""},{"id":542145308,"identity":"fd6c020d-2e03-45aa-aa49-cab8998a76a1","order_by":1,"name":"Joël Schaerer","email":"","orcid":"","institution":"Clario, Inc. (formerly Bioclinica, Inc.)","correspondingAuthor":false,"prefix":"","firstName":"Joël","middleName":"","lastName":"Schaerer","suffix":""},{"id":542145310,"identity":"73bb0618-debf-42f1-bb14-346cfd9494ee","order_by":2,"name":"Thomas Cajgfinger","email":"","orcid":"","institution":"Clario, Inc. (formerly Bioclinica, Inc.)","correspondingAuthor":false,"prefix":"","firstName":"Thomas","middleName":"","lastName":"Cajgfinger","suffix":""},{"id":542145312,"identity":"62c1c56b-c6f8-4beb-b461-4faf5ee51ebc","order_by":3,"name":"Allesandro Delmonte","email":"","orcid":"","institution":"Clario, Inc. (formerly Bioclinica, Inc.)","correspondingAuthor":false,"prefix":"","firstName":"Allesandro","middleName":"","lastName":"Delmonte","suffix":""},{"id":542145313,"identity":"5ce87e31-800e-4040-bbfe-1bcc42aefa8e","order_by":4,"name":"Benjamin Bender","email":"","orcid":"","institution":"University of Tübingen","correspondingAuthor":false,"prefix":"","firstName":"Benjamin","middleName":"","lastName":"Bender","suffix":""},{"id":542145314,"identity":"11c67419-6361-462a-a8a7-395be476c508","order_by":5,"name":"David Whiteman","email":"","orcid":"","institution":"Takeda Pharmaceutical Company Ltd","correspondingAuthor":false,"prefix":"","firstName":"David","middleName":"","lastName":"Whiteman","suffix":""},{"id":542145315,"identity":"8f98aedc-6920-4239-85d3-7c4c9f0ad8cc","order_by":6,"name":"C.J. Malanga","email":"","orcid":"","institution":"Takeda Pharmaceutical Company Ltd","correspondingAuthor":false,"prefix":"","firstName":"C.J.","middleName":"","lastName":"Malanga","suffix":""},{"id":542145316,"identity":"67a49b99-c1bb-4cd4-8931-315412a47329","order_by":7,"name":"Alexandre Fusellier","email":"","orcid":"","institution":"Clario, Inc. (formerly Bioclinica, Inc.)","correspondingAuthor":false,"prefix":"","firstName":"Alexandre","middleName":"","lastName":"Fusellier","suffix":""},{"id":542145317,"identity":"92c4cb3e-1f26-46ab-bf3e-d77b2007d548","order_by":8,"name":"David Scott","email":"","orcid":"","institution":"Clario, Inc. (formerly Bioclinica, Inc.)","correspondingAuthor":false,"prefix":"","firstName":"David","middleName":"","lastName":"Scott","suffix":""},{"id":542145319,"identity":"f0048d32-855b-4a26-9e46-d15329ff95a4","order_by":9,"name":"Jouyce Suhy","email":"","orcid":"","institution":"Clario, Inc. (formerly Bioclinica, Inc.)","correspondingAuthor":false,"prefix":"","firstName":"Jouyce","middleName":"","lastName":"Suhy","suffix":""},{"id":542145320,"identity":"1797b720-a7a9-42a4-9e2d-b484869d30c8","order_by":10,"name":"Hendrik Rosewich","email":"","orcid":"","institution":"University Children's Hospital Tübingen","correspondingAuthor":false,"prefix":"","firstName":"Hendrik","middleName":"","lastName":"Rosewich","suffix":""},{"id":542145321,"identity":"a1480435-eac1-46db-9447-99227824f8a6","order_by":11,"name":"Samuel Groeschel","email":"","orcid":"","institution":"University Children's Hospital Tübingen","correspondingAuthor":false,"prefix":"","firstName":"Samuel","middleName":"","lastName":"Groeschel","suffix":""},{"id":542145322,"identity":"d7331166-b7a9-401d-b3af-4f49ccbd4472","order_by":12,"name":"Luc Bracoud","email":"","orcid":"","institution":"Clario, Inc. (formerly Bioclinica, Inc.)","correspondingAuthor":false,"prefix":"","firstName":"Luc","middleName":"","lastName":"Bracoud","suffix":""}],"badges":[],"createdAt":"2025-10-22 14:38:12","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-7924598/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-7924598/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1007/s00062-026-01652-6","type":"published","date":"2026-04-21T15:57:33+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":95567263,"identity":"d0641c61-5bf5-44ea-aec2-d3a96f112440","added_by":"auto","created_at":"2025-11-10 16:22:14","extension":"docx","order_by":0,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":2446454,"visible":true,"origin":"","legend":"","description":"","filename":"maindocument.docx","url":"https://assets-eu.researchsquare.com/files/rs-7924598/v1/c16f4e626e084036ee910828.docx"},{"id":95567243,"identity":"8b278384-8f18-4ffa-b293-3ce711ae750c","added_by":"auto","created_at":"2025-11-10 16:22:13","extension":"json","order_by":1,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":14361,"visible":true,"origin":"","legend":"","description":"","filename":"75fd8509f26943b5a65129000e6b5b65.json","url":"https://assets-eu.researchsquare.com/files/rs-7924598/v1/9861f139009d5c2fba891b88.json"},{"id":95655048,"identity":"346043d0-ec44-4185-a4f0-c3999d5bfdd3","added_by":"auto","created_at":"2025-11-11 16:14:02","extension":"xml","order_by":2,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":61991,"visible":true,"origin":"","legend":"","description":"","filename":"75fd8509f26943b5a65129000e6b5b651enriched.xml","url":"https://assets-eu.researchsquare.com/files/rs-7924598/v1/d960cc23ff67fb469be0b9d7.xml"},{"id":95567246,"identity":"11bd7571-d713-4725-acde-0c3ab488e234","added_by":"auto","created_at":"2025-11-10 16:22:13","extension":"png","order_by":4,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":310348,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-7924598/v1/c2bd9ae5b9eb398bb4bf2c60.png"},{"id":95567247,"identity":"33e30b4f-c069-4a5b-a609-e3e7c4bbe0a6","added_by":"auto","created_at":"2025-11-10 16:22:13","extension":"png","order_by":5,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":620343,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-7924598/v1/27e6f6d318a73880f6ef3a38.png"},{"id":95655640,"identity":"4da6fe99-da05-4b88-a0ec-20dd3f53bf35","added_by":"auto","created_at":"2025-11-11 16:16:38","extension":"png","order_by":6,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":76872,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-7924598/v1/a53e88e78f5b0f406535b5bd.png"},{"id":95567264,"identity":"f7dcc7f0-f018-4ace-a2e3-698f31cd3432","added_by":"auto","created_at":"2025-11-10 16:22:14","extension":"png","order_by":7,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":1112698,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-7924598/v1/077c6e5e6f24550728dee368.png"},{"id":95655588,"identity":"85bbcc02-d383-45a8-a738-6dd7c4a19a39","added_by":"auto","created_at":"2025-11-11 16:16:32","extension":"png","order_by":8,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":143077,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage5.png","url":"https://assets-eu.researchsquare.com/files/rs-7924598/v1/96a35825a38178e3ae83d51f.png"},{"id":95567252,"identity":"2b753ca2-698a-4110-a1f5-2a46da9a46f1","added_by":"auto","created_at":"2025-11-10 16:22:13","extension":"png","order_by":9,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":102921,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage6.png","url":"https://assets-eu.researchsquare.com/files/rs-7924598/v1/a024691278e5782e609d06a0.png"},{"id":95654884,"identity":"1daa53af-1cab-466c-bc69-dac20ffca16b","added_by":"auto","created_at":"2025-11-11 16:13:39","extension":"png","order_by":10,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":56442,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-7924598/v1/0d1f0c8acc5078c9744bde14.png"},{"id":95567259,"identity":"91655bb7-cb2e-4c3b-ad84-16f967cbe721","added_by":"auto","created_at":"2025-11-10 16:22:13","extension":"png","order_by":11,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":80799,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-7924598/v1/bf80b55e40b1071f3eef3b88.png"},{"id":95654687,"identity":"4149fb8f-74a7-4382-80ab-340a4a434c49","added_by":"auto","created_at":"2025-11-11 16:12:45","extension":"png","order_by":12,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":17563,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-7924598/v1/11854cf4b54887e1df1c273e.png"},{"id":95654963,"identity":"f78095d0-82e5-40f2-a55a-0ea99a3746dd","added_by":"auto","created_at":"2025-11-11 16:13:54","extension":"png","order_by":13,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":154395,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-7924598/v1/359e583384dad0d5ab1ab782.png"},{"id":95567254,"identity":"5725d652-b896-4794-aed1-a48a0e85f6c0","added_by":"auto","created_at":"2025-11-10 16:22:13","extension":"png","order_by":14,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":23386,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage5.png","url":"https://assets-eu.researchsquare.com/files/rs-7924598/v1/de0207ec437b8e145e1f401c.png"},{"id":95654871,"identity":"00697942-a838-4bb6-8eac-e0736bb11960","added_by":"auto","created_at":"2025-11-11 16:13:29","extension":"png","order_by":15,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":24534,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage6.png","url":"https://assets-eu.researchsquare.com/files/rs-7924598/v1/de5ef3f9ec0f15ed0df752fe.png"},{"id":95567256,"identity":"09c40e19-b887-4547-aad1-06e1bbd94741","added_by":"auto","created_at":"2025-11-10 16:22:13","extension":"xml","order_by":16,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":58085,"visible":true,"origin":"","legend":"","description":"","filename":"75fd8509f26943b5a65129000e6b5b651structuring.xml","url":"https://assets-eu.researchsquare.com/files/rs-7924598/v1/5f5951391ab5aebef3118e18.xml"},{"id":95654950,"identity":"af8c185c-af70-45af-9855-a234a3957eab","added_by":"auto","created_at":"2025-11-11 16:13:54","extension":"html","order_by":17,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":70030,"visible":true,"origin":"","legend":"","description":"","filename":"earlyproof.html","url":"https://assets-eu.researchsquare.com/files/rs-7924598/v1/29980e0a638c627d0384e5b5.html"},{"id":95567242,"identity":"161a3c1f-cbeb-458e-86c2-b0223b22e495","added_by":"auto","created_at":"2025-11-10 16:22:13","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":236327,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003e\u003cstrong\u003eArchitecture of the implemented nnU-Net model for MLD lesion segmentation\u003c/strong\u003e\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003e\u003cem\u003eSchematic overview of the nnU-Net pipeline used for automated demyelination load segmentation. The proposed model is based on a 3D U-Net architecture processing input patches of size 64x192x160 voxels. Co-registered and skull-stripped images are concatenated and used as input to the first layer of the model (dark gray). The 6-level encoder-decoder module with skip connections (dotted arrows) enforces multi-resolution image feature learning and it was trained using a weighted Dice and cross-entropy loss function. Each level implements a series of convolutions with a 3x3x3 kernel, normalization and LeakyReLU activations. The final softmax layer (light gray) assign voxel-wise probabilities to each of the three output classes, including MLD lesions, healthy white matter and gray matter. Joint learning of lesion and healthy brain tissue proved beneficial in model convergence. The final MLD lesion mask is computed and binarized from post-processed softmax probabilities.\u003c/em\u003e\u003c/p\u003e","description":"","filename":"floatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-7924598/v1/975e736cbf651a778517e694.png"},{"id":95567245,"identity":"4276ce7c-1f3d-44ba-97a3-c60b8ba79feb","added_by":"auto","created_at":"2025-11-10 16:22:13","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":266962,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003e\u003cstrong\u003ePerformance of the CNN-based lesion segmentation\u003c/strong\u003e\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003e\u003cem\u003e(A) Distribution of Dice coefficients across all 3D T1-aquised test cases (n = 73) for the automated CNN-based segmentation of demyelinating lesions. The width of each violin corresponds to the frequency of Dice scores; dashed lines indicate the median (0.82) and interquartile range.\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003e\u003cem\u003e(B) Bland-Altman diagram depicting agreement between CNN-based and conventional segmentation methods for estimating demyelination load. Each point represents the difference in demyelination load estimates plotted against their mean. For 3D scans, 6 out of 73 cases (8.2%) fell outside the 95% limits of agreement, with the CNN-based model yielding slightly higher demyelination load (median difference: +0.0115 ). No systematic bias between the two methods was observed. Outliers predominantly occurred in scans with low lesion burden.\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003e\u003cem\u003e(C) Representative segmentation example with a very good dice coefficient of 0.91 (left - Original T2-weighted image; middle - conventional segmentation of demyelination load projected onto the T2 image (turquoise); right - CNN-based segmentation of demyelination load (orange)\u003c/em\u003e\u003c/p\u003e","description":"","filename":"floatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-7924598/v1/92b49c0fc18ff54bf87b9d72.png"},{"id":95567241,"identity":"5f0257da-bd12-4197-8d07-56c4985fd852","added_by":"auto","created_at":"2025-11-10 16:22:13","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":81856,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003e\u003cstrong\u003eCorrelation between demyelination load and GMFC score and longitudinal consistency of demyelination load changes across consecutive scans\u003c/strong\u003e\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003e\u003cem\u003eA: Scatter plots illustrate the relationship between clinical motor impairment (GMFC score at time of MRI) and demyelination load derived from CNN-based (blue circles) and conventional (red squares) segmentation methods. Shaded areas represent the respective 95% confidence intervals. Demyelination load correlated significantly with GMFC score for both methods, with stronger association for CNN-based segmentation (Spearman’s r = 0.38, p \u0026lt; 0.001) than conventional segmentation (r = 0.26, p = 0.025). B and C: Box-and-whisker plots show the distribution of volumetric changes in demyelination load between consecutive MRI time points across all patients with multiple scans. Each box represents pooled difference values from several patients, with each data point corresponding to one interval between two consecutive scans of a single patient. Panel A displays results from the CNN-based segmentation, Panel B from the conventional method. The plotted values reflect the difference between value at later time point minus that at earlier time point, resulting in predominantly negative values, as demyelination load typically increased over time. This trend is consistent with disease progression in MLD. Both segmentation approaches demonstrated plausible and stable behavior across time points, with only a few outliers, supporting their suitability for longitudinal tracking.\u003c/em\u003e\u003c/p\u003e","description":"","filename":"floatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-7924598/v1/b8c03c4a8b64231eeb019832.png"},{"id":95567249,"identity":"b812c05c-ac89-4370-9122-5410eac113aa","added_by":"auto","created_at":"2025-11-10 16:22:13","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":468353,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003e\u003cstrong\u003eRepresentative cases with low and high Dice coefficients illustrating differences between CNN-based and conventional segmentation\u003c/strong\u003e\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003e\u003cem\u003eExamples A–E illustrate cases with high (A) and low (B–E) Dice coefficients between conventional and CNN-based segmentation of T2-hyperintense white matter (demyelination load). Each case is presented in three columns: left – axial T2-weighted image, middle – conventional segmentation of demyelination load projected onto the T2 image (turquoise) - right – CNN-based segmentation overlaid projected onto the T2 image (orange).\u003c/em\u003e\u003c/p\u003e\n\u003cul\u003e\n \u003cli\u003e\u003cem\u003e\u003cstrong\u003e(A)\u003c/strong\u003e\u003c/em\u003e\u003cem\u003e\u0026nbsp;Depicts a case with the highest Dice coefficient (0.92), showing close visual correspondence between CNN-based and conventional segmentation, with no evident discrepancies.\u003c/em\u003e\u003c/li\u003e\n \u003cli\u003e\u003cem\u003e\u003cstrong\u003e(B)\u003c/strong\u003e\u003c/em\u003e\u003cem\u003e\u0026nbsp;A case with ventricle deformation and relatively low-resolution T2 imaging, where the extent of frontal white matter abnormalities is underestimated in the conventional segmentation.\u003c/em\u003e\u003c/li\u003e\n \u003cli\u003e\u003cem\u003e\u003cstrong\u003e(C)\u003c/strong\u003e\u003c/em\u003e\u003cem\u003e\u0026nbsp;Demonstrates undersegmentation by the conventional method near the frontal horn of the lateral ventricle, where adjacent hyperintensities are incompletely captured.\u003c/em\u003e\u003c/li\u003e\n \u003cli\u003e\u003cem\u003e\u003cstrong\u003e(D)\u003c/strong\u003e\u003c/em\u003e\u003cem\u003e\u0026nbsp;Shows spurious lesion detection in the conventional segmentation, especially in periventricular regions and the corpus callosum, which are not supported by visual T2 hyperintensities.\u003c/em\u003e\u003c/li\u003e\n \u003cli\u003e\u003cem\u003e\u003cstrong\u003e(E)\u003c/strong\u003e\u003c/em\u003e\u003cem\u003e\u0026nbsp;The case with the lowest Dice coefficient, obtained in a patient aged 1.4 years. Conventional segmentation overestimates lesion volume, likely due to misclassification of incompletely myelinated regions as lesions. Panel E-4 (same patient at age 3.4 years) shows nearly complete myelination with only subtle parietal T2-hyperintensities, likely reflecting early pathology. On the other hand, the CNN-based segmentation reveals a misclassification of gray matter as white matter lesion.\u003c/em\u003e\u003c/li\u003e\n\u003c/ul\u003e","description":"","filename":"floatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-7924598/v1/e92dc9ee1ff8c6731524eddf.png"},{"id":95567250,"identity":"42b90ae5-9d07-4e9b-8d47-5f960780a4d8","added_by":"auto","created_at":"2025-11-10 16:22:13","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":147471,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003e\u003cstrong\u003eDice coefficient between CNN-based and conventional segmentations across clinical and radiological parameters\u003c/strong\u003e\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003e\u003cem\u003eComparison of segmentation overlap (Dice coefficient) between the CNN-based model and conventional SPM-based segmentation for T2-hyperintense white matter (demyelination load) across different clinical and radiological parameters. Each subplot shows the Dice coefficient as a function of one of the following variables: (A) patient age at scan, (B) CNN-based demyelination load, (C) GMFC score at scan time, and (D) MLD MRI severity score. Higher Dice values indicate better agreement. Lower segmentation concordance was observed in younger patients and in those with lower lesion burden, suggesting greater variability or structural immaturity in early disease stages.\u003c/em\u003e\u003c/p\u003e","description":"","filename":"floatimage5.png","url":"https://assets-eu.researchsquare.com/files/rs-7924598/v1/a4f641165f4b41239447090b.png"},{"id":95656174,"identity":"febe2ba2-749e-443e-ba1d-7498e647a5a1","added_by":"auto","created_at":"2025-11-11 16:17:56","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":148161,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003e\u003cstrong\u003ePerformance of CNN vs. conventional segmentation on 2D T1-weighted images\u003c/strong\u003e\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e(A)\u003c/strong\u003e Violin plot of Dice coefficients for CNN-based and conventional lesion segmentation. The width of the violin reflects the frequency of values; the dashed lines mark the 25th, 50th (median), and 75th percentiles. The lower quartile “tail” illustrates that, while most cases achieve high overlap (\u0026gt;0.8), a subset shows markedly lower agreement.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e(B)\u003c/strong\u003e Scatter plot of demyelination load versus GMFC-MLD score, comparing CNN-based (blue circles, blue regression line) and conventional (red squares, red regression line) segmentations. The CNN method shows slightly higher correlation with increasing disability levels.\u003cbr\u003e\n \u003cstrong\u003e(C)\u003c/strong\u003e Bland–Altman analysis showed high agreement with 2 datapoints falling out of the limits of agreement (3.6 %). The CNN-based segmentation selected slightly less voxels as lesion compared to the conventional approach (median - 0.0065).\u003c/p\u003e","description":"","filename":"floatimage6.png","url":"https://assets-eu.researchsquare.com/files/rs-7924598/v1/b51a287e565ffac9602b9e14.png"},{"id":107928179,"identity":"31088465-0895-4631-8968-5fd54a26fb7a","added_by":"auto","created_at":"2026-04-27 16:08:50","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1641899,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7924598/v1/7b4a83eb-f5b8-4ad7-8013-5be7a2968391.pdf"}],"financialInterests":"Competing interest reported. J.S., T.C., A.D., A. F., D.S. and J.S. are full-time employees of Clario with no conflicts of interest related to this work. During the preparation of this manuscript, C.J.M. and D.W. were full-time employees and stockholders of Takeda Pharmaceuticals with no other conflicts of interest related to this work. S.G. received institutional research grants from Shire (a Takeda company) and Orchard Therapeutics, and does adviser activities for Clario, Orchard Therapeutics, and Sanofi, without personal payments. P.M., B.B. and H.R. report no conflicts of interest.","formattedTitle":"Automated Deep Learning–Based Demyelination Load Segmentation in Metachromatic Leukodystrophy","fulltext":[{"header":"Introduction","content":"\u003cp\u003eMetachromatic leukodystrophy (MLD) is a rare autosomal-recessive lysosomal storage disorder caused by mutations in the \u003cem\u003eARSA\u003c/em\u003e gene, leading to deficient arylsulfatase A activity and resulting in sulfatide accumulation. This pathological cascade culminates in progressive demyelination of central and peripheral nervous systems, clinically presenting with motor and cognitive decline \u003csup\u003e1\u0026ndash;3\u003c/sup\u003e.\u003c/p\u003e\u003cp\u003eMRI plays a pivotal role in diagnosing and monitoring MLD. T2-weighted hyperintensities, often exhibiting a tigroid pattern, are the hallmark imaging feature\u003csup\u003e2; 4\u003c/sup\u003e. To quantify disease burden, several imaging-based approaches have been developed. The most straightforward method is the MLD MRI severity score\u003csup\u003e5\u003c/sup\u003e, a visual rating scale assessing the extent of white matter lesions and global brain atrophy across predefined regions. While widely used in clinical routine, this approach remains subject to inter-rater variability and lacks more detailed quantification of T2-hyperintensities. To reduce subjectivity and enable objective, quantitative assessment, the \u0026ldquo;demyelination load\u0026rdquo; was introduced\u003csup\u003e6\u003c/sup\u003e. This method semi-automatically quantifies the proportion of T2-hyperintense white matter relative to total brain volume, allowing reproducible estimation of disease burden validated in landmark studies \u003csup\u003e2; 7\u0026ndash;9\u003c/sup\u003e and has become a reference standard in MLD imaging.\u003c/p\u003e\u003cp\u003eThe demyelination load method relies on tissue segmentation using standard MRI volumetry pipelines such as SPM12 and CAT12 to generate tissue probability maps. These maps are then used to define white matter regions, within which abnormal T2 signal intensity is segmented and quantified. While robust, the method carries limitations: segmentation accuracy may be reduced in a disease affected pediatric population due to anatomical variability and atrophy, brain maturation effects, scanner-dependent image quality. Furthermore, conventional segmentation algorithms typically rely on adult-derived tissue probability maps, which may inadequately represent age-specific anatomical features; this discrepancy, compounded by disease-related structural alterations, further undermines tissue classification accuracy\u003csup\u003e10; 11\u003c/sup\u003e. Thus, while the technique is semi-automated and reproducible, it remains constrained by methodological assumptions.\u003c/p\u003e\u003cp\u003eTo overcome these limitations and improve efficiency, machine learning approaches\u0026mdash;particularly deep learning\u0026mdash;have emerged as powerful tools for white matter lesion segmentation. Previous work has demonstrated their utility in adult populations with multiple sclerosis or cerebral small vessel disease, achieving high accuracy and reliability \u003csup\u003e12\u0026ndash;14\u003c/sup\u003e. However, applications in pediatric leukodystrophies, and in particular in MLD, are missing.\u003c/p\u003e\u003cp\u003eIn this study, we trained a convolutional neural network (CNN) on a curated dataset of MLD patients for whom demyelination load was previously calculated using high-resolution 3D T1-weighted images and axial T2-weighted sequences. The ground truth segmentations were derived using the established semi-automated pipeline with strict visual quality control. We then tested the model on an independent dataset including both high-resolution 3D and 2D T1-weighted images, each paired with T2 images, to assess generalizability. Evaluation metrics included Dice coefficient (DC) and Bland-Altman analysis, alongside correlation with clinical measure of motor symptoms and longitudinal evaluation. Our goal was to assess whether this CNN-based method can serve as a robust alternative to the conventional pipeline, facilitating more efficient and standardized lesion quantification in MLD.\u003c/p\u003e"},{"header":"Materials and Methods","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e\u003ch2\u003eStudy Design and Cohorts\u003c/h2\u003e\u003cp\u003eThis study followed a two-phase design involving a training and a test cohort in MLD patients. MLD diagnosis was confirmed based on deficient arylsulfatase A (ARSA) enzyme activity, elevated urinary sulfatides, clinical presentation, and (where available) pathogenic mutations in the ARSA gene. Ethical approval was obtained from the local Ethics Committee (reference 401/2005), and written informed consent was provided by patients or legal guardians.\u003c/p\u003e\u003c/div\u003e\n\u003ch3\u003eImaging Data and Preprocessing\u003c/h3\u003e\n\u003cp\u003eAll MRI scans included both T1-weighted and T2-weighted sequences. Data were acquired across multiple clinical sites using scanners from different manufacturers (GE, Philips, and Siemens) and at field strengths of 1.5 T and 3 T, reflecting real-world heterogeneity in acquisition protocols. This diversity was considered advantageous for developing a robust and generalizable segmentation approach.\u003c/p\u003e\u003cp\u003eVisual quality control (QC) was applied to all images. T1- and T2-weighted scans were manually inspected for motion artifacts and insufficient brain coverage. Scans with substantial technical limitations were excluded.\u003c/p\u003e\n\u003ch3\u003eConventional Demyelination Load Segmentation\u003c/h3\u003e\n\u003cp\u003eThe reference standard for both CNN training and evaluation was derived using a semi-automated lesion segmentation method validated in previous studies\u003csup\u003e6; 7\u003c/sup\u003e. The pipeline was applied to both training and test data as follows:\u003c/p\u003e\u003cp\u003e\u003cul\u003e\u003cli\u003e\u003cp\u003eT2-weighted images were rigidly co-registered to corresponding T1-weighted images.\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003eTissue segmentation was performed using the CAT12 toolbox within SPM12, generating grey matter, white matter, and CSF probability maps based on adult tissue priors.\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003eWhite matter abnormalities were identified by modeling the T2 intensity distribution within the white matter mask using Gaussian mixture modeling.\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003eLesion voxels were defined by thresholding at the intersection of Gaussian components and refined using a Markov Random Field (MRF) algorithm to reduce noise.\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003eVoxels in clusters\u0026thinsp;\u0026lt;\u0026thinsp;200 were excluded to minimize false positives.\u003c/p\u003e\u003c/li\u003e\u003c/ul\u003e\u003c/p\u003e\u003cp\u003eAll resulting segmentations were visually inspected. Scans with incorrect or anatomically implausible masks were excluded from CNN training.\u003c/p\u003e\n\u003ch3\u003eCNN-Based Segmentation\u003c/h3\u003e\n\u003cp\u003eA 3D U-Net model based on the nnUnet framework\u003csup\u003e12\u003c/sup\u003e was trained to jointly segment MLD white matter lesions, gray matter, healthy white matter and cerebrospinal fluid (CSF). The architecture follows an encoder-decoder architecture with six layers combining a series of 3D convolutions, instance normalization and LeakyReLU activation layers (see Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). Co-registered T1 and T2-weighted images were used as input to the model after resampling 0.5x0.5x1.2 mm and random 3D patch cropping with size 64x192x160 vox.\u003c/p\u003e\u003cp\u003eThe model was trained using 5-fold cross-validation, implementing extensive data augmentation to address the limited dataset size. Random data augmentation included flipping, rotation, scaling and intensity-based transformations. All folds were trained on a NVIDIA A10G GPU using the Adam optimizer with a batch size of 2.\u003c/p\u003e\u003cp\u003e. Predictions from the five cross-validated models were ensembled, averaged and applied to the independent test set (Group 2), which included both high-resolution 3D T1 and lower resolution 2D T1 images. The ability of the CNN to generalize across scan types was a critical component of the evaluation.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003e\u003cem\u003eSchematic overview of the nnU-Net pipeline used for automated demyelination load segmentation. The proposed model is based on a 3D U-Net architecture processing input patches of size 64x192x160 voxels. Co-registered and skull-stripped images are concatenated and used as input to the first layer of the model (dark gray). The 6-level encoder-decoder module with skip connections (dotted arrows) enforces multi-resolution image feature learning and it was trained using a weighted Dice and cross-entropy loss function. Each level implements a series of convolutions with a 3x3x3 kernel, normalization and LeakyReLU activations. The final softmax layer (light gray) assign voxel-wise probabilities to each of the three output classes, including MLD lesions, healthy white matter and gray matter. Joint learning of lesion and healthy brain tissue proved beneficial in model convergence. The final MLD lesion mask is computed and binarized from post-processed softmax probabilities.\u003c/em\u003e\u003c/p\u003e\u003cp\u003eFollowing each segmentation process, the output maps generated by both the CNN and the conventional method were reviewed to ensure technical validity. Specifically, the alignment of T1 and T2 images, the plausibility of grey and white matter classification, and the anatomical consistency of lesion masks were assessed. Outputs with clear technical errors in the conventional segmentation approach (e.g., misregistration or incomplete segmentation) were excluded from subsequent analyses. Importantly, this procedure did not involve excluding cases based on prediction quality, but solely to remove technically invalid results that would preclude meaningful evaluation.\u003c/p\u003e\n\u003ch3\u003ePatient Cohort\u003c/h3\u003e\n\u003cp\u003eThis study included two independent datasets: a training dataset and a test dataset. To avoid overlap of patients between the training and test groups, MRI scans were recruited from separate projects. Specifically, the training cohort consisted of data from a previously published study \u003csup\u003e15\u003c/sup\u003e, whereas the test dataset was collected independently at the Department of Diagnostic and Interventional Neuroradiology, University Hospital T\u0026uuml;bingen. Due to this approach, although similar, excact matching for age or gender between the two datasets could not be ensured.\u003c/p\u003e\u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003eTraining Dataset\u003c/span\u003e: Initially, 221 MRI scans from 44 individual patients (20 female; median age 4.3 years, range 1.7\u0026ndash;10.9 years) were included. Following rigorous visual quality control (QC), 189 MRI scans from 32 patients remained (14 female; median age 4.2 years, range 1.7\u0026ndash;10.9 years; mean number of scans per patient: 4). For model training, a particularly stringent QC threshold was applied to ensure optimal input quality and minimize noise in the ground truth segmentations. Both T1- and T2-weighted images were required to be free of relevant motion artifacts, as high-quality, co-registered input from both modalities was essential for subsequent automated processing. In some cases or in settings where sedation was minimized for clinical reasons, residual motion led to consistently reduced image quality. Consequently, all scans from certain subjects had to be excluded when neither scan met the predefined quality standards. These exclusions reflect the practical challenges of pediatric MRI acquisition rather than site-specific shortcomings and ensured that only technically reliable data contributed to model training.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003eTest Dataset\u003c/span\u003e: The test dataset initially consisted of 196 MRI scans from 49 individual patients (23 female; median age 13.0 years, range 1.4\u0026ndash;33.4 years). Following visual quality control, 128 MRI scans from 42 individual patients remained (26 female; median age 12.4 years, range 1.5\u0026ndash;33.7 years; mean number of scans per patient: 4).\u003c/p\u003e\u003cp\u003eAn overview is presented in Table \u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e.\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003eOverview of the patient cohorts included in the training and test datasets before and after visual quality control (QC).\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"6\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u003cp\u003eDataset\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003eInitial scans (patients)\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u003cp\u003eAfter QC scans (patients)\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c4\"\u003e\u003cp\u003eFemale\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c5\"\u003e\u003cp\u003eMedian age (range)\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c6\"\u003e\u003cp\u003eMean scans per patient\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eTraining dataset\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e221 (44)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e189 (32)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e14\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e4.2 (1.7\u0026ndash;10.9)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e4\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eTest dataset\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e196 (49)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e128 (42)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e26\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e12.4 (1.5\u0026ndash;33.7)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e3\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003eThe test dataset was further subdivided into two subgroups based on imaging protocols:\u003c/p\u003e\u003cp\u003e\u003cul\u003e\u003cli\u003e\u003cp\u003e73 scans from 31 individual patients (14 female, median age 13.5 years, age range 1.5\u0026ndash;33.4 years) with high-resolution 3D T1-weighted gradient-echo (GRE)sequences (median resolution 1 \u0026times; 0.97 \u0026times; 1 mm\u0026sup3;) and matching T2 axial images (median resolution 0.5 \u0026times; 0.5 \u0026times; 3.3 mm\u0026sup3;) - hereafter referred to as 3D T1-based group. These 3D gradient-echo acquisitions represent the high-resolution standard typically used in research and advanced clinical protocols and provide a consistent imaging basis for quantitative analysis.\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003e57 scans from 24 individual patients (12 female, median age 9.7 years, age range 2.0\u0026ndash;23.1 years) with lower-resolution 2D T1-weighted turbo spin echo (TSE) sequences (median resolution 0.66 \u0026times; 0.74 \u0026times; 5 mm\u0026sup3;) and T2-weighted axial images (median resolution 0.5 \u0026times; 0.5 \u0026times; 5.3 mm\u0026sup3;) - hereafter referred to as 2D T1-based group. The inclusion of T1-weighted TSE sequences, as opposed to the 3D gradient-echo acquisitions used in the other subgroup, reflects real-world variability in clinical MRI protocols and allows assessment of model generalizability across distinct acquisition settings.\u003c/p\u003e\u003c/li\u003e\u003c/ul\u003e\u003c/p\u003e\u003cp\u003eImportantly, the CNN was trained only on 3D MPRAGE data but tested on both 3D and 2D T1-weighted inputs to assess generalizability.\u003c/p\u003e\u003cdiv id=\"Sec8\" class=\"Section2\"\u003e\u003ch2\u003eEvaluation Strategy\u003c/h2\u003e\u003cp\u003eTo comprehensively assess CNN performance, we applied both quantitative and qualitative metrics. The conventional method was applied to the test cohort using the same pipeline described above. Comparative analyses were performed on the CNN and conventional segmentations:\u003c/p\u003e\u003cp\u003e\u003col\u003e\u003cspan\u003e\u003cli\u003e\u003cp\u003eSpatial Overlap: Dice coefficient (DC) was calculated to measure spatial agreement between CNN and conventional outputs.\u003c/p\u003e\u003c/li\u003e\u003c/span\u003e\u003cspan\u003e\u003cli\u003e\u003cp\u003eVolume Agreement: Bland-Altman analysis was used to assess volumetric differences between methods.\u003c/p\u003e\u003c/li\u003e\u003c/span\u003e\u003cspan\u003e\u003cli\u003e\u003cp\u003e Outlier Analysis: Dice and Bland-Altman outliers were reviewed visually. These cases were evaluated across scanners, age groups, and anatomical variance (e.g., unmyelinated regions or enlarged ventricles) to determine which segmentation method better handled these conditions.\u003c/p\u003e\u003c/li\u003e\u003c/span\u003e\u003cspan\u003e\u003cli\u003e\u003cp\u003eClinical Correlation: Lesion volumes were correlated with Gross Motor Function Classification for MLD (GMFC-MLD) using Spearman\u0026rsquo;s rank correlation.\u003c/p\u003e\u003c/li\u003e\u003c/span\u003e\u003cspan\u003e\u003cli\u003e\u003cp\u003eLongitudinal Stability: For patients with multiple scans, lesion volume change between consecutive timepoints was measured to evaluate consistency and plausibility.\u003c/p\u003e\u003c/li\u003e\u003c/span\u003e\u003c/ol\u003e\u003c/p\u003e\u003cp\u003eFor the calculation of the \u003cem\u003edemyelination load\u003c/em\u003e (white matter lesion volume divided by total brain volume), the total brain volume was defined as the sum of gray matter and total white matter (normal-appearing white matter and white matter lesion volumes). Importantly, total brain volume was derived separately within each segmentation pipeline\u0026mdash;using SPM-based tissue maps for the conventional approach and CNN-derived tissue classes for the deep learning approach\u0026mdash;to ensure internal consistency of both methods.\u003c/p\u003e\u003cp\u003eThis multi-pronged strategy allowed evaluation not only of segmentation accuracy but also of clinical relevance and robustness to structural variability, especially in a pediatric rare disease context.\u003c/p\u003e\u003c/div\u003e"},{"header":"Results","content":"\u003cdiv id=\"Sec10\" class=\"Section2\"\u003e\u003ch2\u003eSegmentation Performance\u003c/h2\u003e\u003cp\u003eThe CNN-based model demonstrated good agreement with the conventional demyelination load segmentation method. The median Dice Coefficient (DC) across all 73 3D T1-based scans DC reached 0.82 (SD\u0026thinsp;=\u0026thinsp;0.21). As shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e, the violin plot of Dice coefficients is tightly concentrated in the 0.8\u0026ndash;0.9 range, reflecting consistently high overlap between CNN and conventional segmentations for the vast majority of cases. A small number of points extend well below this band, however, representing rare outliers (Dice scores down toward 0.2\u0026ndash;0.4), which were evaluated visually in the outlier analysis.\u003c/p\u003e\u003cp\u003eBland-Altman analysis demonstrated low bias between the two segmentation methods. For 3D T1- images, 6 out of 73 cases (8.2%) fell outside the limits of agreement with CNN-based model estimating a slightly higher demyelination load than the conventional segmentation method (median\u0026thinsp;+\u0026thinsp;0.0115).\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003e\u003cem\u003e(A) Distribution of Dice coefficients across all 3D T1-aquised test cases (n\u0026thinsp;=\u0026thinsp;73) for the automated CNN-based segmentation of demyelinating lesions. The width of each violin corresponds to the frequency of Dice scores; dashed lines indicate the median (0.82) and interquartile range.\u003c/em\u003e\u003c/p\u003e\u003cp\u003e\u003cem\u003e(B) Bland-Altman diagram depicting agreement between CNN-based and conventional segmentation methods for estimating demyelination load. Each point represents the difference in demyelination load estimates plotted against their mean. For 3D scans, 6 out of 73 cases (8.2%) fell outside the 95% limits of agreement, with the CNN-based model yielding slightly higher demyelination load (median difference: +0.0115 ). No systematic bias between the two methods was observed. Outliers predominantly occurred in scans with low lesion burden.\u003c/em\u003e\u003c/p\u003e\u003cp\u003e\u003cem\u003e(C) Representative segmentation example with a very good dice coefficient of 0.91 (left - Original T2-weighted image; middle - conventional segmentation of demyelination load projected onto the T2 image (turquoise); right - CNN-based segmentation of demyelination load (orange)\u003c/em\u003e\u003c/p\u003e\u003cp\u003eDemyelination load derived from the CNN model correlated significantly with clinical severity (see Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e). GMFC scores at the time of MRI demonstrated a significant positive association with lesion load as identified by the CNN-based model (r\u003csub\u003eS\u003c/sub\u003e = 0.38, p\u0026thinsp;\u0026lt;\u0026thinsp;0.001), suggesting that higher motor impairment corresponded to a greater extent of demyelination. The conventional segmentation yielded a statistically significant correlation as well, which was somewhat weaker than that of the CNN-based segmentation (r\u003csub\u003eS\u003c/sub\u003e = 0.26, p\u0026thinsp;\u0026lt;\u0026thinsp;0.025); however, the difference did not reach statistical significance (Williams' test: t\u0026thinsp;=\u0026thinsp;1.81, p\u0026thinsp;=\u0026thinsp;0.075).\u003c/p\u003e\u003cp\u003eIn patients with multiple longitudinal datasets, the median change in demyelination load between consecutive time points showed a slight progression consistent with the expected disease course. This trend was observed in both the CNN-based and the conventional segmentation approaches. Apart from a few outliers, no systematic implausibilities were detected in either method.\u003c/p\u003e\u003cp\u003e\u003cspan type=\"BoldUnderline\" class=\"BoldUnderline\" name=\"Emphasis\"\u003eCorrelation with clinical parameters and longitudinal consistency\u003c/span\u003e\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003e\u003cem\u003eA: Scatter plots illustrate the relationship between clinical motor impairment (GMFC score at time of MRI) and demyelination load derived from CNN-based (blue circles) and conventional (red squares) segmentation methods. Shaded areas represent the respective 95% confidence intervals. Demyelination load correlated significantly with GMFC score for both methods, with stronger association for CNN-based segmentation (Spearman\u0026rsquo;s r\u0026thinsp;=\u0026thinsp;0.38, p\u0026thinsp;\u0026lt;\u0026thinsp;0.001) than conventional segmentation (r\u0026thinsp;=\u0026thinsp;0.26, p\u0026thinsp;=\u0026thinsp;0.025). B and C: Box-and-whisker plots show the distribution of volumetric changes in demyelination load between consecutive MRI time points across all patients with multiple scans. Each box represents pooled difference values from several patients, with each data point corresponding to one interval between two consecutive scans of a single patient. Panel A displays results from the CNN-based segmentation, Panel B from the conventional method. The plotted values reflect the difference between value at later time point minus that at earlier time point, resulting in predominantly negative values, as demyelination load typically increased over time. This trend is consistent with disease progression in MLD. Both segmentation approaches demonstrated plausible and stable behavior across time points, with only a few outliers, supporting their suitability for longitudinal tracking.\u003c/em\u003e\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec11\" class=\"Section2\"\u003e\u003ch2\u003eOutlier Analysis\u003c/h2\u003e\u003cp\u003eQualitative review of individual cases with low Dice scores or outliers in the Bland-Altman diagram revealed that discrepancies clustered in few individuals where misclassification arose from partial volume effects especially near the ventricles/CSF, overinterpretation of incomplete myelination in young patients or altered anatomy with expanded ventricles or advanced atrophy. In all cases, the CNN approach better matched the visual extent of demyelination, particularly in scans with lower contrast. In cases of extensive white matter alterations, both methods were able to accurately capture the extent of the changes, demonstrating a high DC and strong visual concordance (see Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e). Low values of each demyelination load, MLD MRI severity score, age and GMFC were associated with lower agreement (see Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e).\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003e\u003cem\u003eExamples A\u0026ndash;E illustrate cases with high (A) and low (B\u0026ndash;E) Dice coefficients between conventional and CNN-based segmentation of T2-hyperintense white matter (demyelination load). Each case is presented in three columns: left \u0026ndash; axial T2-weighted image, middle \u0026ndash; conventional segmentation of demyelination load projected onto the T2 image (turquoise) - right \u0026ndash; CNN-based segmentation overlaid projected onto the T2 image (orange).\u003c/em\u003e\u003c/p\u003e\u003cp\u003e\u003cul\u003e\u003cli\u003e\u003cp\u003e\u003cb\u003e(A)\u003c/b\u003e \u003cem\u003eDepicts a case with the highest Dice coefficient (0.92), showing close visual correspondence between CNN-based and conventional segmentation, with no evident discrepancies.\u003c/em\u003e\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003e\u003cb\u003e(B)\u003c/b\u003e \u003cem\u003eA case with ventricle deformation and relatively low-resolution T2 imaging, where the extent of frontal white matter abnormalities is underestimated in the conventional segmentation.\u003c/em\u003e\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003e\u003cb\u003e(C)\u003c/b\u003e \u003cem\u003eDemonstrates undersegmentation by the conventional method near the frontal horn of the lateral ventricle, where adjacent hyperintensities are incompletely captured.\u003c/em\u003e\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003e\u003cb\u003e(D)\u003c/b\u003e \u003cem\u003eShows spurious lesion detection in the conventional segmentation, especially in periventricular regions and the corpus callosum, which are not supported by visual T2 hyperintensities.\u003c/em\u003e\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003e\u003cb\u003e(E)\u003c/b\u003e \u003cem\u003eThe case with the lowest Dice coefficient, obtained in a patient aged 1.4 years. Conventional segmentation overestimates lesion volume, likely due to misclassification of incompletely myelinated regions as lesions. Panel E-4 (same patient at age 3.4 years) shows nearly complete myelination with only subtle parietal T2-hyperintensities, likely reflecting early pathology. On the other hand, the CNN-based segmentation reveals a misclassification of gray matter as white matter lesion.\u003c/em\u003e\u003c/p\u003e\u003c/li\u003e\u003c/ul\u003e\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003e\u003cem\u003eComparison of segmentation overlap (Dice coefficient) between the CNN-based model and conventional SPM-based segmentation for T2-hyperintense white matter (demyelination load) across different clinical and radiological parameters. Each subplot shows the Dice coefficient as a function of one of the following variables: (A) patient age at scan, (B) CNN-based demyelination load, (C) GMFC score at scan time, and (D) MLD MRI severity score. Higher Dice values indicate better agreement. Lower segmentation concordance was observed in younger patients and in those with lower lesion burden, suggesting greater variability or structural immaturity in early disease stages.\u003c/em\u003e\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec12\" class=\"Section2\"\u003e\u003ch2\u003eComparison Across Modalities\u003c/h2\u003e\u003cp\u003eDespite not being trained on 2D T1-weighted sequences, the CNN model performed in at least comparable accordance to the conventional segmentation. As expected the overall performance of both tested segmentation methods was less heterogeneous in the 2D T1-group than in the 3D T1-group. The mean dice coefficient was lower with 0.75 and standard deviation was higher with 0.32 (compared to 0.82 +/- 0.2 in 3D T1). The 2D T1-group thus differed significantly from the 3D T1-group (p\u0026thinsp;\u0026lt;\u0026thinsp;0.05). The violin plot in Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003eA demonstrates a vast majority of well according datasets, yet compared to Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eA more data points fall in the lower DC range. Comparable to the results from 3D T1, no systematic over- or underestimation of lesion volume by the CNN-based model was observed in the Bland-Altman-Diagram (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003eC). Regarding the Dice Coefficient, most outliers corresponded to scans with small volumes of T2-hyperintense white matter. Correlation with clinical parameter GMFC showed a strong positive correlation observed between GMFC scores and lesion volume derived from the CNN-based segmentation (r\u003csub\u003eS\u003c/sub\u003e = 0.68, p\u0026thinsp;\u0026lt;\u0026thinsp;0.001). While conventional segmentation yielded a similarly significant result, the strength of the relationship was moderately reduced (r\u003csub\u003eS\u003c/sub\u003e = 0.57, p\u0026thinsp;\u0026lt;\u0026thinsp;0.001). Visual inspections supported the statistical finding that, in the case of 2D T1-weighted data, the conventional approach had incorrectly classified excessive white matter as demyelination load \u0026ndash; maybe due to partial volume effects. Notably, small clusters located outside the typically confluent regions of demyelination were more frequently retained despite the application of the 200-voxel filter, which may be not sufficient for this modality.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003e\u003col\u003e\u003cspan\u003e\u003cli\u003e\u003cp\u003e\u003cb\u003e(A)\u003c/b\u003e Violin plot of Dice coefficients for CNN-based and conventional lesion segmentation. The width of the violin reflects the frequency of values; the dashed lines mark the 25th, 50th (median), and 75th percentiles. The lower quartile \u0026ldquo;tail\u0026rdquo; illustrates that, while most cases achieve high overlap (\u0026gt;\u0026thinsp;0.8), a subset shows markedly lower agreement.\u003c/p\u003e\u003c/li\u003e\u003c/span\u003e\u003cspan\u003e\u003cli\u003e\u003cp\u003e\u003cb\u003e(B)\u003c/b\u003e Scatter plot of demyelination load versus GMFC-MLD score, comparing CNN-based (blue circles, blue regression line) and conventional (red squares, red regression line) segmentations. The CNN method shows slightly higher correlation with increasing disability levels.\u003c/p\u003e\u003c/li\u003e\u003c/span\u003e\u003cspan\u003e\u003cli\u003e\u003cp\u003e\u003cb\u003e(C)\u003c/b\u003e Bland\u0026ndash;Altman analysis showed high agreement with 2 datapoints falling out of the limits of agreement (3.6%). The CNN-based segmentation selected slightly less voxels as lesion compared to the conventional approach (median \u0026minus;\u0026thinsp;0.0065).\u003c/p\u003e\u003c/li\u003e\u003c/span\u003e\u003c/ol\u003e\u003c/p\u003e\u003c/div\u003e"},{"header":"Discussion","content":"\u003cp\u003eThis study demonstrates that a convolutional neural network (CNN)-based approach can reliably segment demyelinated white matter in MLD patients, showing substantial agreement with an established semi-automated reference method. The model performed robustly across heterogeneous MRI acquisition settings, including both 2D and 3D imaging protocols, and patient characteristics, with high spatial overlap (median Dice coefficient for 3D T1-images\u0026thinsp;=\u0026thinsp;0.82, 0.75 for 2D T1-images), low segmentation bias, good correlation with clinical severity and pathophysiological plausible, steadily increasing lesion loads in longitudinal follow-up.\u003c/p\u003e\u003cdiv id=\"Sec14\" class=\"Section2\"\u003e\u003ch2\u003eSegmentation accuracy and influencing factors\u003c/h2\u003e\u003cp\u003eThe overall segmentation agreement was good, with higher accuracy in 3D T1-weighted sequences compared to 2D T1 images. This discrepancy is expected, as 3D scans offer higher spatial resolution and contrast uniformity. The dice coefficient was most affected by low lesion burden, young age, and low GMFC scores\u0026mdash;likely due to developmental factors such as incomplete myelination or greater anatomical variability, which challenge both segmentation methods. Importantly, in these cases the CNN-based approach more closely matched the visual impression than the conventional method, especially in the presence of low contrast or structural abnormalities.\u003c/p\u003e\u003cp\u003eCompared with recent deep-learning pipelines for multiple sclerosis lesion segmentation\u0026mdash;which typically report Dice scores in the 0.75\u0026ndash;0.85 range and 5\u0026ndash;10% outliers \u003csup\u003e16\u0026ndash;18\u003c/sup\u003e\u0026mdash;our CNN achieves comparable or even improved overlap on T1/T2 data. Unlike most MS models trained exclusively on adult FLAIR scans, it generalizes robustly across varied protocols and a markedly younger cohort, demonstrating that modern AI frameworks can be successfully adapted to challenging, heterogeneous pediatric leukodystrophy imaging.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec15\" class=\"Section2\"\u003e\u003ch2\u003eOutliers and qualitative findings\u003c/h2\u003e\u003cp\u003eVisual inspection of segmentation outliers confirmed that most discrepancies stemmed from known pitfalls of conventional segmentation in neuroimaging, even more in a pediatric population\u0026mdash;such as partial volume effects near the ventricles/CSF or misclassification of unmyelinated white matter. In contrast, the CNN model was less prone to oversegmentation in such areas and showed greater spatial specificity. Notably, low Dice scores were not randomly distributed but clustered in a small number of patients and remained relatively stable across their longitudinal scans, even when acquired on different MRI scanners. This suggests that certain anatomical or developmental conditions\u0026mdash;such as early age, delayed or incomplete myelination, or ventricular deformation\u0026mdash;create consistent challenges for conventional segmentation pipelines. These findings highlight the potential of deep learning approaches to better account for structural and maturational variability in pediatric MLD.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec16\" class=\"Section2\"\u003e\u003ch2\u003eCross modalitiy generalizability\u003c/h2\u003e\u003cp\u003eDespite being trained exclusively on 3D data, the CNN model generalized well to 2D T1-weighted scans, albeit with moderately reduced Dice coefficients. This demonstrates its flexibility and potential applicability in real-world clinical settings where 2D acquisitions may still be common. Notably, small, isolated clusters outside typical lesion regions were more frequently retained in the conventional segmentation of low-resolution 2D scans. These likely reflect limitations of voxel-based filtering when applied to noisier or lower-contrast data. While increasing the filter threshold (e.g., above 200 voxels) might have reduced such spurious detections, predefined thresholds were deliberately kept constant across all analyses to preserve comparability between methods. This highlights the need for modality-specific adaptation in conventional pipelines, while the CNN model demonstrated better intrinsic robustness to such variations.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec17\" class=\"Section2\"\u003e\u003ch2\u003eCorrelation with clinical severity\u003c/h2\u003e\u003cp\u003eLesion volumes derived from both the CNN-based and the conventional segmentation methods showed significant correlations with motor impairment as assessed by the GMFC score. While the CNN-based model consistently yielded slightly stronger correlations in both cohorts, the differences were modest and did not indicate a substantial performance gap. Interestingly, the correlation was markedly stronger in the 2D T1-weighted dataset (r\u0026thinsp;=\u0026thinsp;0.68) compared to the 3D T1-weighted dataset (r\u0026thinsp;=\u0026thinsp;0.38), even though the CNN model was trained exclusively on 3D data. The reason for this discrepancy remains unclear. One possible explanation lies in the distribution of clinical severity within each group: in the 3D cohort, only a few patients had a GMFC score of 6, potentially limiting the range of observable clinical impairment and thus weakening the strength of the association. In contrast, the 2D group included a broader spectrum of disease severity, allowing for a clearer trend. Importantly, the correlation coefficients observed for the 2D data align well with previous studies on demyelination load and clinical function \u003csup\u003e7\u003c/sup\u003e, suggesting that the CNN-based segmentation captures disease burden in a clinically meaningful way even in lower-resolution imaging.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec18\" class=\"Section2\"\u003e\u003ch2\u003eLongitudinal performance\u003c/h2\u003e\u003cp\u003eIn longitudinal datasets, both segmentation methods yielded plausible progression patterns, with volumetric changes consistent with natural disease evolution. The CNN approach showed no implausible fluctuations or outliers, supporting its suitability for longitudinal tracking. This aspect is crucial for applications in therapeutic monitoring, particularly in the context of emerging treatments like gene therapy.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec19\" class=\"Section2\"\u003e\u003ch2\u003eClinical implications and future applications\u003c/h2\u003e\u003cp\u003eThe CNN-based segmentation approach offers a promising alternative to conventional methods for quantifying demyelination in MLD. It demonstrated substantially reduced processing time and operator dependency, while maintaining strong concordance with semi-automated pipelines, even when evaluated across heterogeneous MRI sequence types and scanner acquisition parameters.. Particularly in cases with subtle or ambiguous findings\u0026mdash;such as early-stage disease or atypical patterns\u0026mdash;the CNN model may provide more consistent results, helping to guide clinical interpretation. Furthermore, its applicability to both 3D and 2D T1-weighted scans makes it compatible with a broader range of clinical imaging protocols, including those used in resource-limited settings.\u003c/p\u003e\u003cp\u003eImportantly, the observed correlation between demyelination load and motor impairment supports its use as a potential imaging biomarker in future clinical trials or therapeutic monitoring. Given the increasing use of gene therapy in MLD (e.g., arsa-cel), such automated and scalable methods could assist in standardized evaluation of treatment response across centers. Nevertheless, human oversight remains imperative to ensure plausibility of automated results.\u003c/p\u003e\u003cp\u003eWhile the current model was trained on MLD data, the underlying architecture and principles are likely transferable to other leukodystrophies characterized by confluent white matter changes, such as X-linked adrenoleukodystrophy (X-ALD). However, disease-specific adaptations and validations would be required before broader application, as lesion distribution, imaging characteristics, and age-dependent anatomical variability may differ.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec20\" class=\"Section2\"\u003e\u003ch2\u003eLimitations:\u003c/h2\u003e\u003cp\u003eThis study has limitations. First, the ground truth was based on a conventional pipeline that itself is not error-free. Thus, segmentation discrepancies should not be interpreted as incorrect predictions per se\u0026mdash;even though visual inspection in the very most cases suggested that the CNN model more accurately reflected the true extent of demyelination. Second, although the model performed well across 2D and 3D data, it was not explicitly trained on mixed-modal datasets. Future training using diverse input formats and scanner types may further improve robustness. Third, FLAIR imaging was not included in this study. While FLAIR may improve lesion conspicuity beyond early childhood\u0026mdash;when myelination is more advanced\u0026mdash;its diagnostic utility in younger individuals is limited by age-dependent myelination effects. In incompletely myelinated brains, the contrast between normal and demyelinated white matter is often reduced on FLAIR, making T2-weighted imaging more reliable for lesion detection. Moreover, FLAIR sequences were not consistently available across all sites and showed greater heterogeneity in acquisition parameters. Our approach therefore focused on routinely acquired T1- and T2-weighted sequences, which are available for practically all patients and form the basis of established semi-quantitative and volumetric scoring methods. Future studies incorporating standardized FLAIR or multi-contrast protocols may further enhance lesion characterization and model generalizability.\u003c/p\u003e\u003cp\u003eFourth, the tissue probability maps used for conventional segmentation were derived from standard adult templates provided by CAT12/SPM12. While these offer robust segmentation in healthy adult populations, they are not specifically tailored to pediatric brains. As a result, anatomical deviations due to age-related differences or disease-induced brain changes in MLD may lead to systematic misclassification of tissue types, which in turn could bias the ground truth.\u003c/p\u003e\u003cp\u003eFinally, the cohort size\u0026mdash;though substantial for a rare disease\u0026mdash;remains limited. Larger, multicenter datasets are needed to validate generalizability and to train next-generation models with improved adaptability across ages and imaging protocols.\u003c/p\u003e\u003c/div\u003e"},{"header":"Conclusion","content":"\u003cp\u003eIn conclusion, this CNN-based segmentation approach provides a fast, scalable, and clinically relevant method for quantifying demyelination in MLD. It aligns closely with conventional methods, correlates with clinical severity, and is robust across modalities and time points. As precision imaging becomes integral to treatment monitoring in leukodystrophies, such tools will be instrumental in translating advanced MRI into routine clinical care.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eCompeting Interests\u003c/strong\u003e\u003cp\u003eJ.S., T.C., A.D., A. F., D.S. and J.S. are full-time employees of Clario with no conflicts of interest related to this work. During the preparation of this manuscript, C.J.M. and D.W. were full-time employees and stockholders of Takeda Pharmaceuticals with no other conflicts of interest related to this work. S.G. received institutional research grants from Shire (a Takeda company) and Orchard Therapeutics, and does adviser activities for Clario, Orchard Therapeutics, and Sanofi, without personal payments. P.M., B.B. and H.R. report no conflicts of interest.\u003c/p\u003e\u003c/p\u003e\u003ch2\u003eAuthor Contribution\u003c/h2\u003e\u003cp\u003eSamuel Groeschel and Luc Bracoud contributed equally.Pascal Martin made substantial contributions to the conception and design of the work, performed data analysis and interpretation, and drafted the manuscript. Jo\u0026euml;l Schaerer, Thomas Cajgfinger, Alessandro Delmonte, Alexandre Fusellier, David Scott, and Joyce Suhy developed, implemented, and applied the convolutional neural network (CNN) used in this study; Thomas Cajgfinger additionally prepared Figure 1. Benjamin Bender contributed to imaging data acquisition and evaluation. David Whiteman and C.J. Malanga contributed to data acquisition. Hendrik Rosewich contributed to clinical data interpretation. Samuel Groeschel and Luc Bracoud jointly conceived and supervised the study, contributed to study design and data interpretation, and critically revised the manuscript. All authors revised the work critically and approved the final version to be published and agree to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eGieselmann, V., and Krageloh-Mann, I. (2010). Metachromatic leukodystrophy--an update. Neuropediatrics 41, 1-6.\u003c/li\u003e\n\u003cli\u003eGroeschel, S., Kehrer, C., Engel, C., C, I.D., Bley, A., Steinfeld, R., Grodd, W., and Krageloh-Mann, I. (2011). Metachromatic leukodystrophy: natural course of cerebral MRI changes in relation to clinical course. J Inherit Metab Dis 34, 1095-1102.\u003c/li\u003e\n\u003cli\u003eKehrer, C., Groeschel, S., Kustermann-Kuhn, B., B\u0026uuml;rger, F., K\u0026ouml;hler, W., Kohlsch\u0026uuml;tter, A., Bley, A., Steinfeld, R., Gieselmann, V., and Kr\u0026auml;geloh-Mann, I. (2014). Language and cognition in children with metachromatic leukodystrophy: onset and natural course in a nationwide cohort. Orphanet Journal of Rare Diseases 9, 18.\u003c/li\u003e\n\u003cli\u003evan der Voorn, J.P., Pouwels, P.J., Kamphorst, W., Powers, J.M., Lammens, M., Barkhof, F., and van der Knaap, M.S. (2005). Histopathologic correlates of radial stripes on MR images in lysosomal storage disorders. AJNR Am J Neuroradiol 26, 442-446.\u003c/li\u003e\n\u003cli\u003eEichler, F., Grodd, W., Grant, E., Sessa, M., Biffi, A., Bley, A., Kohlschuetter, A., Loes, D.J., and Kraegeloh-Mann, I. (2009). Metachromatic leukodystrophy: a scoring system for brain MR imaging observations. AJNR Am J Neuroradiol 30, 1893-1897.\u003c/li\u003e\n\u003cli\u003eGroeschel, S., i Dali, C., Clas, P., Bohringer, J., Duno, M., Krarup, C., Kehrer, C., Wilke, M., and Krageloh-Mann, I. (2012). Cerebral gray and white matter changes and clinical course in metachromatic leukodystrophy. Neurology 79, 1662-1670.\u003c/li\u003e\n\u003cli\u003eStrolin, M., Krageloh-Mann, I., Kehrer, C., Wilke, M., and Groeschel, S. (2017). Demyelination load as predictor for disease progression in juvenile metachromatic leukodystrophy. Ann Clin Transl Neurol 4, 403-410.\u003c/li\u003e\n\u003cli\u003eClas, P., Groeschel, S., and Wilke, M. (2012). A semi-automatic algorithm for determining the demyelination load in metachromatic leukodystrophy. Acad Radiol 19, 26-34.\u003c/li\u003e\n\u003cli\u003eTillema, J.M., Derks, M.G., Pouwels, P.J., de Graaf, P., van Rappard, D.F., Barkhof, F., Steenweg, M.E., van der Knaap, M.S., and Wolf, N.I. (2015). Volumetric MRI data correlate to disease severity in metachromatic leukodystrophy. Ann Clin Transl Neurol 2, 932-940.\u003c/li\u003e\n\u003cli\u003eKim, M.J., Hong, E., Yum, M.S., Lee, Y.J., Kim, J., and Ko, T.S. (2024). Deep learning-based, fully automated, pediatric brain segmentation. Sci Rep 14, 4344.\u003c/li\u003e\n\u003cli\u003eChacko, A., Schoeman, S., Venkatakrishna, S.S.B., Bolton, S., Shearn, A.I.U., and Andronikou, S. (2023). Caution: shortcomings of traditional segmentation methods from magnetic resonance imaging brain scans intended for 3-dimensional surface modelling in children with pathology. Pediatr Radiol 53, 1854-1862.\u003c/li\u003e\n\u003cli\u003eChen, Y.T., Huang, Y.C., Chen, H.L., Lo, H.C., Chen, P.C., Yu, C.C., Tu, Y.C., Liu, T.L., and Lin, W.C. (2025). Automatic segmentation of white matter lesions on multi-parametric MRI: convolutional neural network versus vision transformer. BMC Neurol 25, 5.\u003c/li\u003e\n\u003cli\u003eAkkus, Z., Galimzianova, A., Hoogi, A., Rubin, D.L., and Erickson, B.J. (2017). Deep Learning for Brain MRI Segmentation: State of the Art and Future Directions. J Digit Imaging 30, 449-459.\u003c/li\u003e\n\u003cli\u003eIsensee, F., Jaeger, P.F., Kohl, S.A.A., Petersen, J., and Maier-Hein, K.H. (2021). nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nat Methods 18, 203-211.\u003c/li\u003e\n\u003cli\u003eGroeschel, S., Beerepoot, S., Amedick, L.B., Krӓgeloh-Mann, I., Li, J., Whiteman, D.A.H., Wolf, N.I., and Port, J.D. (2024). The effect of intrathecal recombinant arylsulfatase A therapy on structural brain magnetic resonance imaging in children with metachromatic leukodystrophy. J Inherit Metab Dis 47, 778-791.\u003c/li\u003e\n\u003cli\u003eMcKinley, R., Wepfer, R., Aschwanden, F., Grunder, L., Muri, R., Rummel, C., Verma, R., Weisstanner, C., Reyes, M., Salmen, A., et al. (2021). Simultaneous lesion and brain segmentation in multiple sclerosis using deep neural networks. Sci Rep 11, 1087.\u003c/li\u003e\n\u003cli\u003eLa Rosa, F., Abdulkadir, A., Fartaria, M.J., Rahmanzadeh, R., Lu, P.J., Galbusera, R., Barakovic, M., Thiran, J.P., Granziera, C., and Cuadra, M.B. (2020). Multiple sclerosis cortical and WM lesion segmentation at 3T MRI: a deep learning method based on FLAIR and MP2RAGE. Neuroimage Clin 27, 102335.\u003c/li\u003e\n\u003cli\u003eCerri, S., Puonti, O., Meier, D.S., Wuerfel, J., Muhlau, M., Siebner, H.R., and Van Leemput, K. (2021). A contrast-adaptive method for simultaneous whole-brain and lesion segmentation in multiple sclerosis. Neuroimage 225, 117471.\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Metachromatic leukodystrophy, convolutional neural network, demyelination load, quantitative MRI, automated lesion segmentation","lastPublishedDoi":"10.21203/rs.3.rs-7924598/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-7924598/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003ch2\u003ePurpose\u003c/h2\u003e\u003cp\u003eMetachromatic leukodystrophy (MLD) is a rare lysosomal storage disorder characterized by progressive white matter demyelination. Quantification of demyelinated white matter on MRI\u0026mdash;typically expressed as the \u003cem\u003edemyelination load\u003c/em\u003e\u0026mdash;serves as a key imaging biomarker of disease burden, enabling objective monitoring beyond visual rating scales. However, current semi-automated pipelines are limited by manual interaction, pediatric brain variability, and differences in MRI acquisition. This study aimed to develop and validate a self-configuring convolutional neural network (CNN) for automated segmentation of demyelinated white matter in MLD and to compare its performance with a conventional semi-automated method across heterogeneous MRI datasets.\u003c/p\u003e\u003ch2\u003eMethods\u003c/h2\u003e\u003cp\u003eAn nnU-Net was trained on 189 3D T1- and axial T2-weighted scans from 35 MLD patients using visually controlled conventional masks as ground truth. Independent testing was performed on 130 scans (73 high-resolution 3D, 57 lower-resolution 2D T1-weighted) from 49 patients. Performance was assessed by Dice coefficient, Bland\u0026ndash;Altman bias, correlation with Gross Motor Function Classification (GMFC-MLD), longitudinal consistency, and qualitative review of outliers.\u003c/p\u003e\u003ch2\u003eResults\u003c/h2\u003e\u003cp\u003eCNN-based segmentation showed strong spatial agreement with the reference method, highest in 3D T1-weighted and robust in 2D scans. Volumetric bias was minimal, and CNN-derived lesion volumes correlated well with motor impairment. Longitudinal analyses showed smooth, monotonic changes, and qualitative review revealed fewer boundary misclassifications.\u003c/p\u003e\u003ch2\u003eConclusion\u003c/h2\u003e\u003cp\u003eThe nnU-Net enables fast, reproducible, and clinically meaningful segmentation of demyelinated white matter in MLD. It generalizes across MRI protocols, correlates with motor function, and offers a scalable tool for standardized biomarker extraction in clinical trials and other leukodystrophies.\u003c/p\u003e","manuscriptTitle":"Automated Deep Learning–Based Demyelination Load Segmentation in Metachromatic Leukodystrophy","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-11-10 16:22:09","doi":"10.21203/rs.3.rs-7924598/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"20a4f644-1d15-4b54-95fa-896dec966f3b","owner":[],"postedDate":"November 10th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2026-04-27T16:06:39+00:00","versionOfRecord":{"articleIdentity":"rs-7924598","link":"https://doi.org/10.1007/s00062-026-01652-6","journal":{"identity":"clinical-neuroradiology","isVorOnly":false,"title":"Clinical Neuroradiology"},"publishedOn":"2026-04-21 15:57:33","publishedOnDateReadable":"April 21st, 2026"},"versionCreatedAt":"2025-11-10 16:22:09","video":"","vorDoi":"10.1007/s00062-026-01652-6","vorDoiUrl":"https://doi.org/10.1007/s00062-026-01652-6","workflowStages":[]},"version":"v1","identity":"rs-7924598","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-7924598","identity":"rs-7924598","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00